Multi Threading
ConcurrentHashMap fat memory footprint
Mar 23rd
While running product sizing tests, we’ve found that an over enthusiastic usage of ConcurrentHashMap (CHM) had evaporated a good ~170MB of much needed heap space (we ran with a 1.5GB heap).
As it turns out, a empty CHM weighs around 1700B. Yes, I’m talking about a map with no entries at all, just the plumbing!
We used a CHM to store user session attributes, having 100,000 user sessions generated 100K CHM instances worth 170MB of heap (100K times 1.7KB).
We took measurements using the super Eclipse MAT.
The obvious solution for saving these scares 170MB, was to switch from a CHM to a Hashtable. A Hashtable cost only around 150B per instance (8% of a CHM).
Other possible solutions could have been: moving to a list structure (seek time is not an issue as we rarely have more than 4-5 attributes per session), or resorting to a an array of Objects.
Change implications:
1. Performance - The product doesn’t have any user scenario that cause multiple threads to concurrently access the same session attributes map, so we don’t expect any performance loss, on the contrary, I’m expecting a hashtable to prove faster for single thread access, over a CHM.
2. Thread safety is a low risk aspect, as both CHM and HT provide the same basic guarantees for a single API operation (e.g., map.get(key)).
To conclude, a CHM is a good idea when you have a shared map structure suffering from a high R/W thread access contention. But dragging behind itself such a large memory footprint, CHM is not ideal to use in masses, or when concurrency performance is not the focus.
P.S
A CHM automatically allocates 16 segments, each with a 16-element array – one best practice is to measure the average map population during your product’s sizing tests, and initialize the CHM with the minimum initialCapcity and loadFactor, required to contain your usage.
Concurrent Modification Exception
Feb 8th
I ran into a ConcurrentModificationException (CME) during stress testing.
What does CME actually mean?
It means that you’ve modified (add, remove, update) your Collection while you’ve been iterating over it (usually in a multi-threaded fashion, but it can occur in a single thread that modifies while iterating).
A few more things to note about CME:
Best effort detection - If you see a CME printout, first off, consider yourself lucky, CMEs are thrown only in best effort. In another universe, the concurrent modification would not have been detected, causing your collection to become corrupted, instead of fast-failing with a CME.
IDing the problem – Like deadlocks, CME’s are easy to pinpoint once you inspected the exception’s stack trace.
Avoiding CME:
- ListIterator
To modify a collection by the same thread that is currently iterating on it, use a ListIterator that will allow you to perform both.
Drawbacks – single thread solution only. - Naive solution: Synchronizers
Use locks to for mutually excluding traversal and modification operations.
Advantages – easy to code.
Drawbacks – very long lock periods while iterating. - CopyOnWrite
Take advantage of the Java.util.concurrent collections like: CopyOnWriteArrayList, CopyOnWriteArraySet. If you require a map then grab CopyOnWriteMap from Apache (this guys have been doing Sun’s dirty work for years now).
Advantages – very good reading performance (no locks are used, instead visibility is obtained via map member volatility).
Drawbacks – very bad write performance on large maps.
Conclusion – use for seldom mutating collections. - toArray()
toArray will create a new array holding a copy of your Set (Map.keySet() for a Map).
You can then iterate over the array, freely modifying the original collection (the array doesn’t change of course).
Advantages – write operations are cheap.
Disadvantages – copying the entire set could be expensive if it occurs too often, and/or the set is very large. - Concurrent Collections
If you want to go heavyweight, consider using: ConcurrentHashMap (or one of its package friends).
Once you create an iterator over a ConcurrentHashMap (CHM), it does not freeze the collection for traversal, updates to the collection may or may not appear during the traversal (weakly consistent).
The approach I ended up taking
My use case was seldom modifying a ~ten items cache. A copyonwrite map was what I used.
In other cases I had, ConcurrentHashMap was the easiest solution (though make sure your code can live in peace with the CHM weak consistency property).
How does hardware evolution affect progamming language design?
Mar 30th
I’ve recently watched the interesting webcast Programming Language Design and Analysis Motivated by Hardware Evolution by Professor Alan Mycroft (Webcast’s link is accessible only from within the IBM Intranet). Ahead are a few keynotes I’ve kept.
Not everything is kept linear
As chip designers continue to scale down chips and transistors, they begin hitting design walls. Some of these walls are related to the fact that as the transistors` physical size is scaled down, some other properties of the chip do not scale linearly as well. This simplest example of this are dimensions, consider length Vs surface area: reducing a square side to 50% of its original size, will causes the square surface space to reduced by 75%, not a linear change. Different electricity characteristics might change at different rates than the rate in which length is changed.
Where is my 12Ghz CPU?
Moore’s law, which predicts the doubling of transistors quantity on a chip every ~18 months is still in effect, sadly, this doesn’t translate into clock speed. Although that, when transistors are miniaturized the distances within the chip reduces as well, and this should mean an increase in speed, but, due to heat dispersion problems (not all dimensions shrink at the same rate, generated heat is one of them, remember?) chip designers are forced to reduce the voltage in which the chip components operate. Therefore no clock speed gain.
This enables us, however, to squeeze in more cores into that optimal one cm^2 silicon pad. Hence, the multi-core technological path that the industry had resorted to in the last couple of years.
There’s always a trade-off
As the voltage in which the chip operates drops, chip designers are starting to face computation inaccuracy problems. How could we live in peace with these imprecision? the professor ponders, do we must insist on absolute accuracy? Consider the task of rendering video, do we really care about the correctness of each pixel on each of the frames, probably not, just remember those old analog VCR and audio cassettes, they were highly inaccurate and still were able to deliver the goods. We might decide to compromise on accuracy, some of the time, in order to benefit on speed, just another type of trade-off. Programming language designers should assist chip designers by developing programming languages that are able to operate in a world of non absolute certainty.
Also think about the build-in error correction mechanisms put in to network protocol stacks.
Better on one world, worse on the other
A major problem with multi-core chips processing, is that although inter-cores communication enjoy a high bandwidth (2.5GB/s), it is stained by a high latency (~75 clock cycles) .
Another problem is that programs are written based on a shared memory model, in which all cores must coordinate when accessing the shared main memory, core’s caches must also be refreshed quite often. While this doesn’t seems a major problem for dual or quad cores, think on how this heavily limits performance on a, not so futuristic anymore, 128 cores chip.
Trying to refrain from shared main memory access might turn the table on some of the disciplines we got accustomed to think of as obvious. For example, when you code a parametrized function you declare how parameters are passed; either by reference, or by value. Declaring this during coding time (rather then deciding this during runtime) can be regarded as “early-binding”. From a performance perspective, everybody knows that passing by reference is, almost always, faster than passing by value (assuming you don’t intend on changing the passed value). This preferred way of action might not hold true on a multi-core system that will have to incur an expensive overhead when it access the data which the reference point to in the shared main memory, no such price has to be paid if the parameter is past by value. One way in which future programming languages might deal with this is to allow for late-binding of the parameters passing method. When running on a chip with only a few cores, a pass by reference will occur, just as, when running on a cores enriched chip a pass by value will be selected. This is true when the pass by reference/value makes no difference to the program logic (no changes to the parameter’s data are visible to the method caller, nor the parameter data is accessed concurrently), and therefore both could be used interchangeably.
Future languages will need to support this “late-binding” feature and others like it.
Summing up
It will be interesting to keep follow of these hardware to software trends of mutual influences.



Via e-mail