Posts Tagged ‘multithreading’

Concurrent Modification Exception

Monday, February 8th, 2010

I ran into a ConcurrentModificationException (CME) during stress testing.
What does CME actually mean?
It means that you’ve updated your Collection while you’ve been iterating over it (usually in a multi-threaded fashion, but it can occur in a single thread that updates while iterating).

A few more things to know about CME:
Best effort detection
- If you see a CME printout, first off, consider yourself lucky, CMEs are thrown only in best effort. In another universe, the concurrent modification would not have been detected, causing your collection to become corrupted, instead of fast-failing with a CME.

IDing the problem – Like deadlocks, CME’s are easy to pinpoint once you inspected the exception’s stack trace.

Getting around it:

  1. ListIterator
    If you’re single threaded, consider solving the CME by manipulating the collection via the ListIterator interface instead of directly.
    Advantages – simple.
    Drawbacks – suitable for a single thread model.
  2. Synchronizers
    Use locks to obtain mutual exclusion while doing collection R/W operations.
    Advantages – easy to code.
    Drawbacks – lock overhead for reading operations.
  3. Copy-on-write
    Take advantage of the Java.util.concurrent collections like: CopyOnWriteArrayList, CopyOnWriteArraySet. If you require a map then grab CopyOnWriteMap from Apache (this guys have been doing Sun’s dirty work for years now).
    Advantages – very good reading performance (no locks are used, visibility is obtained via volatility).
    Drawbacks – very bad write performance on large maps.
    Conclusion – use for seldom mutating collections.
  4. Concurrent Collections
    If you want to go heavyweight, consider using: ConcurrentHashMap (or one of its package friends).
    Once you create an iterator over a ConcurrentHashMap, it does not freeze the collection for traversal, updates to the collection may or may not appear during the traversal (weakly consistent).

The approach I ended up taking:
My use case was populating an almost never changing ~ten items cache. A copyonwrite map was the best choice, I believe.

Best pic idea I could think of to visualize Threads :)

Myth busting – String.intern() object allocations are never garbage collected

Wednesday, January 6th, 2010

Java is becoming quite old (version 1.0 came out in 1996 if I’m not mistaken). When something turns old, legends, myths, and other perceived truths are quick to form around it (just imagine an old Gothic mansion with its stack of scare tales).
Most of the accumulated knowledge is beneficial and helpful, but some of it is not relevant anymore or just plain wrong.
Remembering that Java is 14 yeas old (2010), when I google for something, for Java info/answers, I always inspect the date of the article I landed on.
If you stumble upon somebody claiming that java can/can’t do something, always check his comment’s date. If I see something from 2001, you better search for newer references, instead of accepting it as is.

oldSome sites like http://Javaworld.com, have been there from the get go, were big then, but after losing popularity, are now a grave yard for old Java skeletons (I myself have a not that relevant article there).

The story with String.intern() is the same, you’ll find people all around the place, claiming that over using it will finish up the perm area, because the perm area is never garbage collected. As discussed here, that’s just not true.

Something I enjoy doing is not taking so called “facts” as granted, and re-validating on my IDE.
Thinking that those intern() allocations will never be GCed, I was planing a presentation on how to use weakHashMap based solution can serve as an alternative cache repository for Strings, wrote a program to demonstrate an OMME caused by intern() only to find out that intern() is not so bad  as I originally thought.
Try stuff yourself. You be surprised…

Other myths I’ll should wright about some day are:

  1. Regular expressions in Java are slow – FALSE! I’ve tested this myself, and after compiling the regex, I was able to run over than 1 million matches per second (small strings of course).
  2. Always use StringBuffer to concatenate strings – dead wrong! if you have all concatenations in a single line, like the following, the compiler auto does it for you:
    s= “Hi my name is: “+myName+ “. my lucky number is: “+num;
    Run Javap on a class file using and not using StringBuffer to see that the byte code is the same.
    Though this piece of code could benefit from StringBuffer to prevent rapid object creation:
    for (…) {
    s += strOfThisCycle;
    }
    In any case, Java5 introduces StringBuilder which is the unsynchronized tween of the synchronized StringBuffer class. I guess you will rarely access the same builder from different threads, therefore StringBuilder should be the default choice for ya.

Why is Thread.sleep() inherently inaccurate

Sunday, August 23rd, 2009

Avi Ribchinsky, a friend and a college of mien, is transitioning from C++ to the Java world. He had been playing with Thread.sleep(), when he noticed that the sleep method might oversleep more than ordered, and moreover, it could also under sleep (see Fig 1). Coming from the C++ world, that surely caught him surprised ;)

Fig 1.

Thread.sleep() under sleeping

Thread.sleep() under sleeping

How is sleep implemented in Java anyway?

Avi came asking me if I knew anything about it, I was wondering myself how such a common and important method could be faking in the way shown above. Is it the OS? a Bug in the specific JRE version used? Maybe the API doesn’t guarantee milliseconds precision to begin with?
Thinking about all of these factors, we realized that we don’t really know how the JVM implements the sleep method functionality, my best guess would have been that the process registers itself in the OS for a wake up call, and the OS wakes the process via a software interrupt. OK, time to search the web.

The following article gives a very detailed answer, explaining that sleep is implemented by a thread giving up its OS scheduling quantum back to the scheduler, on the next execution quantum the thread gets, it has the chance to wake up and continue processing, or again continue sleeping.
Therefore, the accuracy resolution of sleep is directly dependent on the process scheduling resolution of the operating system in usage. Since windows XP process scheduling resolution is roughly 10ms, the sleep mechanism, in the Avi’s example, might had prefered to under sleep “a little” rather than oversleeping “a lot”, by waking himself in the current scheduling cycle quantum, rather than in the next, future, quantum.

The article also mentions that the inaccuracies are worsened when a process with a higher scheduling priority, than the sleeping process, is in a runnable state.

I assume that, running on a Hypervisor with course grained process scheduling would also produce greater inaccuracies.

sleeping

Conclusion

You can’t rely on the millisecond accuracy of the sleep method. Take a before and after time measurament to find the actual time spent sleeping, in order to avoid ever increasing inacurracies.
Sleep tight :)

How does hardware evolution affect progamming language design?

Sunday, March 30th, 2008

I’ve recently watched the interesting webcast Programming Language Design and Analysis Motivated by Hardware Evolution by Professor Alan Mycroft (Webcast’s link is accessible only from within the IBM Intranet). Ahead are a few keynotes I’ve kept.

Not everything is kept linear
As chip designers continue to scale down chips and transistors, they begin hitting design walls. Some of these walls are related to the fact that as the transistors` physical size is scaled down, some other properties of the chip do not scale linearly as well. This simplest example of this are dimensions, consider length Vs surface area: reducing a square side to 50% of its original size, will causes the square surface space to reduced by 75%, not a linear change. Different electricity characteristics might change at different rates than the rate in which length is changed.

pl.gif

Where is my 12Ghz CPU?
Moore’s law, which predicts the doubling of transistors quantity on a chip every ~18 months is still in effect, sadly, this doesn’t translate into clock speed. Although that, when transistors are miniaturized the distances within the chip reduces as well, and this should mean an increase in speed, but, due to heat dispersion problems (not all dimensions shrink at the same rate, generated heat is one of them, remember?) chip designers are forced to reduce the voltage in which the chip components operate. Therefore no clock speed gain.
This enables us, however, to squeeze in more cores into that optimal one cm^2 silicon pad. Hence, the multi-core technological path that the industry had resorted to in the last couple of years.

Quad core

There’s always a trade-off
As the voltage in which the chip operates drops, chip designers are starting to face computation inaccuracy problems. How could we live in peace with these imprecision? the professor ponders, do we must insist on absolute accuracy? Consider the task of rendering video, do we really care about the correctness of each pixel on each of the frames, probably not, just remember those old analog VCR and audio cassettes, they were highly inaccurate and still were able to deliver the goods. We might decide to compromise on accuracy, some of the time, in order to benefit on speed, just another type of trade-off. Programming language designers should assist chip designers by developing programming languages that are able to operate in a world of non absolute certainty.
Also think about the build-in error correction mechanisms put in to network protocol stacks.

Better on one world, worse on the other
A major problem with multi-core chips processing, is that although inter-cores communication enjoy a high bandwidth (2.5GB/s), it is stained by a high latency (~75 clock cycles) .
Another problem is that programs are written based on a shared memory model, in which all cores must coordinate when accessing the shared main memory, core’s caches must also be refreshed quite often. While this doesn’t seems a major problem for dual or quad cores, think on how this heavily limits performance on a, not so futuristic anymore, 128 cores chip.
Trying to refrain from shared main memory access might turn the table on some of the disciplines we got accustomed to think of as obvious. For example, when you code a parametrized function you declare how parameters are passed; either by reference, or by value. Declaring this during coding time (rather then deciding this during runtime) can be regarded as “early-binding”. From a performance perspective, everybody knows that passing by reference is, almost always, faster than passing by value (assuming you don’t intend on changing the passed value). This preferred way of action might not hold true on a multi-core system that will have to incur an expensive overhead when it access the data which the reference point to in the shared main memory, no such price has to be paid if the parameter is past by value. One way in which future programming languages might deal with this is to allow for late-binding of the parameters passing method. When running on a chip with only a few cores, a pass by reference will occur, just as, when running on a cores enriched chip a pass by value will be selected. This is true when the pass by reference/value makes no difference to the program logic (no changes to the parameter’s data are visible to the method caller, nor the parameter data is accessed concurrently), and therefore both could be used interchangeably.
Future languages will need to support this “late-binding” feature and others like it.

Summing up
It will be interesting to keep follow of these hardware to software trends of mutual influences.