Concurrent Modification Exception

February 8th, 2010

I got a ConcurrentModificationException (CME) during run-time.
What does it mean?
It means that you’ve updated your collection while iterating over it (single or multi-threaded alike).

A few more things to know:
Best effort detection
- First off, consider yourself lucky, CMEs are thrown only in best effort, so another universe, your collection could have became corrupted instead of fast-failing and and the operation and issuing you a warning.

Identifying the collection – Like deadlocks, CME’s are easy to pinpoint once you inspected the exception’s stack trace.

Solving it:

  1. ListIterator
    If you’re single threaded, consider solving the CME by manipulating the collection via the ListIterator interface instead of directly.
    Advantages – simple.
    Drawbacks – single threaded model oriented.
  2. Synchronizers
    Use locks to obtain mutual exclusion in collection R/W operations.
    Advantages – easy to code.
    Drawbacks – lock overhead for reading operations
  3. Copy-on-write
    Either Java.util.concurrent collections like: CopyOnWriteArrayList, CopyOnWriteArraySet. Or for a map: grab CopyOnWriteMap from Apache (this guys have been doing Sun’s dirty work for years now).
    Advantages – very good reading performance (switching locks for volatile).
    Drawbacks – very bad write performance on large maps.
    Conclusion – use for seldom mutating collections.
  4. Concurrent Collections
    If you want to go heavyweight, consider using: ConcurrentHashMap (or one of its package friends).
    Once you create an iterator over a ConcurrentHashMap, it does not freeze the collection for traversal, updates to the collection may or may not appear during the traversal (weakly consistent).

the approach I ended up taking:
My use case was populating an almost never changing ~ten items cache. A copyonwrite map was the best choice, I believe.

Looks anything like threads?

NAT in VMWare vSphere/ESX – In a nut shell

February 1st, 2010

This post is about NATing an ESX VM, but first, why do I need NAT:

The SIP protocol is not NAT oblivious. To traverse NAT our application has to replace the DNS in the SIP message contact header to the external FQDN that the message receiver will be sending responses to (A NAT with static routing configured).
Therefore I needed to test our software in a NAT topology.

In the past, when we used VMWare player/workstation, it had a build-in NAT network. But, unfortunately, the ESX hypervisor does not provide a NATed network option.
Seeking alternatives at VMWare’s appliance marketplace, I found and downloaded the Vyatta’s community edition (VC5) router appliance (also downladble from sourceforge), and comes under the GPL license.
After 3-4 hours – guided by the official quick start guide -  I had a working NAT configuration in the ESX. Hurray!
Overall, not a hard nut to crack ;) , though I wish VMWare will wise up and just add an build-in NAT option to vSphere.

Left to do:
Obtain some static IPs, so the config won’t break each time the vm reboots and the DHCP lease expires.
Tip #1:
If you want want to access your NATed VM by RDP/VNC, without setting up extra NAT routing rules, consider adding the VM an additional un-NATed NIC, but when doing so, make sure that the OS routing tables are set to route through the NIC that is NATed.
Tip #2:
This short vyatta user installation report also helped me a bit.

Here’s the complete configuration script I ended up feeding to the appliance console (network topology is similar to the one presented in the Vyatta’s getting stated guide):
Where:
1.2.3.4 is your department’s DNS server
192.168.1.199 is the VMs NATed private IP address (provided by the DHCP).
The script contains a NAT forward rule for VNC (port 5900)


configure
set system host-name vyatta-nat
set interfaces ethernet eth0 address dhcp
set service ssh
set service https
commit;
save;
# restart the appliance to switch from console remote desktop to SSH:

#login with user and password
configure
show interfaces

set interfaces ethernet eth1 address 192.168.1.254/24

commit;

delete service dhcp-server
set service dhcp-server shared-network-name ETH1_POOL subnet 192.168.1.0/24 start 192.168.1.100 stop 192.168.1.199
set service dhcp-server shared-network-name ETH1_POOL subnet 192.168.1.0/24 default-router 192.168.1.254
set service dhcp-server shared-network-name ETH1_POOL subnet 192.168.1.0/24 dns-server 1.2.3.4
commit;
show service dhcp-server

set service nat rule 1 source address 192.168.1.0/24
set service nat rule 1 outbound-interface eth0
set service nat rule 1 type masquerade
commit;
show service nat
save;
exit
show nat rules
configure
set service nat rule 20 type destination
set service nat rule 20 inbound-interface eth0
# use a negative fake address to so that all incoming communication will be nated
#set service nat rule 20 destination address !192.168.50.0
#Forward traffic to address 192.168.1.199
set service nat rule 20 inside-address address 192.168.1.199
set service nat rule 20 protocol tcp
set service nat rule 20 destination port 5900
commit;
save;
exit

Myth busting – String.intern() object allocations are never garbage collected

January 6th, 2010

Java is becoming quite old (version 1.0 came out in 1996 if I’m not mistaken). When something turns old, legends, myths, and other perceived truths are quick to form around it (just imagine an old Gothic mansion with its stack of scare tales).
Most of the accumulated knowledge is beneficial and helpful, but some of it is not relevant anymore or just plain wrong.
Remembering that Java is 14 yeas old (2010), when I google for something, for Java info/answers, I always inspect the date of the article I landed on.
If you stumble upon somebody claiming that java can/can’t do something, always check his comment’s date. If I see something from 2001, you better search for newer references, instead of accepting it as is.

oldSome sites like http://Javaworld.com, have been there from the get go, were big then, but after losing popularity, are now a grave yard for old Java skeletons (I myself have a not that relevant article there).

The story with String.intern() is the same, you’ll find people all around the place, claiming that over using it will finish up the perm area, because the perm area is never garbage collected. As discussed here, that’s just not true.

Something I enjoy doing is not taking so called “facts” as granted, and re-validating on my IDE.
Thinking that those intern() allocations will never be GCed, I was planing a presentation on how to use weakHashMap based solution can serve as an alternative cache repository for Strings, wrote a program to demonstrate an OMME caused by intern() only to find out that intern() is not so bad  as I originally thought.
Try stuff yourself. You be surprised…

Other myths I’ll should wright about some day are:

  1. Regular expressions in Java are slow – FALSE! I’ve tested this myself, and after compiling the regex, I was able to run over than 1 million matches per second (small strings of course).
  2. Always use StringBuffer to concatenate strings – dead wrong! if you have all concatenations in a single line, like the following, the compiler auto does it for you:
    s= “Hi my name is: “+myName+ “. my lucky number is: “+num;
    Run Javap on a class file using and not using StringBuffer to see that the byte code is the same.
    Though this piece of code could benefit from StringBuffer to prevent rapid object creation:
    for (…) {
    s += strOfThisCycle;
    }
    In any case, Java5 introduces StringBuilder which is the unsynchronized tween of the synchronized StringBuffer class. I guess you will rarely access the same builder from different threads, therefore StringBuilder should be the default choice for ya.

New Java blog out there

January 4th, 2010

A new baby blog was born: Java Tech Sharing.
Proud father: Guy Moshkovich.

I recommend adding to your RSS/Atom reader.

Utility Frenzy #1 – The log summarizer

October 19th, 2009

Here’s a post I wrote (in the Hebrew language) which tells the story of the log summarizer utility that I’ve wrote. This story is the first in a line of “utilities stories” I’m planning on writing.
My apologies for those of you whom won’t be able to read it. Posts in this site do appear in English..

Google is pregnant again – Noop

October 5th, 2009

After the zillion new dynamic languages that had flooded the earth
(groovy, ruby, …), Google is concocting Noop; a new type-safe language to join Java, Scala, and the rest.

The new language sets out to excel in testability, dependency injection, and readable code (see the proposed features). More interesting than whether Noop will gain a crowd of enthusiasts, is the language’s dynamic and lucid development process; made available through Google code (Sun’s JSRs, are also transparent but here it’s a whole new language).

What do you think?

Why is Thread.sleep() inherently inaccurate

August 23rd, 2009

Avi Ribchinsky, a friend and a college of mien, is transitioning from C++ to the Java world. He had been playing with Thread.sleep(), when he noticed that the sleep method might oversleep more than ordered, and moreover, it could also under sleep (see Fig 1). Coming from the C++ world, that surely caught him surprised ;)

Fig 1.

Thread.sleep() under sleeping

Thread.sleep() under sleeping

How is sleep implemented in Java anyway?

Avi came asking me if I knew anything about it, I was wondering myself how such a common and important method could be faking in the way shown above. Is it the OS? a Bug in the specific JRE version used? Maybe the API doesn’t guarantee milliseconds precision to begin with?
Thinking about all of these factors, we realized that we don’t really know how the JVM implements the sleep method functionality, my best guess would have been that the process registers itself in the OS for a wake up call, and the OS wakes the process via a software interrupt. OK, time to search the web.

The following article gives a very detailed answer, explaining that sleep is implemented by a thread giving up its OS scheduling quantum back to the scheduler, on the next execution quantum the thread gets, it has the chance to wake up and continue processing, or again continue sleeping.
Therefore, the accuracy resolution of sleep is directly dependent on the process scheduling resolution of the operating system in usage. Since windows XP process scheduling resolution is roughly 10ms, the sleep mechanism, in the Avi’s example, might had prefered to under sleep “a little” rather than oversleeping “a lot”, by waking himself in the current scheduling cycle quantum, rather than in the next, future, quantum.

The article also mentions that the inaccuracies are worsened when a process with a higher scheduling priority, than the sleeping process, is in a runnable state.

I assume that, running on a Hypervisor with course grained process scheduling would also produce greater inaccuracies.

sleeping

Conclusion

You can’t rely on the millisecond accuracy of the sleep method. Take a before and after time measurament to find the actual time spent sleeping, in order to avoid ever increasing inacurracies.
Sleep tight :)

ESX Server tuning – quick tour

July 27th, 2009
esx

esx

Our VMWare ESX server does us a great job.
Running on an IBM X3650 HW, with 24GB RAM and 2×4 cores, it can simultaneously run up to 25 virtual machines, each VM is configured with around ~1.5 GB of RAM.

After reaching  the 25 running VMs mark, we started noticing increasing sluggishness when additional VMs were turned on.

Of course, we did the trivial stuff of making sure that all screen savers are disabled, antivirus agents are not correlated to run at the same point in time, and making sure that all of the VMs are running the latest VMWare tools agent.
It was time to dig in deeper to find out where is the bottleneck we came across.

SLKNB_ASomeone told me that the stats that the reliability of the performance indicators that the graphic VI console shows is questionable and it’s recommended using the terminal utilities.So, I SHHed to the service console VM and ran the top utility. Immediately, I understood that what I’m actually doing is surveying the service console VM processes, rather than the overall ESX hypervisor activity. A quick dig up made me realize that the hypervisor is visible through the esxtop command, which is also executed from within the service console VM.

even for those of you that knows your way through the output of top and linux’s sysstat package, the data shown by esxtop is rather cryptic.
This great esxtop tutorial did me a great service with understanding the esxtop output.

I started more than 30 machines to reproduce the problem, and quickly went through the list of usual suspects: CPU, memory and IO:

  • CPU
    I’ve verified that it’s not a CPU problem since the “CPU load average” was around 0.2. and PCPU was much the same.
  • Memory
    Then I’ve switched to the memory display and verified that it’s not a physical memory issue. I saw the “high state” marker which was a good sign + there were almost 17GB ursvd (unreserved memory) in the VMKMEM/MB line.
    SWAP (~3GB) seemed OK.
    VMWare’s ballooning and memory sharing does miracles in broad day light.
  • I/O
    I didn’t see any queues forming. read/write rates seemed pretty low.

So, the 25 VMs performance limit will remain a mystery until I’ll have proper time to analyze it more throughly, or even better, I’ll find someone from IT to do that for me.

Extanding your troubleshooting facilities – Always on verbose GC

July 13th, 2009

Getting it right the first time

What happens when customers are experiencing problems with you application in production? The customer would send you the various logs artifacts and, ideally, you should be able to diagnose the problem and provide a resolution. If you find yourself sending the customer back and forth in an effort to gather additional types of log artifacts and system information, then you are, must likely, doing something wrong.

Who should be helping you

If you deploy your application on top of a application server platform, like Websphere Application Server (WAS) in my case, the platform should be assisting with automatic logs generation and collection. Our development team has been increasingly relying on such services provided by WAS, like: FFDC, WAS Collector, hung threads detection. All of which honorably earned their production stripes and badges.

garbage2One new serviceability artifact that I have long ago really wanted to have in production was the verbose GC, this feature records the JVM garbage collection activity over time, providing insight for resolving issues such as: stop-the-world performance freezes, memory leaks, native heap corruption, etc.

Until today, I was reluctant to enable the verbose GC in production, since I believed that there’s no way to direct the verbose GC output from the native stder (default) to a rotating dedicated file, not doing so might lead to files larger than 2GB (a problem on some file systems), or would cause the system to run out of disk space. I was assuming that the performance implications would be negligible, but still, you have to be extra prudent when it comes to live customers environments.

Taking out the garbageA trigger for action

Last week I had an issue with a WAS component, after opening a ticket with Websphere support, I was asked to reproduce the scenario in order to generate verbose GC output, I decided that enough is enough! I’m gonna look into the GC output file rollover issue again and see what solutions exist, what the community have to say about it, or whether there might be some other custom solution (with the Apache web server, for example, the file rolling is handled by an external process into which the log output is redirected, the process then does the rolling files management itself).

Following a quick search, I was happy to find that the IBM JVM offers a rolling over verbose GC. I quickly found additional hands on reports, Chris Bailey published verbose GC performance impact results that reassured my gut feeling about any performance impact being a non issue.

Here’s the syntax: (quoting the IBM Java 6 diagnostics guide):

-Xverbosegclog[:<file>[,<X>,<Y>]]
Causes -verbose:gc output to be written to the specified file. If the file cannot be found, -verbose:gc tries to create the file, and then continues as normal if it is successful. If it cannot create the file (for example, if an invalid filename is passed into the command), it redirects the output to stderr.
If you specify and the -verbose:gc output is redirected to X files, each containing Y GC cycles.

Final thoughts

  1. I don’t like having to specify the entire path for the file files, the default path should have been the server’s logs directory, or the CWD (CWD is the profile’s directory I believe).
  2. Rollover threshold parameter – I would rather be specifying it in units of max MBs instead of in units of the number of GC cycles entries. I’ve empirically found that 1MB of verbose GC log translates to ~700 GC cycle entries (YMMV).
  3. Good enough. I’ll start doing the preparations to put this into production.

A hand made freeware windows firewall

June 12th, 2009

I have two windows servers that shouldn’t talk to each other. How do I make sure they don’t?

Right, why not use some firewall? well, because I can’t just install any software on these servers, company regulations, and windows’ built-in firewall suck big time (only inbound, have to configure ALL exceptions).
On Linux this is quite a trivial IPTables command. Run the following on server#1:

iptables -I INPUT -s server#2 -j DROP
iptables -I OUTPUT -d server#2 -j DROP

Unfortunately there’s nothing like IPTables built into windows.
Driving inspired from the IPTables concept of routing the packets to the trashcan (“-j drop“), I realized that much same could be implemented on windows by twicking the OS routing table causing it to deliver packets for server#2 to no where.
Here’s my hand tailored, freeware, no software required, windows firewall that sends packets to a vacation in /dev/null:

route ADD 1.1.1.2 MASK 255.255.255.255 1.1.1.0

Where:
Server#1 IP is 1.1.1.1
Server#2 IP is 1.1.1.2
1.1.1.0 isn’t assigned to anyone – our /dev/null for the occasion.

Additional blabber:
If you add the route instruction only to server#1, but not to server#2, then server#2 can still send IP packets to server#1, while this breaks TCP completely, server#2 could still send UDP datagrams to server#1.
Make sure the servers are configured with static IP, otherwise your solution would break over time. In order to make the route persistent across server reboots, add the -p flag.

wrong way! Packet! turn back now!

wrong way! Packet! turn back now!