Posts tagged out of memory
While running product sizing tests, we’ve found that an over enthusiastic usage of ConcurrentHashMap (CHM) had evaporated a good ~170MB of much needed heap space (we ran with a 1.5GB heap).
As it turns out, a empty CHM weighs around 1700B. Yes, I’m talking about a map with no entries at all, just the plumbing!
We used a CHM to store user session attributes, having 100,000 user sessions generated 100K CHM instances worth 170MB of heap (100K times 1.7KB).
We took measurements using the super Eclipse MAT.
The obvious solution for saving these scares 170MB, was to switch from a CHM to a Hashtable. A Hashtable cost only around 150B per instance (8% of a CHM).
Other possible solutions could have been: moving to a list structure (seek time is not an issue as we rarely have more than 4-5 attributes per session), or resorting to a an array of Objects.
1. Performance - The product doesn’t have any user scenario that cause multiple threads to concurrently access the same session attributes map, so we don’t expect any performance loss, on the contrary, I’m expecting a hashtable to prove faster for single thread access, over a CHM.
2. Thread safety is a low risk aspect, as both CHM and HT provide the same basic guarantees for a single API operation (e.g., map.get(key)).
To conclude, a CHM is a good idea when you have a shared map structure suffering from a high R/W thread access contention. But dragging behind itself such a large memory footprint, CHM is not ideal to use in masses, or when concurrency performance is not the focus.
A CHM automatically allocates 16 segments, each with a 16-element array – one best practice is to measure the average map population during your product’s sizing tests, and initialize the CHM with the minimum initialCapcity and loadFactor, required to contain your usage.
Getting it right the first time
What happens when customers are experiencing problems with you application in production? The customer would send you the various logs artifacts and, ideally, you should be able to diagnose the problem and provide a resolution. If you find yourself sending the customer back and forth in an effort to gather additional types of log artifacts and system information, then you are, must likely, doing something wrong.
Who should be helping you
If you deploy your application on top of a application server platform, like Websphere Application Server (WAS) in my case, the platform should be assisting with automatic logs generation and collection. Our development team has been increasingly relying on such services provided by WAS, like: FFDC, WAS Collector, hung threads detection. All of which honorably earned their production stripes and badges.
One new serviceability artifact that I have long ago really wanted to have in production was the verbose GC, this feature records the JVM garbage collection activity over time, providing insight for resolving issues such as: stop-the-world performance freezes, memory leaks, native heap corruption, etc.
Until today, I was reluctant to enable the verbose GC in production, since I believed that there’s no way to direct the verbose GC output from the native stder (default) to a rotating dedicated file, not doing so might lead to files larger than 2GB (a problem on some file systems), or would cause the system to run out of disk space. I was assuming that the performance implications would be negligible, but still, you have to be extra prudent when it comes to live customers environments.
A trigger for action
Last week I had an issue with a WAS component, after opening a ticket with Websphere support, I was asked to reproduce the scenario in order to generate verbose GC output, I decided that enough is enough! I’m gonna look into the GC output file rollover issue again and see what solutions exist, what the community have to say about it, or whether there might be some other custom solution (with the Apache web server, for example, the file rolling is handled by an external process into which the log output is redirected, the process then does the rolling files management itself).
Following a quick search, I was happy to find that the IBM JVM offers a rolling over verbose GC. I quickly found additional hands on reports, Chris Bailey published verbose GC performance impact results that reassured my gut feeling about any performance impact being a non issue.
Here’s the syntax: (quoting the IBM Java 6 diagnostics guide):
Causes -verbose:gc output to be written to the specified file. If the file cannot be found, -verbose:gc tries to create the file, and then continues as normal if it is successful. If it cannot create the file (for example, if an invalid filename is passed into the command), it redirects the output to stderr.
If you specify and the -verbose:gc output is redirected to X files, each containing Y GC cycles.
- I don’t like having to specify the entire path for the file files, the default path should have been the server’s logs directory, or the CWD (CWD is the profile’s directory I believe).
- Rollover threshold parameter – I would rather be specifying it in units of max MBs instead of in units of the number of GC cycles entries. I’ve empirically found that 1MB of verbose GC log translates to ~700 GC cycle entries (YMMV).
- Good enough. I’ll start doing the preparations to put this into production.
Disclaimer: Now I know that this is an old idiom, I’m just presenting my own real life incident taken straight away from the bloody Java trenches.
Exceptions can be threads assassins
when running on top of Websphere thread pool, any Runtime exception that isn’t caught by the applicative code, will bubble up in the stack, ending up killing the specific thread. WAS helps here, by automatically creating a new thread that will take the place of the murdered one, but still, killing and immediately creating a thread is everything but the thread pool rational.
Hiring a thread bodyguard
A simple way to avoid thread death is wrapping the first applicative layer (e.g., Run() method) with a try block that catches and swallows any Exception that’s thrown from anywhere in the application code.
Our project’s code also used this concept, but instead of catch (Exception e), it had a catch (Throwable t), When I noticed that I didn’t rushed to fix it, just in case someone before me had done funky stuff with dynamic class loading that might throw ClassNotFoundError (although this should be caught at a very localized resolution), or maybe it’s there for some other historical reason that not being one the code’s forefathers I’m just not aware of. In any case, I did promise myself that I’ll revisit this piece of code in the future.
Getting some bulls to do correct things
today I finally got the excuse I needed in order to change the catch Throwable in a catch Exception:
We were running stress tests, when the server had an OOME (out of memory error). Since the catch Throwable caught and swallowed the OOME (as OOME is a subclass of Error which is a subclass of Throwable), the thread that generated the OMME kept on living, instead of dieing right there, and so, the JVM continued running, crippled and limping, instead of turning to an honorable solution like hara-kiri. Choosing the quick death route would have been rewarded with a quick resurrection to be provided by the gracious NodeAgent and its watchdog mechanism, and the end result would have been a newly born healthy server ready to get back in business. A retreat in order to attack, you might put it.
Instead, the server had to limp for long minutes, suffering from a series of consecutive strokes (OOME), until the OOME was so bad that the JVM just had to exit.
The Catch Throwable was causing down time, by preventing an imminent restart of the JVM due to an OOME.
- I know that an uncaught exception kills only the specific thread does the JVM treats an error differently? Put other words, if the OOME is not caught, will the entire JVM die or only the specific thread? I assume that the answer is the entire JVM, maybe this is implemented by the JVM itself, or maybe it’s implemented somewhere in the WAS bedrock. If for some reason it’s not the case, one could catch an Error and then execute System.exit(1); in order to hasten the process imminent death.