I stumbled across GChisto the other day when looking for some garbage collection analysis tools for Java. Anyone who’s tried tuning Java GC (especially on older JVM’s) knows that it’s about as much fun as watching paint dry. However, many times you can seriously improve performance by taking the time to analyze what your GC’s are doing. GChisto attempts to give you a birds-eye graphical view of your GC operations, and has a very handy comparison feature that allows you to compare the differences between GC logs. A picture is worth a thousand words, so here’s a teaser:
Installation of GChisto
These instructions are tailored for Ubuntu, but should work for any distro. As long as you can do CVS on Windows, these instructions apply there as well. Also note that this overview is focusing only on how GChisto works with GC logs generated by the CMS collector. It works with all types, but the CMS type is all that is covered here. To run GChisto, you’ll need cvs, Java 6, and Ant. While GChisto requires Java 6, it can read GC logs from Java 1.4.2, 5, and 6. On Ubuntu run the following to make sure everything’s installed:
sudo apt-get install cvs sun-java6-jdk ant
First, you need to sign up for a Java.net account. The project doesn’t have any releases, so you need to have an account so you can checkout CVS. Click here to register for an account. Once you’ve registered, we need to checkout the source code. We’ll install it into a directory just under our home directory:
cd ~ cvs -d :pserver:email@example.com:/cvs login cvs -z9 -d :pserver:firstname.lastname@example.org:/cvs checkout gchisto
Enable verbose GC logging in your java app.
This all depends on your particular application, but you need to add the following java command line options to your application startup:
-Xloggc:before-gc.out -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
This will create a logfile named before-gc.out in the directory from which you start your application that contains all of your GC information. For illustration purposes, let’s assume you make some GC tuning optimizations, and you want to compare before and after. Before starting the app after making your changes, change the before-gc.out to after-gc.out in the above command line options.
Now, we just need to run the appropriately named ant target ‘run’:
cd gchisto ant run
If that doesn’t work, try prepending the JAVA_HOME variable to ant like so:
JAVA_HOME=/path/to/java6/dir ant run
Once ant does it’s thing, you should be presented with the GChisto GUI. Let’s load our logs into it. You should be presented with the “Trace Management” tab, if not, select it. Now let’s load some traces (logs). Make sure that “HotSpot GC Log” is selected from the dropdown box at the lower left corner, and click the “Add” button. Browse to the location of before-gc.out, and load it. Repeat the exact same process for after-gc.out. When done, you should have a window that looks similar to this: Now, let’s start doing some digging around. The first place to visit is the “GC Pause Stats” tab. No pretty graphs right away, but once you understand the numbers being presented to you on the “All GC Stats” page, you will know what the graphs are telling you on the other tabs. First, since we are comparing two files from the same application, place a checkmark in the box “Comparison” in the lower left hand corner. Let’s explain what these numbers are. First, the rows are diveded into these sections:
- All: Totals for all types of GC’s listed below.
- Young GC: Metrics concerning the Eden/New/Young generation of heap.
- Full GC: Metrics concerning any Full GC’s. If you care about response time in your application, avoid these at all costs. For us, even one full GC is regarded as a tuning failure.
- Initial Mark: Metrics concerning the Initial Mark phase of the CMS collector on the Old/Tenured generation of heap. With CMS, this is a ‘stop-the-world’ phase, meaning the entire application is paused for the duration of this phase. In all the cases that I have worked with, this pause is usually very quick, usually faster than GC'ing Eden. Keep an eye on this, but there’s a good chance you won’t have to tune around this one.
Remark: Metrics concerning Remark phase of the CMS collector on the Old/Tenured generation of heap. With CMS, this is a stop-the-world collection phase, meaning that the entire application is stopped during this phase - you want this to be as quick as possible. For our application, it’s the primary focus of our tuning. Each section has three rows:
File : before-gc.out: Metrics from our first file.
- File : after-gc.out: Metrics from our second file.
[Empty]: Comparison statistics. This line shows you the difference between this trace and the first trace in the list. Finally, the breakdown on the columns is as follows:
Num: Total number of occurrences.
- Num (%): Gives you the percent of occurrences - i.e, Young GC’s are > 99% of all GC’s.
- Total GC (sec): Number of seconds total spent in this GC type for the life of the VM.
- Total GC (%): Percentage of time spent on this type of GC.
- Overhead(%): Thanks to the author, Tony Printezis for clearing this one up. Overhead percentage is the percentage of total runtime in seconds (as defined by the last line of the logfile) that is spent in ‘stop-the-world’ garbage collection. Obviously, the lower the better.
- Avg (ms): The average in milliseconds each GC of this type took. This is a very important metric.
- Sigma (ms): The standard deviation between the GC’s. Smaller is better.
- Min(ms): The fastest GC of this type in milliseconds.
- Max(ms): The longest GC of this type in milliseconds. To me, this is the most important metric. Now that you know what these numbers mean, go ahead and click around the various subtabs under the “GC Pause Stats” tab. My personal favorites are “Chart:Max (ms)” and “Chart: Avg (ms)”. Next, we’ll examine the “GC Pause Distribution” tab. Everything here is pretty well self explanatory. Instead of examining every type of GC at once, I usually deselect everything except for one type of GC, and compare those two. Your “Young GC” distrubution should normally be a very steep bell curve, like so: Personally, I don’t use the distribution tab much. I check it for anomalies, but that’s about it. Now, onto the “GC Timeline” tab. The “GC Timeline” tab has the same type of layout as the previous tab, but this one presents to you the frequency and length of each type of GC over the life of the VM. The one thing that I don’t like about this graph is that it doesn’t display a time and date along the X axis, it displays the number of seconds since VM startup. I’ve requested a feature that will convert the lifetime in seconds to a real date/time stamp if you optionally supply the VM startup time on trace file load. Hopefully the devs will implement that one. Anyways, the timeline is very good at showing you if your GC’s are getting longer or more frequent as time goes on. Here’s what the Young GC timeline looks like for me: Now, most of the charting in GChisto is provided by JFreeChart, so you get quite a bit of built in functionality here. The best feature is the ability to drill down on any chart. Simply left click and drag to form a rectangle on any existing chart. When you release the mouse, the chart will zoom into the subset you just gave it. To zoom back out, right click the chart, select “Auto Range->Both Axes”. While you’re at it, dig around the right-click menu. In the properties, you can change colors, axis titles, about anything else you can think of. Another very handy feature is the ability to save any chart as a PNG file. This helps when showing the metrics to people that don’t have GChisto installed and available. I hope this introduction gives you some insight into the power of GChisto. It is definitely going to become permanent in my toolbox of GC utilities. VisualGC is still my favorite real-time GC monitoring solution, but GChisto is now my favorite historical reporting tool, taking the place of PrintGCStats.