It's All Relative

Performance Minded

15 Nov 2012

20 mins read

The useful JVM options

This post relates to JDK6 at the time of writing

HotSpot JVM has a lot of options available. Maybe too much. Sometimes we are looking for the right options, or the “magic” one that can give you a serious boost on your application. Unfortunately, I think it does not exist ! However some can help you for monitoring your application or for tuning some parts.

To find the complete list of options you will find in the globals.hpp file from OpenJDK sources. VM Options Explorer however can help you to navigate through the list.

I have summed up here, in my humble opinion, some of the most useful JVM options.

Heap sizing

You perfectly know -Xms & -Xms options (that can also abbreviate by -ms -mx), but some parts of the Java heap and non-heap can also be sized:

n without unit is expressed in bytes, you can also used k, K, m, M, g & G to expressed respectively kilobytes, megabytes & gigabytes. If NewSize < MaxNewSize, young generation size can be adjust during application life. However resizing does require a FullGC. To avoid this, set the same value for both options.

n without unit is expressed in bytes, you can also used k, K, m, M, g & G to expressed respectively kilobytes, megabytes & gigabytes. If PermSize < MaxPermSize, permanent generation size can be adjust during application life. However resizing does require a FullGC. To avoid this, set the same value for both options.

n without unit is expressed in bytes, you can also used k, K, m, M, g & G to expressed respectively kilobytes, megabytes & gigabytes. Code Cache stores the JITed code. This is an off-heap space, so GC does not reclaim it. If you reached the limit of the ReservedCodeCacheSize, the JIT compiler stops to compile more methods since it cannot store them. So if you have a lot of classes/methods that are needed to be compiled, be aware of these options. When you reach the limit, a warning is emitted on the standard output:

Java HotSpot(TM) Server VM warning: CodeCache is full. Compiler has been disabled"

with -XX:+PrintCompilation you will get also:

7383 COMPILE SKIPPED: code cache is full

GC Settings/Tuning

For configuring GC, there is tons of options, specially for the CMS! But at the end few are really usefull. I will not enter into fine tuning of CMS.

GC Monitoring

For an application, it is absolutely required to log GC activity in a separate file in order to diagnostic any abnormal behavior of memory management. Also note that activating this info has virtually no impact on your application, so there is no excuse to not activate these options.

With only this option you will get the following content:

8.567: [GC 16448K->168K(62848K), 0.0014256 secs]
16.979: [GC 16616K->152K(62848K), 0.0007727 secs]
25.157: [GC 16600K->152K(62848K), 0.0007124 secs]
33.881: [GC 16600K->136K(62848K), 0.0003721 secs]
43.032: [GC 16584K->152K(59968K), 0.0004123 secs]
50.795: [GC 16216K->152K(59584K), 0.0009683 secs]
58.584: [GC 15832K->148K(59136K), 0.0008669 secs]
66.430: [GC 15508K->148K(58816K), 0.0003436 secs]
73.766: [GC 15188K->148K(58688K), 0.0002917 secs]
81.082: [GC 14868K->148K(58176K), 0.0003157 secs]
88.851: [GC 14548K->148K(58048K), 0.0003148 secs]
96.379: [GC 14228K->148K(57536K), 0.0003129 secs]
101.618: [GC 13908K->704K(57472K), 0.0032532 secs]
102.684: [GC 3206K->744K(57280K), 0.0034611 secs]
102.687: [Full GC 744K->723K(57280K), 0.0525762 secs]
109.313: [GC 14227K->755K(57344K), 0.0005165 secs]
115.905: [GC 14003K->779K(56768K), 0.0006688 secs] 

With this option you will get the following content:

9.053: [GC [PSYoungGen: 16448K->168K(19136K)] 16448K->168K(62848K), 0.0014471 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
16.991: [GC [PSYoungGen: 16616K->152K(19136K)] 16616K->152K(62848K), 0.0008244 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
24.965: [GC [PSYoungGen: 16600K->152K(19136K)] 16600K->152K(62848K), 0.0006850 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
32.915: [GC [PSYoungGen: 16600K->152K(19136K)] 16600K->152K(62848K), 0.0007040 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
40.857: [GC [PSYoungGen: 16600K->152K(16256K)] 16600K->152K(59968K), 0.0007618 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
48.653: [GC [PSYoungGen: 16216K->152K(15872K)] 16216K->152K(59584K), 0.0007632 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
53.896: [GC [PSYoungGen: 15832K->192K(15552K)] 15832K->620K(59264K), 0.0131835 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
55.252: [GC [PSYoungGen: 3888K->64K(15424K)] 4317K->716K(59136K), 0.0025925 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
55.255: [Full GC (System) [PSYoungGen: 64K->0K(15424K)] [PSOldGen: 652K->715K(43712K)] 716K->715K(59136K) [PSPermGen: 5319K->5319K(16384K)], 0.0269456 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
62.544: [GC [PSYoungGen: 15360K->32K(15488K)] 16075K->747K(59200K), 0.0006252 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
69.502: [GC [PSYoungGen: 15072K->32K(14784K)] 15787K->747K(58496K), 0.0003713 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 

As you can see there is a lot more and valuable info. You can now know the type of GC configured for Young & Old (PSYoungGen & PSOlgGen = ParallelScavenge, ParallelGC), Full GC is triggered by an explicit call to System.GC() and you have the detail of memory usage of young gen & old gen so you can deduce promotion between the two. Also with CMS you have all the CMS phases that are logged with Initial Mark pause time & Remark pause time. This is a MUST HAVE option: No JVM should start without this option.

With this option you will get the following content:

2012-11-08T22:31:12.310+0100: 8.250: [GC [PSYoungGen: 16448K->168K(19136K)] 16448K->168K(62848K), 0.0007445 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2012-11-08T22:31:20.526+0100: 16.466: [GC [PSYoungGen: 16616K->136K(19136K)] 16616K->136K(62848K), 0.0008255 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2012-11-08T22:31:28.503+0100: 24.444: [GC [PSYoungGen: 16584K->152K(19136K)] 16584K->152K(62848K), 0.0007196 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
2012-11-08T22:31:36.497+0100: 32.439: [GC [PSYoungGen: 16600K->152K(19136K)] 16600K->152K(62848K), 0.0007185 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2012-11-08T22:31:44.565+0100: 40.507: [GC [PSYoungGen: 16600K->152K(16256K)] 16600K->152K(59968K), 0.0007928 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2012-11-08T22:31:52.694+0100: 48.637: [GC [PSYoungGen: 16216K->152K(15872K)] 16216K->152K(59584K), 0.0007599 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2012-11-08T22:31:58.204+0100: 54.147: [GC [PSYoungGen: 15832K->192K(15552K)] 15832K->620K(59264K), 0.0072012 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2012-11-08T22:31:59.999+0100: 55.942: [GC [PSYoungGen: 4932K->64K(15424K)] 5360K->716K(59136K), 0.0015449 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2012-11-08T22:32:00.000+0100: 55.944: [Full GC (System) [PSYoungGen: 64K->0K(15424K)] [PSOldGen: 652K->716K(43712K)] 716K->716K(59136K) [PSPermGen: 5311K->5311K(16384K)], 0.0273445 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
2012-11-08T22:32:07.453+0100: 63.397: [GC [PSYoungGen: 15360K->32K(15488K)] 16076K->748K(59200K), 0.0003439 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2012-11-08T22:32:14.695+0100: 70.640: [GC [PSYoungGen: 15072K->32K(14784K)] 15788K->748K(58496K), 0.0007333 secs]

To remove elapsed time since JVM’s start you can use -XX:-PrintGCTimeStamps which is activated by default with -Xloggc

With this option you will get the following content:

8.494: [GC8.496: [SoftReference, 0 refs, 0.0000084 secs]8.496: [WeakReference, 3 refs, 0.0000061 secs]8.496: [FinalReference, 6 refs, 0.0000117 secs]8.496: [PhantomReference, 0 refs, 0.0000042 secs]8.496: [JNI Weak Reference, 0.0000034 secs] [PSYoungGen: 16448K->168K(19136K)] 16448K->168K(62848K), 0.0017036 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
16.874: [GC16.874: [SoftReference, 0 refs, 0.0000050 secs]16.874: [WeakReference, 1 refs, 0.0000034 secs]16.874: [FinalReference, 4 refs, 0.0000036 secs]16.874: [PhantomReference, 0 refs, 0.0000028 secs]16.874: [JNI Weak Reference, 0.0000028 secs] [PSYoungGen: 16616K->152K(19136K)] 16616K->152K(62848K), 0.0005199 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
24.889: [GC24.890: [SoftReference, 0 refs, 0.0000078 secs]24.890: [WeakReference, 1 refs, 0.0000053 secs]24.890: [FinalReference, 4 refs, 0.0000056 secs]24.890: [PhantomReference, 0 refs, 0.0000039 secs]24.890: [JNI Weak Reference, 0.0000036 secs] [PSYoungGen: 16600K->152K(19136K)] 16600K->152K(62848K), 0.0008915 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
32.829: [GC32.830: [SoftReference, 0 refs, 0.0000084 secs]32.830: [WeakReference, 1 refs, 0.0000050 secs]32.830: [FinalReference, 4 refs, 0.0000056 secs]32.830: [PhantomReference, 0 refs, 0.0000045 secs]32.830: [JNI Weak Reference, 0.0000045 secs] [PSYoungGen: 16600K->152K(19136K)] 16600K->152K(62848K), 0.0009845 secs]

Helps to troubleshoot issues with special references like Weak, Soft, Phantom & JNI.

With this option you will get the following content:

8.654: [GC 8.655: [ParNew
Desired survivor size 1114112 bytes, new threshold 15 (max 15)
- age   1:     187568 bytes,     187568 total
: 17472K->190K(19648K), 0.0022117 secs] 17472K->190K(63360K), 0.0023006 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
17.087: [GC 17.087: [ParNew
Desired survivor size 1114112 bytes, new threshold 15 (max 15)
- age   1:      80752 bytes,      80752 total
- age   2:     166752 bytes,     247504 total
: 17662K->286K(19648K), 0.0036440 secs] 17662K->286K(63360K), 0.0037619 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
25.491: [GC 25.491: [ParNew
Desired survivor size 1114112 bytes, new threshold 15 (max 15)
- age   1:     104008 bytes,     104008 total
- age   2:      34256 bytes,     138264 total
- age   3:     166752 bytes,     305016 total
: 17758K->321K(19648K), 0.0035331 secs] 17758K->321K(63360K), 0.0036424 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
34.745: [GC 34.746: [ParNew
Desired survivor size 1114112 bytes, new threshold 15 (max 15)
- age   1:      34400 bytes,      34400 total
- age   2:     104008 bytes,     138408 total
- age   3:      34256 bytes,     172664 total
- age   4:     166752 bytes,     339416 total
: 17793K->444K(19648K), 0.0040891 secs] 17793K->444K(63360K), 0.0042000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
43.215: [GC 43.215: [ParNew
Desired survivor size 1114112 bytes, new threshold 15 (max 15)
- age   1:     138808 bytes,     138808 total
- age   2:      34400 bytes,     173208 total
- age   3:      34264 bytes,     207472 total
- age   4:      34256 bytes,     241728 total
- age   5:     166752 bytes,     408480 total
: 17916K->586K(19648K), 0.0048752 secs] 17916K->586K(63360K), 0.0049889 secs]

This is the age distribution of the object related to the number of time they are survived from a minor GC (contained in Survivor spaces)

With this option you will get the following content when application terminates on stdout:

         vmop                      [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
4.162: EnableBiasedLocking         [       8          0              0]      [0     0     0     0     0]  0   
8.466: ParallelGCFailedAllocation  [       8          0              0]      [0     0     0     0     2]  0   
23.927: ParallelGCFailedAllocation [       8          0              0]      [0     0     0     0     2]  0   
31.039: RevokeBias                 [       9          0              1]      [0     0     0     0     0]  0   
31.377: RevokeBias                 [      10          0              1]      [0     0     0     0     0]  0   
31.388: RevokeBias                 [      11          1              1]      [2     0     2     0     0]  0   
31.390: ParallelGCFailedAllocation [      11          0              0]      [0     0     0     0     4]  0   
31.452: RevokeBias                 [      12          0              1]      [0     0     0     0     0]  0   
31.487: BulkRevokeBias             [      12          0              1]      [0     0     0     0     0]  0   
31.515: RevokeBias                 [      12          0              1]      [0     0     0     0     0]  0   
31.550: BulkRevokeBias             [      12          0              0]      [0     0     0     0     0]  0   
32.572: ThreadDump                 [      12          0              1]      [0     0     0     0     0]  0   
38.572: ThreadDump                 [      13          0              2]      [0     0     0     0     0]  0   
39.042: ParallelGCFailedAllocation [      13          0              0]      [0     0     0     0     6]  0   
39.575: ThreadDump                 [      13          0              1]      [0     0     0     0     0]  0   
39.775: ParallelGCSystemGC         [      13          0              0]      [0     0     0     0    35]  0   
40.574: ThreadDump                 [      13          0              1]      [0     0     0     0     0]  0   
45.519: HeapDumper                 [      13          0              0]      [0     0     0     0    93]  0   
45.626: ThreadDump                 [      13          0              1]      [0     0     0     0     0]  0   
50.575: no vm operation            [      13          0              0]      [0     0     0     0     0]  0   
50.576: ThreadDump                 [      13          0              1]      [0     0     0     0     0]  0   
50.875: ThreadDump                 [      13          0              1]      [0     0     0     0     0]  0   
50.881: GC_HeapInspection          [      13          0              1]      [0     0     0     0     9]  0   
51.577: ThreadDump                 [      13          0              0]      [0     0     0     0     0]  0   
82.281: ParallelGCFailedAllocation [      13          0              0]      [0     0     0     0     4]  0   
82.578: ThreadDump                 [      13          0              1]      [0     0     0     0     0]  0   
88.578: ThreadDump                 [      13          0              0]      [0     0     0     0     0]  0   
88.999: ParallelGCFailedAllocation [      13          0              0]      [0     0     0     0     4]  0   
94.580: ThreadDump                 [      13          0              0]      [0     0     0     0     0]  0   
95.439: ParallelGCFailedAllocation [      13          0              0]      [0     0     0     0     2]  0   
95.579: ThreadDump                 [      13          0              0]      [0     0     0     0     0]  0   
101.579: ThreadDump                [      13          0              0]      [0     0     0     0     0]  0   
102.276: ParallelGCFailedAllocation[      13          0              0]      [0     0     0     0     5]  0   
102.579: ThreadDump                [      13          0              0]      [0     0     0     0     0]  0   
133.776: no vm operation           [      12          0              1]      [0     0     0     0   332]  0   

Polling page always armed
ThreadDump                       103
HeapDumper                         1
GC_HeapInspection                  1
ParallelGCFailedAllocation        16
ParallelGCSystemGC                 1
EnableBiasedLocking                1
RevokeBias                        57
BulkRevokeBias                     2
    0 VM operations coalesced during safepoint
Maximum sync time      2 ms
Maximum vm operation time (except for Exit VM operation)     93 ms

These stats indicate which vm operation was performed with time taken. It means that during vm operation all threads are stopped (like for STW GC)

Monitoring

Some options are usefull for monitoring the JVM behavior and that are not related to GC.

With this option you will get the following content on stdout:

  1       java.lang.Object::<init> (1 bytes)
---   n   java.lang.System::currentTimeMillis (static)
  2       java.util.ArrayList::add (29 bytes)
  3       java.util.ArrayList::ensureCapacity (58 bytes)
---   n   java.lang.Thread::sleep (static)
  1%      com.bempel.sandbox.TestJIT::allocate @ 7 (84 bytes)
  4       com.bempel.sandbox.TestJIT::call (10 bytes)
  5       java.util.ArrayList::contains (14 bytes)
  6       java.util.ArrayList::indexOf (67 bytes)
  4   made not entrant  (2)  com.bempel.sandbox.TestJIT::call (10 bytes)
  7       com.bempel.sandbox.TestJIT::call (10 bytes)
  8       java.util.concurrent.CopyOnWriteArrayList::contains (22 bytes)
  9       java.util.concurrent.CopyOnWriteArrayList::indexOf (63 bytes)
  2%      com.bempel.sandbox.TestJIT::main @ 60 (83 bytes)

``` Can be usefull to understand why some method are deoptimized.

  • -XX:+LogVMOutput (requires -XX:+UnlockDiagnosticVMOptions) Logs all the vm output (like PrintCompilation to the default hotspot.log file in current directory.

  • -XX:LogFile=file (requires -XX:+UnlockDiagnosticVMOptions) Specifies the path and name for hotpot.log file
  • -XX:+PrintAssembly (requires -XX:+UnlockDiagnosticVMOptions) Prints disassembly of method compiled by JIT compiler. See my post.
  • -XX:ErrorFile=file Specifies where the hs_err_<pid>.log file will be generated in case of a JVM crash.
  • -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1234 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false Activates JMX remote connections to be able to connect with VisualVM.

JIT Compiler

These options control the JIT compiler behavior:

  • -XX:CICompilerCount=n Specifies number of JIT compiler threads used to compile methods
  • -XX:CompileThreshold=n Specifies number of call a method is considered as a HotSpot and JIT compiler compiles it.
  • -Xcomp Compiles method first time they are executed. (CompileThreshold=1)
  • -Xbatch Compiles methods in foreground, no execution is stopped until compilation is done.
  • -Xint Interpreted mode only. JIT compiler deactivated. (CompileThreshold=0)

Optimizations

Some options enables or disables certain kind of JIT optimizations

  • -XX:+UseCompressedStrings Handles String objects with byte array instead of char array. Can reduce the size of consumed Heap. Beware of side effects! Disabled by default.
  • -XX:+UseBiasedLocking Enables Biased Locking. Allow to execute code including synchronized block with virtually no overhead if there is no contention on it (only one thread execution the block). However if another thread revoke the Bias, this can be costly (Stop-The-World/safepoint during revoking). Enabled by default. -XX:+TraceBiasedLocking Gives extra information about BiasedLocking optimization. Disabled by default
  • -XX:+PrintBiasedLockingStatistics Gives extra information about BiasedLocking optimization. Disabled by default
  • -XX:+Inline Enables call inlining. Could be disabled for inspecting disassembly and virtual call. Enabled by default.
  • -XX:+AggressiveOpts Enables some new optimizations that are not enabled by default. If optimizations are good they will be enabled in a next release of JDK.