It's All Relative

Performance Minded

30 Dec 2015

12 mins read

PrintAssembly output explained!

If you are a regular reader of my blog, you may have noticed that I am (ab)using (of) the PrintAssembly options from the JVM to examine the code generated by the JIT compiler. It helps me a lot to understand how my code is executed, and how the JIT compiler works and optimize the code. Even if from time to time I use also JMH, I am not a big fan of benchmarking and especially micro-benchmarking.

Why? Because micro-benchmarking is an idealization of how the production code is executed: tight loops, all data into the L1 cache and hot, few branch misses, best case for aggressive JIT optimizations (like monomorphic calls, etc.)

The thing is, the execution context in production is totally different from micro-benchmarks, so what’s the point in exercising a code that will not be executed in the same condition? Are the conclusions that I can draw from the micro-benchmark still valid or still beneficial for my production cases? All of this, push me away from micro-benchmark as much as possible and try to find another way to evaluate the performance like performance counters inserted directly into the application or reading the assembly generated by the JIT compiler. Note also that it is not perfect either as nowadays CPU are out-of-order in execution and also perform Instruction Level Parallelism. So benchmarking in some situations are the only way to assess performance.

Printing assembly helps me also to back assertions about how JIT optimizes instead of relying on some folklore and urban legends (reordering of instructions, memory barriers, …).

With all of that, PrintAssembly is one of my favorite tools. But I can understand the output of this may be difficult to read. Nowadays, not all developers are familiar with assembly, unfortunately, but with some basic knowledge and with the help of comments inserted, it can be less cryptic.

For those who have never used PrintAssembly please refer to my previous posts about it: How to print dissassembly from JIT code and How to build hsdis-amd64.dll. Chris Newland, creator of JITWatch tool, has also some useful tips for Mac OS X. Nitsan Wakart wrote an article on this.

Your setup is done? Perfect let’s read some assembly, yeah!

Assembly 101

First of all, I am using intel syntax, not AT&T one. I am used to this syntax, and because we are talking about x86 instruction set made by Intel let’s stick to their convention. Reminder: To get this syntax with the disassembler plugin, use the JVM option: -XX:PrintAssemblyOptions=intel

Instruction lines are decomposed as the following:

mnemonic parameters

mnemonic is the instruction name (mov, call, add, jmp, cmp, …) parameters can be register, memory accesses, immediate values Examples:

mov rax, 0x2A
mov rdx, QWORD PTR [rbx+0x571c418]

mov instruction is a data movement. The first line move the constant value 0x2A into the register rax. The second line, move the memory content at the address computed from the value of regiser rbx and the constant value 0x571c418 into the register rdx. Note that order is reversed for AT&T syntax.

push/pop instructions move data to/from the stack add/sub/imul/idiv instructions perform addition/subtraction/multiplication/division on integers inc/dec instructions increment/decrement value in registers or memory and/or/xor/not/shl/shr instructions perform bitwise operations jmp instruction performs a unconditional jump to the specified address jxx instructions perform a conditional jump based on the result of the related last operation cmp instruction performs a comparison between 2 operands call/ret instruction perform call to /return from a subroutine

For more information see this guide for example or the official Intel documentation.

Disassembler comments

Hopefully, disassembler plugin does not spit raw instructions but annotate them with useful information. Let’s take an example with the method ArrayList.add(Object) and analyze it:

        public boolean add(E e) {
L458        ensureCapacityInternal(size + 1);  // Increments modCount!!
L459        elementData[size++] = e;
            return true;
        }
  # {method} {0x000000000ac640f8} 'add' '(Ljava/lang/Object;)Z' in 'java/util/ArrayList'
  # this:     rdx:rdx   = 'java/util/ArrayList'
  # parm0:    r8:r8     = 'java/lang/Object'
  #           [sp+0x40]  (sp of caller)
  0x0000000002d2a760: mov    r10d,DWORD PTR [rdx+0x8]
  0x0000000002d2a764: shl    r10,0x3
  0x0000000002d2a768: cmp    rax,r10
  0x0000000002d2a76b: jne    0x0000000002cf5f60  ;   {runtime_call}
  0x0000000002d2a771: data32 xchg ax,ax
  0x0000000002d2a774: nop    DWORD PTR [rax+rax*1+0x0]
  0x0000000002d2a77c: data32 data32 xchg ax,ax
[Verified Entry Point]
  0x0000000002d2a780: mov    DWORD PTR [rsp-0x6000],eax
  0x0000000002d2a787: push   rbp
  0x0000000002d2a788: sub    rsp,0x30           ;*synchronization entry
                                                ; - java.util.ArrayList::add@-1 (line 458)

  0x0000000002d2a78c: mov    QWORD PTR [rsp+0x8],r8
  0x0000000002d2a791: mov    rbp,rdx
  0x0000000002d2a794: mov    r9d,DWORD PTR [rdx+0x10]  ;*getfield size
                                                ; - java.util.ArrayList::add@2 (line 458)

  0x0000000002d2a798: mov    r11d,DWORD PTR [rdx+0x14]
                                                ;*getfield elementData
                                                ; - java.util.ArrayList::ensureCapacityInternal@1 (line 223)
                                                ; - java.util.ArrayList::add@7 (line 458)

  0x0000000002d2a79c: mov    r8d,r9d
  0x0000000002d2a79f: inc    r8d                ;*iadd
                                                ; - java.util.ArrayList::add@6 (line 458)

  0x0000000002d2a7a2: cmp    r11d,0xd5d0e088    ;   {oop(a 'java/lang/Object'[0] )}
  0x0000000002d2a7a9: je     0x0000000002d2a861  ;*if_acmpne
                                                ; - java.util.ArrayList::ensureCapacityInternal@7 (line 223)
                                                ; - java.util.ArrayList::add@7 (line 458)

  0x0000000002d2a7af: inc    DWORD PTR [rdx+0xc]  ;*putfield modCount
                                                ; - java.util.ArrayList::ensureExplicitCapacity@7 (line 231)
                                                ; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
                                                ; - java.util.ArrayList::add@7 (line 458)

  0x0000000002d2a7b2: mov    r10d,DWORD PTR [r11+0xc]  ; implicit exception: dispatches to 0x0000000002d2a881
  0x0000000002d2a7b6: mov    ebx,r9d
  0x0000000002d2a7b9: sub    ebx,r10d
  0x0000000002d2a7bc: inc    ebx
  0x0000000002d2a7be: test   ebx,ebx
  0x0000000002d2a7c0: jg     0x0000000002d2a80e  ;*ifle
                                                ; - java.util.ArrayList::ensureExplicitCapacity@17 (line 234)
                                                ; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
                                                ; - java.util.ArrayList::add@7 (line 458)

  0x0000000002d2a7c2: mov    DWORD PTR [rdx+0x10],r8d  ;*return
                                                ; - java.util.ArrayList::ensureExplicitCapacity@25 (line 236)
                                                ; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
                                                ; - java.util.ArrayList::add@7 (line 458)

  0x0000000002d2a7c6: mov    r10d,DWORD PTR [r11+0xc]
  0x0000000002d2a7ca: cmp    r9d,r10d
  0x0000000002d2a7cd: jae    0x0000000002d2a839
  0x0000000002d2a7cf: mov    r10d,DWORD PTR [r11+0x8]
  0x0000000002d2a7d3: cmp    r10d,0x200022ee    ;   {metadata('java/lang/Object'[])}
  0x0000000002d2a7da: jne    0x0000000002d2a84d  ;*aastore
                                                ; - java.util.ArrayList::add@26 (line 459)

  0x0000000002d2a7dc: mov    r10,r11            ;*getfield elementData
                                                ; - java.util.ArrayList::add@11 (line 459)

  0x0000000002d2a7df: lea    r10,[r11+r9*4+0x10]
  0x0000000002d2a7e4: mov    r11,QWORD PTR [rsp+0x8]
  0x0000000002d2a7e9: mov    r8,r11
  0x0000000002d2a7ec: mov    DWORD PTR [r10],r8d
  0x0000000002d2a7ef: shr    r10,0x9
  0x0000000002d2a7f3: mov    eax,0x1
  0x0000000002d2a7f8: mov    r11d,0x5965000
  0x0000000002d2a7fe: mov    BYTE PTR [r11+r10*1],r12b
                                                ;*synchronization entry
                                                ; - java.util.ArrayList::add@-1 (line 458)

  0x0000000002d2a802: add    rsp,0x30
  0x0000000002d2a806: pop    rbp
  0x0000000002d2a807: test   DWORD PTR [rip+0xfffffffffe5857f3],eax        # 0x00000000012b0000
                                                ;   {poll_return}
  0x0000000002d2a80d: ret                       ;*synchronization entry
                                                ; - java.util.ArrayList::ensureExplicitCapacity@-1 (line 231)
                                                ; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
                                                ; - java.util.ArrayList::add@7 (line 458)

In the header we can find the following information:

# {method} {0x000000000ac640f8} 'add' '(Ljava/lang/Object;)Z' in 'java/util/ArrayList'
# this:     rdx:rdx   = 'java/util/ArrayList'
# parm0:    r8:r8     = 'java/lang/Object'
#           [sp+0x40]  (sp of caller)

The first line is the name of the method disassembled: 'add' with its signature: one parameter of type Object returning a boolean from the class java.util.ArrayList But as this is a instance method there is in fact 2 parameters as mentioned in the rest of the header: Parameter this which is stored in register rdx, and the Object parameter in register r8.

Verified Entry Point

  0x0000000002d2a774: nop    DWORD PTR [rax+rax*1+0x0]
  0x0000000002d2a77c: data32 data32 xchg ax,ax
[Verified Entry Point]

After the header the first instructions of the methods begins after the [Verified entry point] section. Assembly before this mark is here for alignment (padding). Starting from this section, we will look at comments that are after the semi-colon. Comments that are starting with the star (*) indicates the associated byte code.

Synchronization Entry

0x0000000002d2a780: mov    DWORD PTR [rsp-0x6000],eax
0x0000000002d2a787: push   rbp
0x0000000002d2a788: sub    rsp,0x30           ;*synchronization entry
                                              ; - java.util.ArrayList::add@-1 (line 458)

The following comment: ; - java.util.ArrayList::add@-1 (line 458) gives us the mapping to the Java code: class name, method name and bytecode offset into the method, and finally the line number into the original Java source file. For this prologue, as we do not have a specific bytecode associated we’ve got the -1 offset. For the first one: ;*synchronization entry, it indicates the prologue of the function: some instructions that are necessary to prepare the execution (stack allocation or stack banging, saving some registers, …)

Get size field

0x0000000002d2a78c: mov    QWORD PTR [rsp+0x8],r8
0x0000000002d2a791: mov    rbp,rdx
0x0000000002d2a794: mov    r9d,DWORD PTR [rdx+0x10]  ;*getfield size
                                              ; - java.util.ArrayList::add@2 (line 458)

Next comment retrieves the field named from the current instance (ArrayList). It is translated to the following assembly line: mov r9d,DWORD PTR [rdx+0x10] It moves into r9 register the content of the address rdx (this instance, cf method parameter) + 0x10 offset where the size field is located.

Get elementData field

0x0000000002d2a798: mov    r11d,DWORD PTR [rdx+0x14]
                                  ;*getfield elementData
                                  ; - java.util.ArrayList::ensureCapacityInternal@1 (line 223)
                                  ; - java.util.ArrayList::add@7 (line 458)

The following comment is interesting because we have the same type of bytecode getfield but the mapping to the Java code involved 2 methods: java.util.ArrayList::ensureCapacityInternal@1 (line 223) and java.util.ArrayList::add@7 (line 458). Implicitly, it means that the JIT has inlined the first method mentionned and the byte code come from this method.

        private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
    
        private void ensureCapacityInternal(int minCapacity) {
L223        if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
                minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
            }
            
L227        ensureExplicitCapacity(minCapacity);
        }

Empty array test

0x0000000002d2a7a2: cmp    r11d,0xd5d0e088    ;   {oop(a 'java/lang/Object'[0] )}
0x0000000002d2a7a9: je     0x0000000002d2a861 ;*if_acmpne
                                  ; - java.util.ArrayList::ensureCapacityInternal@7 (line 223)
                                  ; - java.util.ArrayList::add@7 (line 458)

{oop(a 'java/lang/Object'[0])} indicates an instance (oop) with the following type 'java/lang/Object'[0]'. It means object array. This is in fact the constant instance empty array against which we are comparing inside the inlined method ensureCapacityInternal.

More inlining

0x0000000002d2a7af: inc    DWORD PTR [rdx+0xc]  ;*putfield modCount
                                ; - java.util.ArrayList::ensureExplicitCapacity@7 (line 231)
                                ; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
                                ; - java.util.ArrayList::add@7 (line 458)

Here we have an additional level of inlining for the ensureExplicitCapacity method.

        private void ensureExplicitCapacity(int minCapacity) {
L231        modCount++;
            // overflow-conscious code
            if (minCapacity - elementData.length > 0)
L234            grow(minCapacity);
        }

Implicit null check

0x0000000002d2a7b2: mov    r10d,DWORD PTR [r11+0xc]  ; implicit exception: dispatches to 0x0000000002d2a881

New kind of comment: Here we have an implicit null check because we are dereferencing the object array elementData to get the length of it. (Java code: elementData.length). If elementData is null, JVM must throw a NullPointerException in this case. But, too avoid generating code for each object dereferenced, JIT relies on OS signal handling for segfault to handle this rare case. See my article on this technique.

Type Check

Let’s skip some regular comments to stop on this one

0x0000000002d2a7d3: cmp    r10d,0x200022ee    ;   {metadata('java/lang/Object'[])}

We are verifying the current instance elementData class (metadata) is an object array ('java/lang/Object'[]). For performing this, we are getting from the instance the class pointer that we compare to the address of the class loaded by the JVM.

Card marking

Sometimes the comments are wrong:

0x0000000002d2a7df: lea    r10,[r11+r9*4+0x10]
0x0000000002d2a7e4: mov    r11,QWORD PTR [rsp+0x8]
0x0000000002d2a7e9: mov    r8,r11
0x0000000002d2a7ec: mov    DWORD PTR [r10],r8d
0x0000000002d2a7ef: shr    r10,0x9
0x0000000002d2a7f3: mov    eax,0x1
0x0000000002d2a7f8: mov    r11d,0x5965000
0x0000000002d2a7fe: mov    BYTE PTR [r11+r10*1],r12b
                                              ;*synchronization entry
                                              ; - java.util.ArrayList::add@-1 (line 458)

Here this is not a synchronization entry, but a special operation called ‘card marking’ that is performed after a write of a reference into a field or a reference array (elementData in our case). Card marking generated assembly is analyzed in this article. In this case we have card marking for element in an array, but for regular instance field, the generated assembly is different.

Safepoint poll

 0x0000000002d2a807: test   DWORD PTR [rip+0xfffffffffe5857f3],eax        # 0x00000000012b0000
                                                ;   {poll_return}

Finally, the comment {poll_return} indicates that the instruction performs a safepoint check. You will see this at the end of all methods. For more details about safepoints, please read my article and, a more detailed exploration of safepoints and impact here.

Voilà! You have the basics to understand the disassembly output from PrintAssembly options. I strongly recommend, again, if you want to go further to use the wonderful JITWatch tool.

References

From this blog:

From Nitsan’s blog (read all articles but specifically):

Other sources:

Thanks to Georges Gomes for the review, and a special BIG thanks to The Great Nitsan Wakart who provides me tons of comments and corrections!