30 Dec 2015
12 mins readPrintAssembly output explained!
If you are a regular reader of my blog, you may have noticed that I am (ab)using (of) the PrintAssembly options from the JVM to examine the code generated by the JIT compiler. It helps me a lot to understand how my code is executed, and how the JIT compiler works and optimize the code. Even if from time to time I use also JMH, I am not a big fan of benchmarking and especially micro-benchmarking.
Why? Because micro-benchmarking is an idealization of how the production code is executed: tight loops, all data into the L1 cache and hot, few branch misses, best case for aggressive JIT optimizations (like monomorphic calls, etc.)
The thing is, the execution context in production is totally different from micro-benchmarks, so what’s the point in exercising a code that will not be executed in the same condition? Are the conclusions that I can draw from the micro-benchmark still valid or still beneficial for my production cases? All of this, push me away from micro-benchmark as much as possible and try to find another way to evaluate the performance like performance counters inserted directly into the application or reading the assembly generated by the JIT compiler. Note also that it is not perfect either as nowadays CPU are out-of-order in execution and also perform Instruction Level Parallelism. So benchmarking in some situations are the only way to assess performance.
Printing assembly helps me also to back assertions about how JIT optimizes instead of relying on some folklore and urban legends (reordering of instructions, memory barriers, …).
With all of that, PrintAssembly is one of my favorite tools. But I can understand the output of this may be difficult to read. Nowadays, not all developers are familiar with assembly, unfortunately, but with some basic knowledge and with the help of comments inserted, it can be less cryptic.
For those who have never used PrintAssembly please refer to my previous posts about it: How to print dissassembly from JIT code and How to build hsdis-amd64.dll. Chris Newland, creator of JITWatch tool, has also some useful tips for Mac OS X. Nitsan Wakart wrote an article on this.
Your setup is done? Perfect let’s read some assembly, yeah!
Assembly 101
First of all, I am using intel syntax, not AT&T one. I am used to this syntax, and because we are talking about x86 instruction set made by Intel let’s stick to their convention.
Reminder: To get this syntax with the disassembler plugin, use the JVM option:
-XX:PrintAssemblyOptions=intel
Instruction lines are decomposed as the following:
mnemonic parameters
mnemonic
is the instruction name (mov, call, add, jmp, cmp, …)
parameters
can be register, memory accesses, immediate values
Examples:
mov rax, 0x2A
mov rdx, QWORD PTR [rbx+0x571c418]
mov
instruction is a data movement. The first line move the constant value 0x2A
into the register rax
.
The second line, move the memory content at the address computed from the value of regiser rbx
and the constant value 0x571c418
into the register rdx
. Note that order is reversed for AT&T syntax.
push
/pop
instructions move data to/from the stack
add
/sub
/imul
/idiv
instructions perform addition/subtraction/multiplication/division on integers
inc
/dec
instructions increment/decrement value in registers or memory
and
/or
/xor
/not
/shl
/shr
instructions perform bitwise operations
jmp
instruction performs a unconditional jump to the specified address
jxx
instructions perform a conditional jump based on the result of the related last operation
cmp
instruction performs a comparison between 2 operands
call
/ret
instruction perform call to /return from a subroutine
For more information see this guide for example or the official Intel documentation.
Disassembler comments
Hopefully, disassembler plugin does not spit raw instructions but annotate them with useful information.
Let’s take an example with the method ArrayList.add(Object)
and analyze it:
public boolean add(E e) {
L458 ensureCapacityInternal(size + 1); // Increments modCount!!
L459 elementData[size++] = e;
return true;
}
# {method} {0x000000000ac640f8} 'add' '(Ljava/lang/Object;)Z' in 'java/util/ArrayList'
# this: rdx:rdx = 'java/util/ArrayList'
# parm0: r8:r8 = 'java/lang/Object'
# [sp+0x40] (sp of caller)
0x0000000002d2a760: mov r10d,DWORD PTR [rdx+0x8]
0x0000000002d2a764: shl r10,0x3
0x0000000002d2a768: cmp rax,r10
0x0000000002d2a76b: jne 0x0000000002cf5f60 ; {runtime_call}
0x0000000002d2a771: data32 xchg ax,ax
0x0000000002d2a774: nop DWORD PTR [rax+rax*1+0x0]
0x0000000002d2a77c: data32 data32 xchg ax,ax
[Verified Entry Point]
0x0000000002d2a780: mov DWORD PTR [rsp-0x6000],eax
0x0000000002d2a787: push rbp
0x0000000002d2a788: sub rsp,0x30 ;*synchronization entry
; - java.util.ArrayList::add@-1 (line 458)
0x0000000002d2a78c: mov QWORD PTR [rsp+0x8],r8
0x0000000002d2a791: mov rbp,rdx
0x0000000002d2a794: mov r9d,DWORD PTR [rdx+0x10] ;*getfield size
; - java.util.ArrayList::add@2 (line 458)
0x0000000002d2a798: mov r11d,DWORD PTR [rdx+0x14]
;*getfield elementData
; - java.util.ArrayList::ensureCapacityInternal@1 (line 223)
; - java.util.ArrayList::add@7 (line 458)
0x0000000002d2a79c: mov r8d,r9d
0x0000000002d2a79f: inc r8d ;*iadd
; - java.util.ArrayList::add@6 (line 458)
0x0000000002d2a7a2: cmp r11d,0xd5d0e088 ; {oop(a 'java/lang/Object'[0] )}
0x0000000002d2a7a9: je 0x0000000002d2a861 ;*if_acmpne
; - java.util.ArrayList::ensureCapacityInternal@7 (line 223)
; - java.util.ArrayList::add@7 (line 458)
0x0000000002d2a7af: inc DWORD PTR [rdx+0xc] ;*putfield modCount
; - java.util.ArrayList::ensureExplicitCapacity@7 (line 231)
; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
; - java.util.ArrayList::add@7 (line 458)
0x0000000002d2a7b2: mov r10d,DWORD PTR [r11+0xc] ; implicit exception: dispatches to 0x0000000002d2a881
0x0000000002d2a7b6: mov ebx,r9d
0x0000000002d2a7b9: sub ebx,r10d
0x0000000002d2a7bc: inc ebx
0x0000000002d2a7be: test ebx,ebx
0x0000000002d2a7c0: jg 0x0000000002d2a80e ;*ifle
; - java.util.ArrayList::ensureExplicitCapacity@17 (line 234)
; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
; - java.util.ArrayList::add@7 (line 458)
0x0000000002d2a7c2: mov DWORD PTR [rdx+0x10],r8d ;*return
; - java.util.ArrayList::ensureExplicitCapacity@25 (line 236)
; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
; - java.util.ArrayList::add@7 (line 458)
0x0000000002d2a7c6: mov r10d,DWORD PTR [r11+0xc]
0x0000000002d2a7ca: cmp r9d,r10d
0x0000000002d2a7cd: jae 0x0000000002d2a839
0x0000000002d2a7cf: mov r10d,DWORD PTR [r11+0x8]
0x0000000002d2a7d3: cmp r10d,0x200022ee ; {metadata('java/lang/Object'[])}
0x0000000002d2a7da: jne 0x0000000002d2a84d ;*aastore
; - java.util.ArrayList::add@26 (line 459)
0x0000000002d2a7dc: mov r10,r11 ;*getfield elementData
; - java.util.ArrayList::add@11 (line 459)
0x0000000002d2a7df: lea r10,[r11+r9*4+0x10]
0x0000000002d2a7e4: mov r11,QWORD PTR [rsp+0x8]
0x0000000002d2a7e9: mov r8,r11
0x0000000002d2a7ec: mov DWORD PTR [r10],r8d
0x0000000002d2a7ef: shr r10,0x9
0x0000000002d2a7f3: mov eax,0x1
0x0000000002d2a7f8: mov r11d,0x5965000
0x0000000002d2a7fe: mov BYTE PTR [r11+r10*1],r12b
;*synchronization entry
; - java.util.ArrayList::add@-1 (line 458)
0x0000000002d2a802: add rsp,0x30
0x0000000002d2a806: pop rbp
0x0000000002d2a807: test DWORD PTR [rip+0xfffffffffe5857f3],eax # 0x00000000012b0000
; {poll_return}
0x0000000002d2a80d: ret ;*synchronization entry
; - java.util.ArrayList::ensureExplicitCapacity@-1 (line 231)
; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
; - java.util.ArrayList::add@7 (line 458)
Header
In the header we can find the following information:
# {method} {0x000000000ac640f8} 'add' '(Ljava/lang/Object;)Z' in 'java/util/ArrayList'
# this: rdx:rdx = 'java/util/ArrayList'
# parm0: r8:r8 = 'java/lang/Object'
# [sp+0x40] (sp of caller)
The first line is the name of the method disassembled: 'add'
with its signature: one parameter of type Object
returning a boolean
from the class java.util.ArrayList
But as this is a instance method there is in fact 2 parameters as mentioned in the rest of the header:
Parameter this which is stored in register rdx
, and the Object
parameter in register r8
.
Verified Entry Point
0x0000000002d2a774: nop DWORD PTR [rax+rax*1+0x0]
0x0000000002d2a77c: data32 data32 xchg ax,ax
[Verified Entry Point]
After the header the first instructions of the methods begins after the [Verified entry point]
section. Assembly before this mark is here for alignment (padding). Starting from this section, we will look at comments that are after the semi-colon. Comments that are starting with the star (*
) indicates the associated byte code.
Synchronization Entry
0x0000000002d2a780: mov DWORD PTR [rsp-0x6000],eax
0x0000000002d2a787: push rbp
0x0000000002d2a788: sub rsp,0x30 ;*synchronization entry
; - java.util.ArrayList::add@-1 (line 458)
The following comment: ; - java.util.ArrayList::add@-1 (line 458)
gives us the mapping to the Java code: class name, method name and bytecode offset into the method, and finally the line number into the original Java source file. For this prologue, as we do not have a specific bytecode associated we’ve got the -1 offset. For the first one: ;*synchronization entry
, it indicates the prologue of the function: some instructions that are necessary to prepare the execution (stack allocation or stack banging, saving some registers, …)
Get size field
0x0000000002d2a78c: mov QWORD PTR [rsp+0x8],r8
0x0000000002d2a791: mov rbp,rdx
0x0000000002d2a794: mov r9d,DWORD PTR [rdx+0x10] ;*getfield size
; - java.util.ArrayList::add@2 (line 458)
Next comment retrieves the field named from the current instance (ArrayList
). It is translated to the following assembly line: mov r9d,DWORD PTR [rdx+0x10]
It moves into r9
register the content of the address rdx
(this instance, cf method parameter) + 0x10
offset where the size field is located.
Get elementData field
0x0000000002d2a798: mov r11d,DWORD PTR [rdx+0x14]
;*getfield elementData
; - java.util.ArrayList::ensureCapacityInternal@1 (line 223)
; - java.util.ArrayList::add@7 (line 458)
The following comment is interesting because we have the same type of bytecode getfield but the mapping to the Java code involved 2 methods: java.util.ArrayList::ensureCapacityInternal@1 (line 223)
and java.util.ArrayList::add@7 (line 458)
. Implicitly, it means that the JIT has inlined the first method mentionned and the byte code come from this method.
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
private void ensureCapacityInternal(int minCapacity) {
L223 if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
L227 ensureExplicitCapacity(minCapacity);
}
Empty array test
0x0000000002d2a7a2: cmp r11d,0xd5d0e088 ; {oop(a 'java/lang/Object'[0] )}
0x0000000002d2a7a9: je 0x0000000002d2a861 ;*if_acmpne
; - java.util.ArrayList::ensureCapacityInternal@7 (line 223)
; - java.util.ArrayList::add@7 (line 458)
{oop(a 'java/lang/Object'[0])}
indicates an instance (oop) with the following type 'java/lang/Object'[0]'
. It means object array. This is in fact the constant instance empty array against which we are comparing inside the inlined method ensureCapacityInternal
.
More inlining
0x0000000002d2a7af: inc DWORD PTR [rdx+0xc] ;*putfield modCount
; - java.util.ArrayList::ensureExplicitCapacity@7 (line 231)
; - java.util.ArrayList::ensureCapacityInternal@19 (line 227)
; - java.util.ArrayList::add@7 (line 458)
Here we have an additional level of inlining for the ensureExplicitCapacity
method.
private void ensureExplicitCapacity(int minCapacity) {
L231 modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
L234 grow(minCapacity);
}
Implicit null check
0x0000000002d2a7b2: mov r10d,DWORD PTR [r11+0xc] ; implicit exception: dispatches to 0x0000000002d2a881
New kind of comment: Here we have an implicit null check because we are dereferencing the object array elementData to get the length of it. (Java code: elementData.length
). If elementData
is null, JVM must throw a NullPointerException
in this case. But, too avoid generating code for each object dereferenced, JIT relies on OS signal handling for segfault to handle this rare case. See my article on this technique.
Type Check
Let’s skip some regular comments to stop on this one
0x0000000002d2a7d3: cmp r10d,0x200022ee ; {metadata('java/lang/Object'[])}
We are verifying the current instance elementData class (metadata) is an object array ('java/lang/Object'[]
). For performing this, we are getting from the instance the class pointer that we compare to the address of the class loaded by the JVM.
Card marking
Sometimes the comments are wrong:
0x0000000002d2a7df: lea r10,[r11+r9*4+0x10]
0x0000000002d2a7e4: mov r11,QWORD PTR [rsp+0x8]
0x0000000002d2a7e9: mov r8,r11
0x0000000002d2a7ec: mov DWORD PTR [r10],r8d
0x0000000002d2a7ef: shr r10,0x9
0x0000000002d2a7f3: mov eax,0x1
0x0000000002d2a7f8: mov r11d,0x5965000
0x0000000002d2a7fe: mov BYTE PTR [r11+r10*1],r12b
;*synchronization entry
; - java.util.ArrayList::add@-1 (line 458)
Here this is not a synchronization entry
, but a special operation called ‘card marking’ that is performed after a write of a reference into a field or a reference array (elementData
in our case). Card marking generated assembly is analyzed in this article. In this case we have card marking for element in an array, but for regular instance field, the generated assembly is different.
Safepoint poll
0x0000000002d2a807: test DWORD PTR [rip+0xfffffffffe5857f3],eax # 0x00000000012b0000
; {poll_return}
Finally, the comment {poll_return}
indicates that the instruction performs a safepoint check. You will see this at the end of all methods. For more details about safepoints, please read my article and, a more detailed exploration of safepoints and impact here.
Voilà! You have the basics to understand the disassembly output from PrintAssembly options. I strongly recommend, again, if you want to go further to use the wonderful JITWatch tool.
References
From this blog:
- Safety first: Safepoints
- How to print disassembly from JIT code
- Null check elimination
- Volatile and memory barriers
From Nitsan’s blog (read all articles but specifically):
- Where is my safepoint?
- Safepoints: Meaning, Side Effects and Overheads
- The JVM Write Barrier: Card Marking
- Experimentation Notes: Java Print Assembly
- Disassembling a JMH Nano-Benchmark
- JMH perfasm explained: Looking at False Sharing on Conditional Inlining
Other sources:
- JITWatch from Chris Newland
- The Black Magic of (Java) Method Dispatch from Aleksey Shipilëv
- PrintAssembly from OpenJDK wiki
- x86 guide
- Stacks with split personalities from Doug Simon (includes stack banging)
Thanks to Georges Gomes for the review, and a special BIG thanks to The Great Nitsan Wakart who provides me tons of comments and corrections!