It's All Relative

Performance Minded

02 Aug 2020

19 mins read

When Escape Analysis fails you?

Context

In november 2019, I was attending a presentation by Thomas Wuerthinger about Abstractions Without Regret with GraalVM at Devoxx in Antwerp. He took a simple example using Objects#hash() that was significantly faster with Graal Compiler than C2. I was deeply surprised by that. it was obvious that the allocation of the varargs was the culprit: It was eliminated by Graal but not by C2. Why did the Escape Analysis fail here?

Escape Analysis

This post is not intended to introduce you what Escape Analysis (EA) is, there are plenty of other articles for that (see References). Though I will just say that EA is not an optimization per se, but an analysis phase (hence the name ;-)) that gather information to ba able to apply some optimizations like lock elision, scalar replacement or even stack allocation.

Objects#hashCode

The question is why in this case, that seems simple, C2’s EA fails?

Let’s reproduce the issue, not with a JMH benchmark but with a simple case where we can analyze the JITed code. But first let’s analyze Java code:

    public static int hash(Object... values) {
        return Arrays.hashCode(values);
    }

Objects#hash calls Arrays#hashCode

    public static int hashCode(Object a[]) {
        if (a == null)
            return 0;

        int result = 1;

        for (Object element : a)
            result = 31 * result + (element == null ? 0 : element.hashCode());

        return result;
    }

Nothing fancy, Arrays#hashcode should be inlined and it just iterating on the varargs array to compute hash code on each element with multiplication with a prime number.

Here is my class to reproduce the case and to play with it:

import java.util.Arrays;
import java.util.Objects;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.LockSupport;

public class ObjectsHashJIT {

    Integer integerField = new Integer(42);
    Double doubleField = new Double(3.14);
    Object[] fields = {integerField, doubleField};

    public static void main(String[] args) {
        ObjectsHashJIT objectsHashJIT = new ObjectsHashJIT();
        int res = 0;
        int iters = Integer.getInteger("iters", 20_000);
        for (int i = 0; i < iters; i ++) {
            res += objectsHashJIT.bench(i);
        }
        System.out.println(res);
        long endSleep = Long.getLong("endSleep", 1);
        LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(endSleep));
    }

    private int jdkObjectsHash() {
        return Objects.hash(integerField, doubleField);
    }

    private int noVarArgsObjectsHash() {
        return Arrays.hashCode(fields);
    }

    private int rawVarArgsObjectHash_noargs() {
        return rawVarArgsObjectsHash(integerField, doubleField);
    }

    private int iterVarArgsObjectsHash_noargs() {
        return iterVarArgsObjectsHash(integerField, doubleField);
    }

    private int rawObjectsHash() {
        int res = 1;
        res += 31 * res + integerField.hashCode();
        res += 31 * res + doubleField.hashCode();
        return  res;
    }

    private static int rawVarArgsObjectsHash(Object... elements) {
        int res = 1;
        res += 31 * res + elements[0].hashCode();
        res += 31 * res + elements[1].hashCode();
        return res;
    }

    private static int iterVarArgsObjectsHash(Object... elements) {
        if (elements == null)
            return 0;
        int result = 1;
        for (Object element : elements)
            result += 31 * result + (element == null ? 0 : element.hashCode());
        return result;
    }

    private int bench(int i) {
        int res = jdkObjectsHash();
        //int res = rawObjectsHash();
        //int res = noVarArgsObjectsHash();
        //int res = rawVarArgsObjectHash_noargs();
        //int res = iterVarArgsObjectsHash_noargs();
        return res + i;
    }
}

jdkObjectsHash C2

I run the first method jdkObjectsHash in bench method which exercised the code to be JITed and to analyze the output. I am using OpenJDK 14, and to print assembly I am using the new Compiler Control feature instead of CompileCommand JVM options to just output the bench method.

[
  {
    match: "ObjectsHashJIT::bench",
    PrintAssembly: true,
    PrintInlining: true
  }
]
    private int jdkObjectsHash() {
        return Objects.hash(integerField, doubleField);
    }

First the inlining decision:

	@ 1   ObjectsHashJIT::jdkObjectsHash (22 bytes)   inline (hot)                    
	  @ 18   java.util.Objects::hash (5 bytes)   inline (hot)                         
		@ 1   java.util.Arrays::hashCode (56 bytes)   inline (hot)                    
		  @ 43   java.lang.Integer::hashCode (8 bytes)   inline (hot)                 
		  @ 43   java.lang.Double::hashCode (8 bytes)   inline (hot)                  
		   \-> TypeProfile (4427/8854 counts) = java/lang/Double                      
		   \-> TypeProfile (4427/8854 counts) = java/lang/Integer                     
			@ 4   java.lang.Double::hashCode (13 bytes)   inline (hot)                
			  @ 1   java.lang.Double::doubleToLongBits (16 bytes)   (intrinsic)       
			@ 4   java.lang.Integer::hashCode (2 bytes)   inline (hot)                

Then the assembly output (yes, with the intel syntax):

  0x000002774d3c9bd0:   mov    DWORD PTR [rsp-0x7000],eax
  0x000002774d3c9bd7:   push   rbp
  0x000002774d3c9bd8:   sub    rsp,0x30
  0x000002774d3c9bdc:   mov    QWORD PTR [rsp],rdx
  0x000002774d3c9be0:   mov    ebp,r8d
  0x000002774d3c9be3:   mov    rax,QWORD PTR [r15+0x120]
  0x000002774d3c9bea:   mov    r10,rax
  0x000002774d3c9bed:   add    r10,0x18
  0x000002774d3c9bf1:   cmp    r10,QWORD PTR [r15+0x130]
  0x000002774d3c9bf8:   jae    0x000002774d3c9ca8
  0x000002774d3c9bfe:   mov    QWORD PTR [r15+0x120],r10
  0x000002774d3c9c05:   prefetchw BYTE PTR [r10+0xc0]
  0x000002774d3c9c0d:   mov    QWORD PTR [rax],0x1
  0x000002774d3c9c14:   prefetchw BYTE PTR [r10+0x100]
  0x000002774d3c9c1c:   mov    DWORD PTR [rax+0x8],0x2115   ;   {metadata('java/lang/Object'[])}
  0x000002774d3c9c23:   prefetchw BYTE PTR [r10+0x140]
  0x000002774d3c9c2b:   mov    DWORD PTR [rax+0xc],0x2
  0x000002774d3c9c32:   prefetchw BYTE PTR [r10+0x180]
  0x000002774d3c9c3a:   mov    r10,QWORD PTR [rsp]
  0x000002774d3c9c3e:   mov    r11d,DWORD PTR [r10+0x10]
  0x000002774d3c9c42:   mov    r8d,DWORD PTR [r10+0xc]
  0x000002774d3c9c46:   mov    DWORD PTR [rax+0x10],r8d
  0x000002774d3c9c4a:   mov    DWORD PTR [rax+0x14],r11d
  0x000002774d3c9c4e:   test   r8d,r8d
  0x000002774d3c9c51:   je     0x000002774d3c9cc5
  0x000002774d3c9c53:   mov    r9d,DWORD PTR [r8+0xc]
  0x000002774d3c9c57:   mov    r10d,r9d
  0x000002774d3c9c5a:   shl    r10d,0x5
  0x000002774d3c9c5e:   sub    r10d,r9d
  0x000002774d3c9c61:   test   r11d,r11d
  0x000002774d3c9c64:   je     0x000002774d3c9cd2
  0x000002774d3c9c66:   vmovsd xmm0,QWORD PTR [r11+0x10]
  0x000002774d3c9c6c:   vucomisd xmm0,xmm0
  0x000002774d3c9c70:   jp     0x000002774d3c9c74
  0x000002774d3c9c72:   je     0x000002774d3c9c94
  0x000002774d3c9c74:   mov    eax,0x7ff80000
  0x000002774d3c9c79:   add    eax,r10d
  0x000002774d3c9c7c:   add    eax,ebp
  0x000002774d3c9c7e:   add    eax,0x3c1
  0x000002774d3c9c84:   add    rsp,0x30
  0x000002774d3c9c88:   pop    rbp
  0x000002774d3c9c89:   mov    r10,QWORD PTR [r15+0x110]
  0x000002774d3c9c90:   test   DWORD PTR [r10],eax          ;   {poll_return}
  0x000002774d3c9c93:   ret

Without to explain every line or instructions, we can see the first part which allocates in TLAB the varargs array (around the prefetchw instruction) and if you want more information on how to interpret it, read the post about TLAB allocation. Then some instructions to perform the hash code computation, loop is unrolled.

jdkObjectsHash Graal

Now, I run with Graal Compiler to see the difference:

  0x000002590a605640:   nop    DWORD PTR [rax+rax*1+0x0]                                    
  0x000002590a605645:   mov    eax,DWORD PTR [rdx+0xc]                                      
  0x000002590a605648:   test   eax,eax                                                      
  0x000002590a60564a:   je     0x000002590a6056b9                                           
  0x000002590a605650:   mov    eax,DWORD PTR [rax*1+0xc]                                    
  0x000002590a605657:   mov    r10d,DWORD PTR [rdx+0x10]                                    
  0x000002590a60565b:   test   r10d,r10d                                                    
  0x000002590a60565e:   je     0x000002590a6056c0                                           
  0x000002590a605664:   vmovsd xmm0,QWORD PTR [r10*1+0x10]                                  
  0x000002590a60566e:   vmovq  r10,xmm0                                                     
  0x000002590a605673:   movabs r11,0x7ff8000000000000                                       
  0x000002590a60567d:   vucomisd xmm0,xmm0                                                  
  0x000002590a605681:   mov    r9,r11                                                       
  0x000002590a605684:   cmove  r9,r10                                                       
  0x000002590a605688:   cmovp  r9,r11                                                       
  0x000002590a60568c:   mov    r10,r9                                                       
  0x000002590a60568f:   shr    r10,0x20                                                     
  0x000002590a605693:   xor    r9,r10                                                       
  0x000002590a605696:   mov    r10d,r9d                                                     
  0x000002590a605699:   lea    eax,[rax+0x1f]                                               
  0x000002590a60569c:   mov    r11d,eax                                                     
  0x000002590a60569f:   shl    r11d,0x5                                                     
  0x000002590a6056a3:   sub    r11d,eax                                                     
  0x000002590a6056a6:   add    r10d,r11d                                                    
  0x000002590a6056a9:   add    r8d,r10d                                                     
  0x000002590a6056ac:   mov    eax,r8d                                                      
  0x000002590a6056af:   mov    rcx,QWORD PTR [r15+0x110]                                    
  0x000002590a6056b6:   test   DWORD PTR [rcx],eax          ;   {poll_return}               
  0x000002590a6056b8:   ret                                                                 

Here the code is more compact, no allocation.

rawObjectsHash

Let’s see with my rawObjectsHash method which manually unroll and inline the computation of the hash of the 2 fields.

    private int rawObjectsHash() {
        int res = 1;
        res += 31 * res + integerField.hashCode();
        res += 31 * res + doubleField.hashCode();
        return  res;
    }

Inlining decision:

  @ 1   ObjectsHashJIT::rawObjectsHash (34 bytes)   inline (hot)
    @ 11   java.lang.Integer::hashCode (8 bytes)   inline (hot)
      @ 4   java.lang.Integer::hashCode (2 bytes)   inline (hot)
    @ 26   java.lang.Double::hashCode (8 bytes)   inline (hot)
      @ 4   java.lang.Double::hashCode (13 bytes)   inline (hot)
        @ 1   java.lang.Double::doubleToLongBits (16 bytes)   (intrinsic)

Assembly output:

  0x000001d2abe5ded0:   mov    DWORD PTR [rsp-0x7000],eax
  0x000001d2abe5ded7:   push   rbp
  0x000001d2abe5ded8:   sub    rsp,0x10
  0x000001d2abe5dedc:   mov    r10d,DWORD PTR [rdx+0xc]
  0x000001d2abe5dee0:   mov    r11d,DWORD PTR [r10+0xc]     ; implicit exception: dispatches to 0x000001d2abe5df3b
  0x000001d2abe5dee4:   mov    r10d,DWORD PTR [rdx+0x10]
  0x000001d2abe5dee8:   vmovsd xmm0,QWORD PTR [r10+0x10]    ; implicit exception: dispatches to 0x000001d2abe5df64
  0x000001d2abe5deee:   vucomisd xmm0,xmm0
  0x000001d2abe5def2:   jp     0x000001d2abe5def6
  0x000001d2abe5def4:   je     0x000001d2abe5df27
  0x000001d2abe5def6:   mov    r10d,0x7ff80000
  0x000001d2abe5defc:   mov    ecx,r11d
  0x000001d2abe5deff:   shl    ecx,0x5
  0x000001d2abe5df02:   sub    ecx,r11d
  0x000001d2abe5df05:   add    ecx,r10d
  0x000001d2abe5df08:   add    ecx,r11d
  0x000001d2abe5df0b:   add    r8d,ecx
  0x000001d2abe5df0e:   mov    eax,r8d
  0x000001d2abe5df11:   add    eax,0x400
  0x000001d2abe5df17:   add    rsp,0x10
  0x000001d2abe5df1b:   pop    rbp
  0x000001d2abe5df1c:   mov    r10,QWORD PTR [r15+0x110]
  0x000001d2abe5df23:   test   DWORD PTR [r10],eax          ;   {poll_return}
  0x000001d2abe5df26:   ret    

We have now something that is very similar to the Graal output. I don’t know if the performance is similar or not, this is not the point of this post. At least this is the code I expected to have when calling Objects.hash.

noVarArgsObjectsHash

Let’s try to avoid the allocation of varargs array. The method noVarArgsObjectsHash calls Arrays.hashCode with pre-allocated array.

    private int noVarArgsObjectsHash() {
        return Arrays.hashCode(fields);
    }

Inlining decision:

	@ 1   ObjectsHashJIT::noVarArgsObjectsHash (8 bytes)   inline (hot)            
	  @ 4   java.util.Arrays::hashCode (56 bytes)   inline (hot)                   
		@ 43   java.lang.Integer::hashCode (8 bytes)   inline (hot)                
		@ 43   java.lang.Double::hashCode (8 bytes)   inline (hot)                 
		 \-> TypeProfile (4467/8934 counts) = java/lang/Double                     
		 \-> TypeProfile (4467/8934 counts) = java/lang/Integer                    
		  @ 4   java.lang.Double::hashCode (13 bytes)   inline (hot)               
			@ 1   java.lang.Double::doubleToLongBits (16 bytes)   (intrinsic)      
		  @ 4   java.lang.Integer::hashCode (2 bytes)   inline (hot)

Assembly output:

  0x0000016cc78d9bf0:   mov    DWORD PTR [rsp-0x7000],eax                                                                  
  0x0000016cc78d9bf7:   push   rbp                                                                                         
  0x0000016cc78d9bf8:   sub    rsp,0x30                                                                                    
  0x0000016cc78d9bfc:   mov    edi,r8d                                                                                     
  0x0000016cc78d9bff:   mov    esi,DWORD PTR [rdx+0x14]                                                                    
  0x0000016cc78d9c02:   mov    ebx,DWORD PTR [rsi+0xc]      ; implicit exception: dispatches to 0x0000016cc78d9d6c         
  0x0000016cc78d9c05:   test   ebx,ebx                                                                                     
  0x0000016cc78d9c07:   ja     0x0000016cc78d9c20                                                                          
  0x0000016cc78d9c09:   mov    eax,0x1                                                                                     
  0x0000016cc78d9c0e:   add    eax,edi                                                                                     
  0x0000016cc78d9c10:   add    rsp,0x30                                                                                    
  0x0000016cc78d9c14:   pop    rbp                                                                                         
  0x0000016cc78d9c15:   mov    r10,QWORD PTR [r15+0x110]                                                                   
  0x0000016cc78d9c1c:   test   DWORD PTR [r10],eax          ;   {poll_return}                                              
  0x0000016cc78d9c1f:   ret                                                                                                
  0x0000016cc78d9c20:   mov    r11d,ebx                                                                                    
  0x0000016cc78d9c23:   dec    r11d                                                                                        
  0x0000016cc78d9c26:   cmp    r11d,ebx                                                                                    
  0x0000016cc78d9c29:   jae    0x0000016cc78d9cf8                                                                          
  0x0000016cc78d9c2f:   mov    rbp,rsi                                                                                     
  0x0000016cc78d9c32:   xor    ecx,ecx                                                                                     
  0x0000016cc78d9c34:   mov    r9d,0x1f                                                                                    
  0x0000016cc78d9c3a:   mov    r10d,0x3e8                                                                                  
  0x0000016cc78d9c40:   jmp    0x0000016cc78d9c67                                                                          
  0x0000016cc78d9c42:   mov    eax,DWORD PTR [rdx+0xc]                                                                     
  0x0000016cc78d9c45:   add    eax,r9d                                                                                     
  0x0000016cc78d9c48:   mov    r9d,eax                                                                                     
  0x0000016cc78d9c4b:   shl    r9d,0x5                                                                                     
  0x0000016cc78d9c4f:   sub    r9d,eax                                                                                     
  0x0000016cc78d9c52:   inc    ecx                                                                                         
  0x0000016cc78d9c54:   cmp    ecx,r8d                                                                                     
  0x0000016cc78d9c57:   jl     0x0000016cc78d9c77                                                                          
  0x0000016cc78d9c59:   mov    r11,QWORD PTR [r15+0x110]    ; ImmutableOopMap {rsi=NarrowOop rbp=Oop }                     
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}                    
                                                            ; - (reexecute) java.util.Arrays::hashCode@51 (line 4497)      
                                                            ; - ObjectsHashJIT::noVarArgsObjectsHash@4 (line 29)           
                                                            ; - ObjectsHashJIT::bench@1 (line 73)                          
  0x0000016cc78d9c60:   test   DWORD PTR [r11],eax          ;   {poll}                                                     
  0x0000016cc78d9c63:   cmp    ecx,ebx                                                                                     
  0x0000016cc78d9c65:   jge    0x0000016cc78d9c0e                                                                          
  0x0000016cc78d9c67:   mov    r8d,ebx                                                                                     
  0x0000016cc78d9c6a:   sub    r8d,ecx                                                                                     
  0x0000016cc78d9c6d:   cmp    r8d,r10d                                                                                    
  0x0000016cc78d9c70:   cmovg  r8d,r10d                                                                                    
  0x0000016cc78d9c74:   add    r8d,ecx                                                                                     
  0x0000016cc78d9c77:   mov    r11d,DWORD PTR [rsi+rcx*4+0x10]                                                             
  0x0000016cc78d9c7c:   mov    eax,DWORD PTR [r11+0x8]      ; implicit exception: dispatches to 0x0000016cc78d9d2c         
  0x0000016cc78d9c80:   mov    rdx,r11                                                                                     
  0x0000016cc78d9c83:   cmp    eax,0x5d33c                  ;   {metadata('java/lang/Integer')}                            
  0x0000016cc78d9c89:   je     0x0000016cc78d9c42                                                                          
  0x0000016cc78d9c8b:   cmp    eax,0x697fa                  ;   {metadata('java/lang/Double')}                             
  0x0000016cc78d9c91:   jne    0x0000016cc78d9cba                                                                          
  0x0000016cc78d9c93:   vmovsd xmm0,QWORD PTR [rdx+0x10]                                                                   
  0x0000016cc78d9c98:   vucomisd xmm0,xmm0                                                                                 
  0x0000016cc78d9c9c:   jp     0x0000016cc78d9ca0                                                                          
  0x0000016cc78d9c9e:   je     0x0000016cc78d9ca7                                                                          
  0x0000016cc78d9ca0:   mov    eax,0x7ff80000                                                                              
  0x0000016cc78d9ca5:   jmp    0x0000016cc78d9c45                                                                          
  0x0000016cc78d9ca7:   vmovq  r11,xmm0                                                                                    
  0x0000016cc78d9cac:   mov    rdx,r11                                                                                     
  0x0000016cc78d9caf:   shr    rdx,0x20                                                                                    
  0x0000016cc78d9cb3:   xor    rdx,r11                                                                                     
  0x0000016cc78d9cb6:   mov    eax,edx                                                                                     
  0x0000016cc78d9cb8:   jmp    0x0000016cc78d9c45                                                                          

We don’t have the allocation, but still generated code is very convoluted.

rawVarArgsObjectHash

With the method rawVarArgsObjectHash we keep the varargs array but we don’t iterate on it, unrolling manually the loop with distinct calls.

    private static int rawVarArgsObjectsHash(Object... elements) {
        int res = 1;
        res += 31 * res + elements[0].hashCode();
        res += 31 * res + elements[1].hashCode();
        return res;
    }

Inlining decision:

	@ 1   ObjectsHashJIT::rawVarArgsObjectHash_noargs (22 bytes)   inline (hot)              
	  @ 18   ObjectsHashJIT::rawVarArgsObjectsHash (32 bytes)   inline (hot)                 
		@ 10   java.lang.Integer::hashCode (8 bytes)   inline (hot)                          
		  @ 4   java.lang.Integer::hashCode (2 bytes)   inline (hot)                         
		@ 24   java.lang.Double::hashCode (8 bytes)   inline (hot)                           
		  @ 4   java.lang.Double::hashCode (13 bytes)   inline (hot)                         
			@ 1   java.lang.Double::doubleToLongBits (16 bytes)   (intrinsic)                

Assembly output:

  0x0000017aafa23950:   mov    DWORD PTR [rsp-0x7000],eax
  0x0000017aafa23957:   push   rbp
  0x0000017aafa23958:   sub    rsp,0x10
  0x0000017aafa2395c:   mov    r10d,DWORD PTR [rdx+0xc]
  0x0000017aafa23960:   mov    r11d,DWORD PTR [r10+0xc]     ; implicit exception: dispatches to 0x0000017aafa239bb
  0x0000017aafa23964:   mov    r10d,DWORD PTR [rdx+0x10]
  0x0000017aafa23968:   vmovsd xmm0,QWORD PTR [r10+0x10]    ; implicit exception: dispatches to 0x0000017aafa239e4
  0x0000017aafa2396e:   vucomisd xmm0,xmm0
  0x0000017aafa23972:   jp     0x0000017aafa23976
  0x0000017aafa23974:   je     0x0000017aafa239a7
  0x0000017aafa23976:   mov    r10d,0x7ff80000
  0x0000017aafa2397c:   mov    ecx,r11d
  0x0000017aafa2397f:   shl    ecx,0x5
  0x0000017aafa23982:   sub    ecx,r11d
  0x0000017aafa23985:   add    ecx,r10d
  0x0000017aafa23988:   add    ecx,r11d
  0x0000017aafa2398b:   add    r8d,ecx
  0x0000017aafa2398e:   mov    eax,r8d
  0x0000017aafa23991:   add    eax,0x400
  0x0000017aafa23997:   add    rsp,0x10
  0x0000017aafa2399b:   pop    rbp
  0x0000017aafa2399c:   mov    r10,QWORD PTR [r15+0x110]
  0x0000017aafa239a3:   test   DWORD PTR [r10],eax          ;   {poll_return}
  0x0000017aafa239a6:   ret    

No varargs allocation! We have something similar to rawObjectsHash but using the varargs notation.

Can we diagnostic the EA decision? Looking at VM Option Explorer, you can found 2 options: -XX:+PrintEscapeAnalysis and -XX:+PrintEliminateAllocations but both are notproduct meaning you cannot use it with a release build of OpenJDK and requires a fastdebug one. We are also using -XX:+Verbose to have more information about eliminated allocations.

EA report:

======== Connection graph for  ObjectsHashJIT::bench                                                                                                                    
JavaObject NoEscape(NoEscape) [ 266F 263F [ 63 ]]   51  AllocateArray   ===  5  6  7  8  1 ( 49  34  39  33  1  11  1  10 ) [[ 52  53  54  61  62  63 ]]  rawptr:NotNull
 ( int:>=0, java/lang/Object:NotNull *, bool, int ) ObjectsHashJIT::rawVarArgsObjectHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1 !jvms: ObjectsHashJIT::rawVarArgsO
bjectHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1                                                                                                                  
LocalVar [ 51P [ 266b 263b ]]   63      Proj    ===  51  [[ 64  266  263 ]] #5 !jvms: ObjectsHashJIT::rawVarArgsObjectHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1 
                                                                                                                                                                        
Scalar  51      AllocateArray   ===  5  6  7  8  1 ( 49  34  39  33  1  11  1  10 ) [[ 52  53  54  61  62  63 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, 
bool, int ) ObjectsHashJIT::rawVarArgsObjectHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1 !jvms: ObjectsHashJIT::rawVarArgsObjectHash_noargs @ bci:1 ObjectsHashJIT:
:bench @ bci:1                                                                                                                                                          
++++ Eliminated: 51 AllocateArray                                                                                                                                       

The allocation of an array was indeed eliminated (Eliminated: 51 AllocateArray)

iterVarArgsObjectsHash

we have kept the varargs but reintroduced the iteration over the varargs array in a form very similar to Arrays.hashCode.

    private static int iterVarArgsObjectsHash(Object... elements) {
        if (elements == null)
            return 0;
        int result = 1;
        for (Object element : elements)
            result += 31 * result + (element == null ? 0 : element.hashCode());
        return result;
    }

Inlining decision:

	@ 1   ObjectsHashJIT::iterVarArgsObjectsHash_noargs (22 bytes)   inline (hot)
	  @ 18   ObjectsHashJIT::iterVarArgsObjectsHash (5 bytes)   inline (hot)
		@ 1   java.util.Arrays::hashCode (56 bytes)   inline (hot)
		  @ 43   java.lang.Integer::hashCode (8 bytes)   inline (hot)
		  @ 43   java.lang.Double::hashCode (8 bytes)   inline (hot)
		   \-> TypeProfile (4290/8580 counts) = java/lang/Double
		   \-> TypeProfile (4290/8580 counts) = java/lang/Integer
			@ 4   java.lang.Double::hashCode (13 bytes)   inline (hot)
			  @ 1   java.lang.Double::doubleToLongBits (16 bytes)   (intrinsic)
			@ 4   java.lang.Integer::hashCode (2 bytes)   inline (hot)

Assembly output:

  0x000001d2ef949bd0:   mov    DWORD PTR [rsp-0x7000],eax
  0x000001d2ef949bd7:   push   rbp
  0x000001d2ef949bd8:   sub    rsp,0x30
  0x000001d2ef949bdc:   mov    QWORD PTR [rsp],rdx
  0x000001d2ef949be0:   mov    ebp,r8d
  0x000001d2ef949be3:   mov    rax,QWORD PTR [r15+0x120]
  0x000001d2ef949bea:   mov    r10,rax
  0x000001d2ef949bed:   add    r10,0x18
  0x000001d2ef949bf1:   cmp    r10,QWORD PTR [r15+0x130]
  0x000001d2ef949bf8:   jae    0x000001d2ef949ca8
  0x000001d2ef949bfe:   mov    QWORD PTR [r15+0x120],r10
  0x000001d2ef949c05:   prefetchw BYTE PTR [r10+0xc0]
  0x000001d2ef949c0d:   mov    QWORD PTR [rax],0x1
  0x000001d2ef949c14:   prefetchw BYTE PTR [r10+0x100]
  0x000001d2ef949c1c:   mov    DWORD PTR [rax+0x8],0x2115   ;   {metadata('java/lang/Object'[])}
  0x000001d2ef949c23:   prefetchw BYTE PTR [r10+0x140]
  0x000001d2ef949c2b:   mov    DWORD PTR [rax+0xc],0x2
  0x000001d2ef949c32:   prefetchw BYTE PTR [r10+0x180]
  0x000001d2ef949c3a:   mov    r10,QWORD PTR [rsp]
  0x000001d2ef949c3e:   mov    r11d,DWORD PTR [r10+0x10]
  0x000001d2ef949c42:   mov    r8d,DWORD PTR [r10+0xc]
  0x000001d2ef949c46:   mov    DWORD PTR [rax+0x10],r8d
  0x000001d2ef949c4a:   mov    DWORD PTR [rax+0x14],r11d
  0x000001d2ef949c4e:   test   r8d,r8d
  0x000001d2ef949c51:   je     0x000001d2ef949cc5
  0x000001d2ef949c53:   mov    r9d,DWORD PTR [r8+0xc]
  0x000001d2ef949c57:   mov    r10d,r9d
  0x000001d2ef949c5a:   shl    r10d,0x5
  0x000001d2ef949c5e:   sub    r10d,r9d
  0x000001d2ef949c61:   test   r11d,r11d
  0x000001d2ef949c64:   je     0x000001d2ef949cd2
  0x000001d2ef949c66:   vmovsd xmm0,QWORD PTR [r11+0x10]
  0x000001d2ef949c6c:   vucomisd xmm0,xmm0
  0x000001d2ef949c70:   jp     0x000001d2ef949c74
  0x000001d2ef949c72:   je     0x000001d2ef949c94
  0x000001d2ef949c74:   mov    eax,0x7ff80000
  0x000001d2ef949c79:   add    eax,r10d
  0x000001d2ef949c7c:   add    eax,ebp
  0x000001d2ef949c7e:   add    eax,0x3c1
  0x000001d2ef949c84:   add    rsp,0x30
  0x000001d2ef949c88:   pop    rbp
  0x000001d2ef949c89:   mov    r10,QWORD PTR [r15+0x110]
  0x000001d2ef949c90:   test   DWORD PTR [r10],eax          ;   {poll_return}
  0x000001d2ef949c93:   ret

Ouch, now we have lost the elimination of the varargs array allocation just by iterating on it!

Confirmed with EA report:

======== Connection graph for  ObjectsHashJIT::bench                                                                                                                    
JavaObject NoEscape(NoEscape) NSR [ 390F 393F 213F 214F [ 63 68 ]]   51 AllocateArray   ===  5  6  7  8  1 ( 49  34  39  33  1  11  1  10 ) [[ 52  53  54  61  62  63 ]]
  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, int ) ObjectsHashJIT::iterVarArgsObjectsHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1  Type:{0:control
, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:rawptr:NotNull} !jvms: ObjectsHashJIT::iterVarArgsObjectsHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1     
LocalVar NoEscape(NoEscape) [ 51P [ 68 390b 393b ]]   63        Proj    ===  51  [[ 64  68  390  393 ]] #5  Type:rawptr:NotNull !jvms: ObjectsHashJIT::iterVarArgsObject
sHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1                                                                                                                      
LocalVar NoEscape(NoEscape) [ 63 51P [ 213b 214b ]]   68        CheckCastPP     ===  65  63  [[ 406  261  157  144  214  213  231  170  204  204  214 ]]   Type:narrowoo
p: java/lang/Object:BotPTR *[int:2]:NotNull:exact * !jvms: ObjectsHashJIT::iterVarArgsObjectsHash_noargs @ bci:1 ObjectsHashJIT::bench @ bci:1                          
                                                                                                                                                                        
=== No allocations eliminated for  ObjectsHashJIT::bench since there are no scalar replaceable candidates ===                                                           

Despite the fact that all objects are NoEscape!

Conclusion

Escape Analysis seems to fail, for no obvious reason, to elimnate the varargs array allocation which prevents to use freely Objects.hashCode method. Is it something that could be fixed easily?

Thanks to Richard Startin & Charlie Gracie for the review!

Update: 2020-08-23

Nils Eliasson points me to this change in JDK 15 EA that should improve the situation. After double checking it with a 15-ea build from builds.shipilev.net, here are the result for jdkObjectsHash:

Inlining decision (no change):

	@ 1   ObjectsHashJIT::jdkObjectsHash (22 bytes)   inline (hot)
	  @ 18   java.util.Objects::hash (5 bytes)   inline (hot)
		@ 1   java.util.Arrays::hashCode (56 bytes)   inline (hot)
		  @ 43   java.lang.Integer::hashCode (8 bytes)   inline (hot)
		  @ 43   java.lang.Double::hashCode (8 bytes)   inline (hot)
		   \-> TypeProfile (4775/9550 counts) = java/lang/Double
		   \-> TypeProfile (4775/9550 counts) = java/lang/Integer
			@ 4   java.lang.Double::hashCode (13 bytes)   inline (hot)
			  @ 1   java.lang.Double::doubleToLongBits (16 bytes)   (intrinsic)
			@ 4   java.lang.Integer::hashCode (2 bytes)   inline (hot)

Assembly output:

  0x000001f927c7bda0:   mov    DWORD PTR [rsp-0x7000],eax
  0x000001f927c7bda7:   push   rbp
  0x000001f927c7bda8:   sub    rsp,0x30
  0x000001f927c7bdac:   mov    r11d,DWORD PTR [rdx+0x10]
  0x000001f927c7bdb0:   mov    r10d,DWORD PTR [rdx+0xc]
  0x000001f927c7bdb4:   test   r10d,r10d
  0x000001f927c7bdb7:   je     0x000001f927c7be1d
  0x000001f927c7bdbd:   mov    ebx,DWORD PTR [r10+0xc]
  0x000001f927c7bdc1:   mov    r9d,ebx
  0x000001f927c7bdc4:   shl    r9d,0x5
  0x000001f927c7bdc8:   sub    r9d,ebx
  0x000001f927c7bdcb:   test   r11d,r11d
  0x000001f927c7bdce:   je     0x000001f927c7be2a
  0x000001f927c7bdd4:   vmovsd xmm0,QWORD PTR [r11+0x10]
  0x000001f927c7bdda:   nop    WORD PTR [rax+rax*1+0x0]
  0x000001f927c7bde0:   vucomisd xmm0,xmm0
  0x000001f927c7bde4:   jp     0x000001f927c7bde8
  0x000001f927c7bde6:   je     0x000001f927c7be09
  0x000001f927c7bde8:   mov    eax,0x7ff80000
  0x000001f927c7bded:   add    eax,r9d
  0x000001f927c7bdf0:   add    eax,r8d
  0x000001f927c7bdf3:   add    eax,0x3c1
  0x000001f927c7bdf9:   add    rsp,0x30
  0x000001f927c7bdfd:   pop    rbp
  0x000001f927c7bdfe:   mov    r10,QWORD PTR [r15+0x110]
  0x000001f927c7be05:   test   DWORD PTR [r10],eax          ;   {poll_return}
  0x000001f927c7be08:   ret

Which is much better now!

References