Hi Sean, 

  That's an interesting difference between our performance measurements.

  You wrote:
---
Note that 38% of the time is spent in GC; but allocation isn't that  
bad at all, at least with Xprof.  new array millicode is only 1.2%   
The big kahuna in allocation is clone(), which clocks in at 21.7% (up  
from 11.4% in Java 1.3.1).  This *should* basically boil down to a  
pointer increment and a memcpy.  (As opposed to new object(), which  
is a pointer increment and a bzero).  What I think is happening is  
that the cost of other elements are being optimized out in successive  
Java versions, so allocation is becoming more costly relative to them.
---
  
  An alternative explanation is that my inline performance measurements 
are not as accurate as the -XProf. The new array allocation may trigger 
some form of GC and my measurement is including the time involved for GC.