That's an interesting difference between our performance measurements.
You wrote:
---
Note that 38% of the time is spent in GC; but allocation isn't that
bad at all, at least with Xprof. new array millicode is only 1.2%
The big kahuna in allocation is clone(), which clocks in at 21.7% (up
from 11.4% in Java 1.3.1). This *should* basically boil down to a
pointer increment and a memcpy. (As opposed to new object(), which
is a pointer increment and a bzero). What I think is happening is
that the cost of other elements are being optimized out in successive
Java versions, so allocation is becoming more costly relative to them.
---
An alternative explanation is that my inline performance measurements
are not as accurate as the -XProf. The new array allocation may trigger
some form of GC and my measurement is including the time involved for GC.