The short version of the story - By setting gc=false and adding the
following parameter to the VM command line" -XX:+AggressiveHeap, ECJ used
28% less computational time. Same seed.
The performance increase is a result of instructing the JVM to use a
more efficient garbage collection algorithm.
Note, using this switch uses memory more aggressively, so you may want
to read on.
The long version of the story.
I have been using ECJ for the last year and in the past few months
became interested in increasing performance. To that end, I added timers
to each of the major operations of ECJ. Here's what a typical report my
look like:
> Java Default GC Parameters
>
> Run Time:93.94 -> 100.00%
> Setup:3.98 -> 4.23%
> Evaluate:17.94 -> 19.09%
> PostEval Stats:4.32 -> 4.60%
> Breed:67.02 -> 71.34%
> Final Stats:0.90 -> 0.96%
First number is seconds and second number is percent of CPU time.
I was surprised to see that breeding consumed such a large proportion of
the CPU consumption - 71%. Timers were added to specific functions inside
of breeding:
> Run Time:93.94 -> 100.00%
> Setup:3.98 -> 4.23%
> Evaluate:17.94 -> 19.09%
> PostEval Stats:4.32 -> 4.60%
> Breed:67.02 -> 71.34%
> --PrepareToProduce:7.47 -> 7.95%
> --Produce:59.38 -> 63.21%
> ----Produce_BreedingSources:0.78 -> 0.83%
> ----Produce_Produce:57.95 -> 61.68%
> ------CrossOverPipeline_Breed6:33.62 -> 35.79%
> -------CrossOverPipeline_Breed7_childAllocate:8.02 -> 8.54%
> Final Stats:0.90 -> 0.96%
One line of code “obj.children = new GPNode[children.length]” was found to
consume 8.54% of the CPU time. This line of code allocates memory from the
heap for tree children - a very common operation in breeding.
A first tack against this consumer was to think that dynamic object
allocation was eating processor time. This could be cured by statically
defining a collection of nodes that the GP could recycle through without
JVM memory allocation/de-allocation. This was briefly explored until
realizing that creating/maintaining such a collection would require
extensive code changes.
A second and ultimately successful tack was to consider the de-allocation
side. Was the JVM’s garbage collection consuming the processor time?
Garbage collection is performed by the JVM without needing to be
programmed directly – one of the beauties of Java – no memory leaks. In
most programs, this does not introduce performance degradation. However,
most programs don’t allocate millions of objects per second as ECJ.
While Sun Java does offer some access to performing explicit garbage
collection in code, doing this in ECJ only introduced additional
performance degradation. The other way to control GC is through JVM
command line parameters. After a bit of internet research, the following
command GC/Heap parameters were discovered and tested.
> 75000 individuls x 5gen
> Command Line Switch Total Run Time
> --none-- 11.5s
> -XX:+UseParallelGC 13.7s
> -XX:+UseConcMarkSweepGC 16.5s
> -Xincgc 16.7s
> -XX:+UseAdaptiveSizePolicy 11.6s
> -XX:+AggressiveHeap 7.2s
After determining the best command line switch, I tested combinations of
ECJ's garabge collection and the switch.
> 75000 inds x 50gen – testing switch with ECJ code GC
> Command Line Switch ECJ GC Total Run Time
> --none-- none 93.9s
> -XX:+AggressiveHeap none 62.0s
> --none-- normal 100.1s
> -XX:+AggressiveHeap normal 78.8s
> -none aggreesive 111.2s
> -XX:+AggressiveHeap aggressive 91.0s
As can be observed from the above results, using -XX:+AggressiveHeap and
turning off ECJ GC resulted in the much fastred run times by far.
A note on memory usage. -XX:+AggressiveHeap is just that - aggressive. If
you get an "Out of Heap Memory" error when using this switch, then you
will need to setthe VM parameter Xmx???m. This parameter sets the maximum
heap size and units are megabytes. The smallest value I ever use is:
Xmx75m. I'm running symbolic regression with average trees sizes of 18. My
rule of thumb is to allocate 2M of memory for every 1,000 individuals. For
example, a run of 200,000 individuals, I will set the parameter to
Xmx400m. Your needs may be more or less depending on your tree/population
sizes. There's a balance here. It needs to be set high enough so there's
enough heap space. It needs to be set low enough as to not aggressively
consume so much memory that you start doing disk page swaps, especially if
you're running multiple simultaneous runs. Again, this might not be an
issue for smaller populations, non-paralell runs or if you have vast
amounts of memory.
For more information on the -XX:+AggressiveHeap switch and other heap
related switches:
http://java.sun.com/j2se/1.4.2/reference/whitepapers/
I'd love to hear back from any of you who experiment with this.
Also, if there's popular demand, I can include the Performance
Monitoring Class that I've created.
best initentions, Chris Ellingwood
(University of Vermont, Botany)
-----------
Configuration I:
Sun Java: 1.5.0_05
Dual Processor Xeon 3.6 GHZ
Windows XP - 64 bit
ECJ 11
Configuration II:
Sun Java: 1.5.0_05
Pentium 4 - 2.4 GHZ
Windows XP
ECJ 11
|