The short version of the story - By setting gc=false and adding the 
following parameter to the VM command line" -XX:+AggressiveHeap, ECJ used 
28% less computational time. Same seed. 
  The performance increase is a result of instructing the JVM to use a 
more efficient garbage collection algorithm.
  Note, using this switch uses memory more aggressively, so you may want 
to read on.

  The long version of the story. 

  I have been using ECJ for the last year and in the past few months 
became interested in increasing performance. To that end, I added timers 
to each of the major operations of ECJ. Here's what a typical report my 
look like:

> Java Default GC Parameters
> Run Time:93.94 -> 100.00%
> Setup:3.98 -> 4.23%
> Evaluate:17.94 -> 19.09%
> PostEval Stats:4.32 -> 4.60%
> Breed:67.02 -> 71.34%
> Final Stats:0.90 -> 0.96%

  First number is seconds and second number is percent of CPU time. 
  I was surprised to see that breeding consumed such a large proportion of 
the CPU consumption - 71%. Timers were added to specific functions inside 
of breeding:

> Run Time:93.94 -> 100.00%
> Setup:3.98 -> 4.23%
> Evaluate:17.94 -> 19.09%
> PostEval Stats:4.32 -> 4.60%
> Breed:67.02 -> 71.34%
> --PrepareToProduce:7.47 -> 7.95%
> --Produce:59.38 -> 63.21%
> ----Produce_BreedingSources:0.78 -> 0.83%
> ----Produce_Produce:57.95 -> 61.68%
> ------CrossOverPipeline_Breed6:33.62 -> 35.79%
> -------CrossOverPipeline_Breed7_childAllocate:8.02 -> 8.54%
> Final Stats:0.90 -> 0.96%

One line of code “obj.children = new GPNode[children.length]” was found to 
consume 8.54% of the CPU time. This line of code allocates memory from the 
heap for tree children - a very common operation in breeding.

A first tack against this consumer was to think that dynamic object 
allocation was eating processor time. This could be cured by statically 
defining a collection of nodes that the GP could recycle through without 
JVM memory allocation/de-allocation. This was briefly explored until 
realizing that creating/maintaining such a collection would require 
extensive code changes.

A second and ultimately successful tack was to consider the de-allocation 
side. Was the JVM’s garbage collection consuming the processor time? 
Garbage collection is performed by the JVM without needing to be 
programmed directly – one of the beauties of Java – no memory leaks. In 
most programs, this does not introduce performance degradation. However, 
most programs don’t allocate millions of objects per second as ECJ. 

While Sun Java does offer some access to performing explicit garbage 
collection in code, doing this in ECJ only introduced additional 
performance degradation. The other way to control GC is through JVM 
command line parameters. After a bit of internet research, the following 
command GC/Heap parameters were discovered and tested.

> 75000 individuls x 5gen
> Command Line Switch Total Run Time
> --none--                    11.5s
> -XX:+UseParallelGC          13.7s
> -XX:+UseConcMarkSweepGC     16.5s
> -Xincgc                     16.7s
> -XX:+UseAdaptiveSizePolicy  11.6s
> -XX:+AggressiveHeap          7.2s

After determining the best command line switch, I tested combinations of 
ECJ's garabge collection and the switch. 

> 75000 inds x 50gen – testing switch with ECJ code GC
> Command Line Switch ECJ GC Total Run Time
> --none--              none      93.9s
> -XX:+AggressiveHeap   none      62.0s
> --none--              normal      100.1s
> -XX:+AggressiveHeap   normal      78.8s
> -none                 aggreesive  111.2s
> -XX:+AggressiveHeap   aggressive   91.0s

As can be observed from the above results, using -XX:+AggressiveHeap and 
turning off ECJ GC resulted in the much fastred run times by far. 

A note on memory usage. -XX:+AggressiveHeap is just that - aggressive. If 
you get an "Out of Heap Memory" error when using this switch, then you 
will need to setthe VM parameter Xmx???m. This parameter sets the maximum 
heap size and units are megabytes. The smallest value I ever use is: 
Xmx75m. I'm running symbolic regression with average trees sizes of 18. My 
rule of thumb is to allocate 2M of memory for every 1,000 individuals. For 
example, a run of 200,000 individuals, I will set the parameter to 
Xmx400m. Your needs may be more or less depending on your tree/population 
sizes. There's a balance here. It needs to be set high enough so there's 
enough heap space. It needs to be set low enough as to not aggressively 
consume so much memory that you start doing disk page swaps, especially if 
you're running multiple simultaneous runs. Again, this might not be an 
issue for smaller populations, non-paralell runs or if you have vast 
amounts of memory. 

For more information on the -XX:+AggressiveHeap switch and other heap 
related switches:

  I'd love to hear back from any of you who experiment with this. 

  Also, if there's popular demand, I can include the Performance 
Monitoring Class that I've created.

  best initentions, Chris Ellingwood

(University of Vermont, Botany)


Configuration I:
Sun Java: 1.5.0_05
Dual Processor Xeon 3.6 GHZ 
Windows XP - 64 bit
ECJ 11

Configuration II:
Sun Java: 1.5.0_05
Pentium 4 - 2.4 GHZ 
Windows XP
ECJ 11