I actually had a similar question and thought that it was as little
more straight forward than this. Generally, for statistical
significance, you want to run an experiment/solve a problem many
times, each time with a different random number seed, and have some
detailed statistics on each run while having an aggregation of
experimental data such as number of evaluations and whether or not
the ideal individual was found.
Is there a hook to re-seed the random number generator?
Steve, I'm not positive what you're referring to by re-seeding the RNG,
but the MersenneTwister classes can be reseeded by calling their setSeed
methods.
But let me delve in a little bit about how to make scripts and how the RNG
works in ECJ. The random number generator's seed parameters, one per
thread, are seed.0, seed.1, etc., and they are set either to the number
zero, or to the string 'time' -- in which the wall clock time is used as a
seed, as in
seed.0 = 31234
or
seed.0 = time
If multiple seeds use 'time', then the time values are incremented, so you
don't have to worry that these two will have the same values:
seed.0 = time
seed.1 = time
The other common need is to write different stat files. The standard stat
filename parameter is defined in stat.file
stat.file = $out.stat
The '$' means 'in the same directory as where you ran the program' (as
opposed to 'in the same directory as the params file').
For years we've typically run ECJ using a UNIX shell script. Imagine that
you have a statistics output which generates the final result (which you
care about) on a single line at the end of the run -- this is commonly the
case for us using, say, KozaShortStatistics. Now let's say you want to
run 10 random jobs each of 7 different tournament selection sizes, then
you might issue a script like this:
#! /bin/tcsh
set arbitrary_offset=200031
foreach tournament (1 2 3 4 5 6 7)
touch tourn.${tournament}.out
rm tourn.${tournament}.out
set job_max=10
foreach job (0 1 2 3 4 5 6 7 8 9)
@ seed = $tournament * $job_max + $job + $arbitrary_offset
ec.Evolve -file my.params \
-p seed.0=$seed \
-p select.tournament.size = $tournament
tail -1 out.stat >> tourn.${tournament}.out
end
end
rm out.stat
And there you go. 7 files each called tourn.1.out, tourn.2.out, etc.,
with the final results of 10 jobs, one per line, in the files. Different
seeds for every run. If you wanted to save out different statistics
files, you could have added:
-p stat.file=out.${tournament}.${job}.stat \
If you wanted different parameter files, maybe one per tournament, called
1.params, 2.params, etc., you could say:
ec.Evolve -file ${tournament}.params \
You could conceivably also do it in a scripting language like BeanShell
(btw, if you've not played with BeanShell, you really should do so
immediately -- www.beanshell.org). BeanShell is basically a command-line
for java.
Realizing that not everyone runs in a Linux or MacOS X environment, we
have also added some rudimentary job facilities to ECJ. The simplest
approach is to run ECJ with the jobs parameter:
java ec.Evolve -file my.params -p jobs=10 -p seed.0=0
This will run ECJ 10 times, using seeds 0 through 10 respectively. Each
job's statistics file will be written out separately as job.N.out.stat
where N is the job number. Try it!
The jobs parameter reflects our basic default approach to handling jobs in
java code directly. Want to do something more advanced? If you take a
peek into the Evolve java code, you'll notice that we've rearranged it
considerably. In particular, the main() function has been rewritten in
such a way as to make it possible for you to do all sorts of job loops,
loops within loops, whatever you need, if you so desire to do so in java
code rather than an outer script.
Sean
|