ECJ-INTEREST-L Archives

January 2008

ECJ-INTEREST-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
ECJ Evolutionary Computation Toolkit <[log in to unmask]>
Date:
Fri, 18 Jan 2008 14:16:46 -0500
MIME-version:
1.0 (Apple Message framework v752.2)
Reply-To:
ECJ Evolutionary Computation Toolkit <[log in to unmask]>
Content-type:
text/plain; charset=US-ASCII; delsp=yes; format=flowed
Subject:
From:
Sean Luke <[log in to unmask]>
In-Reply-To:
Content-transfer-encoding:
7bit
Comments:
To: ECJ Evolutionary Computation Toolkit <[log in to unmask]>
Parts/Attachments:
text/plain (101 lines)
On Jan 17, 2008, at 5:13 AM, Michael Wilson wrote:

> If you create several instances in quick succession you get identical
> RNGs, and System.currentTimeMillis() doesn't have much entropy anyway.

I think this notion betrays a bit of cargo-cult programming, and it's  
worth explaining why.


>   private long defaultSeed() {
>
>     final long xseed = ((iseed*ISEED_MULT)+ISEED_INC)&ISEED_MASK;
>     final long nanos = System.nanoTime(); iseed = xseed;
>     return (nanos>>>32)^(nanos<<32)^System.currentTimeMillis()^
>            xseed^(((long)super.hashCode())<<32)^
>            Runtime.getRuntime().freeMemory();
> }
>
> It combines entropy from the current real time, the number of clock
> ticks since last startup, the new object's memory address (via  
> super.hashCode()), the amount of free memory in the JVM and the
> output of an embedded static RNG which uses the same algorithm as
> java.util.Random.

It's a cute function overall, but I have a few nits;

	- nanoTime is not available on earlier Java versions, *and* it's not  
guaranteed to work at all.  It's probably not a good idea to use.
	- hashCode is the wrong function.  It should be  
System.identityHashCode(...).  It's also not as random as you think:  
it's word aligned and sequentially packed on some systems.


> This produces a reasonable amount of entropy
> for experimental purposes (we have a couple of 'high quality' seeding
> methods too but they take longer to run).

I think this may be mistaken.  It's true that if you used a crummy  
generator with a small seed (such as c's rand()), providing a good  
entropy seed would be important.    But a crummy generator MT is  
not.   Generally speaking, providing seed entropy shouldn't have  
almost any effect at all.  It's waving a dead chicken.

Mersenne Twister has an internal state of 624 longs.  If you fill  
those longs with a poor-entropy value, or successive poor-entropy  
values, after just a few pumps (probably just one pump), you will  
*not* be able to notice from the MT output.  It will be so random as  
to pass every known statistical test.  This is one reason why MT is  
so well respected.  (Eith one exception: if you have a *lot* of 0's  
in your internal state, MT takes a while to warm up).

Even if you're still not willing to trust the math, we're not loading  
the internal state with low-entropy values.  We're loading it with  
relatively high-entropy values generated as the stream output of a  
carefully-selected linear congruential generator whose seed in turn  
is set to the initial 32-bit number you provide.  Ordinarily linear  
congruential generators aren't fantastic to use as random number  
generators for experiments, but they're just fine for providing  
enough entropy to MT so that those cargo-cult programmers who don't  
trust MT theorists can feel like you've waved the necessary dead  
chickens.

This means that you should be able to use seeds like this:

	2345, 2346, 2347, ...

and have pretty high-quality randomly different experimental  
results.  So you're likely not accomplishing anything with your super- 
duper seeder above, other than feeling warm and fuzzy.  If you have  
statistical results that counter this, I (and I think a fair degree  
of the RNG community) would really like to see them published!  (no  
really! I'm not trying to be sarcastic.  What do you have?)

Now as to why we have the defaultSeed(), setSeed(), and 32-bit seed  
choices: because they are the standard for MT19937.  See here:
	http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/ 
mt19937ar.c
	http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html

We chose to be 100% consistent with the MT standard rather than risk  
wandering off the ranch in search of a seed twice as long (which only  
benefits you if you need to do more than 2^32 experiments).  If you'd  
like something else, you should just go in and set all 624 initial  
values in the MT array: there's a function for that.  Otherwise,  
there are *big* benefits to having a standard RNG.

32 bits, rather than a longg is not a big deal either.  After all,  
java.util.Random uses a only 48-bit seed rather than a long (and in  
fact that's its entire internal state!  java.util.Random is very poor).

Now: how to initialize RNGs in sequence?  Assuming you let MT's  
default seeder do its job, the answer then doesn't lie in seeding  
entropy, but in simply guaranteeing that you have a different seed  
each time.  Just grab System.currentTimeMillis(), use it for your  
first seed, then use the same value + C for your next seed, then the  
next value + 2C for your next seed, then +3C, etc., where C is some  
integer > 0.  Just maintain that internal counter somewhere and you  
should be fine.

Sean

ATOM RSS1 RSS2