On Jan 17, 2008, at 5:13 AM, Michael Wilson wrote:
> If you create several instances in quick succession you get identical
> RNGs, and System.currentTimeMillis() doesn't have much entropy anyway.
I think this notion betrays a bit of cargo-cult programming, and it's
worth explaining why.
> private long defaultSeed() {
>
> final long xseed = ((iseed*ISEED_MULT)+ISEED_INC)&ISEED_MASK;
> final long nanos = System.nanoTime(); iseed = xseed;
> return (nanos>>>32)^(nanos<<32)^System.currentTimeMillis()^
> xseed^(((long)super.hashCode())<<32)^
> Runtime.getRuntime().freeMemory();
> }
>
> It combines entropy from the current real time, the number of clock
> ticks since last startup, the new object's memory address (via
> super.hashCode()), the amount of free memory in the JVM and the
> output of an embedded static RNG which uses the same algorithm as
> java.util.Random.
It's a cute function overall, but I have a few nits;
- nanoTime is not available on earlier Java versions, *and* it's not
guaranteed to work at all. It's probably not a good idea to use.
- hashCode is the wrong function. It should be
System.identityHashCode(...). It's also not as random as you think:
it's word aligned and sequentially packed on some systems.
> This produces a reasonable amount of entropy
> for experimental purposes (we have a couple of 'high quality' seeding
> methods too but they take longer to run).
I think this may be mistaken. It's true that if you used a crummy
generator with a small seed (such as c's rand()), providing a good
entropy seed would be important. But a crummy generator MT is
not. Generally speaking, providing seed entropy shouldn't have
almost any effect at all. It's waving a dead chicken.
Mersenne Twister has an internal state of 624 longs. If you fill
those longs with a poor-entropy value, or successive poor-entropy
values, after just a few pumps (probably just one pump), you will
*not* be able to notice from the MT output. It will be so random as
to pass every known statistical test. This is one reason why MT is
so well respected. (Eith one exception: if you have a *lot* of 0's
in your internal state, MT takes a while to warm up).
Even if you're still not willing to trust the math, we're not loading
the internal state with low-entropy values. We're loading it with
relatively high-entropy values generated as the stream output of a
carefully-selected linear congruential generator whose seed in turn
is set to the initial 32-bit number you provide. Ordinarily linear
congruential generators aren't fantastic to use as random number
generators for experiments, but they're just fine for providing
enough entropy to MT so that those cargo-cult programmers who don't
trust MT theorists can feel like you've waved the necessary dead
chickens.
This means that you should be able to use seeds like this:
2345, 2346, 2347, ...
and have pretty high-quality randomly different experimental
results. So you're likely not accomplishing anything with your super-
duper seeder above, other than feeling warm and fuzzy. If you have
statistical results that counter this, I (and I think a fair degree
of the RNG community) would really like to see them published! (no
really! I'm not trying to be sarcastic. What do you have?)
Now as to why we have the defaultSeed(), setSeed(), and 32-bit seed
choices: because they are the standard for MT19937. See here:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/
mt19937ar.c
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html
We chose to be 100% consistent with the MT standard rather than risk
wandering off the ranch in search of a seed twice as long (which only
benefits you if you need to do more than 2^32 experiments). If you'd
like something else, you should just go in and set all 624 initial
values in the MT array: there's a function for that. Otherwise,
there are *big* benefits to having a standard RNG.
32 bits, rather than a longg is not a big deal either. After all,
java.util.Random uses a only 48-bit seed rather than a long (and in
fact that's its entire internal state! java.util.Random is very poor).
Now: how to initialize RNGs in sequence? Assuming you let MT's
default seeder do its job, the answer then doesn't lie in seeding
entropy, but in simply guaranteeing that you have a different seed
each time. Just grab System.currentTimeMillis(), use it for your
first seed, then use the same value + C for your next seed, then the
next value + 2C for your next seed, then +3C, etc., where C is some
integer > 0. Just maintain that internal counter somewhere and you
should be fine.
Sean
|