Print

Print


On Jan 21, 2008, at 6:05 AM, Michael Wilson wrote:

> By all means, please independently replicate my tests.

I did.  And I am getting totally different results.  What system did  
you use to compute this?

I downloaded 'ent', a randomness tester from Fourmilab (http:// 
www.fourmilab.ch/random/),  and ran it on  
ec.util.MersenneTwisterFast's output (ec.util.MersenneTwister  
generated the same identical file of course -- I double-checked though).

You didn't specify the seed, so I tried a few.  Here's  
MersenneTwister on 1000, starting cold.

	Entropy = 7.999983 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 230.74, and randomly
	would exceed this value 75.00 percent of the times.

	Arithmetic mean value of data bytes is 127.5000 (127.5 = random).
	Monte Carlo value for Pi is 3.141094856 (error 0.02 percent).
	Serial correlation coefficient is -0.000300 (totally uncorrelated =  
0.0).


Here's MersenneTwister with 1.

	Entropy = 7.999981 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 265.11, and randomly
	would exceed this value 50.00 percent of the times.

	Arithmetic mean value of data bytes is 127.5013 (127.5 = random).
	Monte Carlo value for Pi is 3.140317256 (error 0.04 percent).
	Serial correlation coefficient is 0.000126 (totally uncorrelated =  
0.0).


Here's MT seeded with the current time, six tests:

1.	Entropy = 7.999981 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 266.75, and randomly
	would exceed this value 50.00 percent of the times.

	Arithmetic mean value of data bytes is 127.5005 (127.5 = random).
	Monte Carlo value for Pi is 3.141267657 (error 0.01 percent).
	Serial correlation coefficient is -0.000034 (totally uncorrelated =  
0.0).

2.	Entropy = 7.999982 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 255.09, and randomly
	would exceed this value 50.00 percent of the times.

	Arithmetic mean value of data bytes is 127.4979 (127.5 = random).
	Monte Carlo value for Pi is 3.143893258 (error 0.07 percent).
	Serial correlation coefficient is -0.000742 (totally uncorrelated =  
0.0).

3.	Entropy = 7.999981 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 256.70, and randomly
	would exceed this value 50.00 percent of the times.

	Arithmetic mean value of data bytes is 127.4844 (127.5 = random).
	Monte Carlo value for Pi is 3.141654057 (error 0.00 percent).
	Serial correlation coefficient is 0.000353 (totally uncorrelated =  
0.0).

4.	Entropy = 7.999984 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 217.05, and randomly
	would exceed this value 95.00 percent of the times.

	Arithmetic mean value of data bytes is 127.5044 (127.5 = random).
	Monte Carlo value for Pi is 3.141740457 (error 0.00 percent).
	Serial correlation coefficient is -0.000518 (totally uncorrelated =  
0.0).

5.	Entropy = 7.999982 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 252.58, and randomly
	would exceed this value 50.00 percent of the times.

	Arithmetic mean value of data bytes is 127.4936 (127.5 = random).
	Monte Carlo value for Pi is 3.142441257 (error 0.03 percent).
	Serial correlation coefficient is 0.000094 (totally uncorrelated =  
0.0).

6.	Entropy = 7.999983 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 236.90, and randomly
	would exceed this value 75.00 percent of the times.

	Arithmetic mean value of data bytes is 127.4783 (127.5 = random).
	Monte Carlo value for Pi is 3.141865257 (error 0.01 percent).
	Serial correlation coefficient is 0.000283 (totally uncorrelated =  
0.0).

So generally we get Chi Squares of 50 (extremely good), occasionally  
around 75, and one hovering at 95.  Overall, very good according to  
Fermilab.


You indicated that MT was doing worse than java.util.Random.  That's  
surprising to me because my understanding is that java.util.Random is  
rather well known to have poor randomness qualities, see for example  
http://alife.co.uk/nonrandom/ for an astonishing result.

So here's the results for java.util.Random seeded with a 1000.

	Entropy = 7.999992 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 107.78, and randomly
	would exceed this value 99.99 percent of the times.

	Arithmetic mean value of data bytes is 127.5021 (127.5 = random).
	Monte Carlo value for Pi is 3.140214056 (error 0.04 percent).
	Serial correlation coefficient is -0.000008 (totally uncorrelated =  
0.0).


Eesh, bad bad Chi Square.  Here's some java.util.Random results with  
current time:

1.	Entropy = 7.999993 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 100.67, and randomly
	would exceed this value 99.99 percent of the times.

	Arithmetic mean value of data bytes is 127.5045 (127.5 = random).
	Monte Carlo value for Pi is 3.141442857 (error 0.00 percent).
	Serial correlation coefficient is -0.000018 (totally uncorrelated =  
0.0).

2.	Entropy = 7.999993 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 97.79, and randomly
	would exceed this value 99.99 percent of the times.

	Arithmetic mean value of data bytes is 127.5001 (127.5 = random).
	Monte Carlo value for Pi is 3.141452457 (error 0.00 percent).
	Serial correlation coefficient is 0.000059 (totally uncorrelated =  
0.0).

3.	Entropy = 7.999993 bits per byte.

	Optimum compression would reduce the size
	of this 10000000 byte file by 0 percent.

	Chi square distribution for 10000000 samples is 98.80, and randomly
	would exceed this value 99.99 percent of the times.

	Arithmetic mean value of data bytes is 127.4982 (127.5 = random).
	Monte Carlo value for Pi is 3.141766857 (error 0.01 percent).
	Serial correlation coefficient is -0.000085 (totally uncorrelated =  
0.0).


So... not a fluke.



> It would appear that in this case, the chickens were very much alive.
> Incidentally, when you reported a 'bug' in java.util.Random a few  
> years
> back, regarding 'dimensional stability' in reusing an RNG vector  
> element
> three times to generate three bytes, did you run statistical tests or
> did you rely on some combination of intuition and poultry of uncertain
> freshness? :)

I think way back then I relied on a quote from NRC.  They were  
respectable back then.  And I think they're probably still correct  
right now, given java.util.Random's poor performance.

Sean