LISTSERV mailing list manager LISTSERV 16.0

Help for MASON-INTEREST-L Archives


MASON-INTEREST-L Archives

MASON-INTEREST-L Archives


MASON-INTEREST-L@LISTSERV.GMU.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

MASON-INTEREST-L Home

MASON-INTEREST-L Home

MASON-INTEREST-L  February 2009

MASON-INTEREST-L February 2009

Subject:

Re: MersenneTwisterFast

From:

Pelle Evensen <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Thu, 5 Feb 2009 00:13:14 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (150 lines)

Sean Luke wrote:
> Pelle Evensen wrote:
>> Randy Casstevens wrote:
>>> Do some random seeds give better quality results for
>>> MersenneTwisterFast? I have heard that random seeds should be odd
>>> numbers, large numbers, and/or prime numbers, but does it matter for
>>> MersenneTwisterFast?
>>>
>> In short, for any good quality PRNG (the MersenneTwister would be
>> one) the choice of seed does not matter at all as far as "quality" goes.
>
> Actually, many very high-grade RNGs are known to have certain
> pathological seed cases. The crucial issue is: do we know what those
> cases are? If so, we're fine -- just avoid those seed choices.
Using the definition from "Definition 1" of
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/cacm90.pdf (P.
L'Ecuyer, ``Random Numbers for Simulation'', Communications of the ACM,
33 (1990), 85--98),
I distinguish between the *state*, S, and the *seed*, s_0. My point was
that if the API is well designed, s_0 is indeed in S but the set of
possible (reasonable, if you wish) s_0 is a strict subset of S. Well
designed in the sense that the implementer of the PRNG at hand most
likely can make sure that any s_0 is transformed into a unique initial
state for the generator where pathological behaviours are avoided.
> Mersenne Twister is no exception: and it's considered among best of
> the best! MT's internal state is an array of about 600 32-bit
> integers. Its seeding pathology is to fill that array initially with
> lots of zeros. It takes Mersenne Twister a while to work its way out
> of that state, during which time it tends to produce lots of
> relatively poorly random numbers.
And this is exactly why one should distinguish between seed and state;
you may want to be able to access the state to make sure that you can
clone a stream or resume a previous simulation but one should not
manipulate it by direct means, if that makes sense.

The "best of the best" is as usual depending on what you want to use it
for. It does fail some randomness tests;
See page 29 of
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf

It is possibly also slightly slower (the measurements shown in the
TestU01 paper are from C-code) and has a shorter period than some other
generators tested on pages 28-29.
> This turned out to be a problem for early seeding algorithms common
> among the Mersenne Twister community. The current standard seeding
> algorithm is to take a seed and hand it off to a simple but relatively
> random RNG (in this case, a particular linear congruential generator
> in one of Knuth's books), and pump that generator 600-ish times. Each
> output of the generator gets put into a different slot in the Mersenne
> Twister array. That gives you a nice random collection of numbers,
> and an acceptably small number of initial zeros.
>
> FWIW, my MersenneTwisterFast code follows the current standards.
So this is the seed expansion or seed -> state transformation which
(hopefully) makes sure that you can not set up the MT in any
particularly bad state.
>> The only thing to possibly worry about is if you start many different
>> simulations with nearby seeds. For some generators (depending on how
>> the seed is expanded to the internal state of the generator) this
>> will be a problem and for some others, it won't be. What the state of
>> the Mersenne Twister as implemented in MASON is, I don't know.
>
> Generally MT doesn't have this problem, partly because as an algorithm
> it's pretty robust to that issue, and partly because the Knuth seeding
> procedure described above creates very different initial internal
> array states from very similar random numbers. Overall, most high
> grade RNGs are robust against similar seeds.
A peculiarity about the MersenneTwisterFast class is that the
constructor taking a long and the setSeed(long) (I guess for some
compatibility with java.util.Random) ignores the topmost 32 bits. Since
there is nothing magical about the seed expansion, and Java handles 64
bit arithmetic just fine, why not use a 64-bit generator or two separate
32 bit generators to initialize the state (mt[...]) in an interleaved
fashion? It should still be trivial to guarantee that we won't get a bad
state.

Yet another peculiarity is that all conversions from int to the lower
precision types is made by shifting instead of bitmasking. Is this
faster for most CPU's? If there is an explicit conversion, the masking
will probably take place anyway. I don't know if the javac compiler or
JIT-compiler is clever enough to recognize that the msb:s are all zero.

For a linear congruential generator this makes sense but is there some
results (published or not) that supports the hypothesis that the msb:s
from MT19937 are any "worse" than the lsb:s?
>> A better (or at least safer) approach than the Mersenne Twister would
>> be to have a generator that generates distinct streams. This article
>> discusses the desirability of distinct streams;
>> http://www.iro.umontreal.ca/~lecuyer/myftp/papers/rstream.pdf
> Having a generator which is forkable is a nice feature. But I would
> be very hesitant to rely on an algorithm that doesn't have fairly
> provable features: and the paper you cited is actually a technical
> report for a code library, not an algorithmic description. My
> suggestion about random numbers: BE VERY CAREFUL ABOUT DEVIATING FROM
> WELL-STUDIED ALGORITHMS. I don't get a good feeling about the lack of
> formality in this paper.
I should have pointed to this paper instead;
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00.pdf
<http://www.iro.umontreal.ca/%7Elecuyer/myftp/papers/streams00.pdf> P.
L'Ecuyer, R. Simard, E. J. Chen, and W. D. Kelton, ``An
Objected-Oriented Random-Number Package with Many Long Streams and
Substreams'', Operations Research, 50, 6 (2002), 1073--1075.
This provides an algorithmic description. I don't think anyone should
worry too much about the lack of formality in
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/rstream.pdf. L'Ecuyer
may be the most prolific writer of journal papers on PRNG:s and PRNG
testing. A few of his papers are co-authored with Matsumoto of MT-fame.

Sorry for pointing to the wrong paper; it was mostly the reasoning and
motivation for independent streams I was after.

Sean Luke: Have you measured the performance effects of having the
MersenneTwisterFast implement some sensible interface? (Yeah, so
java.util.Random should *really* be an interface, not a class)?
If one is to do exact replication of some existing program written in a
different environment it may be much easier to replace the PRNG in MASON
than doing it in the original program. Being able to replace the
generator with a quasi-random generator or something easily observed as
deterministic would also simplify some testing and debugging.
> Stream protocols etc. are often implementable directly on top of
> well-regarded generators. MT would be particularly good for this
> given its famously gigantic period. It might be worthwhile to
> investigate the current RNG literature for formal protocols.
Yes, if any one is aware of fast methods of transforming an arbitrary
(but good) generator into one with independent streams, I'd very much
like to hear about it.

It can be done trivially but *not fast* by using any (reasonably strong)
symmetric block cipher like this;
Let $R$ be a simulation-quality PRNG outputting $n$-bit words.
Let $E_k(m)$ be encryption of $k$-bit message $m$ under the key $k$.
Note that $n \le k$ for this to work.

Now for each output of $R$, we run it through $E$ and since we do expect
all permutations to be equally likely, the streams would be independent,
of course under the assumption that the original generator has a period
shorter than $2^k$.

I will hopefully soon find the time to try this with reduced versions of
some common block ciphers. It may very well be that the function call
overhead is so large that can still be considered fast if one does batch
generation of numbers, in a way similar to what the MT19937
implementation does in MASON.

Even though the state space of MT19937 is huge, we don't have any
theoretical guarantees that two different seeds don't make us use two
sequences that are (partially) overlapping.

/Pelle

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

January 2023
November 2022
April 2022
March 2022
January 2022
November 2021
May 2021
April 2021
March 2021
February 2021
August 2020
July 2020
June 2020
February 2020
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
October 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
September 2017
June 2017
May 2017
March 2017
February 2017
January 2017
December 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
February 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
July 2004
May 2004
April 2004
March 2004

ATOM RSS1 RSS2



LISTSERV.GMU.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager