This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: True Random Numbers


On Mon, 15 Nov 2004, C J Kenneth Tan -- OptimaNumerics wrote:

> Robert,
> 
> Have you considered the view that putting pseudo-random number
> generators through statistical tests to see when they fail is like
> going fishing and take a picture when it fails?  This was Pierre
> L'Ecuyer's argument for putting pseudo-random number generators
> through application tests.

There are a number of paradoxical aspects to rng's, beginning with the
oxymoron itself (which leads careful individuals such as yourself to add
the "pseudo" modifier;-).

However, all random number tests are application tests, just as all
benchmarks are application tests.  The only question is the size and
purpose of the application.  All random number tests apply random
numbers from an rng (p understood:-) to the process of generating some
result that is either known from theory or presumed known from
experience.  The deviation of the obtained result from the
known/expected result is then related to the probability p that the
obtained result could have been obtained if rng was truly random.  If
that p-value is consistently smaller than it should be by chance, then
you have some reason to doubt that the generator is good (presuming good
control over all other sources of error, e.g. programming errors in the
application that are sensitive to the rng in some otherwise irrelevant
way).

This is slightly different from ordinary benchmarking.  The only truly
reliable systems performance benchmark (as a measure of the expected
performance of the system on YOUR CODE) is (unsurprisingly) YOUR CODE.
Hence, I do performance benchmarking with my own Monte Carlo code (in
addition to lots of other microbenchmark measures that provide me with
useful information about how fast my system e.g. evaluates GSL random
numbers that can obviously help me tune performance of my application).

I have a difficulty, however, with using my own MC application to test
rng's, in spite of the fact that it consumes extremely large numbers of
them.  You see, I don't know the answers it is supposed to be returning
-- I'm doing the computation to OBTAIN the answers when NOBODY knows
them.  The only way a "failure" could be detected is by redoing the
entire computation (done on a beowulf, we're talking GFLOP-years of
numerical effort) with different rngs and hope to observe a difference
(or not) compared to the numerical resolution of the answers (which in
fact I have done in the past).  Even this doesn't tell which (if either)
of the answers is "correct", only that at least one of them might be
INcorrect.

Mind you, this is the WAY certain failures have been spectacularly found
in the past -- sometimes after garbage results were published.  This is
a tremendous expense both in terms of the money wasted on doing the
computation and then doing it over over (often over years in both cases)
and the academic reputations of those involved.  Thus the quality of the
rng used in a really long simulational computation of a truly unknown
result is an inevitable source of ulcers to (I expect) everybody in the
MC game, where the reliability and accuracy of the results are at best
known a posteriori when somebody else redoes the computation with a
different rng, gets a completely different result, and writes a comment
in the literature.

Thus the strong need for "standardized" rng testers.  Everybody who does
numerical work of this sort knows or SHOULD know about the famous
failures of rngs in the past, many of them the built-in/default
generators distributed with Unix, with PC operating systems or
compilers, built into firmware or a numerical library.  They SHOULD know
about the obvious modes of failure -- generators with (too) short
periods for the application, generators that misbehave for e.g. even or
odd seeds or just certain seeds, generators that have to be "warmed up",
the distribution of rands on N-dimensional hyperplanes, simple
bit-distribution bias or deep/occult sequential bias from the iterated
map itself.  Some of those modes of failure may be irrelevant for
certain problems, and some problems may fail even for generators that
have no obvious weaknesses on existing tests -- hence L'Ecuyer's
observation is not without merit.  In fact I have a good friend (Richard
Palmer) who was once doing a numerical simulation on a very complex
system (a spin glass, IIRC) that he claimed was the most difficult test
he'd ever found for an rng -- it was sensitive to failures in uniformity
in some absurdly high dimensionality to the point where he just about
couldn't do the computation at all, at least at that time.

I therefore think that most actual simulation workers should and
generally do take a deep and abiding interest in just what tests a given
generator does or does not pass, and sometimes CAN get a pretty good
idea of whether or not their code is likely to be sensitive to entire
classes of failure.  Cryptographers have a different interest, and hence
a different set of quality tests they often look at, and this seems to
me to appear in the design of the two primary testing suites.  STS and
NIST appear to be more concerned with bitlevel randomness in
cryptographical applications (where any sort of bitlevel pattern can
dramatidcally reduce the space that has to be tested to crack an encoded
message).  Diehard looks like it is a bit more concerned with uniformity
and ability to generate known high order and sequential distributions
correctly; although it contains a few bitlevel tests as well it omits
e.g. the STS monobit test (which a number of GSL routines fail up
front).

So yes, but -- no.

Besides, if one DOES have an application one wishes to test an rng for,
it can really help to have an entire testing apparatus with all the
important components common to ANY rational test prebuilt and integrated
with a simple embedded testing mechanism, and that's really what
rand_rate is intended to become.  That's why I keep adding little tests
of my own as they occur to me -- to ensure that adding tests as
"objects" is easy and eventually can be done with a straightforward API,
just as GSL already makes it easy to add rngs within a consistent API.

I should probably come up with a better name for it than rand_rate --
something that indicates that it is an extensible test suite and not
just a canned set of tests that nobody can touch or modify.  That way if
one DOES wish to comparatively test one's own code to see if at the very
least you get the same results from a subtractive algorithm like NR's
ran3, from GSL's mt19937, from Knuth's ran2, from a shuffled linear
congruential mix like NR's ran1, and from the generator you happened to
use back when one published a given set of results computed on a VAX,
one can simply plug one's code into the API, run NxM times with unique
random seeds to generate N vectors of average results containing M
independent samples with a reliable sigma, and do a Komogorov-Smirnov
test on the various pairs of rng-generated N-vectors of results and
sigmas all without having to write any of the underlying tests yourself
or even understand the central limit theorem beyond a book definition of
what the resulting p values really are.

Obviously the GSL is an ideal vehicle for such a testing tool as it
already embeds a significant fraction of the commonly implemented rngs
known to mankind and it is clearly the intention of the rng person to
make this ALL the published rng's in common use for precisely this sort
of reason and testing.  For this reason I'm hoping to donate the KS
distribution generators themselves back to the GSL, although my
implementation of them may be initially naive since I'm a physicist, not
a mathematician working in statistics.  However, I can read the
literature as well as anybody and code better than most, and having them
in a full GPL open source vehicle ensures that even a naive
implementation will eventually get checked and corrected.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]