Tuesday, September 20, 2011

Pseudo random number generators

In the previous post, we learned that a complex system can exhibit deterministic, but unpredictable behavior. Actually, this aspect has a strong application in computers that has been around for a long time: random number generators.
Many computer programs, like for example games or Monte Carlo simulation require a random function. Typically, a computer operating on defined binary states that only change at particular clock instants has no true randomness - its operations are deterministic. So, typically pseudo random number generators are used to generate number sequences similar to random ones.
In 1946, John von Neumann suggested such an algorithm named the middle-square method. Von Neumann's algorithm is simple: take a number as the "seed", square it, remove the middle digits of the resulting number (when viewed in decimal format) and take the remaining digits as your "random number". This number becomes also the new seed. The algorithm is fast, but statistic tests on the generated random numbers show that the distribution of numbers significantly differs from true random numbers. Worse, once the seed becomes zero, the algorithm gets stuck.
A few decades after Neumann, the programming language C came with a library function for generating random numbers using the functions rand (to pull a number) and srand (to set the seed). For generating one number, the algorithm linearly combines several terms based on constants derived from the seed. This means a higher implementation effort, but the test passes most statistical tests on randomness, such as for example are given ba the Dieharder random test suite.
The famous home computer Commodore 64 had a different algorithm for random number generation, which was simpler: The last seed is multiplied with a big number, a small number is added and the resulting number (given in a floating point format) is mixed up by switching bytes. Although the C64 algorithm was considered practically useful in a publication from Bowman, the Diehard test quickly shows a lot of deficiencies of the algorithm.
In 1997, Makoto Matsumoto and Takuji Nishimura came up with a very good algorithm, the Mersenne Twister. While this algorithm passes most of the Dieharder tests, the implementation has also grown in its complexity. For example, a common implementation of the algorithm requires an array of 624 numbers.

So - do you think now there is a tradeoff between quality of randomness and computation effort? Wrong!

The late (and great) George Marsaglia showed us a simple way to generate high quality numbers: the Xorshift algorithm.
public long randomLong() {
  x ^= (x << 21);
  x ^= (x >>> 35);
  x ^= (x << 4);
  return x;
}
The computational effort of the algorithm is obviously very low. If you chose "good" values for the three magic numbers (e.g., 21, 35, 4 as in the above example) Xorshift passes the Diehard tests. As it is the case in many complex systems, there is a simple way to control the systems behavior as intended, but is very difficult to find that simple way. In the case of pseudo random numbers it took 60 years from John von Neumann's middle-square to George Marsaglia's Xorshift.
  • J. von Neumann. "Various techniques used in connection with random digits", in A.S. Householder, G.E. Forsythe, and H.H. Germond, eds., Monte Carlo Method, National Bureau of Standards Applied Mathematics Series, 12:36-38, reprinted 1951.
  • M. Matsumoto and T. Nishimura. "Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator". ACM Transactions on Modeling and Computer Simulation 8(1):3–30, 1998
  • R. Bowman. Evaluating Pseudo-Random Number Generators. Comput. &amb; Graphics, Vol. 19, No. 2, pp. 315-324, 1995.
  • G. Marsaglia. "Xorshift RNGs". Journal of Statistical Software Vol. 8 No. 14, 2003.

1 comment: