Random Number Generation

Random number generation is a fascinating topic, and one that is too often glossed over by textbooks. Indeed, very few textbooks present robust random number generation idioms. We discuss that here.

How Do You Generate Random Numbers ?

Generating random numbers is not easy. There are a few approaches:

The approach the standard library uses is the algorithmic approach. Implementations vary from system to system, but some are very simple.

For example, here's a simplified version of the BSD random number implementation, where number is some global that is initialised by srand()

int rand()
{
	number = ( number * 1103515245 + 12345) % ((u_long)RAND_MAX + 1);
	return number;
}

If you think this looks flaky, well, you're right. It produces fairly predictable results under some circumstances, and we'll explore this.

Intro: The Functions

There are three functions that are used. srand(int) initialises the generator, rand() generates a nonnegative random number no greater than the implementation defined constant RAND_MAX, and time(0) returns the number of seconds since Jan 1 1970.

rand1.cc


#include <cstdlib>	// rand() and srand()
#include <ctime>	// time

using namespace std;
int main()
{
	// srand() initialises the random number generator.
	// time(0) returns the number of seconds since Jan 1 1970
	srand (time(0));

	// rand() returns a non-negative integer
	no greater than RAND_MAX
	int a = rand(); 

	return 0;
}

This is all well and good, but usually we want to generate random integers in a certain range. We explore a couple of idioms to do this.

Idioms to Generate Random Numbers

We attempt to address the problem of generating random integers in a given range. We consider several different implementations of a function int_rand(int n) which returns a non-negative integer strictly less than n.

The first common idiom is shown below. While this is commonly used in text books, it is not recommended. (for example, explore the behaviour of rand() % 2 in the above implementation if number is initialised to 1.)

rand2.cc



int int_rand( int n )
{
	return rand() %n ;
}

A better implementation inolves generating a floating point random number, and then rescaling it.

rand3.cc


int int_rand(int n)
{
	return int(rand()/(RAND_MAX + 1.0) * n );
}

Note that RAND_MAX+1.0 is used to divide by. There are two gotchas that this addresses: the first is that it makes sure that floating point division is used, as opposed to integer division. If integer division was used, rand()/RAND_MAX would be either 1 or 0. The other reason is this -- recall that rand() can return RAND_MAX and after scaling, we'd get back a freek result of n (no other outcome of rand() would produce this). Since we want a uniform distribution of outcomes, this would clearly be wrong.

This approach is not without its flaws. The problem is that if RAND_MAX is not divisible by n, then there is no uniform way to distribute the possible outcomes of rand(). If RAND_MAX/n is very large, this is not a problem, but if it's not so large, for example, n = 3*RAND_MAX/4, then some numbers will occur twice as frequently as others. So when RAND_MAX/n is large, we need another strategy.

An improved idiom was introduced to me by Andrew Koenig in a discussion on comp.lang.c++. This example along with several quality examples and explanations are to be found in his book Accelerated C++. Highly recommended. The basic idea is that you simply throw away numbers in such a way as to ensure a uniform distribution. For example, if we wanted to generate a number between 0 and 3*RAND_MAX/4, we'd just keep calling rand() until it returned a number in that range. If we want a random number between 0 and 2, we obviously don't want to throw away all numbers greater than or equal to 3 ! Instead, we pick a large (ie close to RAND_MAX) whole number multiple of 3, and throw away all outcomes of rand() greater than or equal to that.

Another improvement that could be made is that the program should handle out of range arguments. The correct way to do this is throw an exception. The header one needs to include to do this is stdexcept

rand4.cc


int_rand(int n)
{
	if ( n <= 0 || n > RAND_MAX )
		throw domain_error("Argument out of range");
	const int bucket_size = RAND_MAX / n;
	int a;
	do 
	{
		a = rand() / bucket_size;
	}
	while ( a >= n );

	return a;
}