PDA

View Full Version : Distribution of random


AndyP
05-10-2006, 04:54 AM
Hi,

Here's some code to generate a random number in the range [min,max)

int RandomInRange( const int min, const int max )
{
return ( rand( ) % ( max - min ) ) + min;
}


But... this does not give a nice distribution. If (max - min) is not a divisor of the maximum value that rand( ) can create, then the first N values in the range will have a higher probability of being generated (where N is (rand_max + 1) % (max - min)).

e.g. if rand() generates values from 0-5 (rand_max = 5), and by some chance, it generates the sequence

0,1,2,3,4,5,0,1,2,3,4,5

This is a nice uniform distribution, but putting it through RandomInRange(0,5) gives
0,1,2,3,4,0,0,1,2,3,4,0
, which biases 0. RandomInRange(0,4) gives
0,1,2,3,0,1,0,1,2,3,0,1
which biases 0 and 1.

We can fix RandomInRange() by normalising rand():

float NormalisedRandom( )
{
return float( rand( ) ) / float ( RAND_MAX );
}
int RandomInRange( const int min, const int max )
{
return int( ( NormalisedRandom( ) * float( max - min ) ) + float( min ) );
}


My question is: Is there a faster, non-floating-point way of ensuring a uniform distribution in an arbitrary range?

Nick
05-10-2006, 06:21 AM
This might do the job:

short random(short min, short max)
{
return (rand() * (max - min) + RAND_MAX / 2) / RAND_MAX + min;
}

To make it even faster, replace the division by RAND_MAX with a shift right by 15.

monjardin
05-10-2006, 06:21 AM
Don't use rand()? boost (http://www.boost.org/) has some useful RNG tools in the random (http://boost.org/libs/random/index.html) library, but I don't know about the performance.

Also, there was a discussion of random numbers a few months ago in this thread (http://www.devmaster.net/forums/showthread.php?t=5432&highlight=random+.oisyn).

juhnu
05-10-2006, 06:56 AM
Mersenne-Twister has a good performance and random distribution: http://www.bedaux.net/mtrand/

AndyP
05-10-2006, 08:17 AM
juhnu: rand() generates reasonable white-noise; I'm happy with that. The problem is that, no matter what random function I use as my input into RandomInRange(), I'll get a non-uniform distribution because of the modulus.

monjardin: Thanks for the reference! boost is always full of surprises :happy: uniform_int looks promising. The docs indicate that it has the same problem as RandomInRange() - it says they ignore errors caused by the folding of values by the modulus function because if the RNG range is considerably greater than the output range, the error is minimised.

However, looking at the implementation of uniform_int, it appears to simply generate numbers using the base RNG until one falls in the correct range... (of course, this is after only 5 minutes looking at boost code - it takes at least 5 years to fully understand a boost header file, and afterwards you are either blind or insane - so I could be mistaken).

Nick: That looks good I'll give it a go :happy:

.oisyn
05-10-2006, 08:26 AM
Nick's code is pointless, the number of possible outcomes of rand(), N, is finite and it'll therefore always suffer from uneven distributions if the range is not a divisor of N, no matter how you do the distribution.

The only way around it is to dump the extra values, or use an alternative function that guarantees an even distribution on a given range.

(BTW Nick, RAND_MAX isn't necessarily defined as 1<<15, that's just an MSVC++ thing ;). Plus the fact that it's actually (1<<15)-1, so shifting isn't the same)

bramz
05-10-2006, 08:30 AM
If you're happy to go with rand(), then you're happy to go with biased distributions in ranges ...

You don't even want to know how many papers that examine statistical numerical experiments are actually examining the statistic properties of rand().

If you're really serious about it, don't go with rand() ...
Mersenne Twister *cough* =)

AndyP
05-10-2006, 08:41 AM
OK, that other thread said it all :blink:

Blaxill suggested the same normalising method that I am using, .oisyn beat it with a stick, and suggested the method that boost uses (but documents incorrectly). Looks like that might be the best answer...

I guess one way to improve on it is to do something like:

int RandomInRange( const int min, const int max )
{
const int range = max - min;

// Find a divisor for RAND_MAX that is greater than (max-min)
const int mod = FindFirstDivisor(RAND_MAX,range);

int result = range;
while ( result >= range )
{
result = rand( ) % mod;
}

return result + min;
}

which'll cut down on the number of iterations you have to do, especially for small output ranges.

Hey ho.

AndyP
05-10-2006, 08:48 AM
GAH! Smart people post while I am busy formulating my reply :angry:

bramz: Mersenne twister it is.

.oisyn: See my last post, where the light slowly dawns.

Nick
05-10-2006, 01:14 PM
Oh, I thought the focus was on making it fast and 'almost' correct, for small ranges. That's too many wrong assumptions for one topic... My bad! :blush:

kusma
05-15-2006, 06:10 PM
fixed point ;)

long long NormalisedRandom2( )
{
return (long long)rand() * ((1UL << 31) / RAND_MAX);
}

int RandomInRange2( const int min, const int max )
{
return (int)((NormalisedRandom2() * (max - min)) >> 31) + min;
}

.oisyn
05-15-2006, 06:35 PM
And how does this help AndyP's problem?

kusma
05-15-2006, 06:38 PM
...because it is a faster, non-floating point version of the fix he provided himself? (the initial question, that is)

edit: i was under the impression that he was already pleased with the floating point method