The Mersenne Twister, in PHP

Introduction

This page describes a pure-PHP implementation of the Mersenne Twister, in the public domain. It uses namespaces, and therefore requires PHP 5.3.0 at least.

The code is maintained on GitHub; downloads are also available.

Example

<?php
require_once "mersenne_twister.php";

use
mersenne_twister\twister;

#--------------------------------------------

$twister1 = new twister(42);
$twister2 = new twister(42);
/*
42 is a seed for initialising
the random-number generator.
*/

for($i = 0; $i < 10; $i++)
# int32 returns a random 32-bit integer
if($twister1->int32() !== $twister2->int32())
print
"They're different -- " .
"this is not supposed to happen!\n";

#--------------------------------------------

$num_iters = 1000;

$twister3 = new twister(42);
$saved = serialize($twister3);

$sum = 0;
for(
$i = 0; $i < $num_iters; $i++)
$sum += $twister3->rangereal_halfopen(10, 20);
/*
the call to rangereal_halfopen produces a
floating-point number >= 10 and < 20
*/

print "This is the average, " .
"which should be about 15: " .
(
$sum / $num_iters) . "\n";

$twister3 = unserialize($saved);

# run the loop again
#
$sum = 0;
for(
$i = 0; $i < $num_iters; $i++)
$sum += $twister3->rangereal_halfopen(10, 20);

print
"This is the average again, " .
"which should be the same as before: " .
(
$sum / $num_iters) . "\n";

#--------------------------------------------

$twister4 = new twister;

$twister4->init_with_file("/dev/urandom", twister::N);
/*
This reads characters from /dev/urandom and
uses them to initialise the random number
generator.

The second argument is multiplied by 4 and
then used as an upper bound on the number of
characters to read.
*/

if($twister4->rangeint(1, 6) == 6)
print
"You've won -- congratulations!\n";

API

The software resides in a single file, called mersenne_twister.php, which should be loaded with require_once or include_once. Everything in this file is defined in a namespace called mersenne_twister.

Among the things defined is a class called twister, which constitutes the API; apart from this class, everything else in the namespace should be considered private.

The following are the methods of twister.

__construct()

Does nothing if it receives no arguments.

If it receives exactly one argument, it behaves like init_with_integer. In other words, the single line

$twister = new twister(42);
is functionally identical to
$twister = new twister;
$twister->init_with_integer(42);
init_with_integer($integer_seed)

Uses $integer_seed to initialise the random number generator.

It is not necessary for is_integer($integer_seed) to be true, because init_with_integer will call a function named force_32_bit_int, which will do its utmost to obtain a 32-bit unsigned integer from $integer_seed. For example, any of these calls will do something sensible:

$twister->init_with_integer(12.34);
 
$twister->init_with_integer("42");
 
$twister->init_with_integer("-42");
 
$twister->init_with_integer(pow(2, 40));
 
$twister->init_with_integer(-pow(2, 40));
 
$twister->init_with_integer(1 << 40 | 1);
/*
intended for a 64-bit machine; it's not
clear what meaning it has on a 32-bit machine.
*/

force_32_bit_int is not part of the API, and is mentioned here only to make the exposition clearer.

init_with_array($integer_array)

The elements of $integer_array are processed with force_32_bit_int and then used to initialise the random number generator.

The running time of this function is asymptotically linear in count($integer_array). However, all integer arrays whose length is less than or equal to twister::N should have the same running time. (twister::N has the value 624.)

The internal state of a newly-seeded twister object is represented by 624 PHP integers. This means that if $integer_array is randomly chosen, and its entropy is as large as possible, then there is no point in having count($integer_array) be greater than 624.

These remarks apply, mutatis mutandis, to the next three methods (init_with_string, init_with_files, init_with_file), all of which are merely wrappers for init_with_array.

init_with_string($string)

$string is broken up into 4-byte fragments, which are then converted to integers and passed to init_with_array.

If the length of $string is not divisible by 4, we begin by padding it at the end with zero bytes.

The algorithm for converting a fragment to a string could be implemented as follows:

function convert($fragment) {
assert(strlen($fragment) == 4);
 
$result = 0;
 
for($i = 0; $i < 4; $i++) {
$result <<= 8;
$result |= ord(substr($fragment, $i, 1));
}
 
return $result;
}

However, we use unpack, since it's faster.

init_with_files($files)

The contents of the files are concatenated and the resulting string is then passed to init_with_string.

This method has an optional second argument. If supplied, it is multiplied by 4 (because there are 4 bytes in a 32-bit integer) to give an upper bound on the length of the generated string. It also causes the method to process only as many files as it needs to, and read only as many bytes as are necessary.

init_with_file($file)

The contents of $file are read and the resulting string is then passed to init_with_string.

This method has an optional second argument; if supplied, it is multiplied by 4 to give an upper bound on the number of bytes read from $file.

int32()

Returns a pseudo-random 32-bit integer. On a 32-bit machine, the return value may be negative; on a 64-bit machine it will always be positive.

int31()

Returns a pseudo-random 31-bit integer (which must be positive, of course).

real_closed()

Returns a pseudo-random quantity x, of type float, such that 0 ≤ x ≤ 1.

real_open()

Returns a pseudo-random quantity x, of type float, such that 0 < x < 1.

real_halfopen()

Returns a pseudo-random quantity x, of type float, such that 0 ≤ x < 1.

(Mathematicians speak of intervals as being closed, open, or half-open, hence the names of the preceding 3 functions. See MathWorld for an explanation.)

real_halfopen2()

The same as real_halfopen, except that the number returned has 53-bit resolution.

rangeint($lower_bound, $upper_bound)

Returns a pseudo-random integer drawn from the range defined by $lower_bound and $upper_bound, with each integer in the range having an equal probability of being chosen.

The worst-case running time of this function is unbounded, because rangeint calls int32 an indefinite number of times. However, even with the most unfavourable range possible, the expected number of calls is 2 on a 32-bit machine, and 3 otherwise, where the word ‘expected’ is used in the sense which it has in statistics.

Note that rangeint begins with the following code:

$lower_bound = intval($lower_bound);
$upper_bound = intval($upper_bound);
This means that a programmer who writes either of the following two lines (for a 32-bit machine) is probably making a mistake:
$twister->rangeint(-pow(2, 40), pow(2, 40))
$twister->rangeint(pow(2, 40), pow(2, 40) + 10)
rangereal_open($lower_bound, $upper_bound)

Returns a pseudo-random number, of type float, such that the following is guaranteed to be true:

$lower_bound < ($x = $twister->rangereal_open ($lower_bound, $upper_bound)) && $x < $upper_bound

The number generated is uniformly distributed over the specified range (or rather, as uniformly as is permitted by the finite resolution of PHP's float type.)

rangereal_halfopen($lower_bound, $upper_bound)

Returns a pseudo-random number, of type float, such that the following is guaranteed to be true:

$lower_bound <= ($x = $twister->rangereal_open ($lower_bound, $upper_bound)) && $x < $upper_bound

The number generated is uniformly distributed over the specified range (or rather, as uniformly as is permitted by the finite resolution of PHP's float type.)

Informally: this is just like rangereal_open except that $lower_bound is a possible return value.

rangereal_halfopen2($lower_bound, $upper_bound)

The same as rangereal_halfopen, except that its resolution will be higher. The price to pay for this higher resolution is higher CPU usage.

In fact, it is possible to conjure up values of $lower_bound and $upper_bound for which the resolution is no higher than for rangereal_halfopen, but I would expect that these are unlikely to arise for most users of this software.

Note that, even where rangereal_halfopen2 has a higher resolution than rangereal_halfopen, no guarantees are made about its size. In particular, it is not guaranteed to be 53 bits.

test()

Performs a comprehensive test of int32 and real_halfopen. Prints a brief message if the test failed; otherwise, does nothing.

Benchmarks

The table below displays the time, in seconds, taken by various operations. The performance may seem very poor, but bear in mind that these figures were obtained on a desktop PC which was purchased in May 2006. Its CPU is an Intel Celeron 2.66GHz, having an L2 cache of 256KB.
Initialisation
Operation Running time
init_with_integer 0.00839
init_with_array, where the array has 624 elements 0.0300
init_with_array, where the array has 2 × 624 = 1248 elements 0.0430
Generation
Operation Running time
int32 0.00000832
rangeint(1, 100) 0.0000262
rangeint(1, 1000000) 0.0000258

Assumptions

The software makes some assumptions about how numbers are represented. They almost certainly hold on your machine, and are mentioned here only for completeness. The assumptions are:

In fact, the second requirement could be weakened considerably; it is likely that any modern floating-point implementation would work.