Random Number Generation

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 42
At a glance
Powered by AI
The key takeaways are that random numbers should have the properties of uniformity and independence, and pseudo-random numbers are generated using techniques like the linear congruential method to simulate these ideal properties.

A sequence of random numbers must have the properties of uniformity, where the expected number of observations in each interval is equal, and independence, where the probability of an observation in an interval is independent of previous values.

The main techniques discussed for generating pseudo-random numbers are the linear congruential method, combined linear congruential generators, and random number streams.

Random Number

Generation
Outline
 Properties of Random Numbers
 Generation of Pseudo-Random Numbers
(PRN)
 Techniques for Generating Random
Numbers
 Tests for Random Numbers
Properties of Random Numbers
• A sequence of random numbers R1, R2, …, must have two
important statistical properties:
– Uniformity
– Independence.
• Random Number, Ri, must be independently drawn from a
uniform distribution with pdf: pdf for random
1, 0  x  1 numbers
f ( x)  
0, otherwise
1
2
1 x 1

E ( R )  xdx 
0 2
0

2
3 1 2
x 1 1 1 1
V ( R )   x dx   E  R  
1 2
2
     
0 3 0 2 3 4 12 3
Properties of Random Numbers

• Uniformity: If the interval [0,1] is divided


into n classes, or subintervals of equal
length, the expected number of observations
in each interval is N/n, where N is the total
number of observations.

• Independence: The probability of


observing a value in a particular interval is
independent of the previous value drawn.
4
Generation of Pseudo-Random Numbers

• “Pseudo”, because generating numbers using a


known method removes the potential for true
randomness.
If the method is known, the set of random numbers
can be replicated!!
• Goal: To produce a sequence of numbers in
[0,1] that simulates, or imitates, the ideal
properties of random numbers (RN) - uniform
distribution and independence.

5
Important considerations in RN routines

 The routine should be fast. Individual computations are


inexpensive, but a simulation may require many millions of
random numbers.
 Portable to different computers – ideally to different
programming languages. This ensures the program produces
same results.
 Have sufficiently long cycle. The cycle length, or period
represents the length of random number sequence before
previous numbers begin to repeat in an earlier order.
 Replicable. Given the starting point, it should be possible to
generate the same set of random numbers, completely
independent of the system that is being simulated.
 Closely approximate the ideal statistical properties of
uniformity and independence.
Techniques for Generating Random Numbers

 Linear Congruential Method (LCM)


Most widely used technique for generating
random numbers
 Combined Linear Congruential Generators (CLCG).
Extension to yield longer period (or cycle)
 Random-Number Streams

7
Congruential Property
• Number theory is the systematic study of the
natural numbers.
• Divisible means that there is no reminder after
division. For example, 6 ÷ 3 = 2.
• The reminder when a number x is divided by n has
n possibilities, 0, 1, 2, …, n-1.
• If 11 is divided by 5, the reminder is 1.
• If x is divided by 5, the reminders could be 0, 1, 2,
3, or is 4.
• 11 ≡ 1 (mod 5)
• This is stated as 11 is congruent to 1 (modulo
5).
• i.e., when 11 is divided by 5, the reminder is 1.

• Xi+1 ≡ (a xi + c) mod m
The Random Number Cycle
• Almost all random number
generators have as their basis a
sequence of pseudorandom integers.
• The integers or ``fixed point''
numbers are manipulated
arithmetically to yield floating point
or ``real'' numbers.
• The Nature of the cycle
– the sequence has a finite number
of integers
– the sequence gets traversed in a
particular order
– the sequence repeats if the period
of the generator is exceeded
– the integers need not be distinct;
that is, they may repeat.
Techniques for Generating Random Numbers:
Linear Congruential Method
• To produce a sequence of integers, X1, X2, … between 0 and m-1
by following a recursive relationship:
X i 1  (aX i  c) mod m, i  0,1,2,...

The The The


multiplier increment modulus

• X0 is called the seed


• The selection of the values for a, c, m, and X0 drastically affects
the statistical properties and the cycle length.
• If c  0, then it is called mixed congruential method.
• When c = 0, it is called multiplicative congruential method.
11
• The random integers are being generated in the range
[0,m-1], and to convert the integers to random
numbers:

Xi
Ri  , i  1,2,...
m

12
• EXAMPLE 1: Use X0 = 27, a = 17, c = 43, and m = 100.
• The Xi and Ri values are:
X1 = (17*27+43) mod 100 = 502 mod 100 = 2, R1 = 0.02;
X2 = (17*2+43) mod 100 = 77 mod 100 =77, R2 = 0.77;
X3 = (17*77+43) mod 100 = 1352 mod 100 = 52, R3 = 0.52;

• Notice that the numbers generated assume values only from the
set I = {0,1/m,2/m,….., (m-1)/m} because each Xi is an integer in
the set {0,1,2,….,m-1}
• Thus, each Ri is discrete on I, instead of continuous on interval
[0,1].

13
• EXAMPLE 2: LCG (5, 1, 16, 1)
– Let us consider a simple example with a = 5, c = 1, m
= 16, and X0 =1.
– The sequence of pseudorandom integers generated by
this algorithm is:
1,6,15,12,13,2,11,8,9,14,7,4,5,10,3,0,1,6,15,12,13,2,11,8,
9,14, ..
• Maximum Density
– Such that the values assumed by Ri, i = 1,2,…, leave no
large gaps on [0,1]
– Problem: Instead of continuous, each Ri is discrete.
– Solution: a very large integer for modulus m (e.g., 231-1,
248)
• Maximum Period
– To achieve maximum density and avoid cycling.
– Achieved by: proper choice of a, c, m, and X0.
• Most digital computers use a binary representation of numbers
– Speed and efficiency are aided by a modulus, m, to be (or
close to) a power of 2.

15
Maximum Period or Cycle Length:
•For m a power of 2, say m=2b, and c 0, the longest
possible period is P=m=2b, which is achieved when c is
relatively prime to m (greatest common divisor of c and
m is 1) and a=1+4 k, where k is an integer.
•For m a power of 2, say m=2b, and c=0, the longest
possible period is P=m/4=2b-2, which is achieved if the
seed X0 is odd and if the multiplier a is given by a=3+8k
or a = 5+8k for some k = 0,1,….
•For m a prime number and c=0, the longest possible
period is P = m-1, which is achieved whenever the
multiplier a has the property that the smallest integer k
such that ak-1 is divisible by m is k = m-1.
16
Example - 3
Using the multiplicative congruential method,
find the period of the generator for a = 13, m =
26, and X0 = 1, 2, 3, and 4.
The solution is given in next slide.
When the seed is 1 and 3, the sequence has
period 16.
When the seed is 2, the sequence has period 8.
When the seed is 4, the sequence has period 4.
• Period Determination Using Various seeds

• i Xi Xi Xi Xi

• 0 1 2 3 4
• 1 13 26 39 52
• 2 41 18 59 36
• 3 21 42 63 20
• 4 17 34 51 4
• 5 29 58 23
• 6 57 50 43
• 7 37 10 47
• 8 33 2 35
• 9 45 7
• 10 9 27
• 11 53 31
• 12 49 19
• 13 61 55
• 14 25 11
• 15 5 15
• 16 1 3
Linear Congruential Generators
• Some famous values for a and m (assuming c = 0)
 a = 23, m = 108 + 1 (first LCG; original 1951 implementation )
 a = 65539, m = 231 –1 (RANDU generator of 1960s; poor
because of correlated output)
 a = 16807, m = 231–1 (has been discussed as minimum standard
for RNGs; used in Matlab version 4)

Lehmer,D. H. (1951), “Mathematical Methods in Large-Scale


Computing Units,” Annals of the Computation Laboratory of
Harvard University, no. 26, pp. 141–146.
Random Number Generators Used in Common
Software Packages
• MATLAB:
Versions earlier than 5: LCG with a = 75 = 16807, c =
0, m = 231–1
Versions 5 to 7.3: lagged Fibonacci generator
combined with shift register random integer generator
with period 21492 (“ziggurat algorithm”)
Versions 7.4 and later: “Mersenne twister”
(sophisticated linear algorithm with huge period 219937
)
• EXCEL: Uk = fractional part (9821Uk –1 + 0.211327);
period 223
• SAS (vers. 6): LCG with period 231–1
Random-Number Streams
• The seed for a linear congruential random-number generator:
 It is the integer value X0 that initializes the random-number
sequence.
Any value in the sequence can be used to “seed” the generator.
• A random-number stream:
 refers to a starting seed taken from the sequence X0, X1, …, XP.
 If the streams are b values apart, then stream i could be defined
by starting seed: Si  X b ( i 1)
– Older generators: b = 105; Newer generators: b = 1037.
• A single random-number generator with k streams can act like k
distinct virtual random-number generators
• To compare two or more alternative systems.
– Advantageous to dedicate portions of the pseudo-random
number sequence to the same 21
purpose in each of the simulated
systems.
Tests for Random Numbers
1. Frequency test. Uses the Kolmogorov-Smirnov
or the chi-square test to compare the distribution
of the set of numbers generated to a uniform
distribution.
2. Runs test. Tests the runs up and down or the
runs above and below the mean by comparing
the actual values to expected values. The
statistic for comparison is the chi-square.
3. Autocorrelation test. Tests the correlation
between numbers and compares the sample
correlation to the expected correlation of zero.
Tests for Random Numbers
(cont.)
4. Gap test. Counts the number of digits that
appear between repetitions of a particular
digit and then uses the Kolmogorov-
Smirnov test to compare with the expected
number of gaps.
5. Poker test. Treats numbers grouped together
as a poker hand. Then the hands obtained are
compared to what is expected using the chi-
square test.
Tests for Random Numbers
(cont.)
In testing for uniformity, the hypotheses are as
follows:
H0: Ri ~ U[0,1]
H1: Ri  U[0,1]
The null hypothesis, H0, reads that the numbers
are distributed uniformly on the interval [0,1].
Test for Random Numbers
(cont.)
In testing for independence, the hypotheses are as
follows;
H0: Ri ~ independently
H1: Ri  independently

This null hypothesis, H0, reads that the numbers


are independent.
Failure to reject the null hypothesis means that no
evidence of dependence has been detected on the
basis of this test. This does not imply that further
testing of the generator for independence is
unnecessary.
Test for Random Numbers
(cont.)
Level of significance 
 = P(reject H0 | H0 true)
Frequently,  is set to 0.01 or 0.05
(Hypothesis)
Actually True Actually False
Accept 1 - 
(Type II error)
Reject  1 - 
(Type I error)
• When to use these tests:
If a well-known simulation language or random-
number generator is used, it is probably
unnecessary to test.
If the generator is not explicitly known or
documented, e.g., spreadsheet programs,
symbolic/numerical calculators, tests should be
applied to many sample numbers.
• Types of tests:
Theoretical tests: evaluate the choices of m, a, and c
without actually generating any numbers.
Empirical tests: applied to actual sequences of
numbers produced.
Frequency Tests
• Test of uniformity
• Two different methods
Chi-square test
Kolmogorov-Smirnov test

• Both these tests measure the degree of agreement between the


distribution of a sample of generated random numbers and the
theoretical uniform distribution.
• Both tests are based on null hypothesis of no significant
difference between the sample distribution and the theoretical
distribution.
Chi-square test
• Chi-square test uses the sample statistic:
n is the # of classes Ei is the expected #
in the ith class
(Oi  Ei )
n 2
  2
0
i 1 Ei Oi is the observed #
in the ith class

– Approximately the chi-square distribution with n-1 degrees of freedom


where the critical values are obtained from statistical tables.
– For the uniform distribution, Ei, the expected number in the each class
is: N
Ei  , where N is the total # of observations
n

• Valid only for large samples, e.g. N >= 50


29
Frequency Test - Chi-Square Test

• Example - 4: Use Chi-square test for the data shown below


with =0.05. The test uses n=10 intervals of equal length,
namely [0, 0.1], [0.1, 0.2], …., [0.9, 1.0].
• The value of 02=3.4; The critical value from statistical
Table is
0.05,92 = 16.9. Therefore, the null hypothesis is not
rejected.
Kolmogorov-Smirnov Test [Frequency Test]

• Compares the continuous cdf, F(x), of the uniform distribution


with the empirical cdf, SN(x), of the N sample observations.
– We know: F ( x)  x, 0  x  1
– If the sample from the RN generator is R1, R2, …, RN, then
the empirical cdf, SN(x) is:
number of R1 , R2 ,..., Rn which are  x
S N ( x) 
N

• Based on the statistic: D = max| F(x) - SN(x)|


– Sampling distribution of D is known (a function of N, tabulated in Table
A.8.)

32
Goodness-of-fit test
1 1

n 5
Let X be a random variable that is
uniformly distributed in [0,1]. The x1 x2 x3 x4 x5
cumulative distribution function (CDF) is
the diagonal from (0,0) to (1,1).
Suppose x(1), x(2),…, x(n) are samples of X.
x1, x2,…, xn be the ordered x(i)’s. x1  x2  x3  ....  xn
The Empirical distribution is
number of xi ' s such that xi  x
Fn ( x) 
n
33
1 1

n 5

x1 x2 x3 x4 x5
Bad fit
Good fit

34
Kolmogorov-Smirnov Test
• Steps:
1. Rank the data from smallest to largest. Let R(i) denote ith
smallest observation, so that R(1)R(2)…R(N)
2. Compute i   i  1
D   max   R i  ; D   max  R i   
1 i  N N N
  1i  N
 

3. Compute D= max(D+, D-)


4. Locate in Statistical Table, the critical value D, for the
specified significance level  and the sample size N.
5. If the sample statistic D is greater than the critical value D,
the null hypothesis is rejected. If D  D, conclude there is
no difference.

35
Kolmogorov-Smirnov Test [Frequency Test]

• Example 5: Suppose 5 generated numbers are 0.44, 0.81, 0.14, 0.05,


0.93.
Arrange R(i) from
R(i) 0.05 0.14 0.44 0.81 0.93 smallest to largest
Step 1:
i/N 0.20 0.40 0.60 0.80 1.00
D+ = max {i/N – R(i)}
i/N – R(i) 0.15 0.26 0.16 - 0.07
Step 2:
R(i) – (i-1)/N 0.05 - 0.04 0.21 0.13 D- = max {R(i) - (i-1)/N}

Step 3: D = max(D+, D-) = 0.26


Step 4: For  = 0.05,
D = 0.565 > D

Hence, H0 is not rejected.


Test for Autocorrelation
• The test for autocorrelation are concerned with the dependence
between numbers in a sequence.
• Consider:

• Though numbers seem to be random, every fifth number is a


large number in that position.
• This may be a small sample size, but the notion is that
numbers in the sequence might be related

37
• Testing the autocorrelation between every m numbers (m is
a.k.a. the lag), starting with the ith number
 The autocorrelation im between numbers: Ri, Ri+m,
Ri+2m, Ri+(M+1)m i  (M  1 )m  N
 M is the largest integer such that
• Hypothesis:
H 0 :  im  0, if numbers are independent
H1 :  im  0, if numbers are dependent

• If the values are uncorrelated:


– For large values of M, the distribution of the estimator
of im, denoted ̂ im is approximately normal.

38
ˆ im
• Test statistic: Z0 
ˆ ˆim

Z0 is distributed normally with mean = 0 and variance = 1,


and:
1 
M

ρˆ im   
M  1  k 0
Ri  km Ri (k 1 )m   0.25

13M  7
σˆ ρim 
12(M  1 )

• If im > 0, the subsequence has positive autocorrelation:


High random numbers tend to be followed by high ones,
and vice versa.
If im < 0, the subsequence has negative autocorrelation
Low random numbers tend to be followed by high ones,
and vice versa.
39
• After computing Z0, do not reject the hypothesis of
independence if –z/2 Z0  z/2
  is the level of significance and z/2 is obtained
from Statistical Tables.

40
• Example 5: Test whether the 3rd, 8th, 13th, and so on, for the
output on Slide 30 are auto-correlated or not.
– Hence,  = 0.05, i = 3, m = 5, N = 30, and M = 4. M is the
largest integer such that 3+(M+1)5  30.

1 (0.23)(0.28)  (0.25)(0.33)  (0.33)(0.27)


ρˆ 35     0.25
4  1  (0.28)(0.05)  (0.05)(0.36) 
 0.1945
13(4)  7
σˆ ρ35   0.128
12( 4  1 )
0.1945
Z0    1.516
0.1280

– From Statistical Tables, z0.025 = 1.96. Hence, the hypothesis


is not rejected.
41

You might also like