Lecture 07 - Random Numbers
Lecture 07 - Random Numbers
Lecture 07 - Random Numbers
Chapter 7
Uniform Random Generators
Testing of Uniform Random Generators
Methods of Generating Non-Uniform
l
Variables
a
ep
Inversion, Rejection, Composition
itn
1
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
1
Introduction
Random numbers
R1, R2, R3, . . . .
a l
ep
itn
2
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
2
Continuous Uniformly Distributed RNs
a l
Variance 2
ep
1
V ( R) = ∫ x 2 dx − [ E ( R)]2 = 1 / 12
itn
3
0 (C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
3
Generation of RNs – RNGs (RN Generators)
-Should be fast
-Should be portable
-Should have sufficiently long cycle
-Should be replicable
-Should hold ideal statistical properties (uniformity
and independence)
a l
ep
itn
4
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
4
Linear Congruent Generator (LCG)
a l
= Ri/m
ep
itn
5
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
5
Multiplicative Congruent Generator (MCG)
-If c = 0
-Example;
A = 13, m = 26 = 64, x0 = 1, 2, 3, and 4
a l
ep
itn
6
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
6
Guidelines for selecting the values
al
or a = 2max wordsize/2 + 3
ep
or a = 65,539
itn
7
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
7
Testing Uniform Random Generators
a l
Poker test
ep
itn
8
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
8
Frequency Tests (uniformity test)
a l
uniformly distributed?
ep
Use chi-square (goodness of fit) test
itn
9
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
9
Chi-square χ2 test
Statistical test
Checks how well certain observed data fit the
theoretically expected data
First divide the observed data (here, the random
numbers) into k non-overlapping classes, k must
be >= 3
Then count Oi, the number of times the
observed data falls in each class i, i = 1, 2, 3,
4, . .
Then measure how far the observed frequency
a l
deviates from the expected
− 2
ep
k
(O E )
chi 2 = ∑ i i
itn
i =1 Ei 10
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
10
Chi-square χ2 test
k
(O − E ) 2
chi 2 = ∑ i i
i =1 Ei
Oi is the observed number of sequences of value xi and
Ei is the expected number of occurrences of xi
This is showing the square of the differences the
observed values and the expected values (divided by the
expected values to normalize it)
Now consider two hypotheses:
NULL Hypothesis, H0 : Y matches distribution of X
Hypothesis H1: Y does not match distribution of X
l
a
If χ2 is small(ish) we do not reject the null hypothesis
ep
(i.e. the distributions may match)
itn
11
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
11
Chi-square test
If the observed data depart more from the expected
value, the value of chi2 will be larger
In our example, Ei = 500 for i=1, 2, 3, . . . , 10
i Range Oi no. of observed No. of expected (Oi-Ei)2 (Oi-Ei)2/Ei
occurrences occurrences
1 000-099 468 500 1024 2.048
2 100-199 519 500 361 0.722
3 200-299 480 500 400 0.8
4 300-399 495 500 25 0.05
5 400-499 508 500 64 0.128
6 500-599 426 500 5476 10.952
7 600-699 497 500 9 0.018
a l
8 700-799 515 500 225 0.45
ep
9 800-899 463 500 1369 2.738
itn
10 900-999 529 500 841 1.682
12
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
12
Chi-square test
Chi2 = 19.588
For large values, the hypothesis is rejected if
Χ >Χ2 2
k −1,1−α
distribution
a l
Degree of freedom v = k – 1, where k is no. of sets
ep
α = Pr(reject H0 | H0 true), literally – level of significance
itn
13
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
13
Chi-square test
Let’s consider, probability of neglecting H0 is 0.05
Then,
α = 0.05 α = 0.1
1 − α = 0.95 1 − α = 0.9
k = 10 k = 10
k −1 = 9 k −1 = 9
?? ??
a l
ep
itn
14
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
14
Kolmogorov-Smirnov Test
Another kind of frequency test
Compares continuous cdf, F(x), of the uniform
distribution to the empirical cdf, SN(x), of the sample of
the N observations
F ( x ) = x, 0 ≤ x ≤1
If sample from the random-number generator is R1, R2, .
. . , RN, then the empirical cdf, SN(x) is defined by
a l
is true
ep
itn
15
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
15
Kolmogorov-Smirnov Test
This test is based on the largest absolute deviation
between F(x) and SN(x) over the range of the random
variable
Steps
1. Rank the data from smallest to largest. E.g.
R(1)<=R(2)<=R(3)<= . . . . <=R(N)
2. Compute
+ i
D = max − R( i )
1≤i ≤ N N
i − 1
l
−
D = max R(i ) −
a
ep
1≤i ≤ N
N
itn
16
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
16
Kolmogorov-Smirnov Test
3. Compute D = max(D+, D-)
4. Determine the critical value Dα from Kolmogorov-
Smirnov Critical values table
5. If D > Dα, the null hypothesis is rejected, otherwise
accepted
a l
ep
itn
17
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
17
Runs test – runs up and runs down
Direct test of independence (only) assumption
Chi-square test might show some numbers are uniformly
distributed but the ordering matters in case of
independence
Look at example on page 270 of Discrete Event System
Simulation, Banks, Nicol, Nelson
The general idea is to look at the patterns of runs up and
runs down
E.g.
0.08 0.18 0.23 0.36 0.42 0.55 0.63 0.72 0.89 0.91
One run – run up
0.08 0.93 0.15 0.96 0.26 0.84 0.28 0.79 0.36 0.57
a l
Nine run – five up, four down
ep
itn
18
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
18
Runs test – runs up and runs down
The run pattern is likely to be somewhere between two
extremes
If N is the number of numbers in the sequence, the
maximum number of runs is N-1 and minimum number
of runs is one
Now, if a is total number of runs in truly random
sequence, the mean and variance of a are given by
2N −1
µa =
3
16 N − 29
σ a2 =
90
For N>20, the distribution of a is approximated by
a l
normal distribution
ep
N (µa ,σ ) 2
itn
a 19
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
19
Runs test – runs up and runs down
Therefore, standardized normal test parameter
a − µa
Z0 =
σa
a − [(2 N − 1) / 3]
Z0 =
(16 N − 29) / 90
Where, Z0 is approximately equal to N(0,1)
Therefore, the hypothesis can not be rejected if
− Zα / 2 ≤ Z 0 ≤ Zα / 2
Class-demonstration;
Example 7.8, page 272 α /2 α /2
a l
ep
itn
20
− Zα / 2 Zα / 2
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
20
Runs test – runs above and below the mean
a l
Use the standard normal distribution to test
ep
itn
21
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
21
Runs test – runs above and below the mean
If n1 and n2 is the number of individual observation above
and below the mean and b is total number of runs,
2n1n2 1
mean µb = +
N 2
2n1n2 (2n1n2 − N )
variance of b σb =
2
N 2 ( N − 1)
If n1 or n2 is greater than 20, b is approximately normally
distributed
a l
ep
itn
22
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
22
Runs test – runs above and below the mean
Let’s define standardized variable Z0 for testing
b − (2n1n2 / N ) − 1 / 2
Z0 = 1/ 2
2n1n2 (2n1n2 − N )
N 2
( N − 1)
Failure to reject the hypothesis of independence
occurs when − Zα / 2 ≤ Z 0 ≤ Zα / 2
a l
Class-demonstration
ep
Example 7.9, page 274
itn
23
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
23
Runs test – length of runs
Given a sequence of random numbers, we can expect a
certain number of runs of each length 1, 2, 3, …
If the run lengths in our empirical tests differ too much from
the expected distribution, we can reject the generator
Idea is to determine the empirical distribution of runs of
various lengths, then compare it with the expected
distribution given random data
The distributions are compared using the Chi-Square test
a l
ep
itn
24
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
24
Runs test – length of runs
a l
ep
Derivations – on blackboard
itn
25
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
25
Runs test – length of runs
a l
ep
itn
26
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
26
Autocorrelation test
l
H 0 : ρ im = 0
a
ep
H1 : ρ im ≠ 0
itn
27
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
27
Autocorrelation test
a l
13M + 7
σρˆ im =
ep
12( M + 1)
itn
28
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
28
Autocorrelation test
l
a
ep
implies zero autocorrelation
itn
29
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
29
Gap test
4, 1, 3, 5, 1, 7, 2, 8, 2, 0, 7, 9, 1, 3, 5, 2, 7, 9,
4, 1, 6, 3, 3, 9, 6, 3, 4, 8, 2, 3, 1, ..
a l
ep
P(gap of 10) = P(no 3), . . . (P no 3) P(3)
itn
30
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
30
Gap test
Steps
1. Specify the cdf for the theoretical frequency
distribution given by P(gap<=x) based on the
selected class interval width
l
2. Arrange the observed sample of gaps in a cumulative
a
ep
distribution with these same classes
itn
31
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
31
Gap test
Steps
3. Find D, the maximum deviation between F(x) and
SN(x) as in D =| F ( x) − S N ( x) |
4. Determine the critical value, Dα , from Kolmogorov-
Smirnov Critical Values table for the specified value of
α and the sample size N
5. If the calculated value of D is greater than the
tabulated value of Dα, the null hypothesis of
independence is rejected
a l
ep
itn
32
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
32
Poker test
Test for interdependence based on the frequency with which
certain digits are repeated in a series of numbers
e.g.
0.255, 0.577, 0.331, 0.414, 0.828, 0.909, 0.303, 0.001..
Here, there are three possibilities
1. The individual numbers can all be different
2. The individual numbers can all be the same
3. There can be one pair of like digits
The probability associated with each of these possibilities is
given as;
P(three different digits) = P(second different from the first) *
P(third different from the first and second)
a l
ep
= (0.9)*(0.8)
itn
= 0.72 33
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
33
Poker test
P(three like digits) = P(second digit same as the first) *
P(third digit same as the first)
= (0.1) * (0.1) = 0.01
P(exactly one pair) = 1 - 0.72 – 0.01 = 0.27
3
P(exactly one pair) = (0.1)(0.9) = 0.27
2
a l
Example 7.14, page 283
ep
itn
34
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
34
Code snippet
/* Returns values between 0 and 1*/
/* Depends on 32 bit representation for integers */
double randu( seed )
long int *seed;
{
long int a = 16807,
mod = 2147483647; /* 2^31 - 1 */
double dmod = 2147483647.0; /* 2^31 - 1 */
*seed = *seed * a;
if ( *seed < 0 )
*seed += mod;
return( ( double ) *seed / dmod );
a l
}
ep
itn
35
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
35
Discrete distributions
In real life, the probability of occurring one kind of event/data
has different probability than other kind of event/data
Normally, there is a need for discrete but non-uniform
distribution
Therefore, while generating a uniform random number ‘U’,
the value is compared with the values of y. If the value
falls in a interval y i < U ≤ y i + 1 (I = 0, 1, . . . 4) the
corresponding value of xi+1 is taken as the desired output
No. of items (x) Probability p(x) Cumulative prob. (y)
0 0 0
1 0.10 0.10
2 0.51 0.61
a l
3 0.19 0.80
ep
4 0.15 0.95
itn
36
5 0.05 1.00
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
36
Discrete distributions
When the number U is being tested, only 10% of the
numbers are validated by using the first entry where as
only 61% of numbers are validated by using second
entry
For such cases, the table can be rearranged as the ordering
doesn’t matter for testing
Cumulative prob. (y) No. of items (x)
0.51 2
0.70 3
0.85 4
0.95 1
1.00 5
a l
Now, the first entry can satisfy 51% of data
ep
itn
37
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
37
Non-uniform continuously distributed RNs
l
a
used
ep
Rejection, Composition, Convolution etc.
itn
38
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
38
Non-uniform continuously distributed RNs
If desired PDF is
a (1 − x) (0 ≤ x ≤ 1)
f ( x) =
0 otherwise
Cumulative;
a l
x2
ep
F ( x) = a ( x − )
2
itn
39
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
39
Non-uniform continuously distributed RNs
Let U = F(x)
x = 1 – (1 – U)1/2
U X
0.1009 69.23
0.3754 88.48
0.0842 66.65
0.9901 142.33
a l
ep
itn
40
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
40
Non-uniform continuously distributed RNs
l
R is cumulative
a
1
ep
x = − ln( R)
λ
itn
41
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
41
Non-uniform continuously distributed RNs
a l
ep
itn
42
0 U 1.0
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
42
Non-uniform continuously distributed RNs
output Interpolated
} interval
a l
ep
input
itn
43
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
43
Non-uniform continuously distributed RNs
The standard curve and its points do not change
Let’s define the slope of the segment as;
xi +1 − xi (i = 0, 1, 2, 3, . . . N-1)
a = i
yi +1 − yi
now,
if Yi < U < Yi+1 (i = 0, 1, 2, 3, . . . N-1)
then,
x = xi + ai(U-Yi)
U X
0.1009 69.29
a l
0.3754 68.48
ep
0.6842 66.65 . .
itn
44
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
44
Non-uniform continuously distributed RNs
Disadvantage of inverse-transform
1. For normal and gamma distribution, there is no
mathematical expression of F-1
2. For given distribution, it may not be the fastest way
Advantages
1. Even if the distribution is truncated, can generate the
output
2. Facilitates variance reduction that rely on correlation
a l
ep
itn
45
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
45
Non-uniform continuously distributed RNs
Composition
Applies when the distribution function F from which we
wish to generate can be expressed as a convex
combination of other distributions F1, F2, . . .
For all x,
∞
F ( x) = ∑ p j F j ( x)
j =1
pj ≥ 0
∞
∑p
j =1
j =1
a l
Fj is distribution function
ep
itn
46
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
46
Non-uniform continuously distributed RNs
If, pk > 0, pj = 0, j>k
then, it will be finite
∞
pdf = f(x) = ∑ p .f
j =1
j j ( x)
Algorithm
1. Generate positive random number J such that
P( J − j ) = pj for j = 1, 2, 3, 4, . .
2. Return X with distribution function FJ
a l
ep
itn
47
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
47
Non-uniform continuously distributed RNs
Convolution
Probability distribution of a sum of two or more
independent random variables is a convolution of the
distributions of the original variables
Algorithm
Let F be the distribution function of X and G be the
distribution function of Yj
1. Generate Y1, Y2, Y3, . . . Ym, independent and
identically distributed random variables each with
distribution function G
2. Return, X = Y1 + Y2 + Y3 + . . . + Ym
a l
ep
itn
48
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
48
Non-uniform continuously distributed RNs
Rejection method
Applied when the PDF f(x) has a lower and upper limit to
its range a and b and upper bound c.
However, the upper bound ‘c’ is not very much relevant
Steps
1. Compute the values of two independent uniformly
distributed variables U1 and U2
2. Compute, X0 = a + U(b – a)
3. Compute, Y0 = c.U2
4. If Y0 <= f(X0) accept X0 otherwise repeat with two
new uniform variates
a l
ep
itn
49
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
49
Non-uniform continuously distributed RNs
a l
ep
Therefore, is this a valid method?
itn
50
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
50
Non-uniform continuously distributed RNs
Given, X0,
X0 is accepted if,
Y0 <= f(X0)
f(X0)
∆x
y
a l
a X0 b
x
ep
itn
51
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
51
Non-uniform continuously distributed RNs
Let ∆x be small area. If Y0 still is inside the small rectangle
bounded by ∆x , under the curve, now output range will
be,
X0 – ∆x to X0
In such case,
P(X <= X0) = P (Y0 bounded by left of X0)
X0
∫ f ( x)dx( x0 − a )
F(X0) = a
.
c( x0 − a ) (b − a )
l
Therefore,
a
X0
ep
F(X0) = ∫ f ( x)dx
itn
a 52
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
52
Non-uniform continuously distributed RNs
Disadvantage
Two uniform variates must be calculated
The situation will be even more complex if more
iterations need to be performed
Therefore, it is usable only if,
The probability density function is given by a
mathematical function
The PDF must be limited i.e. F(x) = 0 for < a and > b. If
not, then use inverse transformation method
a l
ep
itn
53
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
53
Sample function
/* C implementation Park & Miller’s random function */
double random ( seed )
long int *seed;
{
long int a = 16807, /* 7^5 */
m = 2147483647, /* 2^31 - 1 */
q = 127773, /* m / a (int divide) */
r = 2836, /* m % a */
lo, hi, test;
double dm = 2147483647;
hi = *seed / q;
lo = *seed % q;
test = a * lo - r * hi;
l
*seed = test > 0 ? test : test + m;
a
ep
return( (double ) *seed / dm );
}
itn
54
(C) Pramod Parajuli, 2004
cs
Source: www.csitnepal.com
54