Academia.eduAcademia.edu

Monte Carlo and Quasi-Monte Carlo Methods 2000

2002

Ada Numerica (1998), pp. 1-49 © Cambridge University Press, 1998 Monte Carlo and quasi-Monte Carlo methods Russel E. Caflisch* Mathematics Department, UCLA, Los Angeles, CA 90095-1555, USA E-mail: [email protected] Monte Carlo is one of the most versatile and widely used numerical methods. Its convergence rate, O(N~1^2), is independent of dimension, which shows Monte Carlo to be very robust but also slow. This article presents an introduction to Monte Carlo methods for integration problems, including convergence theory, sampling methods and variance reduction techniques. Accelerated convergence for Monte Carlo quadrature is attained using quasi-random (also called low-discrepancy) sequences, which are a deterministic alternative to random or pseudo-random sequences. The points in a quasi-random sequence are correlated to provide greater uniformity. The resulting quadrature method, called quasi-Monte Carlo, has a convergence rate of approximately For quasi-Monte Carlo, both theoretical error estimates and O((log N^N'1). practical limitations are presented. Although the emphasis in this article is on integration, Monte Carlo simulation of rarefied gas dynamics is also discussed. In the limit of small mean free path (that is, the fluid dynamic limit), Monte Carlo loses its effectiveness because the collisional distance is much less than the fluid dynamic length scale. Computational examples are presented throughout the text to illustrate the theory. A number of open problems are described. Research supported in part by the Army Research Office under grant number DAAH0495-1-0155. R. E. CAFLISCH CONTENTS 1 Introduction 2 Monte Carlo integration 3 Generation and sampling methods 4 Variance reduction 5 Quasi-random numbers 6 Quasi-Monte Carlo techniques 7 Monte Carlo methods for rarefied gas dynamics References 2 4 8 13 23 33 42 46 1. Introduction Monte Carlo provides a direct method for performing simulation and integration. Because it is simple and direct, Monte Carlo is easy to use. It is also robust, since its accuracy depends on only the crudest measure of the complexity of the problem. For example, Monte Carlo integration converges at a rate O{N~1/2) that is independent of the dimension of the integral. For this reason, Monte Carlo is the only viable method for a wide range of high-dimensional problems, ranging from atomic physics to finance. The price for its robustness is that Monte Carlo can be extremely slow. The order O(N~1^2) convergence rate is decelerating, since an additional factor of 4 increase in computational effort only provides an additional factor of 2 improvement in accuracy. The result of this combination of ease of use, wide range of applicability and slow convergence is that an enormous amount of computer time is spent on Monte Carlo computations. This represents a great opportunity for researchers in computational science. Even modest improvements in the Monte Carlo method can have substantial impact on the efficiency and range of applicability for Monte Carlo methods. Indeed, much of the effort in the development of Monte Carlo has been in construction of variance reduction methods which speed up the computation. A description of some of the most common variance reduction methods is given in Section 4. Variance reduction methods accelerate the convergence rate by reducing the constant in front of the O(N~1/2) for Monte Carlo methods using random or pseudo-random sequences. An alternative approach to acceleration is to change the choice of sequence. Quasi-Monte Carlo methods use quasirandom (also known as low-discrepancy) sequences instead of random or pseudo-random. Unlike pseudo-random sequences, quasi-random sequences do not attempt to imitate the behaviour of random sequences. Instead, the elements of a quasi-random sequence are correlated to make them more uniform than random sequences. For this reason, Monte Carlo integration using MONTE CARLO AND QUASI-MONTE CARLO 3 quasi-random points converges more rapidly, at a rate O(N~1 (log N)k), for some constant k. Quasi-random sequences are described in Sections 5 and 6. In spite of their importance in applications, Monte Carlo methods receive relatively little attention from numerical analysts and applied mathematicians. Instead, it is number theorists and statisticians who design the pseudo-random, quasi-random and other types of sequence that are used in Monte Carlo, while the innovations in Monte Carlo techniques are developed mainly by practitioners, including physicists, systems engineers and statisticians. The reasons for the near neglect of Monte Carlo in numerical analysis and applied mathematics are related to its robustness. First, Monte Carlo methods require less sophisticated mathematics than other numerical methods. Finite difference and finite element methods, for example, require careful mathematical analysis because of possible stability problems, but stability is not an issue for Monte Carlo. Instead, Monte Carlo nearly always gives an answer that is qualitatively correct, but acceleration (error reduction) is always needed. Second, Monte Carlo methods are often phrased in nonmathematical terms. In rarefied gas dynamics, for example, Monte Carlo allows for direct simulation of the dynamics of the gas of particles, as described in Section 7. Finally, it is often difficult to obtain definitive results on Monte Carlo, because of the random noise. Thus computational improvements often come more from experience than from a particular insightful calculation. This article is intended to provide an introduction to Monte Carlo methods for numerical analysts and applied mathematicians. In spite of the reasons cited above, there are ample opportunities for this community to make significant contributions to Monte Carlo. First of all, any improvements can have a big impact, because of the prevalence of Monte Carlo computations. Second, the methodology of numerical analysis and applied mathematics, including well controlled computational experiments on canonical problems, is needed for Monte Carlo. Finally, there are some outstanding problems on which a numerical analysis or applied mathematics viewpoint is clearly needed; for example: • design of Monte Carlo simulation for transport problems in the diffusion limit (Section 7) • formulation of a quasi-Monte Carlo method for the Metropolis algorithm (Section 6) • explanation of why quasi-Monte Carlo behaves like standard Monte Carlo when the dimension is large and the number of simulation is of moderate size (Section 6). Some older, but still very good, general references on Monte Carlo are Kalos and Whitlock (1986) and Hammersley and Handscomb (1965). 4 R. E. CAFLISCH The focus of this article is on Monte Carlo for integration problems. Integration problems are simply stated, but they can be extremely challenging. In addition, integration problems contain most of the difficulties that are found in more general Monte Carlo computations, such as simulation and optimization. The next section formulates the Monte Carlo method for integration and describes its convergence. Section 3 describes random number generators and sampling methods. Variance reduction methods are discussed in Section 4 and quasi-Monte Carlo methods in Section 5. Effective use of quasiMonte Carlo requires some modification of standard Monte Carlo techniques, as described in Section 6. Monte Carlo methods for rarefied gas dynamics are described in Section 7, with emphasis on the loss of effectiveness for Monte Carlo in the fluid dynamic limit. 2. Monte Carlo integration The integral of a Lebesgue integrable function f(x) can be expressed as the average or expectation of the function / evaluated at a random location. Consider an integral on the one-dimensional unit interval nn= f1 f(x)dx = f. (2.i) Jo Let a; be a random variable that is uniformly distributed on the unit interval. Then (2.2) I[f] = E[f(x)]. d d For an integral on the unit cube I = [0, l] in d dimensions, )dx, (2.3) in which a; is a uniformly distributed vector in the unit cube. The Monte Carlo quadrature formula is based on the probabilistic interpretation of an integral. Consider a sequence {xn} sampled from the uniform distribution. Then an empirical approximation to the expectation is M/] = ^ £ / ( * n ) . (2.4) n=l According to the Strong Law of Large Numbers (Feller 1971), this approximation is convergent with probability one; that is, (2.5) In addition, it is unbiased, which means that the average of Iff [/] is exactly /[/] for any N; that is, E[IN[f]] = I[f], (2.6) MONTE CARLO AND QUASI-MONTE CARLO in which the average is over the choice of the points {xn}. In general, define the Monte Carlo integration error (2.7) so that the bias is .E[ejv[/]] and the root mean square error (RMSE) is £M/]2]1/2. (2.8) 2.1. Accuracy of Monte Carlo The Central Limit Theorem (CLT) (Feller 1971) describes the size and statistical properties of Monte Carlo integration errors. Theorem 2.1 For N large, eN[f] « aN-l'2u (2.9) in which v is a standard normal (N(0,1)) random variable and the constant a = a[f] is the square root of the variance of / ; that is, \1/2 O ' (f(x)-I[f]) dxJ f 2 id A more precise statement is that lim Prob I a < eN <b\ N—>oo \ a . (2.10) = Prob(a < v < b) I = fh{2n)-1'2e-x2l2dx. (2.11) Ja This says that the error in Monte Carlo integration is of size O{N~1/2) with a constant that is just the variance of the integrand / . Moreover, the statistical distribution of the error is approximately a normal random variable. In contrast to the usual results of numerical analysis, this is a probabilistic result. It does not provide an absolute upper bound on the error; rather it says that the error is of a certain size with some probability. On the other hand, this result is an equality, so that the bounds it provides are tight. The use of this result will be discussed at the end of this section. Now we present a partial proof of the Central Limit Theorem, which proves that the error size is O(7V~1//2). Derivation of the Gaussian distribution for the error is more difficult (Feller 1971). First define & = a~1(f(x{) — / ) for Xi uniformly distributed. Then = 0, = ja-2(f(Xi)-f-)2dx = 0 if M 3- = l, (2.12) R. E. CAPLISCH The last equality is due to the independence of the Now consider the sum N (2.13) Its variance is N - N 1 1/2 +E JV" I E Li=l 1/2 •'{£-} (2.14) Therefore E[e2N] = aN-1/2, (2.15) which shows that the RMSE is of size O(uN~1/2). The converse of the Central Limit Theorem is useful for determining the size of N required for a particular computation. Since the error bound from the CLT is probabilistic, the precision of the Monte Carlo integration method can only be ensured within some confidence level. To ensure an error of size at most e with confidence level c requires the number of sample points N to be AT = e -V s(c), (2.16) in which s is the confidence function for a normal variable; that is, =r J- {c) = evi(s(c)/V2). (2.17) For example, 95 per cent confidence in the error size requires that s = 2, approximately. In an application, the exact value of the variance is unknown (it is as difficult to compute as the integral itself), so the formula (2.16) cannot be directly used. There is an easy way around this, which is to determine the empirical error and variance (Hogg and Craig 1995). Perform M computations using independent points Xi for 1 < i < MN. For each j obtain values MONTE CARLO AND QUASI-MONTE CARLO n for 1 < j < M. The empirical RMSE is then ejy, given by M ) (g ) ( g 2\ l 2 ' ) (2.18) l IN = M- Y.IN)- (2-19) a = A^1/2eiv. (2.20) in which M The empirical variance is a given by This value can be used for a in (2.16) to determine the value of N for a given precision level e and a given confidence level c. 2.2. Comparison to grid-based methods Most people who see Monte Carlo for the first time are surprised that it is a viable method. How can a random array be better than a grid? There are several ways to answer this question. First, compare the convergence rate of Monte Carlo with that of a grid-based integration method such as Simpson's rule. The convergence rate for grid-based quadrature is O(N~k/d) for an order k method in dimension d, since the grid with N points in the unit cube has spacing N~1^d. On the other hand, the Monte Carlo convergence rate is O(N~1^2) independent of dimension. So Monte Carlo beats a grid in high-dimension d, if k/d < 1/2. (2.21) On the other hand, for an analytic function on a periodic domain, the value of k is infinite, so that this simple explanation fails. A more realistic explanation for the robustness of Monte Carlo is that it is practically impossible to lay down a grid in high dimension. The simplest cubic grid in d dimensions requires at least 2d points. For d = 20, which is not particularly large, this requires more than a million points. Moreover, it is practically impossible to refine a grid in a high dimension, since a refinement requires increasing the number of points by factor 2d. In contrast to these difficulties for a grid in high dimension, the accuracy of Monte Carlo quadrature is nearly independent of dimension and each additional point added to the Monte Carlo quadrature formula provides an incremental improvement in its accuracy. To be sure, the value of N at which the O(N^1^2) error estimate becomes valid (that is, the length of the transient) is difficult to predict, but experience shows that, for problems of moderate complexity in moderate dimension (for instance d = 20), the 0{N~1/2) error size is typically attained for moderate values of N. 8 R. E. CAFLISCH Two additional interpretations of Monte Carlo quadrature are worthwhile. Consider the Fourier representation of a periodic function with period one /(*)= £ /»e2^. (2.22) k=—oo The integral /[/] is just /(0); that is, the contributions to the integral are 0 from all wave-numbers k ^ 0. For a grid of spacing 1/n, the grid-based quadrature formula is i n ) . (2.23) The contributions to this sum are 0 (as they should be) from wave-numbers k 7^ mn for some integer m, and the contributions are f(k) for wave-numbers k = mn. That is, the accuracy of grid-based quadrature is 100 per cent for k T^ mn, but 0 per cent for k = mn (with m ^ O ) . Monte Carlo quadrature using a random array is partially accurate for all k, which is superior to a grid-based method if the Fourier coefficients decay slowly. Finally, insight into the relative performance of grid-based and Monte Carlo methods is gained by considering the points themselves in a high dimension. For a regular grid, the change from one point to the next is only a variation of one component at a time, that is, (0,0,..., 0,0) H-> (0,0,... ,0,1/n). In many problems, this is an inefficient use of the unvaried components. In a random array, all components are varied in each point, so that the state space is sampled more fully. This accords well with the global nature of Monte Carlo: each point of a Monte Carlo integration formula is an estimate of the integral over the entire domain. 3. Generation and sampling methods 3.1. Random number generators The numbers generated by computers are not random, but pseudo-random, which means that they are made to have many of the properties of random number sequences (Niederreiter 1992). While this is a well-developed subject, occasional problems still occur, mostly with very long sequences (N > 109). The methods used to generate pseudo-random numbers are mostly linear congruential methods. There is a series of reliable pseudorandom number generators in the popular book Numerical Recipes (Press, Teukolsky, Vettering and Flannery 1992). It is important to use the routines ranO, rani, ran2, ranS, ran4 from the second edition (for instance, rani is recommended for N < 108); the routines RANO, RANI, RAN2, RANS from. the first edition of this book had some deficiencies (see Press and Teukolsky (1992)). MONTE CARLO AND QUASI-MONTE CARLO 9 For very large problems requiring extremely large values of N (as high as 1013!), reliable sequences can be obtained from the project Scalable Parallel Random Number Generators Library for Parallel Monte Carlo Computations (http://www.ncsa.uiuc.edu/Apps/SPRNG/). 3.2. Sampling methods Standard (pseudo-) random number generators produce uniformly distributed variables. Non-uniform variables can be sampled through transformation of uniform variables. For a non-uniform random variable with density p(x), the expectation of a function f(x) is (3.1) = Jf(x)p(x)dx. For a sequence of random numbers {xn} distributed according to the density p, the empirical approximation to the expectation is W ] = ^E/(*n) (3-2) and the resulting quadrature error is (3.3) eN[f] = I[f}-lN[f}. As in the one-dimensional case, the Central Limit Theorem says that eN[f] ~ N-^ov (3.4) a2 = j{f-f)2p{x)dx. (3.5) in which v is -/V(0,1) and Next we discuss methods for generating non-uniform random variables starting from uniform random variables. 3.3. Transformation method This is a general method for producing a random variable x with density p(x), through transformation of a uniform random variable. Let y be a uniform variable and look for a function X(y), so that x — X(y) has the desired density p(x). Define the cumulative distribution function P(x) = fX p(x') dx'. (3.6) •/—oo Determination of the mapping X(y) is through the following computation of the expectation. For any function / , Ep[f(x)} = EaDi{[f(X(y))}, (3.7) 10 R. E. CAFLISCH so that, using a change of variables, J f(x)p(x)dx = J f(X(y))dy = J f{x){dy/dx)dx. This implies that p(x) = dy/ dx = 1/X'(y) so that / P(X(y)) X(y) = y = P~\y). (3.8) (y ' p(x)dx = y; i.e., (3.9) This formulation is convenient and explicit but not necessarily easy to implement, as it may be difficult to compute the inverse of the function P. 3.4- Gaussian random variables As an example of the transformation method, a Gaussian (normal) random variable has density p and cumulative distribution function P given by p{x) = r J- = i + \ erf (x/V2), (3.10) in which erf is the error function defined by erf(z) = - = / e-t2dt. (3.11) V71" Jo One way to sample from a Gaussian distribution is to apply the inverse of the Gaussian cumulative distribution function P to a uniform random variable y. Sample a normal variable x, starting from a uniform variable y, by y = P(x) = ± + ±eTf(x/V2), (3.12) that is, x = v / 2erf~ 1 (2j/-l). (3.13) Approximate formulas for P^~1\y) or erf"1 are found in Kennedy and Gentle (1980). For the Gaussian distribution, as well as for a number of other distributions, special transformations are a useful alternative to the general transformation method. The simplest of these is the Box-Muller method. It provides a direct way of generating normal random variables without inverting the error function. Starting with two uniform variables y\,y2, two MONTE CARLO AND QUASI-MONTE CARLO 11 normal variables £i,£2 are obtained by ^/-21og(yi)cos(27n/2), = x2 = ^/-21og(yi)ain(27ry2). (3.14) Box-Muller is based on the following observation. First change from rectangular to polar coordinates, that is, (x\,X2) = (r cos 0,r sin 9), (3.15) dxidx2 = rdrd9. (3.16) so that The corresponding transformation of the probability density function is (2 7 r)- 1 e- (x i +x 2)/ 2 dxi dx 2 = (2Tr)'1e'r2/2r dr 66. (3.17) This shows that the angular variable y\ = 9/(2ir) is uniformly distributed over the unit interval. Next the variable r is easily sampled, since it has density re~r I2 and corresponding cumulative distribution function P(r) = f e"r'2/V dr' = 1 - e~^2. Jo (3.18) Therefore r can be sampled by ^ ^ ) . (3.19) After replacing 3/2 by 1 — 3/2, the resulting transform (VUV2) ^ {r,0)-* {xi,x2) (3.20) is given in (3.14). The only disadvantages of the Box-Muller method are that it requires evaluation of transcendental functions and that it generates two random variables when only one may be needed. See Marsaglia (1991) for examples of efficient use of Box-Muller. 3.5. Acceptance-rejection method Another general way of producing random variables from a given density p(x) is based on a probabilistic approach. This method shows the power and flexibility of stochastic methods. For a given density function p(x), suppose that we know a function q(x) satisfying q(x)>p(x), (3.21) and that we have a way to sample from the probability density q{x) = q(x)/I[q], (3.22) 12 R. E. CAFLISCH 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fig. 1. Typical choice of p, q and p/q. Accept for y < p/q and reject for y > p/q in which I[q] = J q(x') dx'. In practice, q is often chosen to be a constant. The acceptance—rejection procedure goes as follows. Pick two random variables, x', y, in which x' is a trial variable chosen according to the probability density q(x'), and y is a decision variable chosen according to the uniform density on 0 < y < 1. The decision is to • accept if 0 < y < p(x')/q(x') • reject if p(x')/q(x') < y < 1. This procedure is repeated until a value x' is accepted. Once a value x' is accepted, take x = x'. (3.23) This procedure is depicted graphically in Figure 1. Here is a derivation of the acceptance-rejection method. Since 0 < p < q, p(x) = q(x) (3.24) MONTE CARLO AND QUASI-MONTE CARLO 13 So, Jf(x)p(x)dx = J jT1 f(x)X(^>y)q(x)dydxI[q] f{<) i[q\ p(x'n)/q(x'n)>yn « TV"1 X ! /(*")> ( 3 - 25 ) accepted pts in which A7"' = N = « total number of trial points, total number of accepted points (3.26) N'/I[q]. The acceptance-rejection method has some obvious advantages that often make it the method of choice for generating random variables with a given distribution. It does not require inversion of the cumulative distribution function P. In addition, it works even if the density function p has not been normalized to have integral 1. One disadvantage of the method is that it may be inefficient, requiring many trials before acceptance. In practice this is often not a serious deficiency, since the largest share of the computation is on the integrand rather than on the random point selection. An extension of the acceptance—rejection method called the Metropolis algorithm (Kalos and Whitlock 1986) is used to find the equilibrium distribution for a stochastic process. 4. Variance reduction In Monte Carlo integration, the error e and the number N of samples are related by e = N O (aN'l/2) 2 = O(a/e) . , (4.1) (4.2) The computational time required for the method is proportional to the sample number N and thus is of size O(a/e)2, which shows that computational time grows rapidly as the desired accuracy is tightened. There are two options for acceleration (error reduction) of Monte Carlo. The first is variance reduction, in which the integrand is transformed in a way that reduces the constant variance a. The second is to modify the statistics, that is, to replace the random variables by an alternative sequence which improves the exponent 1/2. In this section, various strategies for variance reduction are described. In Section 5, we discuss quasi-random variables, an example 14 R. E. CAFLISCH of the second approach. One note of caution is that the acceleration methods described here may require extra computational time, which must be balanced against savings gained by reduced N. In most examples, however, the savings due to variance reduction are quite significant. 4-1. Antithetic variables Antithetic variables is one of the simplest and most widely used variance reduction methods. The method is as follows: for each (multi-dimensional) sample value x, also use the value — x. The resulting Monte Carlo quadrature rule is n) + f(-Xn)}. (4.3) 71=1 For example, the vector x could be the discrete states in a random walk, so that the dimension would be the number of time-steps in the walk. When antithetic variables are used for a problem involving a random walk, then for each path x = (x\,..., x^), we also use the path —x = (—x\,..., — x^). The use of antithetic variables is motivated by an expansion for small values of the variance. Consider, for example, the expectation E[f(x)] in which x is an JV(O, <r2) random variable with a small. Set x — ax. (4-4) The Taylor expansion of / = f(crx) (for small a) is x + O(a2). (4.5) Since the distribution of x is symmetric about 0, the average E[x] of the linear term is zero. In a standard Monte Carlo integral of / , these terms do not cancel exactly, so that the Monte Carlo error is proportional to a. With antithetic variables, on the other hand, the linear terms cancel exactly, so that the remaining error is proportional to a2. 4-2. Control variates The idea of control variates is to use an integrand g, which is similar to / and for which the integral I[g] = f g(x)dx is known. The integral /[/] is then written as j f{x) dx = J (f(x) - g(x)) dx + J g{x) dx. (4.6) Monte Carlo quadrature is used on the integral of / — g to obtain the formula 7 «[/] = Jj E (/(*") - 9(xn)) + I[9}. n=l (4.7) MONTE CARLO AND QUASI-MONTE CARLO 15 The resulting integration error ejv[/] = /[/] — IN[I] is of size eN[f] « aj-gN-1'2, (4.8) in which the relevant variance is °3f-g = J{fc)-9(x))2<te (4-9) using the notation f{x) = f{x) - /[/], (4.10) g(x) = gix) - I[g}. This shows that the control variate method is effective if CTf-g < Of. (4.11) Optimal use of a control variate is made by introducing a multiplier A. For a given function g, write the integral of / as j fix) dx = f ifix) - Xg(x)) dx + A J gix) dx. (4.12) The error in Monte Carlo quadrature of the first integral is proportional to the variance (4.13) dx. The optimal value of A is found by minimizing <7/_Ag t ° obtain E[fg]/E[g2] A = (y>) 4-3. (4.14) Matching moments method Monte Carlo integration error is partly due to statistical sampling error, that is, differences between the desired density function p(x) and the empirical density function of the sampled points {xn}^=1. Part of this difference can be seen directly through comparison of the moments of the two distributions. Define the first and second moments mi and 7712 of p as 77ii = / xp{x) dx, 7712 = / x2pix)dx. The first and second moments Mi a nd M2 of the sample {xn}^=1 N Ml = l N~ ^ (4-15) are N x n, M2 = A ^ n=l 1 y ^ x2 (4-16) n=l The moment error is due to the inequality of these moments, that is, Ml 7^ m i > M2 7^ rn2- (4-17) 16 R. E. CAFLISCH A partial correction to the statistical sampling error is to make the moments exactly match. This can be done by a simple transformation of the sample points. To match the first moment of the sample with that of p, replace xn by yn = (xn-fJ'i) + m1. (4.18) This satisfies l Yn (4.19) = mi so that the first moment is exactly right. To match the first two moments, replace xn by \Vtl2 Vn = (xn- mj/c + mi, c=J 771? f. V M - Mi . , (4.20) Then N-1Y,Vn = ml, N-1'£i& = rn2. (4.21) These transformed sequences with the correct moments must be used with some caution. Since the transformation involves Hi and ^2, the new sample points yn are no longer independent, and Monte Carlo error estimates are not so straightforward. For example, the Central Limit Theorem is not directly applicable, and the method may be biased. Actually, this is an example of the second approach to acceleration of Monte Carlo, through modification of the properties of the random sequence. It is presented here along with variance reduction methods, however, because the resulting improvement in Monte Carlo accuracy is comparable to that gained through variance reduction. Pullin (1979) has formulated a method in which the sample points are independent and have a prescribed empirical mean and variance, but come from a Gaussian distribution with a randomly chosen mean and variance. 4-4- Stratification Stratification combines the benefits of a grid with those of random variables. In the simplest case, stratification is based on a regular grid with uniform density in one dimension. Split the integration region fi = [0,1] into M pieces Q/- given by [W] Then, each subset has the same measure, |fifc| = 1/M. Define the averages over each fi^ by / » = fk = l^fcl"1 / f(x) dx for x € nk. (4.23) MONTE CARLO AND QUASI-MONTE CARLO 17 For each k, sample Nk = N/M points {x\ '} uniformly distributed in fife. Then the stratified quadrature formula is just the sum of the quadratures over each subset, that is, M N/M 1 IN = iV- £ £ / (*!fc)) • fc=l (4.24) i=l The Monte Carlo quadrature error for this stratified sum is °l = I (f(x) - f(x)f dx M f 2_j I — \J\XI Jk) ax - {^•zo) For this stratified quadrature rule, there is a simple result. Stratified Monte Carlo quadrature always beats the unstratified quadrature, since (4.26) as < a. The proof of this inequality is straightforward. For each k, c = fk is the minimizer of jf (f(x) - c)2 dx. (4.27) In particular, / (f(x)-fk)2dx<[ (f(x)-f)2dx. (4.28) Add this over all A; to get l = ° M TtL dx dx = a2. (4.29) Stratification can be phrased more generally as follows. Split the integration region f2 into M pieces fife with n = ujf=1nk. (4.30) Take Nk random variables in each piece fife with M Y^Nk = N. fc=i (4.31) 18 R. E. CAFLISCH In each subset £lk choose points x i distributed with density p^ (x) in which = x) p(x)/pk, Pk = (4.32) [ p(x)dx. The stratified quadrature formula is the following sum over k: M M/] = £ f fc=l - Nk iVfc n=l £/«)• (4.33) The resulting integration error is eN[f] = I[f]-IN[f] M = £$[/]• (4-34) fc=i The components of this error are (4.35) in which the variances are O \ V2 r ' (f(x)- nk f^)2v(-k)(x)dx) ) 1/2 (4.36) \JUk / and the averages are fk= ! f(x)p(x)dx/pk. (4.37) Jnk Stratification always lowers the integration error if the distribution of points is balanced. The balance condition is that, for all k, (4.38) pk/Nk = 1/N, that is, the number of points in set Clk is proportional to its weighted size pk- The resulting error for stratified quadrature is (fc)2 . (4.39) Since the variance over a subset is always less than the variance over the whole set, that is, as < a, (4.40) MONTE CARLO AND QUASI-MONTE CARLO 19 then stratification always lowers the integration error. Actually, a better choice than the balance condition may be to put more points where / has largest variation. An adaptive method for stratification was formulated by Lepage (1978) and is described by Press et al. (1992). 4-5. Importance sampling Importance sampling is probably the most widely used Monte Carlo variance reduction method. Consider the simple integral / f(x) dx and rewrite it by introducing a density p as nn = Now think of this as an integral with density function p. Sample points xn from the density p{x) and form the Monte Carlo estimate The resulting error epj[f} = I[f] — IN[I} has size eN[f] « apN-1'2, (4.43) in which the variance av is So importance sampling will reduce the Monte Carlo quadrature error, if Op <C u. (4.45) Importance sampling is an effective method when f/p is nearly constant, so that Up is small. One difficulty in this method is that sampling from the density p is required, but this can be performed using acceptance-rejection, if necessary. One use of importance sampling is to emphasize rare but important events, i.e., small regions of space in which the integrand / is large. 4-6. Russian roulette Some Monte Carlo computations involve infinite sums, for instance, due to iteration of an integral equation. The Russian roulette method converts an infinite sum into a sum that is of finite length for each sample. Consider the sum oo S = Y, an n=0 (4-46) 20 R. E. CAFLISCH and suppose that the terms an are exponentially decreasing, that is, an\ < c\n, (4.47) in which 0 < A < 1. Choose A < K < 1, and let M be chosen according to a discrete exponential distribution, so that Prob(M > n) = Kn. (4.48) Define the random sum M K n<1 S=J2 (4-49) ~ n- n=0 Since |K~ n a n | < for all M. Then and (A/K)™ < 1, then this sum is uniformly bounded |A/K| oo E[S] = Yl Prc-b(M > n)K-nan n=0 = (4.50) S. This formula leads to the following Monte Carlo method for computation of the infinite sum (4.46): N Mi SN = iv- E E K~na^ 1 (4-51) i=0 n=0 in which the values Mj are chosen according to the probability distribution (4.48). In this method, the terms an could also be computed by a Monte Carlo integral. 4-7. Example of variance reduction As an example of the use of variance reduction consider the three-dimensional Gaussian integral I[u\ = (2TT)-3/2 r f°° [°° mr^+^+^dzidzadzs, (4.52) J—oo J—oo J—oo in which the integrand u is defined by u = ri = r2 = r3 = (l + ro)-1a + ri)-1(l + r2)-1(l + r3)-1, ™'-*2/2 r2eax*-a2/2. (4.53) This integral can be interpreted as the present value of a payment of $1 four years from now, for an interest rate of r» in year i. The interest rates are MONTE CARLO AND QUASI-MONTE CARLO 21 allowed to evolve according to a lognormal model with variance a, that is, n = ri-1eaXi-a2/2, (4.54) in which the x[s are independent N(0,1) Gaussian random variables. The factors (1 + n ) " 1 are the discount factors. For the computations below, we take r 0 = 0.10 and a = 0.1. We evaluate I[u] by sampling X{ from an N(0,1) distribution using the transformation method, that is, by direct inversion of the cumulative distribution function for a normal random variable. Then we apply antithetic variables and control variates to this problem. Approximate (1 + r,)" 1 by 1 — r, for i > 1. Then form the control variate as g = (1 + ro)" 1 ^ - n ) ( l - r 2 )(l - rs). (4.55) Since g consists of a sum of linear exponentials, its integral can be performed exactly, for instance r J—o (4.56) Numerical results are presented in Figure 2. This compares the quadrature error from standard Monte Carlo, antithetic variables and control variates, using both standard pseudo-random points and the quasi-random (Sobol') points that will be discussed in the next section. In order to make a meaningful comparison, we have performed an average (root mean square) of the error over 20 runs for each value of N. The computations are all independent, that is, the same random number is never used twice. The figure also shows the results from a single run without averaging or variance reduction. The results show the following points. • Both control variates and antithetic variables provide significant error reduction. • Control variates and antithetic variables together perform better than either one alone. This shows that the error reduction from the two methods are different. • Quasi-Monte Carlo, which will be discussed in Section 5, has smaller error and a faster rate of convergence. • Control variates and antithetic variables, both separately and together, provide further error reduction when used with quasi-Monte Carlo. 22 R. E. CAFLISCH 10 —I n p 1 .—1 1—1—1—1 x Single : Average Antithetic Control var. - - CV-anti X X -10 c x \ : X " o • \ • X CM I \ • • \ • X • w \ \ \ 2 \ " • • " • • < v . 10 - X X X 10" 10' 10" J . 1 . 1 1 10' 10° 1 1 1 1 1 1—I—r-| 1 1 1 1 1 1—i—r- j 1 x 10" - 1 1 1 1 i I r-_ Single Average \ ^ : - - Antithetic X I : Control var. o - - - CV-anti CM i ^ ^ "- _ X X " . ' " • * . X LU ~^ X X 10" - X "Sl x X X -. • • • - . . . x ~ - x . ' ^ • : • s ; 10 10' 10' 10" 10' Fig. 2. Discounted cashflow, Monte Carlo (top) and quasi-Monte Carlo (bottom) MONTE CARLO AND QUASI-MONTE CARLO 23 5. Quasi-random numbers Quasi-random sequences are a deterministic alternative to random or pseudorandom sequences (Kuipers and Niederreiter 1974, Hua and Wang 1981, Niederreiter 1992, Zaremba 1968). In contrast to pseudo-random sequences, which try to mimic the properties of random sequences, quasi-random sequences are designed to provide better uniformity than a random sequence, and hence faster convergence for quadrature formulas. Uniformity of a sequence is measured in terms of its discrepancy, which is defined in Section 5.2 below, and for this reason quasi-random sequences are also called low-discrepancy sequences. Some have objected to the name quasi-random, since these sequences are intentionally not random. We continue to use this name, however, because of the unpredictability of quasi-random sequences. 5.1. Monte Carlo versus quasi-Monte Carlo The difference between standard Monte Carlo and quasi-Monte Carlo is best illustrated by Figure 3, which compares a pseudo-random sequence with a quasi-random sequence (a Sobol' sequence) in two dimensions. Standard Monte Carlo methods use pseudo-random sequences and provide a convergence rate of (^(iV"1/2) for Monte Carlo quadrature using N samples. In addition to integration problems, standard Monte Carlo is applicable to optimization and simulation problems. The limiting factor in accuracy of standard Monte Carlo is the clumping that occurs in the points of a random or pseudo-random sequence. Clumping of points, as well as spaces that have no points, is clearly seen in the pseudorandom sequence in Figure 3. The reason for this clumping is that the points are independent. Since different points know nothing about each other, there is some small chance that they will lie very close together. A simple argument shows that about i/N out of TV" points lie in clumps. Quasi-Monte Carlo methods use quasi-random sequences, which are deterministic, with correlations between the points to eliminate clumping. The resulting convergence rate is O((logN) k N~ l ). Because of the correlations, quasi-random sequences are less versatile than random or pseudo-random sequences. They are designed for integration, rather than simulation or optimization. On the other hand, the desired result of a simulation can often be written as an expectation, which is an integral, so that quasi-Monte Carlo is then applicable. As discussed below, this often introduces high dimensionality, which can limit the effectiveness of quasi-Monte Carlo sequences. 5.2. Discrepancy Quasi-random sequences were invented by number theorists who were interested in the uniformity properties of numerical sequences (Kuipers and Niederreiter 1974, Hua and Wang 1981). An important first step is formu- 24 R. E. CAFLISCH 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Fig. 3. Two-dimensional projection of a pseudo-random sequence (top) and a Sobol' sequence (bottom) MONTE CARLO AND QUASI-MONTE CARLO 25 lating a quantitative measure of uniformity. Uniformity of a sequence of points is measured in terms of its discrepancy. For a sequence of iV points {xn} in the unit cube Id, define RN( J) = ^#{xn € J} - m( J) (5.1) for any subset J of Id. RN(J) is just the Monte Carlo quadrature error in measuring the volume of J. The discrepancy is then defined by some norm applied to Rj^(J). First, restrict J to be a rectangular set, and define the set of all such subsets to be E. Since a rectangular set can be defined by two antipodal vertices, a rectangle J can be identified as J = J(x, y) in which the points x and y are antipodal vertices. Now define two discrepancies: DN, which is an L°° norm on J, and T/v, which is an 1? norm, defined by DN = J€E TN = f RN(J{x,y))2 dx dy 2d (5.2) J(x,y)£l ,xi<yi It is useful to also define the discrepancy over rectangular sets with one vertex at 0, as D*N = sup \RN(J)\ JeE* (5.3) in which E* is the set {J(0, x)}. Other definitions include terms from lowerdimensional projections of the sequence (Hickernell 1998). 5.3. Koksma-Hlawka inequality Quasi-random sequences are useful for integration because they lead to much smaller error than standard Monte Carlo quadrature. The basis for analysing quasi-Monte Carlo quadrature error is the Koksma-Hlawka inequality. Consider the integral /[/] = / /(*) dx (5.4) and the Monte Carlo approximation ^ / ( * n ) , (5.5) and define the quadrature error (5.6) 26 R. E. CAFLISCH Define the variation (in the Hardy-Krause sense) of / , a function of a single variable, as V[f] =Jof dt dt. (5.7) In d dimensions, the variation is denned as 9*f (5.8) • dtd in which /} is the restriction of the function / to the boundary X{ = 1. Since these restrictions are functions of d — 1 variables, this definition is recursive. Theorem 5.1 (Koksma-Hlawka theorem) For any sequence {xn} and any function / with bounded variation, the integration error e is bounded as e[f] < V[f] D*N. (5.9) It is instructive to compare the Koksma-Hlawka result to the root mean square error (RMSE) of standard Monte Carlo integration using a random sequence, that is, E W\2]l/2 = v[f]N~l/2, (5.10) in which °\f\ = <Kfid{f{x)-I[f])2dxY\ {x)dx-^Yjf(xn). (5.11) In comparing the Koksma-Hlawka inequality (5.9) and the RMSE for standard Monte Carlo integration (5.10), note the following points. • Both error bounds are a product of one factor that depends on the sequence {i.e., DN for Koksma-Hlawka and JV" 1 / 2 for RMSE) and a factor that depends on the function / {i.e., V[f] for Koksma-Hlawka and a[f] for RMSE). • The Koksma-Hlawka inequality is a worst-case bound, whereas the RMSE (5.10) is a probabilistic bound. • The RMSE (5.10) is an equality, and so it is tight, whereas KoksmaHlawka is an upper bound. • Our experience is that V[f] in Koksma-Hlawka is usually a gross overestimate, while the discrepancy is indicative of actual performance. MONTE CARLO AND QUASI-MONTE CARLO 27 A further, more subtle point is that for standard Monte Carlo each point is an estimate of the entire integral. A typical quasi-random sequence, on the other hand, has a hierarchical structure, so that the initial points sample the integrand on a coarse scale and later points sample it on a finer scale. This is the source of the log N terms in the discrepancy bounds cited below. 5.4- Proof of the Koksma-Hlawka inequality First consider functions / that vanish on the boundary of the unit cube. Note that 1 N } = <T-Y6(x-xn)-l\dx, N ^ in which R(x) = RN(J(0,X)) boundary terms, then (5.12) n=l as defined in Section 5.2. Since there are no N elf] = I n=l R(x)df{x) (supi?(a;)) D*NV[f}. (5.13) For / that is nonzero on the boundary of the unit cube, the terms from integration by parts are bounded by the boundary terms in V[/]. 5.5. Average integration error Most computational experience confirms that discrepancy is a good indicator of quasi-Monte Carlo performance, while variation is not a typical indicator. In 1990, Wozniakowski (Wozniakowski 1991) proved the following surprising result. Theorem 5.2 We have (5.14) in which the expectation E is taken with respect to functions / distributed according to the Brownian sheet measure. 28 R. E. CAFLISCH The Brownian sheet measure is a measure on function space. It is a natural generalization of simple Brownian motion b(x) to multi-dimensional 'time' x. With probability one, a function f(x), chosen from the Brownian sheet measure, is approximately 'half-differentiable'. In particular, its variation is infinite. In this context, we can definitely say that variation is not a good indicator of integration error. Since Wozniakowski's identity (5.14) is an equality, it follows that the I? discrepancy Tjy does agree with the typical size of quasi-Monte Carlo integration error. Our computational experience is that Tjy and DN are usually of the same size. Denote x' = {x'^)f=1 in which x\ = 1 — Xi\ also denote the finite difference operator Dif(x') = f (x'+Aii'j)—f (x'), in which e^ is the ith coordinate vector. The Brownian sheet is based at the point x' = 0, that is, x = ( 1 , . . . , 1), and has f(x' = 0) = / ( I , . . . , 1) = 0. For any point x' in / d and any set of positive lengths Aj (with x\ + Aj < 1), the multi-dimensional difference D\ ... Ddf(x) is a normally distributed random variable with mean zero and variance E[(D1... Ddf{x)f] = A x . . . Ad. (5.15) This implies that E[ df(x) df(y)} = 6(x - y) dx dy, (5.16) in which d/ is understood in the sense of the Ito calculus (Karatzas and Shreve 1991). Moreover f(x') = 0 if x\ = 0 for any i. For any x in Id, f(x) is normally distributed with mean zero and variance E[f(x)2} = f[x'i. (5.17) i=l Wozniakowski's proof of (5.14) was a straightforward calculation of both sides of the equality. The following proof, derived by Morokoff and Caflisch (1994), is based on the properties of the Brownian sheet measure and follows the same lines as the proof of the Koksma-Hlawka inequality. Proof of Wozniakowski's identity. First rewrite the integration error E[f] using integration by parts, following the steps of the proof of the KoksmaHlawka inequality. Note that ( 1 I N 1 n=l ) in which R(x) = RN(J(0,X)) as defined in Section 5.2. Also, R(x) — 0 if Xi = 0, and f(x) = 0 if x, = 1, for any i. This implies that the boundary MONTE CARLO AND QUASI-MONTE CARLO 29 terms all disappear in the following integration by parts: N e\f\ = //w*~E/w f Jid R(x)df(x) The quantity d/ in this identity is denned here through the Ito calculus, even though V[f] = 00 with probability one. It follows from (5.16) that the average square error is E[s[f\2} = E [ ldxld [ R(x)R(y)E[df(x)df(y)} R(x)2dx id 5.6. (5.19) Quasi-random number generators An infinite sequence {xn} is uniformly distributed if lim DN = 0. (5.20) N-*oo The sequence is quasi-random if DN < cQogAQkjV"1, (5.21) in which c and k are constants that are independent of N, but may depend on the dimension d. In particular, it is possible to construct sequences for which k = d. It is also common to say that a sequence is quasi-random only if the exponent k is equal to the dimension d of the sequence. The simplest example of a quasi-random sequence is the Van der Corput sequence in one dimension (d = 1) (Niederreiter 1992). It is described most simply as follows. Write out n in base 2. Then obtain the nth point xn by reversion of the bits of n around the decimal point, that is, n = a m a m _ i . . . aiao (base 2) (5.22) • • • a m -ia m (base 2). xn = Halton sequences (Halton 1960) are a generalization of this procedure. In the kth dimension, expand n in base p^, the fcth prime, and form the fcth 30 R. E. CAFLISCH component by reversion of this expansion around the decimal point. The discrepancy of a Halton sequence is bounded by £>jv(Halton) < cd(log A ^ i V " 1 (5.23) in which Cd is a constant depending on d. On the other hand, the average discrepancy of a random sequence is E[Tl (random)] 1 / 2 = c ^ " 1 / 2 . (5.24) Additional sequences go by the names of Haselgrove (Haselgrove 1961), Faure (Faure 1982), Sobol' (Sobol' 1967, 1976), Niederreiter (Niederreiter 1992, Xing and Niederreiter 1995), and Owen (Owen 1995, 1997, 1998, Hickernell 1996). Niederreiter has formulated a general theory for quasi-random sequences in terms of (t,s) nets (Niederreiter 1992). All of these sequences satisfy bounds like (5.23), but the more recent constructions have much better constants caAn algorithm for construction of a Sobol' sequence is found in Press et al. (1992) and for Niederreiter sequences in Bratley, Fox and Niederreiter (1994). Various quasi-random number generators are found in the software collection ACM-Toms at h t t p : / / w w w . n e t l i b . o r g / t o m s / i n d e x . h t m l . For quasi-random sequences with discrepancy of size O((log N)dN~l), the Koksma-Hlawka inequality implies that the integration error is of this same size, that is, e[f]<cV\f]{\ogN)dN-\ (5.25) Note that the Koksma-Hlawka inequality applies for any sequence. For quasi-random sequences, the discrepancy is small, so that the integration error is also small. 5.7. Limitations: smoothness and dimension The Koksma-Hlawka inequality and discrepancy bounds for a quasi-random sequence together imply that quasi-Monte Carlo quadrature converges much faster than standard Monte Carlo quadrature. On the other hand, there are several distinct limitations to the effectiveness of quasi-Monte Carlo methods (Morokoff and Caflisch 1994, 1995). First, there is no theoretical basis for empirical estimates of accuracy of quasi-Monte Carlo methods. Recall from Section 2 that the Central Limit Theorem can be used to test the accuracy of standard Monte Carlo quadrature as the computation proceeds, and then to predict the required number of samples N for a given accuracy level. Since there is no Central Limit Theorem for quasi-random sequences, there is no corresponding empirical error estimate for quasi-Monte Carlo. On the other hand, confidence in the accuracy of quasi-Monte Carlo integration comes from refining a set of computations. MONTE CARLO AND QUASI-MONTE CARLO 31 Second, quasi-Monte Carlo methods are designed for integration and are not directly applicable to simulation. This is because of the correlations between the points of a quasi-random sequence. However, in many simulations the desired result is an expectation of some quantity, which can then be written as a high-dimensional integral on which quasi-Monte Carlo can be used. In fact, we can think of the different dimensions of a quasi-random point as independent, so that quasi-random sequences can represent a simulation by allocating one dimension per time-step or decision. This approach often requires a high-dimensional quasi-random sequence. Third, quasi-Monte Carlo integration is found to lose its effectiveness when the dimension of the integral becomes large. This can be anticipated from on discrepancy. For large dimension d, this bound the bound (log N^N'1 is dominated by the (log N)d term unless N > 2d. (5.26) Fourth, quasi-Monte Carlo integration is found to lose its effectiveness if the integrand / is not smooth. The factor V[/] in the Koksma-Hlawka inequality (5.9) is an indicator of this dependence, although we actually find that a much smaller amount of smoothness, somewhere between continuity and differentiability, is usually enough. The limitation on smoothness is significant, because many Monte Carlo methods involve decisions of some sort, which usually correspond to multiplication by 0 or 1. In the next subsection, some computational evidence for limitations on dimension and smoothness will be presented. Techniques for overcoming these limitations will be discussed in Section 6. An important lesson from our computational experience is that quasiMonte Carlo integration is almost always as accurate as standard Monte Carlo integration. So the 'loss of effectiveness' cited above means that quasiMonte Carlo performs no better than standard Monte Carlo. The reasons for this are not clear, but it is consistent with the discrepancy computations presented in the next subsection. 5.8. Dependence on dimension To demonstrate how quasi-random sequences depend on dimension, we will present results from two computational tests. The first is a computation of I? discrepancy for a quasi-random sequence. The second is an examination of the one- and two-dimensional projections of quasi-random sequences. The L2 discrepancy T/v can be computed directly (Morokoff and Caflisch 1994) by an explicit formula. Figure 4 shows computation of T/v for a Sobol' sequence for dimension 4 and 16. In addition to Tjv, each graph shows a plot of TV"1/2 with a constant coming from the expected discrepancy of a random sequence. Each graph also shows a plot of N^1 (with a constant of 32 R. E. CAFLISCH 8 10 12 14 16 10 12 14 16 18 Fig. 4. Log-log (base 10) plot of discrepancy T/v for a Sobol' sequence in 4 dimensions (top) and for 16 dimensions (bottom) MONTE CARLO AND QUASI-MONTE CARLO 33 no significance). Since the plots are on a log-log scale, these reference curves are lines of slope — 1/2 and — 1. The figure for small dimension (4) and for large N, shows that T/v ~ O(N~1). We have not tried to match the logN terms, since we do not seem to be able to detect them reliably. On the other hand, the graph for large dimension (16), shows that for moderate sized N, TJV ~ O(N~1^2), with a constant that agrees with the discrepancy of a random sequence. Eventually, as N gets very large, T/v must be nearly of size O(N~1). The agreement between discrepancy of quasi-random and random sequences for large d and moderate N has not been explained. Next we consider one- and two-dimensional projections of a quasi-random sequence. Many, but not all, quasi-random sequences have the following special property. All one-dimensional projections of the sequence {i.e., all single components) are equally well distributed. The Sobol' sequence, for example, has this property. This implies that any function / that consists of a sum of one-dimensional functions will be well integrated by quasi-Monte Carlo, in other words, functions of the form /n(*n). (5.27) n=l In particular, quasi-Monte Carlo performs very well for linear functions d f(x) « ^ajXj. (5.28) i=l Now consider two-dimensional projections. Figure 5 shows a 'good' and a 'bad' two-dimensional projection of a Sobol' sequence. The top plot of the 'good' projection is very uniform, while the bottom plot of the 'bad' projection shows gaps and a repetitive structure to the points. For this projection, the holes would be filled in as the further points of the sequence are laid down. At certain special values of N, the sequence would be quite uniform. We have observed patterns of this sort in a variety of quasi-random sequences (Morokoff and Caflisch 1994), but they are not expected to occur in the randomized quasi-random points of Owen (1995, 1997). 6. Quasi-Monte Carlo techniques In this section, two techniques are described for overcoming the smoothness and high-dimension limitations of quasi-Monte Carlo. The first method is a smoothing method and will be demonstrated on the acceptance-rejection method, which involves a characteristic function in its standard form, coming from the decision to accept or reject. The second technique is a rearrangement of dimension, so as to put most of the weight of an integral into the leading dimensions of the quasi-random sequence. 34 R. E. CAFLISCH o.i 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Dimension 2 if niff iiiii S it It 1 !i p lil i 0.7 0.8 0.9 Dimension 27 Fig. 5. Good and bad two-dimensional projections of a Sobol' sequence MONTE CARLO AND QUASI-MONTE CARLO 6.1. 35 Smoothing The effectiveness of quasi-Monte Carlo is often lost on problems involving discontinuities, such as decisions in the acceptance-rejection method. One way to overcome this difficulty is to smooth out the integrand without changing the value of the integral. Here we describe a smoothed version of the acceptance-rejection method and compare it with standard acceptancerejection, when used with a quasi-random sequence. Standard acceptance—rejection can be expressed in an integration problem in the following way. Consider the integral of a function / multiplying a function p on the unit interval, with 0 < p(x) < 1. Rewrite the integral as f1 f1 f1 / f(x)p(x)dx = Jo / f(x)X(y <p(x))dydx. (6.1) Jo Jo The acceptance-rejection method is a slight modification of the usual Monte Carlo formula for this integral, that is, • sample (x, y) uniformly • weight w = x{y < p(%)) — 1 (accept), if y < p(x) • weight w = x(y < P(x)) — 0 (reject), if y > p{x) • repeat until number of accepted points is N. The Monte Carlo quadrature formula is then M Wif(xi), (6.2) in which M (6.3) The smoothed acceptance-rejection method (Spanier and Maize 1994, Moskowitz and Caflisch 1996) is formulated by replacing the discontinuous function x(y < p(x)) by a smooth function q(x,y) satisfying / q(x,y)dy=p(x). Jo For example, one choice of q is a piecewise linear function: { 1 if y < p(x) - e, 0 if y > p(x) + e, linear p(x) — e < y < p(x) + e. The integral can then be rewritten as / f(x)p(x)dx= Jo (6.4) f f f(x)q(x,y)dydx, Jo Jo (6.5) (6.6) 36 R. E. CAFLISCH 10' 10" -10' CO I 2 LLJ 10 KEY (convergence rate) 10 - Raw Monte Carlo: (-.52) Weighted unit, sampling: (-.49) Rejection method: . . . (-.50) Smooth rej., delta = .2: _ . _ (-.51) 10"" 10' N 10° 10° 10° 10" 10" 10" I UJ KEY (convergence rate) 10"'- 10- Raw Monte Carlo: (-.96) Weighted unif. sampling: (-96) Rejection method: . . . (-.66) Smooth rej., delta = .2: _ . _ (-.88) 10" 10' N Fig. 6. Smoothed acceptance-rejection for Monte Carlo (top) and quasi-Monte Carlo (bottom) MONTE CARLO AND QUASI-MONTE CARLO 37 and the sampling procedure is the following: • sample (x, y) uniformly • set the weight w = q(x, y) • repeat until M (6.7) ,yi)~N. The quadrature formula, using smoothed acceptance-rejection, is then M 1 JN = N- J2<l(xi,yi)f(xi), (6.8) in which M (6.9) q{xi,yi)~N. Figure 6 shows the results of standard and smoothed acceptance—rejection for Monte Carlo using both a pseudo-random sequence and a quasi-random sequence. The integral evaluated here (Moskowitz and Caflisch 1996) has the form (6.10) x)ax, JI< in which I7 = [0,1]7 and p(x) = exp | - ^sin2 (^|^ij + sin2 \^x2j + sin2 (^^^a^-ij) J) J-, /(a;) = e arcsm sin(l)+ — . (6.11) The integral / is evaluated in several ways. First, by 'raw Monte Carlo', by which we mean evaluation of the product f(x)p(x) at points from a uniformly distributed sequence. Second, points are sampled from the (unnormalized) density function p using acceptance-rejection, then / is evaluated at these points. Third, standard acceptance-rejection is replaced by smoothed acceptance-rejection. Finally, all three of these methods are performed both with pseudo-random and quasi-random points. These numerical results show the following. • Importance sampling, using acceptance-rejection, reduces variance and leads to smaller errors for both Monte Carlo and quasi-Monte Carlo. • For Monte Carlo, smoothing has little effect on errors, since it does not change the variance. • Without importance sampling, the quasi-Monte Carlo method converges at a rapid rate, but with a constant that is larger than for the method with importance sampling. 38 R. E. CAFLISCH • For quasi-Monte Carlo, the error for unsmoothed acceptance-rejection decreases at a slower rate, because of discontinuities involved in the acceptance-rejection decision. • For quasi-Monte Carlo, the rapid convergence rate is regained using smoothing. Although smoothed acceptance—rejection regains much of the effectiveness of quasi-Monte Carlo, it entails a loss of efficiency. Since fractional weights are involved, more accepted samples are required to get total weight N. Moreover, we do not have an effective quasi-Monte Carlo method for the Metropolis algorithm (Kalos and Whitlock 1986), which is a stochastic process involving acceptance-rejection. 6.2. Dimension reduction: Brownian bridge method For problems in a dimension of moderate size, quasi-Monte Carlo provides a significant speed-up over standard Monte Carlo. Many of the most important applications of Monte Carlo, however, involve integrals of a very high dimension. One class of examples comprises path integrals involving Brownian motion b(t). A typical example is a Feynman-Kac integral (Karatzas and Shreve 1991) of the form I =E j f(b(t))expi f \{b{s),s)ds\ dt (6.12) In order to evaluate this expectation by Monte Carlo, we need to discretize the time in the Brownian motion to obtain a random walk with M steps. Then the integral becomes an integral over the M steps of a random walk, each of which is distributed by a normal distribution. An accurate representation of the integral requires a large number M of time-steps, which results in an integral of large-dimension M. Because of the high dimension, we find that quasi-Monte Carlo loses much of its effectiveness on such a problem. The main point of this section is to show how a rearrangement of the dimensions can regain the effectiveness of quasi-Monte Carlo for this problem. The standard discretization of a random walk is to represent the position b(t + At) in terms of the previous position b(t) by the formula b(t + At) = b(t) + VAt v, (6.13) in which v is an N(0,1) random variable. Using a sequence of independent samples of is, we can generate the random walk sequentially by j/o = 0, y i = b(At), y 2 = b(2At),.... (6.14) This representation leads to the M-dimensional integral discussed above. MONTE CARLO AND QUASI-MONTE CARLO 39 16 14 12 10 • 8 6 42- -3 -2 -1 Fig. 7. Standard discretization of random walk The effectiveness of quasi-Monte Carlo will be regained using an alternative representation of the random walk, the Brownian bridge discretization, which was first introduced as a quasi-Monte Carlo technique by Moskowitz and Caflisch (1996). This representation relies on the following Brownian bridge formula (Karatzas and Shreve 1991) for b(t + At\), given b(t) and b(T = t + Ah + At2): b(t + Ah) = ab(t) + (1 - a)b(T) + cu, (6.15) in which (6.16) a = At2/(Ah + At2) Using this representation, the random walk can be generated by successive subdivision. Suppose for simplicity that M is a power of 2. Then generate the random walk in the following order: V0 = 0 , D M , V M / 2 , VM/4, V3M/4, •••• (6-17) The standard discretization and Brownian bridge discretization are represented schematically in Figures 7 and 8. The significance of this representation is that it first chooses the large time-steps over which the changes in b(t) are large. Then it fills in the small time-steps in between, in which the changes in b(t) are quite small. The 40 R. E. CAFLISCH 1614 12 10 864 -3 -2 -1 Fig. 8. Brownian bridge discretization of random walk advantage of this representation is that it concentrates the variance into the early, large time-steps. This is much like a principal component analysis (Acworth, Broadie and Glasserman 1997). Although the actual dimension of the problem is not changed, in some sense the effective dimension of the problem is lowered, so that quasi-Monte Carlo retains its effectiveness. To make this statement quantitative, suppose that, at some given value of N, the discrepancy is of size N^1 for dimension d (omitting logarithmic terms for simplicity), but is of size N~ll2 for the remaining dimensions. We expect that the integration error is roughly of the size of the variance times the discrepancy. Using the Brownian bridge discretization, the variance over the first d dimensions isCTO,which is about the same size as the original value of a, whereas the variance over the remaining M — d dimensions is ai, which is much smaller, that is, (j\ <?C fo ~ &• (6.18) Denote e s and Ebb to be the errors for quasi-Monte Carlo using the standard discretization and the Brownian bridge discretization, respectively. Then, approximately, es = ebb = (6.19) 41 MONTE CARLO AND QUASI-MONTE CARLO 10' - 10" 10 - I LJJ 10 KEY (convergence rate) Pseudo-random, std discr.: 10- (-•51) Halton seq., std discr.: _ • _ (-.77) Pseudo-random, alt. discr.: . . .(-.so) Halton seq., alt. discr.: 10' ; (-.95) , 10"' ^ , t 10° Fig. 9. Log-log plot for Feynman-Kac integral, 75 runs, T = 0.08, m = 32 The ordering (6.18) then implies that (6.20) £bb < This shows that the Brownian bridge discretization can provide a significant improvement in quasi-Monte Carlo integration for path integral problems. As an example, consider an integral of the form (6.12), in which X(x,t) = 1 1 —r + ^ r ~ 4x 2 (6.21) According to the Feynman-Kac formula (Karatzas and Shreve 1991), the integral / = F(x, t), in which x is the starting point of the Brownian motion, solves the following linear parabolic differential equation: ^ t\ = I S^L(X t) + X(x t) — (x t) dx with F(x,0) = f(x). (6.22) 42 R. E. CAFLISCH The exact solution is F(x,t) = (t + l)(x2 + I ) " 1 . Computational results for this problem of Moskowitz and Caflisch (1996) are presented in Figure 9, at time T = 0.08 with M = 32. The results show the following. • For Monte Carlo using pseudo-random numbers, the error is the same for the standard and Brownian bridge discretizations. This is because the variance of the two is the same. • For quasi-Monte Carlo with the standard discretization, the error is only a little less than for standard Monte Carlo; i.e., the effectiveness of quasi-Monte Carlo has been lost for this high-dimension (M = 32). • With the Brownian bridge discretization, the error for quasi-Monte Carlo is substantially reduced, in terms of both the rate of convergence and the constant; i.e., the effectiveness of quasi-Monte Carlo has been regained. It is necessary to use the transformation method (Section 3.3) rather than the Box-Muller method (Section 3.4), since the latter method has large gradients (Morokoff and Caflisch 1993). Similar results have been obtained on problems from computational finance by Caflisch, Morokoff and Owen (19976). 7. Monte Carlo methods for rarefied gas dynamics Computations for rarefied gas dynamics, as occur in the outer parts of the atmosphere, are difficult since the gas is represented by a distribution function over space, time and molecular velocity. For general problems, the only effective method has been a Monte Carlo method, as described below. One reason for including this application of Monte Carlo in this survey is that formulation of an effective Monte Carlo method in the fluid dynamic limit is an open problem. 7.1. The Boltzmann equation The kinetic theory of gases describes the behaviour of a gas in which the density is not too large, so that the only interactions between gas particles are binary collisions. The resulting nonlinear Boltzmann equation, for the molecular distribution function F(x,£,t), is a basic equation of nonequilibrium statistical mechanics (Cercignani 1988, Chapman and Cowling 1970). The collision operator is Q(F,F)(C = | ( F « i ) F « ' ) - F(^)F^))B(LJ, \d ~ *|)da;d4i, (7.2) MONTE CARLO AND QUASI-MONTE CARLO 43 in which £, £j represent velocities before a collision, £', £[ represent velocities after a collision, and the collision parameters are represented by w € 5 2 , where S2 is the unit sphere in M3. The parameter e is a measure of the 'mean free time', which is defined as the characteristic time between collisions of a particle. For a molecular distribution F, the macroscopic (or fluid dynamic) variables are the density p, velocity u and temperature T defined by P = u = T = (3P)-1 J\£-u\2Fdt (7.3) The importance of these moments of F is that they correspond to conserved quantities, namely, mass, momentum and energy, for the collisional process, which is expressed as follows: Q(F,F)d£ / = 0, £Q(F,F)d$ = 0, 2 = 0. (7.4) These also provide the degrees of freedom for the equilibrium case, the Maxwellian distributions M(£, p, u, T) = p(27rr)- 3 / 2 exp ( - | £ - u\2/2T) (7.5) . In the limit e —> 0, the solution F of ( 7.1) is given by the Hilbert (or Chapman-Enskog) expansion, in which the leading term is a Maxwellian distribution of the form (7.5), in which p, u, T satisfy the compressible Euler (or Navier-Stokes) equations of fluid dynamics (Chapman and Cowling 1970). The Euler equations are pt + V • (up) (pu)t + V • (uup) + V(pT) = = 0, 0, This expansion is not valid in layers around shocks, boundaries and nonequilibrium initial data, where special boundary, shock or initial layer expansions can be constructed (Caflisch 1983). By varying this limit, taking the velocity u to be size e as well, the compressible equations are replaced 44 R. E. CAFLISCH by the incompressible fluid equations (Bardos, Golse and Levermore 1991, 1993, De Masi, Esposito and Lebowitz 1989). 7.2. Particle methods In particle methods for transport theory, the distribution F(x., £, t) is represented as the sum N FN(X,£,t) = J26(Z- U*)W* - *»(*)), (7-7) 71=1 and the positions x n (£) and velocities £n(t) are evolved in time to simulate the effects of convection and collisions. The most common particle methods use random collisions between a reasonable number of particles (e.g., 10 3 106) to simulate the dynamics of many particles (e.g., 10 23 ). In the direct simulation Monte Carlo (DSMC) method pioneered by Bird (1976, 1978), the numerical method is designed to simulate the physical processes as closely as possible. This makes it easy to understand and to insert new physics; it is also numerically robust. First, space and time are discretized into spatial cells of size Ax3 and time-steps of duration At. In each time-step the evolution is divided into two steps: transport and collisions. For the transport step each particle is moved from position x n (£) to xn(t + At) = xn(t) + At£„(*). In the collision step, random collisions are performed between particles within each spatial bin. In each collision, particles £ n and £ m are chosen randomly from the full set of particles in a bin, with probability pmn given by _ ff([srn £nl) "52l<i<j<N0 S(\£i — £j\) in o\ in which No is the number of particles in the spatial bin, and s2 ,\£m-U)&» (7-9) is the total collision rate between particles of velocity £j and £,•. As written, this choice requires O(N2) operations to evaluate the sum in the denominator in (7.8). By a standard acceptance—rejection scheme, however, each choice can be made in 0(1) steps, so that the total method requires only O(N) steps. Next the collision parameters u> are randomly chosen from a uniform distribution on S2. The outcome of the collision is two new velocities £'n and £'m which replace the old velocities £ n and £ m . The number of collisions performed in each time-step has been determined by several methods. In the original 'time-counter' (TC) method, a collision MONTE CARLO AND QUASI-MONTE CARLO 45 time Atc is determined. It is equal to one over the frequency for that collision type, that is, Atc = 2(nN0S^m-U))~1, (7.10) in which n is the number density of particles. This is added to the timecounter tc = J2 Atc. In the time interval of length At beginning at t, collisions are continued until tc exceeds the final time, that is, until tc > t + At. For N particles this method has operation count O(N). The unlikely possibility of choosing a collision with very small frequency can result in a large collisional time-step that may cause relatively large errors. To remove such errors, Bird (1976) has developed a 'no time-counter' (NTC) method. This method uses a maximum collision probability 5 m a x . The number of collisions to be performed in time-step dt is chosen as if the collision frequency were exactly 5 m a xforall collision pairs. For each selection of a collision pair, the collision between £ m and £„ is then performed with probability S(\£m — £n\)/Smax, as in an acceptance-rejection scheme. The DSMC method with the time-counter or no-time-counter algorithm has been enormously successful. A rigorous convergence result for DSMC was proved by Wagner (1992). Several related methods have been developed by other researchers, including Koura (1986), Nanbu (1986), and Goldstein, Sturtevant and Broadwell (1988). Additional modifications of Nanbu's method, including use of quasi-random sequences, have been developed by Babovsky, Gropengiesser, Neunzert, Struckmeier and Wiesen (1990). 7.3. Methods with the correct diffusion limit One of the most difficult problems of transport theory is that there may be wide variation of mean free paths within a single problem. In regions where the mean free path is large, there are few collisions, so that large numerical time-steps may be taken, whereas in regions where the mean free path is small, the time- and space-steps must be small. Thus the highly collisional regions determine the numerical time-step, which may make computations impractically slow. On the other hand, much of the extra effort in the collisional regions seems wasted, since in those regions the gas will be nearly in fluid dynamic equilibrium, so that a fluid dynamic description (7.6) should be valid. A partial remedy to this problem is to use a numerical method that converts to a numerical method for the correct fluid equations in regions where the mean free time is small. This allows large, fluid dynamic time-steps in the collisional region and large collisional time-steps in the large mean free time region. Consider a discretization of the Boltzmann equation (7.1) with discrete space and time scales Ax and At. If Ax, At are much smaller than e, then 46 R. E. CAFLISCH the Boltzmann solution is well resolved. On the other hand, in regions of small mean free path, we want to let the discretization scale be much larger than the collision scale e. So we consider the limit in which the numerical parameters Ax, At are held fixed, while e —> 0. We say that the discretized equation has the correct diffusion limit, if in this limit the solution of the discrete equations for (7.1) goes to the solution for a discretization of the fluid equation (7.6). In the context of linear neutron transport, Larsen and co-workers (Larsen, Morel and Miller 1987, Borgers, Larsen and Adams 1992) have investigated the diffusion limit for a variety of difference schemes and have found that many of them have the correct diffusion limit only in special regimes. They have constructed some alternative methods that always have the correct diffusion limit. Jin and Levermore (1993) have applied a similar procedure to an interface problem to get a constraint on the quadrature set for the discretization of a scattering integral. Another class of methods that uses information from the diffusion limit to improve a numerical transport methods is the family of 'diffusion synthetic acceleration methods' (Larsen 1984). For the Broadwell model of the Boltzmann equation, finite difference numerical methods with the correct diffusion limit have been developed (Caflisch, Jin and Russo 1997a, Jin, Pareschi and Toscani 1998). These use implicit differencing for the collision step, because it is a stiff problem with rate e" 1 . For the nonlinear Boltzmann equation, on the other hand, Monte Carlo methods are the most practical because of the large number of degrees of freedom. Application of the methods would require some kind of implicit step to handle the collisions, but no such method has been formulated so far. We believe that a method that uses fluid dynamic information to improve the collisional step for rarefied gas dynamic computations would have great impact. Some important partial steps in this direction have been taken (Gabetta, Pareschi and Toscani 1997), but they have not yet been extended to particle methods such as DSMC. REFERENCES P. Acworth, M. Broadie and P. Glasserman (1997), A comparison of some Monte Carlo and quasi Monte Carlo techniques for option pricing, in Monte Carlo and Quasi-Monte Carlo Methods 1996 (G. Larcher, H. Niederreiter, P. Hellekalek and P. Zinterhof, eds), Springer. H. Babovsky, F. Gropengiesser, H. Neunzert, J. Struckmeier and J. Wiesen (1990), 'Application of well-distributed sequences to the numerical simulation of the Boltzmann equation', J. Comput. Appl. Math. 31, 15-22. C. Bardos, F. Golse and C. D. Levermore (1991), 'Fluid dynamic limits of kinetic equations: I. Formal derivations', J. Statist. Phys. 63, 323-344. MONTE CARLO AND QUASI-MONTE CARLO 47 C. Bardos, F. Golse and C. D. Levermore (1993), 'Fluid dynamic limits of kinetic equations: II. Convergence proofs for the Boltzmann equation', Comm. Pure Appl. Math. 46, 667-753. G. A. Bird (1976), Molecular Gas Dynamics, Oxford University Press. G. A. Bird (1978), 'Monte Carlo simulation of gas flows', Ann. Rev. Fluid Mech. 10, 11-31. C. Borgers, E. W. Larsen and M. L. Adams (1992), 'The asymptotic diffusion limit of a linear discontinuous discretization of a two-dimensional linear transport equation', J. Comput. Phys. 98, 285-300. P. Bratley, B. L. Fox and H. Niederreiter (1994), 'Algorithm 738 - Programs to generate Niederreiter's discrepancy sequences', A CM Trans. Math. Software 20, 494-495. R. E. Caflisch (1983), Fluid dynamics and the Boltzmann equation, in Nonequilibrium Phenomena I: The Boltzmann equation (E. W. Montroll and J. L. Lebowitz, eds), Vol. 10 of Studies in Statistical Mechanics, North Holland, pp. 193-223. R. E. Caflisch, S. Jin and G. Russo (1997a), 'Uniformly accurate schemes for hyperbolic systems with relaxation', SIAM J. Numer. Anal. 34, 246-281. R. E. Caflisch, W. Morokoff and A. B. Owen (19976), 'Valuation of mortgage-backed securities using Brownian bridges to reduce effective dimension', J. Comput. Finance 1, 27-46. C. Cercignani (1988), The Boltzmann Equation and its Applications, Springer. S. Chapman and T. G. Cowling (1970), The Mathematical Theory of Non-Uniform Gases, Cambridge University Press. A. De Masi, R. Esposito and J. L. Lebowitz (1989), 'Incompressible NavierStokes and Euler limits of the Boltzmann equation', Comm. Pure Appl. Math. 42, 1189-1214. H. Faure (1982), 'Discrepance de suites associees a un systeme de numeration (en dimension s)\ Ada Arithmetica 41, 337-351. W. Feller (1971), An Introduction to Probability Theory and its Applications: Vol. I, Wiley. E. Gabetta, L. Pareschi and G. Toscani (1997), 'Relaxation schemes for nonlinear kinetic equations', SIAM J. Numer. Anal. 34, 2168-2194. D. Goldstein, B. Sturtevant and J. E. Broadwell (1988), Investigations of the motion of discrete-velocity gases, in Rarefied Gas Dynamics: Theoretical and Computational Techniques (E. P. Muntz, D. P. Weaver and D. H. Campbell, eds), Vol. 118 of Progress in Aeronautics and Astronautics, Proceedings of the 16th International Symposium on Rarefied Gas Dynamics, pp. 100-117. J. H. Halton (1960), 'On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals', Numer. Math. 2, 84-90. J. M. Hammersley and D. C. Handscomb (1965), Monte Carlo Methods, Wiley, New York. C. Haselgrove (1961), 'A method for numerical integration', Math. Comput. 15, 323-337. F. J. Hickernell (1996), 'The mean square discrepancy of randomized nets', ACM Trans. Model. Comput. Simul. 6, 274-296. 48 R. E. CAFLISCH F. J. Hickernell (1998), 'A generalized discrepancy and quadrature error bound', Math. Comput. 67, 299-322. R. V. Hogg and A. T. Craig (1995), Introduction to Mathematical Statistics, Prentice Hall. L. K. Hua and Y. Wang (1981), Applications of Number Theory to Numerical Analysis, Springer, Berlin/New York. S. Jin and C. D. Levermore (1993), 'Fully-discrete numerical transfer in diffusive regimes', Transp. Theory Statist. Phys. 22, 739-791. S. Jin, L. Pareschi and G. Toscani (1998), 'Diffusive relaxation schemes for discretevelocity kinetic equations'. Preprint. M. H. Kalos and P. A. Whitlock (1986), Monte Carlo Methods, Vol. I: Basics, Wiley, New York. I. Karatzas and S. E. Shreve (1991), Brownian Motion and Stochastic Calculus, Springer, New York. W. J. Kennedy and J. E. Gentle (1980), Statistical Computing, Dekker, New York. K. Koura (1986), 'Null-collision technique in the direct-simulation Monte Carlo method', Phys. Fluids 29, 3509-3511. L. Kuipers and H. Niederreiter (1974), Uniform Distribution of Sequences, Wiley, New York. E. W. Larsen (1984), 'Diffusion-synthetic acceleration methods for discreteordinates problems', Transp. Theory Statist. Phys. 13, 107-126. E. W. Larsen, J. E. Morel and W. F. Miller (1987), 'Asymptotic solutions of numerical transport problems in optically thick, diffusive regimes', J. Comput. Phys. 69, 283-324. G. Lepage (1978), 'A new algorithm for adaptive multidimensional integration', J. Comput. Phys. 27, 192-203. G. Marsaglia (1991), 'Normal (Gaussian) random variables for supercomputers', J. Supercomputing 5, 49-55. W. Morokoff and R. E. Caflisch (1993), 'A quasi-Monte Carlo approach to particle simulation of the heat equation', SIAM J. Numer. Anal. 30, 1558-1573. W. Morokoff and R. E. Caflisch (1994), 'Quasi-random sequences and their discrepancies', SIAM J. Sci. Statist. Comput. 15, 1251-1279. W. Morokoff and R. E. Caflisch (1995), 'Quasi-Monte Carlo integration', J. Comput. Phys. 122, 218-230. B. Moskowitz and R. E. Caflisch (1996), 'Smoothness and dimension reduction in quasi-Monte Carlo methods', J. Math. Comput. Modeling 23, 37-54. K. Nanbu (1986), Theoretical basis of the direct simulation Monte Carlo method, in Proceedings of the 15th International Symposium on Rarefied Gas Dynamics (V. Bom and C. Cercignani, eds), Vol. I, pp. 369-383. H. Niederreiter (1992), Random Number Generation and Quasi-Monte Carlo Methods, SIAM, Philadelphia, PA. A. B. Owen (1995), Randomly permuted (t, m, s)-nets and (t, s)-sequences, in Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (H. Niederreiter and P. J.-S. Shiue, eds), Springer, New York, pp. 299-317. A. B. Owen (1997), 'Monte Carlo variance of scrambled net quadrature', SIAM J. Numer. Anal. 34, 1884-1910. MONTE CARLO AND QUASI-MONTE CARLO 49 A. B. Owen (1998), 'Scrambled net variance for integrals of smooth functions', Ann. Statist. In press. W. H. Press and S. A. Teukolsky (1992), 'Portable random number generators', Corn-put. Phys. 6, 522-524. W. H. Press, S. A. Teukolsky, W. T. Vettering and B. P. Flannery (1992), Numerical Recipes in C: The AH of Scientific Computing, 2nd edn, Cambridge University Press. D. I. Pullin (1979), 'Generation of normal variates with given sample and mean variance', J. Statist. Comput. Simul. 9, 303-309. I. M. Sobol' (1967), 'The distribution of points in a cube and the accurate evaluation of integrals', Zh. Vychisl. Mat. Mat. Fiz. 7, 784-802. In Russian. I. M. Sobol' (1976), 'Uniformly distributed sequences with additional uniformity property', USSR Comput. Math. Math. Phys. 16, 1332-1337. J. Spanier and E. H. Maize (1994), 'Quasi-random methods for estimating integrals using relatively small samples', SIAM Rev. 36, 18-44. W. Wagner (1992), 'A convergence proof for Bird's Direct Simulation Monte Carlo method for the Boltzmann equation', J. Statist. Phys. 66, 1011-1044. H. Wozniakowski (1991), 'Average case complexity of multivariate integration', Bull. Amer. Math. Soc. 24, 185-194. C. P. Xing and H. Niederreiter (1995), 'A construction of low-discrepancy sequences using global function fields', Ada Arithmetica 73, 87-102. S. K. Zaremba (1968), 'The mathematical basis of Monte Carlo and quasi-Monte Carlo methods', SIAM Rev. 10, 303-314.