N13 1p
N13 1p
N13 1p
Driver
Contents
Part Homework Problems: -1 Math 280B Homework Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.1 Homework 1. Due Monday, January 22, 2007 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.2 Homework 2. Due Monday, January 29, 2007 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.3 Homework #3 Due Monday, February 5, 2007 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.4 Homework #4 Solutions (Due Friday, February 16, 2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.5 Homework #5 Solutions (Due Friday, February 23, 2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1.6 Homework #6 Solutions (Due Friday, March 2, 2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Math 280A Homework Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.1 Homework 1. Due Friday, September 29, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2 Homework 2. Due Friday, October 6, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.3 Homework 3. Due Friday, October 13, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.4 Homework 4. Due Friday, October 20, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.5 Homework 5. Due Friday, October 27, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.6 Homework 6. Due Friday, November 3, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.7 Homework 7. Due Monday, November 13, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.7.1 Corrections and comments on Homework 7 (280A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.8 Homework 8. Due Monday, November 27, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.9 Homework 9. Due Noon, on Wednesday, December 6, 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 6 6
Contents
Part II Formal Development 3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Algebraic sub-structures of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finitely Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Finitely Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Examples of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Simple Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Simple Independence and the Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Constructing Finitely Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Countably Additive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Distribution Function for Probability Measures on (R, BR ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Construction of Premeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Regularity and Uniqueness Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Construction of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Completions of Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 A Baby Version of Kolmogorovs Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 21 21 25 25 26 28 30 32 35 35 35 37 38 41 42
Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Factoring Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 and Monotone Class Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 The Monotone Class Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Basic Properties of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 An Example of Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Borel-Cantelli Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Kolmogorov and Hewitt-Savage Zero-One Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 A Quick Introduction to Lebesgue Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Integrals of positive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Integrals of Complex Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Densities and Change of Variables Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Measurability on Complete Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Comparison of the Lebesgue and the Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 53 54 58 59 63 67 67 70 74 78 81 81 83
Page: 4
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
Contents
8.7.1 Laws of Large Numbers Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 9 Functional Forms of the Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 91 91 91 93 95 96 102 104 106 107 107 111 113 113 115 115 119 119
10 Multiple and Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Tonellis Theorem and Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Fubinis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Fubinis Theorem and Completions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Lebesgue Measure on Rd and the Change of Variables Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 The Polar Decomposition of Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 More Spherical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Jensens, H olders and Minikowskis Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Completeness of Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Relationships between dierent Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Appendix: Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part III Convergence Results 12 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Random Series Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 A WLLN Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Strong Law of Large Number Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 More on the Weak Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Maximal Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Kolmogorovs Convergence Criteria and the SSLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8 Necessity Proof of Kolmogorovs Three Series Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Weak Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Total Variation Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Derived Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 126 128 128 129 130 132 133 136 138 140 143 143 144 147
Page: 5
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
Contents
Skorohod and the Convergence of Types Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weak Convergence Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compactness and Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weak Convergence in Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149 152 156 157 161 161 164 165 169 171 173 176 177
14 Characteristic Functions (Fourier Transform) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Basic Properties of the Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 A Fourier Transform Inversion Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 Appendix: Bochners Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 Appendix: A Multi-dimensional Weirstrass Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.8 Appendix: Some Calculus Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15 Weak Convergence of Random Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 15.1 Innitely Divisible and Stable Symmetric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 15.1.1 Stable Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Part IV Conditional Expectations and Martingales 16 Hilbert Space Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 17 The Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 18 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Additional Properties of Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Regular Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 Appendix: Standard Borel Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 204 207 208 209
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Page: 6
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
Part
Homework Problems:
k Note that P (s) = k=0 pk s is well dened and continuous (by DCT) for s [1, 1] . So the derivative makes sense to compute for s (1, 1) with no qualications. When s = 1 you should interpret the derivative as the one sided derivative d P (1) P (1 h) |1 P (s) := lim h0 ds h
and you will need to allow for this limit to be innite in case k=1 kpk = . d In computing ds |1 P (s) , you may wish to use the fact (draw a picture or give a calculus proof) that 1 sk increases to k as s 1. 1s Hint for Exercise 8.20: Start by observing that Sn n
4
d = E = 1 n4
1 n
(Xk )
k=1 n
Then analyze for which groups of indices (k, j, l, p); E [(Xk )(Xj )(Xl )(Xp )] = 0.
* For Problem 18, please add the missing assumption that the random variables should have mean zero. (The assertion to prove is false without this assumption.) With this assumption, Var(X ) = E[X 2 ]. Also note that Cov(X, Y ) = 0 is equivalent to E[XY ] = EX EY.
Part I
Background Material
and bn = n with > 0 shows the necessity for assuming right hand side of Eq. (1.2) is not of the form 0. Proof. The proofs of items 1. and 2. are left to the reader. Proof of Eq. (1.1). Let a := limn an and b = limn bn . Case 1., suppose b = in which case we must assume a > . In this case, for every M > 0, there exists N such that bn M and an a 1 for all n N and this implies an + bn M + a 1 for all n N. Since M is arbitrary it follows that an + bn as n . The cases where b = or a = are handled similarly. Case 2. If a, b R, then for every > 0 there exists N N such that |a an | and |b bn | for all n N.
1. If an bn for1 a.a. n then limn an limn bn . 2. If c R, limn (can ) = c limn an . 3. If {an + bn }n=1 is convergent and
n
(1.1)
Therefore, |a + b (an + bn )| = |a an + b bn | |a an | + |b bn | 2
provided the right side is not of the form . 4. {an bn }n=1 is convergent and
n
(1.2)
provided the right hand side is not of the for 0 of 0 () . Before going to the proof consider the simple example where an = n and bn = n with > 0. Then if < 1 0 if = 1 lim (an + bn ) = if > 1 while
n
lim an + lim bn = .
n
for all n N. Since n is arbitrary, it follows that limn (an + bn ) = a + b. Proof of Eq. (1.2). It will be left to the reader to prove the case where lim an and lim bn exist in R. I will only consider the case where a = limn an = 0 and limn bn = here. Let us also suppose that a > 0 (the case a < 0 is handled similarly) and let := min a 2 , 1 . Given any M < , there exists N N such that an and bn M for all n N and for this choice of N, an bn M for all n N. Since > 0 is xed and M is arbitrary it follows that limn (an bn ) = as desired. , let sup and inf denote the least upper bound and For any subset R greatest lower bound of respectively. The convention being that sup = if or is not bounded from above and inf = if or is not bounded from below. We will also use the conventions that sup = and inf = +.
is a sequence of numbers. Then Notation 1.3 Suppose that {xn }n=1 R
This shows that the requirement that the right side of Eq. (1.1) is not of form is necessary in Lemma 1.2. Similarly by considering the examples an = n
1
(1.3) (1.4)
Here we use a.a. n as an abreviation for almost all n. So an bn a.a. n i there exists N < such that an bn for all n N.
10
We will also write lim for lim inf n and lim for lim sup . Remark 1.4. Notice that if ak := inf {xk : k n} and bk := sup{xk : k n}, then {ak } is an increasing sequence while {bk } is a decreasing sequence. and Therefore the limits in Eq. (1.3) and Eq. (1.4) always exist in R lim inf xn = sup inf {xk : k n} and
n n n
a inf {ak : k N } sup{ak : k N } a + , i.e. a ak a + for all k N. Hence by the denition of the limit, limk ak = a. If lim inf n an = , then we know for all M (0, ) there is an integer N such that M inf {ak : k N } and hence limn an = . The case where lim sup an = is handled simin
The following proposition contains some basic properties of liminfs and limsups.
Proposition 1.5. Let {an } n=1 and {bn }n=1 be two sequences of real numbers. Then
larly. exists. If A R, then for Conversely, suppose that limn an = A R every > 0 there exists N () N such that |A an | for all n N (), i.e. A an A + for all n N (). From this we learn that A lim inf an lim sup an A + .
n n
3. (1.5)
i.e. that A = lim inf n an = lim sup an . If A = , then for all M > 0
n
whenever the right side of this equation is not of the form . 4. If an 0 and bn 0 for all n N, then lim sup(an bn ) lim sup an lim sup bn ,
n n n
there exists N = N (M ) such that an M for all n N. This show that lim inf n an M and since M is arbitrary it follows that lim inf an lim sup an .
n n
(1.6)
The proof for the case A = is analogous to the A = case. Proposition 1.6 (Tonellis theorem for sums). If {akn }k,n=1 is any sequence of non-negative numbers, then
provided the right hand side of (1.6) is not of the form 0 or 0. Proof. Item 1. will be proved here leaving the remaining items as an exercise to the reader. Since inf {ak : k n} sup{ak : k n} n, lim inf an lim sup an .
n n
akn =
k=1 n=1 n=1 k=1
akn .
Here we allow for one and hence both sides to be innite. Proof. Let
K N N K
Now suppose that lim inf n an = lim sup an = a R. Then for all > 0,
n
M := sup
k=1 n=1
akn : K, N N
= sup
n=1 k=1
akn : K, N N
Page: 10
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
and L :=
akn .
k=1 n=1
Since
K K N
L=
k=1 n=1
akn = lim
N n=1
akn = lim
k=1 n=1
K N
lim
akn
k=1 n=1
and
K k=1
akn
k=1 n=1 k=1 n=1
akn
k=1 n=1
akn = L
and therefore taking the supremum of the left side of this inequality over K and N shows that M L. Thus we have shown
akn = M.
k=1 n=1
By symmetry (or by a similar argument), we also have that M and hence the proof is complete.
n=1
k=1
akn =
for an innite number of throws. 5. Suppose we release a perfume particle at location x R3 and follow its motion for all time, 0 t < . In this case, we might take, = C ([0, ) , R3 ) : (0) = x . Denition 2.3. An event is a subset of . Example 2.4. Suppose that = {0, 1} is the sample space for ipping a coin an innite number of times. Here n = 1 represents the fact that a head was thrown on the nth toss, while n = 0 represents a tail on the nth toss. Example 2.2. 1. The sample space for ipping a coin one time could be taken to be, = {0, 1} . 2. The sample space for ipping a coin N -times could be taken to be, = N {0, 1} and for ipping an innite number of times, = { = (1 , 2 , . . . ) : i {0, 1}} = {0, 1} . 3. If we have a roulette wheel with 40 entries, then we might take = {00, 0, 1, 2, . . . , 36} for one spin, = {00, 0, 1, 2, . . . , 36} for N spins, and = {00, 0, 1, 2, . . . , 36}
N N N N
1. A = { : 3 = 1} represents the event that the third toss was a head. 2. A = i=1 { : i = i+1 = 1} represents the event that (at least) two heads are tossed twice in a row at some time. 3. A = N =1 nN { : n = 1} is the event where there are innitely many heads tossed in the sequence. 4. A = N =1 nN { : n = 1} is the event where heads occurs from some time onwards, i.e. A i there exists, N = N ( ) such that n = 1 for all n N. Ideally we would like to assign a probability, P (A) , to all events A . Given a physical experiment, we think of assigning this probability as follows. Run the experiment many times to get sample points, (n) for each n N, then try to dene P (A) by P (A) = lim
N
1 # {1 k N : (k ) A} . N
(2.1)
for an innite number of spins. 4. If we throw darts at a board of radius R, we may take = DR := (x, y ) R2 : x2 + y 2 R for one throw,
N = DR
That is we think of P (A) as being the long term relative frequency that the event A occurred for the sequence of experiments, { (k )}k=1 . Similarly supposed that A and B are two events and we wish to know how likely the event A is given that we now that B has occurred. Thus we would like to compute: P (A|B ) = lim
n
# {k : 1 k n and k A B } , # {k : 1 k n and k B }
14
which represents the frequency that A occurs given that we know that B has occurred. This may be rewritten as P (A|B ) = lim =
1 n # {k : 1 k n 1 n n # {k : 1 k
Example 2.7. The previous example suggests that if we ip a fair coin an innite N number of times, so that now = {0, 1} , then we should dene P ({ : (1 , . . . , k ) = }) =
k
and k A B } n and k B }
1 2k
(2.2)
P (A B ) . P (B )
Denition 2.5. If B is a non-null event, i.e. P (B ) > 0, dene the conditional probability of A given B by, P (A B ) P (A|B ) := . P (B ) There are of course a number of problems with this denition of P in Eq. (2.1) including the fact that it is not mathematical nor necessarily well dened. For example the limit may not exist. But ignoring these technicalities for the moment, let us point out three key properties that P should have. 1. P (A) [0, 1] for all A . 2. P () = 1 and P ( ) = 1. 3. Additivity. If A and B are disjoint event, i.e. A B = AB = , then P (A B ) = lim 1 # {1 k N : (k ) A B } N N 1 = lim [# {1 k N : (k ) A} + # {1 k N : (k ) B }] N N = P (A) + P (B ) .
for any k 1 and {0, 1} . Assuming there exists a probability, P : 2 [0, 1] such that Eq. (2.2) holds, we would like to compute, for example, the probability of the event B where an innite number of heads are tossed. To try to compute this, let An = { : n = 1} = {heads at time n} BN := nN An = {at least one heads at time N or later} and
B = N =1 BN = {An i.o.} = N =1 nN An .
Since
c c BN = nN Ac n M nN An = { : N = = M = 1} ,
1 0 as M . 2M N Therefore, P (BN ) = 1 for all N. If we assume that P is continuous under taking decreasing limits we may conclude, using BN B, that
c P (BN )
we see that
P (B ) = lim P (BN ) = 1.
N
Without this continuity assumption we would not be able to compute P (B ) . The unfortunate fact is that we can not always assign a desired probability function, P (A) , for all A . For example we have the following negative theorem. Theorem 2.8 (No-Go Theorem). Let S = {z C : |z | = 1} be the unit circle. Then there is no probability function, P : 2S [0, 1] such that P (S ) = 1, P is invariant under rotations, and P is continuous under taking decreasing limits. Proof. We are going to use the fact proved below in Lemma , that the continuity condition on P is equivalent to the additivity of P. For z S and N S let zN := {zn S : n N }, (2.3) that is to say ei N is the set N rotated counter clockwise by angle . By assumption, we are supposing that
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Example 2.6. Let us consider the tossing of a coin N times with a fair coin. In this case we would expect that every is equally likely, i.e. P ({ }) = 21 N . Assuming this we are then forced to dene P (A) = 1 # (A) . 2N
Observe that this probability has the following property. Suppose that k {0, 1} is a given sequence, then P ({ : (1 , . . . , k ) = }) = 1 1 2N k = k . N 2 2
That is if we ignore the ips after time k, the resulting probabilities are the same as if we only ipped the coin k times.
Page: 14
job: prob
(2.4)
a countable subgroup of S. As above R acts on S by rotations and divides S up into equivalence classes, where z, w S are equivalent if z = rw for some r R. Choose (using the axiom of choice) one representative point n from each of these equivalence classes and let N S be the set of these representative points. Then every point z S may be uniquely written as z = nr with n N and r R. That is to say S= (rN ) (2.5)
r R
where A is used to denote the union of pair-wise disjoint sets {A } . By Eqs. (2.4) and (2.5), 1 = P (S ) =
r R
P (rN ) =
r R
P (N ).
(2.6)
We have thus arrived at a contradiction, since the right side of Eq. (2.6) is either equal to 0 or to depending on whether P (N ) = 0 or P (N ) > 0. To avoid this problem, we are going to have to relinquish the idea that P should necessarily be dened on all of 2 . So we are going to only dene P on particular subsets, B 2 . We will developed this below.
Part II
Formal Development
3 Preliminaries
3.1 Set Operations
Let N denote the positive integers, N0 := N {0} be the non-negative integers and Z = N0 (N) the positive and negative integers including 0, Q the rational numbers, R the real numbers, and C the complex numbers. We will also use F to stand for either of the elds R or C. Notation 3.1 Given two sets X and Y, let Y X denote the collection of all functions f : X Y. If X = N, we will say that f Y N is a sequence with values in Y and often write fn for f (n) and express f as {fn }n=1 . If X = {1, 2, . . . , N }, we will write Y N in place of Y {1,2,...,N } and denote f Y N by f = (f1 , f2 , . . . , fN ) where fn = f (n). Notation 3.2 More generally if {X : A} is a collection of non-empty sets, let XA = X and : XA X be the canonical projection map dened
A
We also dene the symmetric dierence of A and B by A B := (B \ A) (A \ B ) . As usual if {A }I is an indexed collection of subsets of X we dene the union and the intersection of this collection by I A := {x X : I x A } and I A := {x X : x A I }. Notation 3.4 We will also write I A for I A in the case that {A }I are pairwise disjoint, i.e. A A = if = . Notice that is closely related to and is closely related to . For example let {An }n=1 be a sequence of subsets from X and dene
kn kn
inf An := kn Ak ,
as X A rather than XA . Recall that an element x XA is a choice function, i.e. an assignment x := x() X for each A. The axiom of choice states that XA = provided that X = for each A. Notation 3.3 Given a set X, let 2X denote the power set of X the collection of all subsets of X including the empty set. The reason for writing the power set of X as 2X is that if we think of 2 X meaning {0, 1} , then an element of a 2X = {0, 1} is completely determined by the set A := {x X : a(x) = 1} X. In this way elements in {0, 1} are in one to one correspondence with subsets of X. For A 2X let Ac := X \ A = {x X : x / A} and more generally if A, B X let B \ A := {x B : x / A} = A B c .
X
(One should read {An i.o.} as An innitely often and {An a.a.} as An almost always.) Then x {An i.o.} i N N n N and this may be expressed as {An i.o.} = N =1 nN An . Similarly, x {An a.a.} i N N which may be written as {An a.a.} = N =1 nN An . n N, x An x An
20
3 Preliminaries
Denition 3.5. Given a set A X, let 1A (x) = be the characteristic function of A. Lemma 3.6. We have: 1. {An i.o.} = {Ac n a.a.} , 2. lim sup An = {x X :
n c n=1
1 if x A 0 if x /A
1An (x) = } ,
2. (taking = f (X )) shows X is countable. Conversely if f : X N is injective let x0 X be a xed point and dene g : N X by g (n) = f 1 (n) for n f (X ) and g (n) = x0 otherwise. 4. Let us rst construct a bijection, h, from N to N N. To do this put the elements of N N into an array of the form (1, 1) (1, 2) (1, 3) . . . (2, 1) (2, 2) (2, 3) . . . (3, 1) (3, 2) (3, 3) . . . . . . .. . . . . . . . and then count these elements by counting the sets {(i, j ) : i + j = k } one at a time. For example let h (1) = (1, 1) , h(2) = (2, 1), h (3) = (1, 2), h(4) = (3, 1), h(5) = (2, 2), h(6) = (1, 3) and so on. If f : N X and g : N Y are surjective functions, then the function (f g ) h : N X Y is surjective where (f g ) (m, n) := (f (m), g (n)) for all (m, n) N N. 5. If A = then A is countable by denition so we may assume A = . With out loss of generality we may assume A1 = and by replacing Am by A1 if necessary we may also assume Am = for all m. For each m N let am : N Am be a surjective function and then dene f : N N m=1 Am by f (m, n) := am (n). The function f is surjective and hence so is the composition, f h : N m=1 Am , where h : N N N is the bijection dened above. N 6. Let us begin by showing 2N = {0, 1} is uncountable. For sake of N contradiction suppose f : N {0, 1} is a surjection and write f (n) as N (f1 (n) , f2 (n) , f3 (n) , . . . ) . Now dene a {0, 1} by an := 1 fn (n). By construction fn (n) = an for all n and so a / f (N) . This contradicts the assumption that f is surjective and shows 2N is uncountable. For the general case, since Y0X Y X for any subset Y0 Y, if Y0X is uncountable then so is Y X . In this way we may assume Y0 is a two point set which may as well be Y0 = {0, 1} . Moreover, since X is an innite set we may nd an injective map x : N X and use this to set up an injection, i : 2N 2X by setting i (A) := {xn : n N} X for all A N. If 2X were countable we could nd a surjective map f : 2X N in which case f i : 2N N would be surjective as well. However this is impossible since we have already seed that 2N is uncountable. We end this section with some notation which will be used frequently in the sequel. Notation 3.9 If f : X Y is a function and E 2Y let f 1 E := f 1 (E ) := {f 1 (E )|E E}. If G 2X , let f G := {A 2Y |f 1 (A) G}.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
3. 4. 5. 6.
lim inf n An = x X : n=1 1Ac (x) < , n supkn 1Ak (x) = 1kn Ak = 1supkn An , inf 1Ak (x) = 1kn Ak = 1inf kn Ak , 1lim sup An = lim sup 1An , and
n
7. 1lim inf n An = lim inf n 1An . Denition 3.7. A set X is said to be countable if is empty or there is an injective function f : X N, otherwise X is said to be uncountable. Lemma 3.8 (Basic Properties of Countable Sets). 1. If A X is a subset of a countable set X then A is countable. 2. Any innite subset N is in one to one correspondence with N. 3. A non-empty set X is countable i there exists a surjective map, g : N X. 4. If X and Y are countable then X Y is countable. 5. Suppose for each m N that Am is a countable subset of a set X, then A = m=1 Am is countable. In short, the countable union of countable sets is still countable. 6. If X is an innite set and Y is a set with at least two elements, then Y X is uncountable. In particular 2X is uncountable for any innite set X. Proof. 1. If f : X N is an injective map then so is the restriction, f |A , of f to the subset A. 2. Let f (1) = min and dene f inductively by f (n + 1) = min ( \ {f (1), . . . , f (n)}) . Since is innite the process continues indenitely. The function f : N dened this way is a bijection. 3. If g : N X is a surjective map, let f (x) = min g 1 ({x}) = min {n N : f (n) = x} . Then f : X N is injective which combined with item
Page: 20
job: prob
21
Denition 3.10. Let E 2X be a collection of sets, A X, iA : A X be the inclusion map (iA (x) = x for all x A) and EA =
1 i A (E )
Example 3.15. Here are some examples of algebras. 1. B = 2X , then B is a algebra. 2. B = {, X } is a algebra called the trivial eld. 3. Let X = {1, 2, 3}, then A = {, X, {1} , {2, 3}} is an algebra while, S := {, X, {2, 3}} is a not an algebra but is a system. Proposition 3.16. Let E be any collection of subsets of X. Then there exists a unique smallest algebra A(E ) and algebra (E ) which contains E . Proof. Simply take A(E ) := and (E ) := {M : M is a algebra such that E M}. {A : A is an algebra such that E A}
= {A E : E E} .
3.2 Exercises
Let f : X Y be a function and {Ai }iI be an indexed family of subsets of Y, verify the following assertions. Exercise 3.1. (iI Ai ) =
c
iI Ac i.
Exercise 3.2. Suppose that B Y, show that B \ (iI Ai ) = iI (B \ Ai ). Exercise 3.3. f 1 (iI Ai ) = iI f 1 (Ai ). Exercise 3.4. f 1 (iI Ai ) = iI f 1 (Ai ). Exercise 3.5. Find a counterexample which shows that f (C D) = f (C ) f (D) need not hold. Example 3.11. Let X = {a, b, c} and Y = {1, 2} and dene f (a) = f (b) = 1 and f (c) = 2. Then = f ({a} {b}) = f ({a}) f ({b}) = {1} and {1, 2} = c c f ({a} ) = f ({a}) = {2} .
Example 3.17. Suppose X = {1, 2, 3} and E = {, X, {1, 2}, {1, 3}}, see Figure 3.1. Then
A(E ) = (E ) = 2X . On the other hand if E = {{1, 2}} , then A (E ) = {, X, {1, 2}, {3}}. Exercise 3.6. Suppose that Ei 2X for i = 1, 2. Show that A (E1 ) = A (E2 ) i E1 A (E2 ) and E2 A (E1 ) . Similarly show, (E1 ) = (E2 ) i E1 (E2 ) and E2 (E1 ) . Give a simple example where A (E1 ) = A (E2 ) while E1 = E2 .
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
22
3 Preliminaries
Denition 3.18. Let X be a set. We say that a family of sets F 2X is a partition of X if distinct members of F are disjoint and if X is the union of the sets in F . Example 3.19. Let X be a set and E = {A1 , . . . , An } where A1 , . . . , An is a partition of X. In this case A(E ) = (E ) = {i Ai : {1, 2, . . . , n}} where i Ai := when = . Notice that # (A(E )) = #(2{1,2,...,n} ) = 2n . Example 3.20. Suppose that X is a nite set and that A 2X is an algebra. For each x X let Ax = {A A : x A} A, wherein we have used A is nite to insure Ax A. Hence Ax is the smallest set in A which contains x. Let C = Ax Ay A. I claim that if C = , then Ax = Ay . To see this, let us rst consider the case where {x, y } C. In this case we must have Ax C and Ay C and therefore Ax = Ay . Now suppose either x or y is not in C. For deniteness, say x / C, i.e. x / y. Then x Ax \ Ay A from which it follows that Ax = Ax \ Ay , i.e. Ax Ay = . k Let us now dene {Bi }i=1 to be an enumeration of {Ax }xX . It is now a straightforward exercise to show A = {i Bi : {1, 2, . . . , k }} . Proposition 3.21. Suppose that B 2X is a algebra and B is at most a countable set. Then there exists a unique nite partition F of X such that F B and every element B B is of the form B = {A F : A B } . In particular B is actually a nite set and # (B ) = 2n for some n N. Proof. We proceed as in Example 3.20. For each x X let Ax = {A B : x A} B , wherein we have used B is a countable algebra to insure Ax B. Just as above either Ax Ay = or Ax = Ay and therefore F = {Ax : x X } B is a (necessarily countable) partition of X for which Eq. (3.1) holds for all B B . Enumerate the elements of F as F = {Pn }N n=1 where N N or N = . If N = , then the correspondence (3.1)
a {0, 1} Aa = {Pn : an = 1} B is bijective and therefore, by Lemma 3.8, B is uncountable. Thus any countable algebra is necessarily nite. This nishes the proof modulo the uniqueness assertion which is left as an exercise to the reader. Example 3.22 (Countable/Co-countable Field). Let X = R and E := {{x} : x R} . Then (E ) consists of those subsets, A R, such that A is countable or Ac is countable. Similarly, A (E ) consists of those subsets, A R, such that A is nite or Ac is nite. More generally we have the following exercise. Exercise 3.7. Let X be a set, I be an innite index set, and E = {Ai }iI be a partition of X. Prove the algebra, A (E ) , and that algebra, (E ) , generated by E are given by A(E ) = {i Ai : I with # () < or # (c ) < } and (E ) = {i Ai : I with countable or c countable} respectively. Here we are using the convention that i Ai := when = . Proposition 3.23. Let X be a set and E 2X . Let E c := {Ac : A E} and Ec := E {X, } E c Then A(E ) := {nite unions of nite intersections of elements from Ec }. (3.2)
Proof. Let A denote the right member of Eq. (3.2). From the denition of an algebra, it is clear that E A A(E ). Hence to nish that proof it suces to show A is an algebra. The proof of these assertions are routine except for possibly showing that A is closed under complementation. To check A is closed under complementation, let Z A be expressed as
N K
Z=
i=1 j =1
Aij
Zc =
i=1 j =1
Bij =
j1 ,...,jN =1
(B1j1 B2j2 BN jN ) A
wherein we have used the fact that B1j1 B2j2 BN jN is a nite intersection of sets from Ec .
Page: 22
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
23
Remark 3.24. One might think that in general (E ) may be described as the countable unions of countable intersections of sets in E c . However this is in general false, since if
Proof. By Exercise 3.10, f 1 ( (E )) is a algebra and since E F , (E ) f 1 ( (E )). It now follows that (f 1 (E )) f 1 ( (E )).
Z=
i=1 j =1
Zc =
j1 =1,j2 =1,...jN =1,... =1
Ac,j
is a algebra which contains E and thus (E ) f f 1 (E ) . Hence for every B (E ) we know that f 1 (B ) f 1 (E ) , i.e. f 1 ( (E )) f 1 (E ) . Applying Eq. (3.3) with X = A and f = iA being the inclusion map implies
1 1 ( (E ))A = i A ( (E )) = (iA (E )) = (EA ).
which is now an uncountable union. Thus the above description is not correct. In general it is complicated to explicitly describe (E ), see Proposition 1.23 on page 39 of Folland for details. Also see Proposition 3.21. Exercise 3.8. Let be a topology on a set X and A = A( ) be the algebra generated by . Show A is the collection of subsets of X which may be written as nite union of sets of the form F V where F is closed and V is open. Solution to Exercise (3.8). In this case c is the collection of sets which are either open or closed. Now if Vi o X and Fj X for each j, then (n i=1 Vi ) m F is simply a set of the form V F where V X and F X. Therefore j o j =1 the result is an immediate consequence of Proposition 3.23. Denition 3.25. The Borel eld, B = BR = B (R) , on R is the smallest -eld containing all of the open subsets of R. Exercise 3.9. Verify the algebra, BR , is generated by any of the following collection of sets: 1. {(a, ) : a R} , 2. {(a, ) : a Q} or 3. {[a, ) : a Q} . Hint: make use of Exercise 3.6. Exercise 3.10. Suppose f : X Y is a function, F 2Y and B 2X . Show f 1 F and f B (see Notation 3.9) are algebras ( algebras) provided F and B are algebras ( algebras). Lemma 3.26. Suppose that f : X Y is a function and E 2Y and A Y then f 1 (E ) = f 1 ( (E )) and ( (E ))A = (EA ), (3.3) (3.4)
Example 3.27. Let E = {(a, b] : < a < b < } and B = (E ) be the Borel eld on R. Then E(0,1] = {(a, b] : 0 a < b 1} and we have B(0,1] = E(0,1] . In particular, if A B such that A (0, 1], then A E(0,1] . Denition 3.28. A function, f : Y is said to be simple if f ( ) Y is a nite set. If A 2 is an algebra, we say that a simple function f : Y is measurable if {f = y } := f 1 ({y }) A for all y Y. A measurable simple function, f : C, is called a simple random variable relative to A. Notation 3.29 Given an algebra, A 2 , let S(A) denote the collection of simple random variables from to C. For example if A A, then 1A S (A) is a measurable simple function. Lemma 3.30. For every algebra A 2 , the set simple random variables, S (A) , forms an algebra. Proof. Let us observe that 1 = 1 and 1 = 0 are in S (A) . If f, g S (A) and c C\ {0} , then {f + cg = } =
a,bC:a+cb=
({f = a} {g = b}) A
(3.5)
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
24
3 Preliminaries
and {f g = } =
a,bC:ab=
({f = a} {g = b}) A
(3.6)
from which it follows that f + cg and f g are back in S (A) . Denition 3.31. A simple function algebra, S, is a subalgebra of the bounded complex functions on X such that 1 S and each function, f S, is a simple function. If S is a simple function algebra, let A (S) := {A X : 1A S} . (It is easily checked that A (S) is a sub-algebra of 2X .) Lemma 3.32. Suppose that S is a simple function algebra, f S and f (X ) . Then {f = } A (S) . Proof. Let {i }i=0 be an enumeration of f (X ) with 0 = . Then
n 1 n n
3. If f : C is a simple function such that 1{f =} S for all C, then f = C 1{f =} S. Conversely, by Lemma 3.32, if f S then 1{f =} S for all C. Therefore, a simple function, f : X C is in S i 1{f =} S for all C. With this preparation, we are now ready to complete the verication. First o, A A (S (A)) 1A S (A) A A which shows that A (S (A)) = A. Similarly, f S (A (S)) {f = } A (S) C 1{f =} S C f S which shows S (A (S)) = S.
g :=
i=1
( i )
i=1
(f i 1) S.
Moreover, we see that g = 0 on n i=1 {f = i } while g = 1 on {f = } . So we have shown g = 1{f =} S and therefore that {f = } A. Exercise 3.11. Continuing the notation introduced above: 1. Show A (S) is an algebra of sets. 2. Show S (A) is a simple function algebra. 3. Show that the map A Algebras 2X S (A) {simple function algebras on X } is bijective and the map, S A (S) , is the inverse map. Solution to Exercise (3.11). 1. Since 0 = 1 , 1 = 1X S, it follows that and X are in A (S) . If A A (S) , then 1Ac = 1 1A S and so Ac A (S) . Finally, if A, B A (S) then 1AB = 1A 1B S and thus A B A (S) . 2. If f, g S (A) and c F, then {f + cg = } =
a,bF:a+cb=
({f = a} {g = b}) A
and {f g = } =
a,bF:ab=
({f = a} {g = b}) A
1. ( is monotone) (E ) (F ) if E F. 2. For A, B A, the following strong additivity formula holds; (A B ) + (A B ) = (A) + (B ) . 3. ( is nitely subbadditive) (n j =1 Ej ) 4. is sub-additive on A i
n j =1
(4.3)
(Ej ).
(E )
i=1 n
(Ei )
(A)
i=1
(Ai ) for A =
i=1
Ai
(4.4)
(E )
i=1 n i=1
(Ei )
(4.1)
where A A and {Ai }i=1 A are pairwise disjoint sets. 5. ( is countably superadditive) If A = i=1 Ai with Ai , A A, then
i=1
Ai
i=1
(Ai ) .
6. A nitely additive measure, , is a premeasure i is sub-additve. (Ei ) (4.2) Proof. 1. Since F is the disjoint union of E and (F \ E ) and F \ E = F E c A it follows that (F ) = (E ) + (F \ E ) (E ). 2. Since A B = [A \ (A B )] [B \ (A B )] A B,
(E ) =
i=1 n
whenever E = i=1 Ei E with Ei E for i = 1, 2, . . . , n < . 5. If E = A is an algebra, () = 0, and is nitely additive on A, then is said to be a nitely additive measure. 6. is additive (or countable additive) on E if item 4. holds even when n = . 7. If E = A is an algebra, () = 0, and is additive on A then is called a premeasure on A. 8. A measure is a premeasure, : B [0, ] , where B is a algebra. We say that is a probability measure if (X ) = 1.
(A B ) = (A B \ (A B )) + (A B ) = (A \ (A B )) + (B \ (A B )) + (A B ) . Adding (A B ) to both sides of this equation proves Eq. (4.3). j s are pair-wise disjoint and 3. Let Ej = Ej \ (E1 Ej 1 ) so that the E n E = j =1 Ej . Since Ej Ej it follows from the monotonicity of that (E ) = (Ej ) (Ej ).
26
4. If A = i=1 Bi with A A and Bi A, then A = i=1 Ai where Ai := Bi \ (B1 . . . Bi1 ) A and B0 = . Therefore using the monotonicity of and Eq. (4.4)
It is clear that 2 = 4 and that 3 = 5. To nish the proof we will show 5 = 2 and 5 = 3. 5 = 2. If An A such that An A A, then A \ An and therefore
n
(A)
i=1
(Ai )
i=1
(Bi ).
n
5. Suppose that A = i=1 Ai with Ai , A A, then i=1 Ai A for all n n and so by the monotonicity and nite additivity of , i=1 (Ai ) (A) . Letting n in this equation shows is superadditive. 6. This is a combination of items 5. and 6.
Proposition 4.3. Suppose that P is a nitely additive probability measure on an algebra, A 2 . Then the following are equivalent: 1. P is additive on 2. For all An A such 3. For all An A such 4. For all An A such 5. For all An A such A. that that that that An An An An A A, P (An ) P (A) . A A, P (An ) P (A) . , P (An ) 1. , P (An ) 1.
Remark 4.4. Observe that the equivalence of items 1. and 2. in the above proposition hold without the restriction that P ( ) = 1 and in fact P ( ) = may be allowed for this equivalence. Denition 4.5. Let (, B ) be a measurable space, i.e. B 2 is a algebra. A probability measure on (, B ) is a nitely additive probability measure, P : B [0, 1] such that any and hence all of the continuity properties in Proposition 4.3 hold. We will call (, B , P ) a probability space. Lemma 4.6. Suppose that (, B , P ) is a probability space, then P is countably sub-additive. Proof. Suppose that An B and let A1 := A1 and for n 2, let An := An \ (A1 . . . An1 ) B . Then
P ( n=1 An ) = P (n=1 An ) = n=1
Proof. We will start by showing 1 2 3. 1 = 2. Suppose An A such that An A A. Let An := An \ An1 with A0 := . Then {An }n=1 are disjoint, An = n k=1 Ak and A = k=1 Ak . Therefore,
n
P (An )
n=1
P (An ) .
P (A) =
k=1
P (Ak ) = lim
P (A) = lim P
N
N n=1 An
= lim
P (An ) =
n=1 n=1
P (An ) .
c 2 = 3. If An A such that An A A, then Ac n A and therefore, c lim (1 P (An )) = lim P (Ac n ) = P (A ) = 1 P (A) . n
Ac n
A and therefore we
Then P (A) :=
A
p ( ) for all A
denes a measure on 2 .
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 26
job: prob
27
Example 4.8. Suppose that X is any set and x X is a point. For A X, let x (A) = 1 if x A 0 if x / A.
(A) =
F A
() =
F
()1A
Then = x is a measure on X called the Dirac delta measure at x. Example 4.9. Suppose that is a measure on X and > 0, then is also a measure on X. Moreover, if {j }j J are all measures on X, then = j =1 j , i.e.
where 1A is one if A and zero otherwise. We may check that is a measure on B . Indeed, if A = i=1 Ai and F , then A i Ai for one and hence exactly one Ai . Therefore 1A = i=1 1Ai and hence
(A) =
F
()1A =
F
()
i=1
1Ai
(A) =
j =1
=
i=1 F
()1Ai =
i=1
(Ai )
is a measure on X. (See Section 3.1 for the meaning of this sum.) To prove this we must show that is countably additive. Suppose that {Ai }i=1 is a collection of pair-wise disjoint subsets of X, then
as desired. Thus we have shown that there is a one to one correspondence between measures on B and functions : F [0, ]. The following example explains what is going on in a more typical case of interest to us in the sequel. Example 4.12. Suppose that = R, A consists of those sets, A R which may be written as nite disjoint unions from S := {(a, b] R : a b } . We will show below the following:
( i=1 Ai )
(Ai ) =
i=1 i=1 j =1
j (Ai ) j ( i=1 Ai )
j =1
=
j =1 i=1
j (Ai ) = ( i=1 Ai )
wherein the third equality we used Theorem 1.6 and in the fourth we used that fact that j is a measure. Example 4.10. Suppose that X is a set : X [0, ] is a function. Then :=
xX
1. A is an algebra. (Recall that BR = (A) .) 2. To every increasing function, F : R [0, 1] such that F () := lim F (x) = 0 and
x x
(x)x
F (+) := lim F (x) = 1 there exists a nitely additive probability measure, P = PF on A such that
(x)
P ((a, b] R) = F (b) F (a) for all a b . 3. P is additive on A i F is right continuous. 4. P extends to a probability measure on BR i F is right continuous. Let us observe directly that if F (a+) := limxa F (x) = F (a) , then (a, a + 1/n] while P ((a, a + 1/n]) = F (a + 1/n) F (a) F (a+) F (a) > 0. Hence P can not be additive on A in this case.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
for all A X. Example 4.11. Suppose that F 2X is a countable or nite partition of X and B 2X is the algebra which consists of the collection of sets A X such that A = { F : A} . (4.5) Any measure : B [0, ] is determined uniquely by its values on F . Conversely, if we are given any function : F [0, ] we may dene, for A B ,
Page: 27
job: prob
28
1. If = 0, then E(f ) =
y C{}
y P (f = y ) =
y C{}
y P (f = y/)
yP (f = y ).
(4.6) =
z P (f = z ) = E(f ).
z C{}
Example 4.14. Suppose that A A, then E1A = 0 P (Ac ) + 1 P (A) = P (A) . (4.7)
Remark 4.15. Let us recall that our intuitive notion of P (A) was given as in Eq. (2.1) by 1 P (A) = lim # {1 k N : (k ) A} N N where (k ) was the result of the k th independent experiment. If we use this interpretation back in Eq. (4.6), we arrive at E(f ) =
y C
z P (f + g = z ) z P (a+b=z {f = a, g = b})
z C
= =
z C
z
a+b=z
yP (f = y ) = lim 1 N N 1 N N
N
1 N
y # {1 k N : f ( (k )) = y }
y C
=
z C a+b=z
= lim
y
y C N k=1
1f ((k))=y = lim
1 N N
f ( (k )) 1f ((k))=y
k=1 y C
=
a,b
(a + b) P ({f = a, g = b}) .
= lim
k=1
a
b
P ({f = a, g = b})
Thus informally, Ef should represent the average of the values of f over many independent experiments. Proposition 4.16. The expectation operator, E = EP , satises: 1. If f S(A) and C, then E(f ) = E(f ). 2. If f, g S (A) , then E(f + g ) = E(g ) + E(f ). (4.9) 3. E is positive, i.e. E(f ) 0 if f is a non-negative measurable simple function. 4. For all f S (A) , |Ef | E |f | . (4.10)
Page: 28 job: prob
=
a
= and similarly,
(4.8)
Equation (4.9) is now a consequence of the last three displayed equations. 3. If f 0 then E(f ) = aP (f = a) 0.
a0
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
29
|| 1f =
1 1A = 1Ac =
n=1 M
1 Ac = n
n=1 k
and therefore, E |f | = E
C
= || 1f = =
C
(1)
k=0 M
|| E1f = =
C
|| P (f = ) max |f | . =
(1)
k=0
1An1 Ank
|| P (f = ) = E |f | . 1M = 1A = n=1 An
(1)
k=1
1An1 Ank .
(4.13)
Remark 4.17. Every simple measurable function, f : C, may be written as N f = j =1 j 1Aj for some j C and some Aj C. Moreover if f is represented this way, then
N N N
Taking expectations of this equation then gives Eq. (4.12). Remark 4.20. Here is an alternate proof of Eq. (4.13). Let and by relabeling the sets {An } if necessary, we may assume that A1 Am and / Am+1 AM for some 0 m M. (When m = 0, both sides of Eq. (4.13) are zero and so we will only consider the case where 1 m M.) With this notation we have
M
Ef = E
j =1
j 1 Aj =
j =1
j E1Aj =
j =1
j P (Aj ) .
Remark 4.18 (Chebyshevs Inequality). Suppose that f S(A), > 0, and p > 0, then P ({|f | }) = E 1|f | E Observe that |f | =
C p
(1)
k=1 m
|f | p 1|f | p E |f | . p || 1{f =}
|| p
(4.11)
=
k=1 m
(1)
k+1
=
k=1
(1)
m
k+1
m k
k nk
{f = } A as well.
=1
k=0
(1) (1)
m
m k
Lemma 4.19 (Inclusion Exclusion Formula). If An A for n = 1, 2, . . . , M such that M n=1 An < , then
M
= 1 (1 1)
= 1.
This veries Eq. (4.13) since 1M ( ) = 1. n=1 An Example 4.21 (Coincidences). Let be the set of permutations (think of card A) shuing), : {1, 2, . . . , n} {1, 2, . . . , n} , and dene P (A) := #( n! to be the uniform distribution (Haar measure) on . We wish to compute the probability of the event, B, that a random permutation xes some index i. To do this, let Ai := { : (i) = i} and observe that B = n i=1 Ai . So by the Inclusion Exclusion Formula, we have
macro: svmonob.cls date/time: 23-Feb-2007/15:20
M n=1 An =
k=1
(1)
(An1 Ank ) .
(4.12)
Proof. This may be proved inductively from Eq. (4.3). We will give a different and perhaps more illuminating proof here. Let A := M n=1 An . c M c Since Ac = M A = A , we have n n=1 n=1 n
Page: 29 job: prob
30
P (B ) =
k=1
(1)
P (Ai1 Aik ) .
4 2 = 6 3
Since P (Ai1 Aik ) = P ({ : (i1 ) = i1 , . . . , (ik ) = ik }) = and # {1 i1 < i2 < i3 < < ik n} = we nd P (B ) =
k=1 n
(1)
k=1
k+1
1 1 2 1 =1 + = k! 2 6 3
(n k )! n! n , k
k+1
and EN =
1 (3 + 1 + 1 + 0 + 0 + 1) = 1. 6
(1)
k+1
n (n k )! = k n!
(1)
k=1
1 . k!
P (B ) =
k=1
(1)
1 = e1 1 = 0.632. k!
Example 4.22. Continue the notation in Example 4.21. We now wish to compute the expected number of xed points of a random permutation, , i.e. how many cards in the shued stack have not moved on average. To this end, let X i = 1 Ai and observe that
n n
p ( ) for all A 2 .
(4.14)
N ( ) =
i=1
Xi ( ) =
i=1
1(i)=i = # {i : (i) = i} .
Exercise 4.1 (Simple Independence 1.). Suppose qi : [0, 1] are funcn tions such that qi () = 1 for i = 1, 2, . . . , n and If p ( ) = i=1 qi (i ) . Show for any functions, fi : R that
n n n
EP
i=1
fi (Xi ) =
i=1
EP [fi (Xi )] =
i=1
EQi fi
EN =
i=1
EXi =
i=1
P (Ai ) =
i=1
(n 1)! = 1. n!
where Qi ( ) =
qi () for all .
2 3 1 3 1 2
3 2 3 1 2 1
N ( ) 3 1 1 0 0 1
job: prob
Exercise 4.2 (Simple Independence 2.). Prove the converse of the previous exercise. Namely, if
n n
EP
i=1
fi (Xi ) =
i=1
EP [fi (Xi )]
(4.15)
for any functions, fi : R, then there exists functions qi : [0, 1] with n qi () = 1, such that p ( ) = i=1 qi (i ) .
macro: svmonob.cls date/time: 23-Feb-2007/15:20
31
Exercise 4.3 (A Weak Law of Large Numbers). Suppose that R n is a nite set, n N, = n , p ( ) = i=1 q (i ) where q : [0, 1] such that q () = 1, and let P : 2 [0, 1] be the probability measure dened as in Eq. (4.14). Further let Xi ( ) = i for i = 1, 2, . . . , n, := EXi , 2 2 := E (Xi ) , and Sn = 1. Show, =
Proof. Let x [0, 1] , = {0, 1} , q (0) = 1 x, q (1) = x, = n , and P Pn 1 n i=1 i Px ({ }) = q (1 ) . . . q (n ) = x i=1 i (1 x) . As above, let Sn =
1 n
1 (X1 + + Xn ) . n
q () and
2
( ) q () =
q () .
(4.16) Ex [f (Sn )] =
f
k=0
2. Show, ESn = . 3. Let ij = 1 if i = j and ij = 0 if i = j. Show E [(Xi ) (Xj )] = ij . 4. Using Sn may be expressed as,
1 n 2 n i=1 2
k n
n k nk x (1 x) = pn (x) . k
Hence we nd |pn (x) f (x)| = |Ex f (Sn ) f (x)| = |Ex [f (Sn ) f (x)]| Ex |f (Sn ) f (x)| = Ex [|f (Sn ) f (x)| : |Sn x| ] + Ex [|f (Sn ) f (x)| : |Sn x| < ] 2M Px (|Sn x| ) + () where M := max |f (y )| and
y [0,1]
E (Sn ) =
1 2 . n 1 2 . n2
5. Conclude using Eq. (4.17) and Remark 4.18 that P (|Sn | ) (4.18)
So for large n, Sn is concentrated near = EXi with probability approaching 1 for n large. This is a version of the weak law of large numbers. Exercise 4.4 (Bernoulli Random Variables). Let = {0, 1} , , X : R be dened by X (0) = 0 and X (1) = 1, x [0, 1] , and dene Q = x1 + (1 x) 0 , i.e. Q ({0}) = 1 x and Q ({1}) = x. Verify, (x) := EQ X = x and 2 (x) := EQ (X x) = (1 x) x 1/4. Theorem 4.23 (Weierstrass Approximation Theorem via Bernsteins Polynomials.). Suppose that f C ([0, 1] , C) and
n 2
() := sup {|f (y ) f (x)| : x, y [0, 1] and |y x| } is the modulus of continuity of f. Now by the above exercises, Px (|Sn x| ) and hence we may conclude that max |pn (x) f (x)| M + () 2n2 1 4n2 (see Figure 4.1)
x[0,1]
pn (x) :=
k=0
n f k
k n
xk (1 x)
nk
Then
n x[0,1]
lim
32
Proof. Let A denote the collection of sets which may be written as nite disjoint unions of sets from S . Clearly S A A(S ) so it suces to show A is an algebra since A(S ) is the smallest algebra containing S . By the properties of S , we know that , X A. Now suppose that Ai = F i F A where, for i = 1, 2, . . . , n, i is a nite collection of disjoint sets from S . Then
n n
Ai =
i=1 i=1 F i
=
(F1 ,,...,Fn )1 n
(F1 F2 Fn )
and this is a disjoint (you check) union of elements from S . Therefore A is closed under nite intersections. Similarly, if A = F F with being a nite collection of disjoint sets from S , then Ac = F F c . Since by assumption F c A for F S and A is closed under nite intersections, it follows that Ac A.
Fig. 4.1. Plots of Px (Sn = k/n) versus k/n for n = 100 with x = 1/4 (black), x = 1/2 (red), and x = 5/6 (green).
be as in Example Example 4.27. Let X = R and S := (a, b] R : a, b R 4.25. Then A(S ) may be described as being those sets which are nite disjoint unions of sets from S . Proposition 4.28 (Construction of Finitely Additive Measures). Suppose S 2X is a semi-algebra (see Denition 4.24) and A = A(S ) is the algebra generated by S . Then every additive function : S [0, ] such that () = 0 extends uniquely to an additive measure (which we still denote by ) on A. Proof. Since (by Proposition 4.26) every element A A is of the form A = i Ei for a nite collection of Ei S , it is clear that if extends to a measure then the extension is unique and must be given by (A) = (Ei ).
i
(4.19)
To prove existence, the main point is to show that (A) in Eq. (4.19) is well dened; i.e. if we also have A = j Fj with Fj S , then we must show (Ei ) =
i j
(Fj ).
(4.20) (Ei
Exercise 4.5. Let A 2X and B 2Y be semi-elds. Show the collection E := {A B : A A and B B} is also a semi-eld. Proposition 4.26. Suppose S 2X is a semi-eld, then A = A(S ) consists of sets which may be written as nite disjoint unions of sets from S .
i
But Ei = j (Ei Fj ) and the additivity of on S implies (Ei ) = Fj ) and hence (Ei ) =
i j
(Ei Fj ) =
i,j
(Ei Fj ).
Similarly,
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 32
job: prob
(Fj ) =
j i,j
(Ei Fj )
which combined with the previous equation shows that Eq. (4.20) holds. It is now easy to verify that extended to A as in Eq. (4.19) is an additive measure on A. Proposition 4.29. Let X = R, S be a semi-algebra S = {(a, b] R : a b }, (4.21)
and A = A(S ) be the algebra formed by taking nite disjoint unions of elements from S , see Proposition 4.26. To each nitely additive probability measures : [0, 1] such that A [0, ], there is a unique increasing function F : R F () = 0, F () = 1 and . ((a, b] R) = F (b) F (a) a b in R (4.22)
[0, 1] such that F () = 0, Conversely, given an increasing function F : R F () = 1 there is a unique nitely additive measure = F on A such that the relation in Eq. (4.22) holds. Proof. Given a nitely additive probability measure , let . F (x) := ((, x] R) for all x R Then F () = 1, F () = 0 and for b > a, F (b) F (a) = ((, b] R) ((, a]) = ((a, b] R) . [0, 1] as in the statement of the theorem is Conversely, suppose F : R given. Dene on S using the formula in Eq. (4.22). The argument will be completed by showing is additive on S and hence, by Proposition 4.28, has a unique extension to a nitely additive measure on A. Suppose that
n
(a, b] =
i=1
(ai , bi ].
By reordering (ai , bi ] if necessary, we may assume that a = a1 < b1 = a2 < b2 = a3 < < bn1 = an < bn = b. Therefore, by the telescoping series argument,
n n
[F (bi ) F (ai )] =
i=1
((ai , bi ] R).
So suppose A =
n=1
A=
k j =1
Ej with Ej S and An =
Theorem 5.4. To each function F : R [0, 1] satisfying properties 1. 4. in Lemma 5.3, there exists a unique probability measure, PF , on BR such that PF ((a, b]) = F (b) F (a) for all < a b < . Proof. The uniqueness assertion in the theorem is covered in Exercise 5.1 below. The existence portion of the Theorem follows from Proposition 5.7 and Theorem 5.19 below. Example 5.5 (Uniform Distribution). The function,
Ej = A Ej =
n=1
An Ej =
n=1 i=1
En,i Ej
(Ej )
n=1 i=1
(En,i Ej ) .
36
(A) =
j =1
(Ej )
j =1 n=1 i=1 Nn k
(En,i Ej )
Nn
II
n=1
o J n
n=1
n . J
=
n=1 i=1 j =1
(En,i Ej ) =
n=1 i=1
(En,i ) =
n=1
(An ) ,
which proves (using Proposition 4.2) the sub-additivity of on A. Now suppose that F : R R be an increasing function, F () := limx F (x) and = F be the nitely additive measure on (R, A) described in Proposition 4.29. If happens to be a premeasure on A, then, letting An = (a, bn ] with bn b as n , implies F (bn ) F (a) = ((a, bn ]) ((a, b]) = F (b) F (a). Since was an arbitrary sequence such that bn b, we have shown limyb F (y ) = F (b), i.e. F is right continuous. The next proposition shows the converse is true as well. Hence premeasures on A which are nite on bounded sets are in one to one correspondences with right continuous increasing functions which vanish at 0. Proposition 5.7. To each right continuous increasing function F : R R there exists a unique premeasure = F on A such that F ((a, b]) = F (b) F (a) < a < b < . Proof. As above, let F () := limx F (x) and = F be as in Proposition 4.29. Because of Proposition 5.6, to nish the proof it suces to show is sub-additive on S . First suppose that < a < b < , J = (a, b], Jn = (an , bn ] such that
{bn }n=1
F (b) F ( a) = (I )
n=1
n ) (J
n=1
n ). (J
n J (5.2)
=
n=1
(Jn ) +
n=1
n \ Jn ). (J
Given > 0, we may use the right continuity of F to choose bn so that n \ Jn ) = F ( (J bn ) F (bn ) 2n n N. Using this in Eq. (5.2) shows
(J ) = ((a, b])
n=1
(Jn ) +
J=
n=1
Jn . We wish to show
which veries Eq. (5.1) since > 0 was arbitrary. The hard work is now done but we still have to check the cases where a = or b = . For example, suppose that b = so that
(J )
n=1
(Jn ).
J = (a, ) =
n=1
Jn
IM := (a, M ] = J IM =
1 n=1
Jn IM
o J n
o n To see this, let c := sup x b : [ a, x] is nitely covered by J . If c < b, n=1 o o m for some m and there exists x J m such that [ then c J a, x] is nitely covered o o n n by , say by J . We would then have that J n=1 n=1 n=1 o m covers [a, c ] for all c J . But this contradicts the denition of c.
n o o
n o
o n J
n oN
n omax(m,N )
F (M ) F (a) = (IM )
n=1
(Jn IM )
n=1
(Jn ).
nitely
Page: 36
job: prob
37
((a, )) = F () F (a)
n=1
(Jn ). Therefore,
The other cases where a = and b R and a = and b = are handled similarly. Before continuing our development of the existence of measures, we will pause to show that measures are often uniquely determined by their values on a generating sub-algebra. This detour will also have the added benet of motivating Carathoedorys existence proof to be given below.
(C \ A) =
( i=1
[Ci \ A])
i=1
(Ci \ A)
i=1
(Ci \ Ai ) < .
Since C \ AN C \ A, it also follows that C \ AN < for suciently large N and this shows B = i=1 Bi B0 . Hence B0 is a sub- -algebra of B = (A) which contains A which shows B0 = B . Many theorems in the sequel will require some control on the size of a measure . The relevant notion for our purposes (and most purposes) is that of a nite measure dened next. Denition 5.11. Suppose X is a set, E B 2X and : B [0, ] is a function. The function is nite on E if there exists En E such that (En ) < and X = n=1 En . If B is a algebra and is a measure on B which is nite on B we will say (X, B , ) is a nite measure space. The reader should check that if is a nitely additive measure on an algebra, B , then is nite on B i there exists Xn B such that Xn X and (Xn ) < . Corollary 5.12 ( Finite Regularity Result). Theorem 5.10 continues to hold under the weaker assumption that : B [0, ] is a measure which is nite on A. Proof. Let Xn A such that n=1 Xn = X and (Xn ) < for all n.Since A B n (A) := (Xn A) is a nite measure on A B for each n, by Theorem 5.10, for every B B there exists Cn A such that B Cn and (Xn [Cn \ B ]) = n (Cn \ B ) < 2n . Now let C := n=1 [Xn Cn ] A and observe that B C and (C \ B ) = ( n=1 ([Xn Cn ] \ B ))
n=1
([Xn Cn ] \ B ) =
n=1
Applying this result to B c shows there exists D A such that B c D and (B \ Dc ) = (D \ B c ) < . So if we let A := Dc A , then A B C and (C \ A) = ([B \ A] [(C \ B ) \ A]) (B \ A) + (C \ B ) < 2 and the result is proved.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
38
Exercise 5.1. Suppose A 2X is an algebra and and are two measures on B = (A) . a. Suppose that and are nite measures such that = on A. Show = . b. Generalize the previous assertion to the case where you only assume that and are nite on A. Corollary 5.13. Suppose A 2X is an algebra and : B = (A) [0, ] is a measure which is nite on A. Then for all B B, there exists A A and C A such that A B C and (C \ A) = 0. Proof. By Theorem 5.10, given B B , we may choose An A and Cn A such that An B Cn and (Cn \ B ) 1/n and (B \ An ) 1/n. N By replacing AN by N n=1 An and CN by n=1 Cn , we may assume that An and Cn as n increases. Let A = An A and C = Cn A , then A B C and (C \ A) = (C \ B ) + (B \ A) (Cn \ B ) + (B \ An ) 2/n 0 as n .
Proposition 5.15. Let be a premeasure on an algebra A, then has a unique extension (still called ) to a function on A satisfying the following properties. 1. (Continuity) If An A and An A A , then (An ) (A) as n . 2. (Monotonicity) If A, B A with A B then (A) (B ) . 3. (Strong Additivity) If A, B A , then (A B ) + (A B ) = (A) + (B ) . (5.3)
( n=1 An )
n=1
(An ) .
(5.4)
5. ( - Additivity on A ) The function is countably additive on A . Proof. Let A, B be sets in A such that A B and suppose {An }n=1 and {Bn }n=1 are sequences in A such that An A and Bn B as n . Since Bm An An as m , the continuity of on A implies, (An ) = lim (Bm An ) lim (Bm ) .
m m
Exercise 5.2. Let B = BRn = ({open subsets of Rn }) be the Borel algebra on Rn and be a probability measure on B . Further, let B0 denote those sets B B such that for every > 0 there exists F B V such that F is closed, V is open, and (V \ F ) < . Show: 1. B0 contains all closed subsets of B . Hint: given a closed subset, F R and k N, let Vk := xF B (x, 1/k ) , where B (x, ) := {y Rn : |y x| < } . Show, Vk F as k . 2. Show B0 is a algebra and use this along with the rst part of this exercise to conclude B = B 0 . Hint: follow closely the method used in the rst step of the proof of Theorem 5.10. 3. Show for every > 0 and B B , there exist a compact subset, K Rn , such that K B and (B \ K ) < . Hint: take K := F {x Rn : |x| n} for some suciently large n.
n
(5.5)
Using this equation when B = A, implies, limn (An ) = limm (Bm ) whenever An A and Bn A. Therefore it is unambiguous to dene (A) by; (A) = lim (An )
n
for any sequence A such that An A. With this denition, the continuity of is clear and the monotonicity of follows from Eq. (5.5). Suppose that A, B A and {An }n=1 and {Bn }n=1 are sequences in A such that An A and Bn B as n . Then passing to the limit as n in the identity, (An Bn ) + (An Bn ) = (An ) + (Bn )
{An }n=1
proves Eq. (5.3). In particular, it follows that is nitely additive on A . Let {An }n=1 be any sequence in A and choose {An,i }i=1 A such that An,i An as i . Then we have,
N N
N n=1 An,N
n=1
(An,N )
n=1
(An )
n=1
(An ) .
(5.6)
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
39
(An ) = lim
n=1
Denition 5.17 (Measurable Sets). Suppose is a nite premeasure on an algebra A 2X . We say that B X is measurable if for all > 0 there exists A A and C A such that A B C and (C \ A) < . We will denote the collection of measurable subsets of X by B = B () . We also dene : B [0, (X )] by (B ) = inf { (C ) : B C A } . (5.8)
The previous two inequalities show is additive on A . Suppose is a nite premeasure on an algebra, A 2X , and A A A . Since A, Ac A and X = A Ac , it follows that (X ) = (A) + (Ac ) . From this observation we may extend to a function on A A by dening (A) := (X ) (Ac ) for all A A . (5.7)
Remark 5.18. If B B, > 0, A A and C A are such that A B C and (C \ A) < , then (A) (B ) (C ) and in particular, 0 (B ) (A) < , and 0 (C ) (B ) < . Indeed, if C A with B C , then A C and so by Lemma 5.16, (A) (C \ A) + (A) = (C ) from which it follows that (A) (B ) . The fact that (B ) (C ) follows directly from Eq. (5.8). Theorem 5.19 (Finite Premeasure Extension Theorem). Suppose is a nite premeasure on an algebra A 2X . Then B is a algebra on X which contains A and is a additive measure on B . Moreover, is the unique measure on B such that |A = . Proof. It is clear that A B and that B is closed under complementation. Now suppose that Bi B for i = 1, 2 and > 0 is given. We may then choose Ai Bi Ci such that Ai A , Ci A , and (Ci \ Ai ) < for i = 1, 2. Then with A = A1 A2 , B = B1 B2 and C = C1 C2 , we have A A B C A . Since C \ A = (C1 \ A) (C2 \ A) (C1 \ A1 ) (C2 \ A2 ) , it follows from the sub-additivity of that with (C \ A) (C1 \ A1 ) + (C2 \ A2 ) < 2. Since > 0 was arbitrary, we have shown that B B . Hence we now know that B is an algebra. Because B is an algebra, to verify that B is a algebra it suces to show that B = n=1 Bn B whenever {Bn }n=1 is a disjoint sequence in B . To prove B B , let > 0 be given and choose Ai Bi Ci such that Ai A , Ci A , and (Ci \ Ai ) < 2i for all i. Since the {Ai }i=1 are pairwise disjoint we may use Lemma 5.16 to show,
macro: svmonob.cls date/time: 23-Feb-2007/15:20
(5.9)
Lemma 5.16. Suppose is a nite premeasure on an algebra, A 2X , and has been extended to A A as described in Proposition 5.15 and Eq. (5.7) above. 1. If A A and An A such that An A, then (A) = limn (An ) . 2. is additive when restricted to A . 3. If A A and C A such that A C, then (C \ A) = (C ) (A) . Proof.
c 1. Since Ac n A A , by the denition of (A) and Proposition 5.15 it follows that
2. Suppose A, B A are disjoint sets and An , Bn A such that An A and Bn B, then An Bn A B and therefore, (A B ) = lim (An Bn ) = lim [ (An ) + (Bn ) (An Bn )]
n n
= (A) + (B ) wherein the last equality we have used Proposition 4.3. 3. By assumption, X = Ac C. So applying the strong additivity of on A in Eq. (5.3) with A Ac A and B C A shows (X ) + (C \ A) = (Ac C ) + (Ac C ) = (Ac ) + (C ) = (X ) (A) + (C ) .
Page: 39
job: prob
40
(Ci ) =
i=1 i=1
( (Ai ) + (Ci \ Ai ))
n n
Theorem 5.20. Suppose that is a nite premeasure on an algebra A. Then (B ) := inf { (C ) : B C A } B (A) (5.11) 2i . denes a measure on (A) and this measure is the unique extension of on A to a measure on (A) . Proof. Let {Xn }n=1 A be chosen so that (Xn ) < for all n and Xn X as n and let (5.10) n (A) := n (A Xn ) for all A A.
n i=1
(n i=1 Ai )
+
i=1
(Ci \ Ai ) (X ) +
i=1
(Ci ) (X ) + < .
i=1 n Let B = i=1 Bi , C := i=1 Ci A and for n N let A := n n Then A A B C A , C \ A A and
Ai A .
(C \ An )
i=1
(Ci \ Ai ) +
i=n+1
(Ci )
Each n is a premeasure (as is easily veried) on A and hence by Theorem 5.19 each n has an extension, n , to a measure on (A) . Since the measure n are increasing, := limn n is a measure which extends . The proof will be completed by verifying that Eq. (5.11) holds. Let B (A) , Bm = Xm B and > 0 be given. By Theorem 5.19, there exists Cm A such that Bm Cm Xm and (Cm \ Bm ) = m (Cm \ Bm ) < 2n . Then C := m=1 Cm A and
(C \ B ) (Ci ) as n .
m=1
(Cm \ B )
m=1
(Cm \ B )
m=1
(Cm \ Bm ) < .
+
i=n+1
Thus (B ) (C ) = (B ) + (C \ B ) (B ) + which, since > 0 is arbitrary, shows satises Eq. (5.11). The uniqueness of the extension is proved in Exercise 5.1. Example 5.21. If F (x) = x for all x R, we denote F by m and call m Lebesgue measure on (R, BR ) . 2i < .
i=1
Since > 0 is arbitrary, it follows that B B . Moreover by repeated use of Remark 5.18, we nd
| (B ) (An )| < +
i=n+1 n n
(Ci ) and
n n
(Bi ) (An ) =
i=1 i=1
[ (Bi ) (Ai )]
i=1
| (Bi ) (Ai )|
Theorem 5.22. Lebesgue measure m is invariant under translations, i.e. for B BR and x R, m(x + B ) = m(B ). (5.12) Moreover, m is the unique measure on BR such that m((0, 1]) = 1 and Eq. (5.12) holds for B BR and x R. Moreover, m has the scaling property m(B ) = || m(B ) (5.13)
(B )
i=1
(Bi ) < 2 +
i=n+1
(Ci )
where R, B BR and B := {x : x B }. (Bi ) 2. Proof. Let mx (B ) := m(x + B ), then one easily shows that mx is a measure on BR such that mx ((a, b]) = b a for all a < b. Therefore, mx = m by the uniqueness assertion in Exercise 5.1. For the converse, suppose that m is translation invariant and m((0, 1]) = 1. Given n N, we have
macro: svmonob.cls date/time: 23-Feb-2007/15:20
(B )
i=1
Since > 0 is arbitrary, we have shown (B ) = i=1 (Bi ) . This completes the proof that B is a - algebra and that is a measure on B .
Page: 40 job: prob
41
k1 k , ] = n k=1 n n
k1 1 + (0, ] . n n
Denition 5.24. A measure space (X, B , ) is complete if every subset of a null set is in B , i.e. for all F X such that F E B with (E ) = 0 implies that F B . Proposition 5.25 (Completion of a Measure). Let (X, B , ) be a measure space. Set N = N := {N X : F B such that N F and (F ) = 0} , := {A N : A B and N N } and B=B (A N ) := (A) for A B and N N , is a algebra, , see Fig. 5.1. Then B is a well dened measure on B is the which extends on B , and (X, B , unique measure on B ) is complete measure , is called the completion of B relative to and space. The -algebra, B , is called the completion of . . Let A B and N N and choose F B such Proof. Clearly X, B
1 = m((0, 1]) =
k=1 n
1 k1 + (0, ] n n
=
k=1
That is to say 1 ]) = 1/n. n l ]) = l/n for all l, n N and therefore by the translation Similarly, m((0, n invariance of m, m((0, m((a, b]) = b a for all a, b Q with a < b. Finally for a, b R such that a < b, choose an , bn Q such that bn b and an a, then (an , bn ] (a, b] and thus m((a, b]) = lim m((an , bn ]) = lim (bn an ) = b a,
n n
i.e. m is Lebesgue measure. To prove Eq. (5.13) we may assume that = 0 1 since this case is trivial to prove. Now let m (B ) := || m(B ). It is easily checked that m is again a measure on BR which satises m ((a, b]) = 1 m ((a, b]) = 1 (b a) = b a if > 0 and m ((a, b]) = || if < 0. Hence m = m.
1
m ([b, a)) = ||
(b a) = b a that N F and (F ) = 0. Since N c = (F \ N ) F c , (A N )c = Ac N c = Ac (F \ N F c ) = [Ac (F \ N )] [Ac F c ] is closed under where [Ac (F \ N )] N and [Ac F c ] B . Thus B complements. If Ai B and Ni Fi B such that (Fi ) = 0 then since Ai B and Ni Fi and (Ai Ni ) = (Ai ) (Ni ) B (Fi ) (Fi ) = 0. Therefore, B is a algebra. Suppose A N1 = B N2 with A, B B and N1 , N2 , N . Then A A N1 A N1 F2 = B F2 which shows that (A) (B ) + (F2 ) = (B ).
macro: svmonob.cls date/time: 23-Feb-2007/15:20
42
Similarly, we show that (B ) (A) so that (A) = (B ) and hence (A N ) := (A) is well dened. It is left as an exercise to show is a measure, i.e. that it is countable additive.
Theorem 5.27 (Kolmogorovs Extension Theorem I.). Continuing the notation above, every nitely additive probability measure, P : A [0, 1] , has a unique extension to a probability measure on (A) . Proof. From Theorem 5.19, it suces to show limn P (An ) = 0 whenever {An }n=1 A with An . However, by Lemma 5.26, if An A and An , we must have that An = for a.a. n and in particular P (An ) = 0 for a.a. n. This certainly implies limn P (An ) = 0. Given a probability measure, P : (A) [0, 1] and n N and (1 , . . . , n ) n , let pn (1 , . . . , n ) := P ({ : 1 = 1 , . . . , n = n }) . (5.15)
Exercise 5.4 (Consistency Conditions). If pn is dened as above, show: 1. p1 () = 1 and 2. for all n N and (1 , . . . , n ) n , pn (1 , . . . , n ) =
pn+1 (1 , . . . , n , ) .
Exercise 5.5 (Converse to 5.4). Suppose for each n N we are given functions, pn : n [0, 1] such that the consistency conditions in Exercise 5.4 hold. Then there exists a unique probability measure, P on (A) such that Eq. (5.15) holds for all n N and (1 , . . . , n ) n . Example 5.28 (Existence of iid simple R.V.s). Suppose now that q : [0, 1] is a function such that q () = 1. Then there exists a unique probability measure P on (A) such that, for all n N and (1 , . . . , n ) n , we have P ({ : 1 = 1 , . . . , n = n }) = q (1 ) . . . q (n ) . This is a special case of Exercise 5.5 with pn (1 , . . . , n ) := q (1 ) . . . q (n ) .
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
6 Random Variables
6.1 Measurable Functions
Denition 6.1. A measurable space is a pair (X, M), where X is a set and M is a algebra on X. To motivate the notion of a measurable function, suppose (X, M, ) is a measure space and f : X R+ is a function. Roughly speaking, we are going to dene f d as a certain limit of sums of the form,
X
Corollary 6.7. Suppose that (X, M) is a measurable space. Then the following conditions on a function f : X R are equivalent: 1. f is (M, BR ) measurable, 2. f 1 ((a, )) M for all a R, 3. f 1 ((a, )) M for all a Q, 4. f 1 ((, a]) M for all a R. Exercise 6.1. Prove Corollary 6.7. Hint: See Exercise 3.9. Exercise 6.2. If M is the algebra generated by E 2X , then M is the union of the algebras generated by countable subsets F E . Exercise 6.3. Let (X, M) be a measure space and fn : X R be a sequence of measurable functions on X. Show that {x : limn fn (x) exists in R} M. Exercise 6.4. Show that every monotone function f : R R is (BR , BR ) measurable. Denition 6.8. Given measurable spaces (X, M) and (Y, F ) and a subset A X. We say a function f : A Y is measurable i f is MA /F measurable. Proposition 6.9 (Localizing Measurability). Let (X, M) and (Y, F ) be measurable spaces and f : X Y be a function. 1. If f is measurable and A X then f |A : A Y is measurable. 2. Suppose there exist An M such that X = n=1 An and f |An is MAn measurable for all n, then f is M measurable. Proof. 1. If f : X Y is measurable, f 1 (B ) M for all B F and therefore 1 1 f | (B ) MA for all B F . A (B ) = A f
For this to make sense we will need to require f 1 ((a, b]) M for all a < b. Because of Corollary 6.7 below, this last condition is equivalent to the condition f 1 (BR ) M. Denition 6.2. Let (X, M) and (Y, F ) be measurable spaces. A function f : X Y is measurable of more precisely, M/F measurable or (M, F ) measurable, if f 1 (F ) M, i.e. if f 1 (A) M for all A F . Remark 6.3. Let f : X Y be a function. Given a algebra F 2 , the algebra M := f 1 (F ) is the smallest algebra on X such that f is (M, F ) - measurable . Similarly, if M is a - algebra on X then F = f M ={A 2Y |f 1 (A) M} is the largest algebra on Y such that f is (M, F ) - measurable. Example 6.4 (Characteristic Functions). Let (X, M) be a measurable space and 1 A X. Then 1A is (M, BR ) measurable i A M. Indeed, 1 A (W ) is either 1 c , X, A or A for any W R with 1A ({1}) = A. Example 6.5. Suppose f : X Y with Y being a nite set and F = 2 . Then f is measurable i f 1 ({y }) M for all y Y. Proposition 6.6. Suppose that (X, M) and (Y, F ) are measurable spaces and further assume E F generates F , i.e. F = (E ) . Then a map, f : X Y is measurable i f 1 (E ) M.
Y
44
6 Random Variables
2. If B F , then
1 1 f 1 (B ) = (B ) An = n=1 f n=1 f |An (B ).
Since each An M, MAn M and so the previous displayed equation shows f 1 (B ) M. The proof of the following exercise is routine and will be left to the reader. Proposition 6.10. Let (X, M, ) be a measure space, (Y, F ) be a measurable space and f : X Y be a measurable map. Dene a function : F [0, ] by (A) := (f 1 (A)) for all A F . Then is a measure on (Y, F ) . (In the future we will denote by f or f 1 and call f the push-forward of by f or the law of f under . Theorem 6.11. Given a distribution function, F : R [0, 1] let G : (0, 1) R be dened (see Figure 6.1) by, G (y ) := inf {x : F (x) y } . Then G : (0, 1) R is Borel measurable and G m = F where F is the unique measure on (R, BR ) such that F ((a, b]) = F (b) F (a) for all < a < b < . To give a formal proof of Eq. (6.1), G (y ) = inf {x : F (x) y } x0 , there exists xn x0 with xn x0 such that F (xn ) y. By the right continuity of F, it follows that F (x0 ) y. Thus we have shown {G x0 } (0, F (x0 )] (0, 1) . For the converse, if y F (x0 ) then G (y ) = inf {x : F (x) y } x0 , i.e. y {G x0 } . Indeed, y G1 ((, x0 ]) i G (y ) x0 . Observe that G (F (x0 )) = inf {x : F (x) F (x0 )} x0 and hence G (y ) x0 whenever y F (x0 ) . This shows that (0, F (x0 )] (0, 1) G1 ((0, x0 ]) . As a consequence we have G m = F . Indeed, (G m) ((, x]) = m G1 ((, x]) = m ({y (0, 1) : G (y ) x}) = m ((0, F (x)] (0, 1)) = F (x) .
Fig. 6.1. A pictorial denition of G.
Fig. 6.2. As can be seen from this picture, G (y ) x0 i y F (x0 ) and similalry, G (y ) x1 i y x1 .
See section 2.5.2 on p. 61 of Resnick for more details. Theorem 6.12 (Durrets Version). Given a distribution function, F : R [0, 1] let Y : (0, 1) R be dened (see Figure 6.3) by, Y (x) := sup {y : F (y ) < x} . Then Y : (0, 1) R is Borel measurable and Y m = F where F is the unique measure on (R, BR ) such that F ((a, b]) = F (b) F (a) for all < a < b < .
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Proof. Since G : (0, 1) R is a non-decreasing function, G is measurable. We also claim that, for all x0 R, that G1 ((0, x0 ]) = {y : G (y ) x0 } = (0, F (x0 )] R, see Figure 6.2.
Page: 44 job: prob
(6.1)
45
(G ) = f 1 g 1 (G ) f 1 (F ) M.
Denition 6.14 ( Algebras Generated by Functions). Let X be a set and suppose there is a collection of measurable spaces {(Y , F ) : A} and functions f : X Y for all A. Let (f : A) denote the smallest algebra on X such that each f is measurable, i.e.
1 (f : A) = ( f (F )).
Example 6.15. Suppose that Y is a nite set, F = 2Y , and X = Y N for some N N. Let i : Y N Y be the projection maps, i (y1, . . . , yN ) = yi . Then, as the reader should check, (1 , . . . , n ) = A N n : A n . Proposition 6.16. Assuming the notation in Denition 6.14 and additionally let (Z, M) be a measurable space and g : Z X be a function. Then g is (M, (f : A)) measurable i f g is (M, F )measurable for all A. Proof. () If g is (M, (f : A)) measurable, then the composition f g is (M, F ) measurable by Lemma 6.13. () Let
1 G = (f : A) = A f (F ) .
Proof. Since Y : (0, 1) R is a non-decreasing function, Y is measurable. Also observe, if y < Y (x) , then F (y ) < x and hence, F (Y (x) ) = lim F (y ) x.
y Y (x)
and so we have shown F (Y (x) ) x F (Y (x)) . We will now show {x (0, 1) : Y (x) y0 } = (0, F (y0 )] (0, 1) . (6.2) and therefore
For the inclusion , if x (0, 1) and Y (x) y0 , then x F (Y (x)) F (y0 ), i.e. x (0, F (y0 )] (0, 1) . Conversely if x (0, 1) and x F (y0 ) then (by denition of Y (x)) y0 Y (x) . From the identity in Eq. (6.2), it follows that Y is measurable and (Y m) ((, y0 )) = m Y
1
1 1 g 1 A f (F ) = A g 1 f (F ) M.
Hence
1 g 1 (G ) = g 1 A f (F ) 1 = (g 1 A f (F ) M
which shows that g is (M, G ) measurable. Denition 6.17. A function f : X Y between two topological spaces is Borel measurable if f 1 (BY ) BX . Proposition 6.18. Let X and Y be two topological spaces and f : X Y be a continuous function. Then f is Borel measurable.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Therefore, Law (Y ) = F as desired. Lemma 6.13 (Composing Measurable Functions). Suppose that (X, M), (Y, F ) and (Z, G ) are measurable spaces. If f : (X, M) (Y, F ) and g : (Y, F ) (Z, G ) are measurable functions then g f : (X, M) (Z, G ) is measurable as well.
Page: 45 job: prob
46
6 Random Variables
Proof. This is an application of Lemma 6.20 and Proposition 6.16. Corollary 6.22. Let (X, M) be a measurable space and f, g : X C be (M, BC ) measurable functions. Then f g and f g are also (M, BC ) measurable. Proof. Dene F : X C C, A : C C C and M : C C C by F (x) = (f (x), g (x)), A (w, z ) = w z and M (w, z ) = wz. Then A and M are continuous and hence (BC2 , BC ) measurable. Also F is (M, BC2 ) measurable since 1 F = f and 2 F = g are (M, BC ) measurable. Therefore A F = f g and M F = f g, being the composition of measurable functions, are also measurable. As an example of this material, let us give another proof of the existence of i.i.d. simple random variables see Example 5.28 above. Theorem 6.23 (Existence of i.i.d simple R.V.s). This Theorem has been moved to Theorem 7.22 below. (6.3) Corollary 6.24 (Independent variables on product spaces). This Corollary has been moved to Corollary 7.23 below. Lemma 6.25. Let C, (X, M) be a measurable space and f : X C be a (M, BC ) measurable function. Then F (x) := is measurable. Proof. Dene i : C C by i(z ) = For any open set V C we have i1 (V ) = i1 (V \ {0}) i1 (V {0}) Because i is continuous except at z = 0, i1 (V \ {0}) is an open set and hence in BC . Moreover, i1 (V {0}) BC since i1 (V {0}) is either the empty set or the one point set {0} . Therefore i1 (C ) BC and hence i1 (BC ) = i1 ( (C )) = (i1 (C )) BC which shows that i is Borel measurable. Since F = i f is the composition of measurable functions, F is also measurable. Remark 6.26. For the real case of Lemma 6.25, dene i as above but now take z to real. From the plot of i, Figure 6.26, the reader may easily verify that 1 i1 ((, a]) is an innite half interval for all a and therefore i is measurable. x
macro: svmonob.cls date/time: 23-Feb-2007/15:20
1 f (x)
(BY ) = f
( (Y )) = (f
(Y )) (X ) = BX .
Example 6.19. For i = 1, 2, . . . , n, let i : Rn R be dened by i (x) = xi . Then each i is continuous and therefore BRn /BR measurable. Lemma 6.20. Let E denote the collection of open rectangle in Rn , then BRn = (E ) . We also have that BRn = (1 , . . . , n ) and in particular, A1 An BRn whenever Ai BR for i = 1, 2, . . . , n. Therefore BRn may be described as the algebra generated by {A1 An : Ai BR } . Proof. Assertion 1. Since E BRn , it follows that (E ) BRn . Let E0 := {(a, b) : a, b Qn
n
a < b} ,
where, for a, b R , we write a < b i ai < bi for i = 1, 2, . . . , n and let (a, b) = (a1 , b1 ) (an , bn ) .
Since every open set, V Rn , may be written as a (necessarily) countable union of elements from E0 , we have V (E0 ) (E ) , i.e. (E0 ) and hence (E ) contains all open subsets of Rn . Hence we may conclude that BRn = (open sets) (E0 ) (E ) BRn . Assertion 2. Since each i is BRn /BR measurable, it follows that (1 , . . . , n ) BRn . Moreover, if (a, b) is as in Eq. (6.3), then (a, b) =
1 n i=1 i
if if
f (x) = 0 f (x) = 0
((ai , bi )) (1 , . . . , n ) .
if z = 0 0 if z = 0.
1 z
(Ai ) (1 , . . . , n ) = BRn .
Corollary 6.21. If (X, M) is a measurable space, then f = (f1 , f2 , . . . , fn ) : X Rn is (M, BRn ) measurable i fi : X R is (M, BR ) measurable for each i. In particular, a function f : X C is (M, BC ) measurable i Re f and Im f are (M, BR ) measurable.
Page: 46 job: prob
47
be a funcCorollary 6.28. Let (X, M) be a measurable space and f : X R tion. Then the following are equivalent 1. f is (M, BR ) - measurable, 2. f 1 ((a, ]) M for all a R, 3. f 1 ((, a]) M for all a R, 4. f 1 ({}) M, f 1 ({}) M and f 0 : X R dened by f 0 (x) := 1R (f (x)) = is measurable. = R {} . When talking We will often deal with functions f : X R dened about measurability in this context we will refer to the algebra on R by (6.4) BR := ({[a, ] : a R}) . Proposition 6.27 (The Structure of BR be as above, then ). Let BR and BR : A R BR }. BR = {A R In particular {} , {} BR and BR BR . Proof. Let us rst observe that
c {} = , n=1 [, n) = n=1 [n, ] BR {} = n=1 [n, ] BR and R = R\ {} BR .
be functions Corollary 6.29. Let (X, M) be a measurable space, f, g : X R and (f + g ) : X R using the conventions, 0 = 0 and dene f g : X R and (f + g ) (x) = 0 if f (x) = and g (x) = or f (x) = and g (x) = . Then f g and f + g are measurable functions on X if both f and g are measurable. Exercise 6.5. Prove Corollary 6.28 noting that the equivalence of items 1. 3. is a direct analogue of Corollary 6.7. Use Proposition 6.27 to handle item 4. Exercise 6.6. Prove Corollary 6.29. Proposition 6.30 (Closure under sups, infs and limits). Suppose that (X, M) is a measurable space and fj : (X, M) R for j N is a sequence of M/BR measurable functions. Then supj fj , inf j fj , lim sup fj and lim inf fj
j j
(6.5)
[a, ] : a R [a, ] R : a R
i1 ([a, ]) : a R
= ({[a, ) : a R}) = BR .
are all M/BR measurable functions. (Note that this result is in generally false when (X, M) is a topological space and measurable is replaced by continuous in the statement.) Proof. Dene g+ (x) := sup j fj (x), then {x : g+ (x) a} = {x : fj (x) a j } = j {x : fj (x) a} M so that g+ is measurable. Similarly if g (x) = inf j fj (x) then {x : g (x) a} = j {x : fj (x) a} M. Since lim sup fj = inf sup {fj : j n} and
j j n
BR = i1 (BR ) = {A R : A BR }. This implies: 1. A BR A R BR and = is such that A R BR there exists B BR 2. if A R such that A R = B R. Because AB {} and {} , {} BR we may conclude that A BR as well. This proves Eq. (6.5). The proofs of the next two corollaries are left to the reader, see Exercises 6.5 and 6.6.
Page: 47 job: prob
48
6 Random Variables
let f+ (x) := max {f (x), 0} and Denition 6.31. Given a function f : X R f (x) := max (f (x), 0) = min (f (x), 0) . Notice that f = f+ f . is a Corollary 6.32. Suppose (X, M) is a measurable space and f : X R function. Then f is measurable i f are measurable. Proof. If f is measurable, then Proposition 6.30 implies f are measurable. Conversely if f are measurable then so is f = f+ f . Denition 6.33. Let (X, M) be a measurable space. A function : X F ) is a simple function if is M BF (F denotes either R, C or [0, ] R measurable and (X ) contains only nitely many elements. Any such simple functions can be written as
n
=
i=1
(6.6)
Indeed, take 1 , 2 , . . . , n to be an enumeration of the range of and Ai = 1 ({i }). Note that this argument shows that any simple function may be written intrinsically as = y 1 1 ( { y } ) . (6.7)
y F
2k+1 2k 2k then n (x) = n+1 (x) = 2n and if x if x f 1 ( 2n +1 , 2n+1 ] +1 2k 2k+1 2k+1 2k+2 1 f ( 2n+1 , 2n+1 ] then n (x) = 2n+1 < 2n+1 = n+1 (x). Similarly
(2n , ] = (2n , 2n+1 ] (2n+1 , ], and so for x f 1 ((2n+1 , ]), n (x) = 2n < 2n+1 = n+1 (x) and for x f 1 ((2n , 2n+1 ]), n+1 (x) 2n = n (x). Therefore n n+1 for all n. It is clear by construction that n (x) f (x) for all x and that 0 f (x) n (x) 2n if x X2n . Hence we have shown that n (x) f (x) for all x X and n f uniformly on bounded sets. For the second assertion, rst assume that f : X R is a measurable function and choose n to be simple functions such + that n f as n and dene n = n n . Then
+ |n | = + n + n n+1 + n+1 = |n+1 | + and clearly |n | = + n + n f+ + f = |f | and n = n n f+ f = f as n . Now suppose that f : X C is measurable. We may now choose simple function un and vn such that |un | |Re f | , |vn | |Im f | , un Re f and vn Im f as n . Let n = un + ivn , then 2 |n | = u2 n + vn |Re f | + |Im f | = |f | 2 2 2 2
The next theorem shows that simple functions are pointwise dense in the space of measurable functions. Theorem 6.34 (Approximation Theorem). Let f : X [0, ] be measurable and dene, see Figure 6.4,
n2n 1
n (x) :=
k=0 n2 1
n
k 1 1 k k+1 (x) + n1f 1 ((n2n ,]) (x) 2n f (( 2n , 2n ]) k 1 k k+1 (x) + n1{f >n2n } (x) 2n { 2n <f 2n }
=
k=0
then n f for all n, n (x) f (x) for all x X and n f uniformly on the sets XM := {x X : f (x) M } with M < . Moreover, if f : X C is a measurable function, then there exists simple functions n such that limn n (x) = f (x) for all x and |n | |f | as n . Proof. Since ( k k+1 2k 2k + 1 2k + 1 2k + 2 , n ] = ( n+1 , n+1 ] ( n+1 , n+1 ], n 2 2 2 2 2 2
job: prob
and n = un + ivn Re f + i Im f = f as n .
Page: 48
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
The following is an immediate corollary of Proposition 6.16 and Lemma 6.35. Corollary 6.36. Let X and A be sets, and suppose for A we are give a measurable space (Y , F ) and a function f : X Y . Let Y := A Y , F := A F be the product algebra on Y and M := (f : A) be the smallest algebra on X such that each f is measurable. Then the function F : X Y dened by [F (x)] := f (x) for each A is (M, F ) measurable is (M, BR and a function H : X R ) measurable i there exists a (F , BR ) such that H = h F. measurable function h from Y to R
7 Independence
7.1 and Monotone Class Theorems
Denition 7.1. Let C 2X be a collection of sets. 1. C is a monotone class if it is closed under countable increasing unions and countable decreasing intersections, 2. C is a class if it is closed under nite intersections and 3. C is a class if C satises the following properties: a) X C b) If A, B C and A B , then B \ A C . (Closed under proper dierences.) c) If An C and An A, then A C . (Closed under countable increasing unions.) Remark 7.2. If C is a collection of subsets of which is both a class and a system then C is a algebra. Indeed, since Ac = X \ A, we see that any - system is closed under complementation. If C is also a system, it is closed under intersections and therefore C is an algebra. Since C is also closed under increasing unions, C is a algebra. Lemma 7.3 (Alternate Axioms for a System*). Suppose that L 2 is a collection of subsets . Then L is a class i satises the following postulates: 1. X L 2. A L implies Ac L. (Closed under complementation.) 3. If {An }n=1 L are disjoint, the n=1 An L. (Closed under disjoint unions.) Proof. Suppose that L satises a. c. above. Clearly then postulates 1. and 2. hold. Suppose that A, B L such that A B = , then A B c and Ac B c = B c \ A L. Taking compliments of this result shows A B L as well. So by induction, m Bm := n=1 An L. Since Bm n=1 An it follows from postulate c. that n=1 An L. Now suppose that L satises postulates 1. 3. above. Notice that L and by postulate 3., L is closed under nite disjoint unions. Therefore if A, B L with A B, then B c L and A B c = allows us to conclude that A B c L. Taking complements of this result shows B \ A = Ac B L as well, i.e. postulate b. holds. If An L with An A, then Bn := An \ An1 L for all n, where by convention A0 = . Hence it follows by postulate 3 that n=1 An = n=1 Bn L. Theorem 7.4 (Dynkins Theorem). If L is a class which contains a contains a class, P , then (P ) L. Proof. We start by proving the following assertion; for any element C L, the collection of sets, LC := {D L : C D L} , is a system. To prove this claim, observe that: a. X LC , b. if A B with A, B LC , then A C, B C L with A C B \ C and (B \ A) C = [B C ] \ A = [B C ] \ [A C ] L. Therefore LC is closed under proper dierences. Finally, c. if An LC with An A, then An C L and An C A C L, i.e. A LC . Hence we have veried LC is still a system. For the rest of the proof, we may assume with out loss of generality that L is the smallest class containing P if not just replace L by the intersection of all classes containing P . Then for C P we know that LC L is a - class containing P and hence LC = L. Since C P was arbitrary, we have shown, C D L for all C P and D L. We may now conclude that if C L, then P LC L and hence again LC = L. Since C L is arbitrary, we have shown C D L for all C, D L, i.e. L is a system. So by Remark 7.2, L is a algebra. Since (P ) is the smallest algebra containing P it follows that (P ) L. As an immediate corollary, we have the following uniqueness result. Proposition 7.5. Suppose that P 2 is a system. If P and Q are two probability1 measures on (P ) such that P = Q on P , then P = Q on (P ) .
1
52
7 Independence
Proof. Let L := {A (P ) : P (A) = Q (A)} . One easily shows L is a class which contains P by assumption. Indeed, P L, if A, B L with A B, then P (B \ A) = P (B ) P (A) = Q (B ) Q (A) = Q (B \ A) so that B \ A L, and if An L with An A, then P (A) = limn P (An ) = limn Q (An ) = Q (A) which shows A L. Therefore (P ) L = (P ) and the proof is complete. Example 7.6. Let := {a, b, c, d} and let and be the probability measure 1 on 2 determined by, ({x}) = 1 4 for all x and ({a}) = ({d}) = 8 and ({b}) = ({c}) = 3/8. In this example, L := A 2 : P (A) = Q (A) is system which is not an algebra. Indeed, A = {a, b} and B = {a, c} are in L but A B / L. Exercise 7.1. Suppose that and are two measure on a measure space, (, B ) such that = on a system, P . Further assume B = (P ) and there exists n P such that; i) (n ) = (n ) < for all n and ii) n as n . Show = on B . Hint: Consider the measures, n (A) := (A n ) and n (A) = (A n ) . Solution to Exercise (7.1). Let n (A) := (A n ) and n (A) = (A n ) for all A B. Then n and n are nite measure such n ( ) = n ( ) and n = n on P . Therefore by Proposition 7.5, n = n on B . So by the continuity properties of and , it follows that (A) = lim (A n ) = lim n (A) = lim n (A) = lim (A n ) = (A)
n n n n
Corollary 7.9. The joint distribution, is uniquely determined from the knowledge of P ((X1 , . . . , Xn ) A1 An ) for all Ai BR or from the knowledge of P (X1 x1 , . . . , Xn xn ) for all Ai BR for all x = (x1 , . . . , xn ) Rn . Proof. Apply Proposition 7.5 with P being the systems dened by P := {A1 An BRn : Ai BR } for the rst case and P := {(, x1 ] (, xn ] BRn : xi R} for the second case. Denition 7.10. Suppose that {Xi }i=1 and {Yi }i=1 are two nite sequences of random variables on two probability spaces, (, B , P ) and (X, F , Q) respectively. We write (X1 , . . . , Xn ) = (Y1 , . . . , Yn ) if (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) have the same distribution, i.e. if P ((X1 , . . . , Xn ) B ) = Q ((Y1 , . . . , Yn ) B ) for all B BRn . More generally, if {Xi }i=1 and {Yi }i=1 are two sequences of random variables on two probability spaces, (, B , P ) and (X, F , Q) we write {Xi }i=1 = {Yi }i=1 i (X1 , . . . , Xn ) = (Y1 , . . . , Yn ) for all n N. Exercise 7.2. Let {Xi }i=1 and {Yi }i=1 be two sequences of random variables such that {Xi }i=1 = {Yi }i=1 . Let {Sn }n=1 and {Tn }n=1 be dened by, Sn := X1 + + Xn and Tn := Y1 + + Yn . Prove the following assertions. 1. Suppose that f : Rn Rk is a BRn /BRk measurable function, then f (X1 , . . . , Xn ) = f (Y1 , . . . , Yn ) . 2. Use your result in item 1. to show {Sn }n=1 = {Tn }n=1 . Hint: apply item 1. with k = n and a judiciously chosen function, f : Rn Rn . d d 3. Show lim sup Xn = lim sup Yn and similarly that lim inf n Xn =
n n d d d d d d n n
for all A B . Corollary 7.7. A probability measure, P, on (R, BR ) is uniquely determined by its distribution function, F (x) := P ((, x]) . Denition 7.8. Suppose that is a sequence of random variables on a 1 probability space, (, B , P ) . The measure, = P (X1 , . . . , Xn ) on BRn is called the joint distribution of (X1 , . . . , Xn ) . To be more explicit, (B ) := P ((X1 , . . . , Xn ) B ) := P ({ : (X1 ( ) , . . . , Xn ( )) B }) for all B BRn .
Page: 52 job: prob
n {Xi }i=1
53
lim sup Xn x
n
= {Xn x i.o.} ,
Exercise 7.3. Suppose that A 2 is an algebra, B := (A) , and P is a probability measure on B . Show, using the theorem, that for every B B there exists A A such that that P (A B ) < . Here A B := (A \ B ) (B \ A)
To use this identity you will also need to nd B BRm such that m k=n {Xk x} = {(X1 , . . . , Xm ) B } . 7.1.1 The Monotone Class Theorem This subsection may be safely skipped! Lemma 7.11 (Monotone Class Theorem*). Suppose A 2X is an algebra and C is the smallest monotone class containing A. Then C = (A). Proof. For C C let C (C ) = {B C : C B, C B c , B C c C},
c Bc then C (C ) is a monotone class. Indeed, if Bn C (C ) and Bn B, then Bn and so
= |1A 1B |
so that P (A B ) = E |1A 1B | . 2. Also observe that if B = Bi and A = i Ai , then B \ A i (Bi \ Ai ) i Ai A \ B i (Ai \ Bi ) i Ai so that A 3. We also have
c (B2 \ B1 ) \ (A2 \ A1 ) = B2 B1 (A2 \ A1 ) c c = B2 B1 (A2 Ac 1) c = B2 B1 (Ac 2 A1 ) c c = [B2 B1 Ac 2 ] [B2 B1 A1 ] (B2 \ A2 ) (A1 \ B1 ) c
Bi and Bi
B i (Ai
Bi ) .
C C C
C Bn C B c C Bn C B c and Bn C c B C c .
Since C is a monotone class, it follows that C B, C B c , B C c C , i.e. B C (C ). This shows that C (C ) is closed under increasing limits and a similar argument shows that C (C ) is closed under decreasing limits. Thus we have shown that C (C ) is a monotone class for all C C . If A A C , then A B, A B c , B Ac A C for all B A and hence it follows that A C (A) C . Since C is the smallest monotone class containing A and C (A) is a monotone class containing A, we conclude that C (A) = C for any A A. Let B C and notice that A C (B ) happens i B C (A). This observation and the fact that C (A) = C for all A A implies A C (B ) C for all B C . Again since C is the smallest monotone class containing A and C (B ) is a monotone class we conclude that C (B ) = C for all B C . That is to say, if A, B C then A C = C (B ) and hence A B, A B c , Ac B C . So C is closed under complements (since X A C ) and nite intersections and increasing unions from which it easily follows that C is a algebra.
and similarly, (A2 \ A1 ) \ (B2 \ B1 ) (A2 \ B2 ) (B1 \ A1 ) so that (A2 \ A1 ) (B2 \ B1 ) (B2 \ A2 ) (A1 \ B1 ) (A2 \ B2 ) (B1 \ A1 ) = (A1 B1 ) (A2 B2 ) .
5. Let L be the collection of sets B for which the assertion of the theorem holds. Show L is a system which contains A.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 53
job: prob
54
7 Independence
Solution to Exercise (7.3). Since L contains the system, A it suces by the theorem to show L is a system. Clearly, L since A L. If B1 B2 with Bi L and > 0, there exists Ai A such that P (Bi Ai ) = E |1Ai 1Bi | < /2 and therefore, P ((B2 \ B1 ) (A2 \ A1 )) P ((A1 P ((A1 B1 ) (A2 B2 )) B1 )) + P ((A2 B2 )) < . An ) < 2n
Proof. As mentioned above, we may always assume with out loss of generality that X Ci . Fix, Aj Cj for j = 2, 3, . . . , n. We will begin by showing that P (A A2 An ) = P (A) P (A2 ) . . . P (An ) for all A (C1 ) . (7.1)
Since it is clear that this identity holds if P (Aj ) = 0 for some j = 2, . . . , n, we may assume that P (Aj ) > 0 for j 2. In this case we may dene, Q (A) = P (A A2 An ) P (A A2 An ) = P (A2 ) . . . P (An ) P (A2 An ) = P (A|A2 An ) for all A (C1 ) .
P ([n Bn ]
[n An ])
n=1
P (Bn
An ) < .
AN < .
Then equation Eq. (7.1) is equivalent to P (A) = Q (A) on (C1 ) . But this is true by Proposition 7.5 using the fact that Q = P on the system, C1 . Since (A2 , . . . , An ) C2 Cn were arbitrary we may now conclude that (C1 ) , C2 , . . . , Cn are independent. By applying the result we have just proved to the sequence, C2 , . . . , Cn , (C1 ) shows that (C2 ) , C3 , . . . , Cn , (C1 ) are independent. Similarly we show inductively that (Cj ) , Cj +1 , . . . , Cn , (C1 ) , . . . , (Cj 1 ) are independent for each j = 1, 2, . . . , n. The desired result occurs at j = n. Denition 7.14. A collection of subsets of B , {Ct }tT is said to be independent i {Ct }t are independent for all nite subsets, T. More explicitly, we are requiring P (t At ) = P (At )
t
whenever is a nite subset of T and At Ct for all t . Corollary 7.15. If {Ct }tT is a collection of independent classes such that each Ct is a system, then { (Ct )}tT are independent as well. Example 7.16. Suppose that = n where is a nite set, B = 2 , P ({ }) = n j =1 qj (j ) where qj : [0, 1] are functions such that qj () = 1. n Let Ci := i1 A ni : A . Then {Ci }i=1 are independent. Indeed, if Bi := i1 Ai ni , then Bi = A1 A2 An and we have
for all Ai Ci and J {1, 2, . . . , n} . Observe that if {Ci }i=1 , are independent classes then so are {Ci {X }}i=1 . n Moreover, if we assume that X Ci for each i, then {Ci }i=1 , are independent i
n n n
P n j =1 Aj =
j =1
Theorem 7.13. Suppose that {Ci }i=1 is a nite sequence of independent n classes. Then { (Ci )}i=1 are also independent.
Page: 54 job: prob
P (Bi ) =
A1 A2 An i=1
qi (i ) =
i=1 Ai
qi ()
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
55
while P (Bi ) =
qi (i ) =
i1 Ai ni i=1 Ai
qi () .
k {Xj }j =1
with countable
Denition 7.17. A collections of random variables, {Xt : t T } are independent i { (Xt ) : t T } are independent. Theorem 7.18. Let X := {Xt : t T } be a collection of random variables. Then the following are equivalent: 1. The collection X, 2. P (t {Xt At }) =
t
P k j =1 {Xj = xj } =
j =1
P (Xj = xj )
(7.2)
for all xj R. Proof. Observe that both sides of Eq. (7.2) are zero unless xj is in the range of Xj for all j. Hence it suces to verify Eq. (7.2) for those xj Ran(Xj ) =: Rj k for all j. Now if {Xj }j =1 are independent, then {Xj = xj } (Xj ) for all xj R and therefore Eq. (7.2) holds. Conversely if Eq. (7.2) and Vj BR , then k P k j =1 {Xj Vj } = P j =1
xj Vj Rj
P (Xt At )
P (Xt xt )
{Xj = xj }
Qk
j =1
=P =
for all nite subsets, T, and all xt R for t . Proof. The equivalence of 1. and 2. follows almost immediately form the denition of independence and the fact that (Xt ) = {{Xt A} : A BR } . Clearly 2. implies 3. holds. Finally, 3. implies 2. is an application of Corollary 7.15 with Ct := {{Xt a} : a R} and making use the observations that Ct is a system for all t and that (Ct ) = (Xt ) . Example 7.19. Continue the notation of Example 7.16 and further assume that n R and let Xi : be dened by, Xi ( ) = i . Then {Xi }i=1 are independent random variables. Indeed, (Xi ) = Ci with Ci as in Example 7.16. Alternatively, from Exercise 4.1, we know that
n n
k j =1 {Xj = xj }
Vj Rj
(x1 ,...,xk )
(x1 ,...,xk )
Qk
j =1
P
Vj Rj k
k j =1 {Xj = xj }
=
k
(x1 ,...,xk )
Qk
P (Xj = xj )
k
j =1 j =1 Vj Rj
=
j =1 xj Vj Rj
P (Xj = xj ) =
j =1
P (Xj Vj ) .
EP
i=1
fi (Xi ) =
i=1
EP [fi (Xi )]
for all fi : R. Taking Ai and fi := 1Ai in the above identity shows that
n n
P (X1 A1 , . . . , Xn An ) = EP
i=1 n
1Ai (Xi ) =
i=1
EP [1Ai (Xi )]
Denition 7.21. As sequences of random variables, {Xn }n=1 , on a probability space, (, B , P ), are i.i.d. (= independent and identically distributed) if they are independent and (Xn ) P = (Xk ) P for all k, n. That is we should have P (Xn A) = P (Xk A) for all k, n N and A BR . Observe that {Xn }n=1 are i.i.d. random variables i
n n n
=
i=1
P (Xi Ai )
P (X1 A1 , . . . , Xn An ) =
j =1
P (Xi Ai ) =
j =1
P (X1 Ai ) =
j =1
(Ai ) (7.3)
as desired.
Page: 55 job: prob macro: svmonob.cls
date/time: 23-Feb-2007/15:20
56
7 Independence
where = (X1 ) P. The identity in Eq. (7.3) is to hold for all n N and all Ai BR . Theorem 7.22 (Existence of i.i.d simple R.V.s). Suppose that {qi }i=0 is a n sequence of positive numbers such that i=0 qi = 1. Then there exists a sequence {Xk }k=1 of simple random variables taking values in = {0, 1, 2 . . . , n} on ((0, 1], B , m) such that m ({X1 = i1 , . . . , Xk = ii }) = qi1 . . . qik for all i1 , i2 , . . . , ik {0, 1, 2, . . . , n} and all k N. Proof. For i = 0, 1, . . . , n, let 1 = 0 and j := interval, (a, b], let
j i=0 qi n
Ti ((a, b]) := (a + i1 (b a) , a + i (b a)]. Given i1 , i2 , . . . , ik {0, 1, 2, . . . , n}, let Ji1 ,i2 ,...,ik := Tik Tik1 (. . . Ti1 ((0, 1])) and dene {Xk }k=1 on (0, 1] by Xk :=
i1 ,i2 ,...,ik {0,1,2,...,n}
Fig. 7.1. Here we suppose that p0 = 2/3 and p1 = 1/3 and then we construct Jl and Jl,k for l, k {0, 1} .
see Figure 7.1. Repeated applications of Corollary 6.22 shows the functions, Xk : (0, 1] R are measurable. Observe that m (Ti ((a, b])) = qi (b a) = qi m ((a, b]) , and so by induction, m (Ji1 ,i2 ,...,ik ) = qik qik1 . . . qi1 . The reader should convince herself/himself that {X1 = i1 , . . . Xk = ii } = Ji1 ,i2 ,...,ik and therefore, we have m ({X1 = i1 , . . . , Xk = ii }) = m (Ji1 ,i2 ,...,ik ) = qik qik1 . . . qi1 as desired. (7.4)
Corollary 7.23 (Independent variables on product spaces). Suppose n = {0, 1, 2 . . . , n} , qi > 0 with = N , and for i=0 qi = 1, = i N, let Yi : R be dened by Yi ( ) = i for all . Further let B := (Y1 , Y2 , . . . , Yn , . . . ) . Then there exists a unique probability measure, P : B [0, 1] such that P ({Y1 = i1 , . . . , Yk = ii }) = qi1 . . . qik . Proof. Let {Xi }i=1 be as in Theorem 7.22 and dene T : (0, 1] by T (x) = (X1 (x) , X2 (x) , . . . , Xk (x) , . . . ) . Observe that T is measurable since Yi T = Xi is measurable for all i. We now dene, P := T m. Then we have P ({Y1 = i1 , . . . , Yk = ii }) = m T 1 ({Y1 = i1 , . . . , Yk = ii }) = m ({Y1 T = i1 , . . . , Yk T = ii }) = m ({X1 = i1 , . . . , Xk = ii }) = qi1 . . . qik .
n
Page: 56
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
57
Theorem 7.24. Given a nite subset, R and a function q : [0, 1] such that q () = 1, there exists a probability space, (, B , P ) and an independent sequence of random variables, {Xn }n=1 such that P (Xn = ) = q () for all . Proof. Use Corollary 7.20 to shows that random variables constructed in Example 5.28 or Theorem 7.22 t the bill. Proposition 7.25. Suppose that {Xn }n=1 is a sequence of i.i.d. random variables with distribution, P (Xn = 0) = P (Xn = 1) = 1 2 . If we let U := n 2 X , then P ( U x ) = (0 x ) 1 , i.e. U has the uniform distribution n n=1 on [0, 1] . Proof. Let us recall that P (Xn = 0 a.a.) = P (Xn = 1 a.a.) . Hence we may, by shrinking if necessary, assume that {Xn = 0 a.a.} = = {Xn = 1 a.a.} . With this simplication, we have 1 2 1 U< 4 1 3 U < 2 4 U< and hence that U< 3 4 = U< 1 2 1 3 U < 2 4 = {X1 = 0} , = {X1 = 0, X2 = 0} and = {X1 = 1, X2 = 0}
Since x U < x + 2(n+1) = n j =1 {Xj = j } {Xn+1 = 0} we see that P x U < x + 2(n+1) = 2(n+1) and hence P U < x + 2(n+1) = x + 2(n+1) which completes the induction argument. Since x P (U < x) is left continuous we may now conclude that P (U < x) = x for all x (0, 1) and since x x is continuous we may also deduce that P (U x) = x for all x (0, 1) . Hence we may conclude that P (U x) = (0 x) 1.
Lemma 7.26. Suppose that {Bt : t T } is an independent family of elds. And further assume that T = sS Ts and let BTs = tTs Bs = (tTs Bs ) . Then {BTs }sS is an independent family of elds. Proof. Let Cs = {K B : B B , K Ts } . It is now easily checked that {Cs }sS is an independent family of systems. Therefore {BTs = (Cs )}sS is an independent family of algebras. We may now show the existence of independent random variables with arbitrary distributions. U< 3 4 = 3 . 4 Theorem 7.27. Suppose that {n }n=1 are a sequence of probability measures on (R, BR ) . Then there exists a probability space, (, B , P ) and a sequence 1 {Yn }n=1 independent random variables with Law (Yn ) := P Yn = n for all n. Proof. By Theorem 7.24, there exists a sequence of i.i.d. random variables, {Zn }n=1 , such that P (Zn = 1) = P (Zn = 0) = 1 2 . These random variables may be put into a two dimensional array, {Xi,j : i, j N} , see the proof of Lemma 3.8. For each i, let Ui := j =1 2i Xi,j {Xi,j }j =1 measurable random variable. According to Proposition 7.25, Ui is uniformly distributed on [0, 1] . Moreover by the grouping Lemma 7.26, {Xi,j }j =1
= {X1 = 0} {X1 = 1, X2 = 0} . From these identities, it follows that P (U < 0) = 0, P U< 1 4 = 1 , P 4 U< 1 2 = 1 , and P 2
n j j =1 j 2
P (U < x) = x.
The proof is by induction on n. Indeed, we have already veried (7.5) when n = n 1, 2. Suppose we have veried (7.5) up to some n N and let x = j =1 j 2j and consider P U < x + 2(n+1) = P (U < x) + P x U < x + 2(n+1) =x+P xU <x+2
(n+1)
are independent
i=1
Page: 57
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
58
7 Independence
algebras and hence {Ui }i=1 is a sequence of i.i.d.. random variables with the uniform distribution. Finally, let Fi (x) := ((, x]) for all x R and let Gi (y ) = inf {x : Fi (x) y } . Then according to Theorem 6.11, Yi := Gi (Ui ) has i as its distribution. Moreover each Yi is {Xi,j }j =1 measurable and therefore the {Yi }i=1 are independent random variables. 7.2.1 An Example of Ranks Let {Xn }n=1 be i.i.d. with common continuous distribution function, F. In this case we have, for any i = j, that P (Xi = Xj ) = F F ({(x, x) : x R}) = 0. This may be proved directly with some work or will be an easy consequence of Fubinis theorem to be considered later, see Example 10.11 below. For the direct proof, let {al }l= be a sequence such that, al < al+1 for all l Z, liml al = and liml al = . Then {(x, x) : x R} lZ [(al , al+1 ] (al , al+1 ]]
must have X5 in the last slot, i.e. (, , , , X5 ) . Since R4 = 2, we know out of the remaining slots, X4 must be in the second from the far most right, i.e. (, , X4 , , X5 ) . Since R3 = 2, we know that X3 is again the second from the right of the remaining slots, i.e. we now know, (, X3 , X4 , , X5 ) . Similarly, R2 = 2 implies (X2 , X3 , X4 , , X5 ) and nally R1 = 1 gives, (X2 , X3 , X4 , X1 , X5 ) . As another example, if Ri = i for i = 1, 2, . . . , n, then Xn < Xn1 < < X1 . Theorem 7.28 (Renyi Theorem). Let {Xn }n=1 be i.i.d. and assume that F (x) := P (Xn x) is continuous. The {Rn }n=1 is an independent sequence, P (Rn = k ) = 1 for k = 1, 2, . . . , n, n
and the events, An = {Xn is a record} = {Rn = 1} are independent as n varies and 1 P (An ) = P (Rn = 1) = . n Proof. By Problem 6 on p. 110 of Resnick, (X1 , . . . , Xn ) and (X1 , . . . , Xn ) have the same distribution for any permutation . Since F is continuous, it now follows that up to a set of measure zero, = {X1 < X2 < < Xn }
[F (al+1 ) F (al )]
and therefore 1 = P ( ) =
Since F is continuous and F (+) = 1 and F () = 0, it is easily seen that l , we have F is uniformly continuous on R. Therefore, if we choose al = N P (Xi = Xj ) lim sup sup F
N lZ
l+1 N
l N
= 0.
Since P ({X1 < X2 < < Xn }) is independent of we may now conclude that 1 P ({X1 < X2 < < Xn }) = n! for all . As observed before the statement of the theorem, to each realization (1 , . . . , n ) , (here i N with i i) of (R1 , . . . , Rn ) there is a permutation, = (1 , . . . , n ) such that X1 < X2 < < Xn . From this it follows that {(R1 , . . . , Rn ) = (1 , . . . , n )} = {X1 < X2 < < Xn }
Rn :=
j =1
For example if (X1 , X2 , X3 , X4 , X5 , . . . ) = (9, 8, 3, 7, 23, . . . ) , we have R1 = 1, R2 = 2, R3 = 2, and R4 = 2, R5 = 1. Observe that rank order, from lowest to highest, of (X1 , X2 , X3 , X4 , X5 ) is (X2 , X3 , X4 , X1 , X5 ) . This can be determined by the values of Ri for i = 1, 2, . . . , 5 as follows. Since R5 = 1, we
Page: 58 job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
59
P ({Rn = n }) =
(1 ,...n1 )
=
(1 ,...n1 )
Figure 7.2 below serves as motivation for the following elementary lemma on convex functions.
P (An ) < ,
n=1
(7.6)
then P ({An i.o.}) = 0. Proof. First Proof. We have P ({An i.o.}) = P ( n=1 kn Ak ) = lim P (kn Ak ) lim
n n
P (Ak ) = 0.
kn
Fig. 7.2. A convex function, , along with a cord and a tangent line. Notice that the tangent line is always below and the cord lies above between the points of intersection of the cord with the graph of .
(7.7) Second Proof. (Warning: this proof require integration theory which is developed below.) Equation (7.6) is equivalent to
E
n=1
1An <
Lemma 7.31 (Convex Functions). Suppose that P C 2 ((a, b) R)2 with (x) 0 for almost all x (a, b) . Then satises; 1. for all x0 , x (a, b) , (x0 ) + (x0 ) (x x0 ) (x)
and
2
which is equivalent to P ({An i.o.}) = 0. Example 7.30. Suppose that {Xn } are Bernoulli random variables with P (Xn = 1) = pn and P (Xn = 0) = 1 pn . If pn <
Page: 59 job: prob
P C 2 denotes the space of piecewise C 2 functions, i.e. P C 2 ((a, b) R) means the is C 1 and there are a nite number of points, {a = a0 < a1 < a2 < < an1 < an = b} , such that |[aj 1 ,aj ](a,b) is C 2 for all j = 1, 2, . . . , n.
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
60
7 Independence
2. for all u v with u, v (a, b) , (u + t (v u)) (u) + t ( (v ) (u)) t [0, 1] . (This lemma applies to the functions, ex for all R, |x| for > 1, and ln x to name a few examples. See Appendix 11.7 below for much more on convex functions.) Proof. 1. Let f (x) := (x) [ (x0 ) + (x0 ) (x x0 )] . Then f (x0 ) = f (x0 ) = 0 while f (x) 0 a.e. and so by the fundamental theorem of calculus,
x
(y ) dy.
Hence it follows that f (x) 0 for x > x0 and f (x) 0 for x < x0 and therefore, f (x) 0 for all x (a, b) . 2. Let f (t) := (u) + t ( (v ) (u)) (u + t (v u)) . (t) = (v u)2 (u + t (v u)) 0 for almost Then f (0) = f (1) = 0 with f all t. By the mean value theorem, there exists, t0 (0, 1) such that f (t0 ) = 0 and then by the fundamental theorem of calculus it follows that
t
f (t) =
t0
( ) dt. f
Fig. 7.4. A graph of 1 x and e2x showing that 1 x e2x for all x [0, 1/2] .
In particular, f (t) 0 for t > t0 and f (t) 0 for t < t0 and hence f (t) f (1) = 0 for t t0 and f (t) f (0) = 0 for t t0 , i.e. f (t) 0. Example 7.32. Taking (x) := ex , we learn (see Figure 7.3), 1 x ex for all x R and taking (x) = e2x we learn that 1 x e2x for 0 x 1/2. (7.9) (7.8)
(1 an ) := lim
n=1
(1 an ) .
n=1
N n=1
(1 an ) = 0 i
n=1 n=1
an = .
(1 an )
n=1 n=1
ean = exp
n=1
an
Page: 60
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
61
(1 an ) exp
n=1 n=1
an
Making use of the independence of {Ak }k=1 and hence the independence of {Ac k }k=1 , we have P (mkn Ac k) =
mkn
Hence if n=1 an = then n=1 (1 an ) = 0. Conversely, suppose that n=1 an < . In this case an 0 as n and so there exists an m N such that an [0, 1/2] for all n m. With this notation we then have for N m that
N m N
P (Ac k) =
mkn
(1 P (Ak )) .
(7.12)
Using the simple inequality in Eq. (7.8) along with Eq. (7.12) shows
m
(1 an ) =
n=1 n=1 m
(1 an )
n=m+1 N
(1 an )
m N 2an
P (mkn Ac k)
mkn
eP (Ak ) = exp
k=n
n=1 m
(1 an )
n=m+1
=
n=1
(1 an ) exp 2
n=m+1
an
the
above
n=1
(1 an ) exp 2
n=m+1
an
as desired. Example 7.34 (Example 7.30 continued). Suppose that {Xn } are now independent Bernoulli random variables with P (Xn = 1) = pn and P (Xn = 0) = 1 pn . Then P (limn Xn = 0) = 1 i pn < . Indeed, P (limn Xn = 0) = 1 i P (Xn = 0 a.a.) = 1 i P (Xn = 1 i.o.) = 0 i pn = P (Xn = 1) < .
(1 an )
n=1 n=1
(1 an ) exp 2
n=m+1
an
> 0.
Lemma 7.33 (Second Borel-Cantelli Lemma). Suppose that {An }n=1 are independent sets. If
P (An ) = ,
n=1
Proposition 7.35 (Extremal behaviour of iid random variables). Sup pose that {Xn }n=1 is a sequence of i.i.d. random variables and cn is an increasing sequence of positive real numbers such that for all > 1 we have
then P ({An i.o.}) = 1. Combining this with the rst Borel Cantelli Lemma gives the (Borel) Zero-One law, 0 if n=1 P (An ) < . P (An i.o.) = 1 if n=1 P (An ) = Proof. We are going to prove Eq. (7.11) by showing,
c 0 = P ({An i.o.} ) = P ({Ac n a.a}) = P (n=1 kn Ak ) . c m c Since kn Ac k n=1 kn Ak as n and k=n Ak n=1 kn Ak as m , c
(7.13)
(7.14)
Xn = 1 a.s. cn
(7.15)
Proof. By the second Borel-Cantelli Lemma, Eq. (7.13) implies P Xn > 1 cn i.o. n = 1 from which it follows that
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 61
job: prob
62
7 Independence
lim sup
n
Xn 1 a.s.. cn
Example 7.37. Suppose now that {Xn }n=1 are i.i.d. distributed by the Poisson distribution with intensity, , i.e. P (X1 = k ) = = 1. In this case we have
Xn 1 cn
=P
1 Xn cn k
k e . k!
Similarly, by the rst Borel-Cantelli lemma, Eq. (7.14) implies P (Xn > cn i.o. n) = 0 or equivalently, P (Xn cn a.a. n) = 1. That is to say, lim sup
n
P (X1 n) = e
k=n
n k e k! n!
and
Xn a.s. cn Xn k cn
k=n
k n e = e k! n! n = e n!
k=n
n! kn k! n n! k e (k + n)! n!
k=0
k=0
n 1 k = . k! n!
lim sup
n
Xn 1 cn
=P
= 1.
Thus we have shown that n n e P (X1 n) . n! n! = 1. Thus in terms of convergence issues, we may assume that P (X1 x) x x x! 2xex xx
Xn =1 cn
=P
lim sup
n
Xn Xn 1 lim sup 1 cn n cn
Example 7.36. Let {En }n=1 be a sequence of independent random variables with exponential distributions determined by P (En > x) = e(x0) or P (En x) = 1 e(x0) . (Observe that P (En 0) = 0) so that En > 0 a.s.) Then for cn > 0 and > 0, we have
wherein we have used Stirlings formula, x! 2xex xx . Now suppose that we wish to choose cn so that P (X1 cn ) 1/n. This suggests that we need to solve the equation, xx = n. Taking logarithms of this equation implies that ln n x= ln x and upon iteration we nd, x= = ln n ln
ln n ln x
P (En > cn ) =
n=1 n=1
cn
=
n=1
cn
P (En > ln n) =
n=1 n=1
1 n
=
2
ln n = (n) 2 (x)
3
ln n ( n ) 2 2
ln n ln x
En = 1 a.s. ln n
job: prob macro: svmonob.cls
2 (n)
ln n 3 (n) +
(x)
Page: 62
date/time: 23-Feb-2007/15:20
63
where k = ln ln ln. Since, x ln (n) , it follows that 3 (x) hence that ln (n) ln (n) 3 (n) x= = 1+O . 2 (n) + O ( 3 (n)) 2 (n) 2 (n) Thus we are lead to take cn := (cn )
cn ln(n) . 2 ( n)
(n) and
Let {Xn }n=1 be a sequence of random variables on a measurable space, (, B ) . Let Bn := (X1 , . . . , Xn ) , B := (X1 , X2 , . . . ) , Tn := (Xn+1 , Xn+2 , . . . ) , and T := n=1 Tn B . We call T the tail eld and events, A T , are called tail events. Example 7.38. Let Sn := X1 + + Xn and {bn }n=1 (0, ) such that bn . Here are some example of tail events and tail measurable random variables: 1. {
n=1
= exp (cn [ln + ln cn ]) ln (n) [ln + 2 (n) 3 (n)] 2 (n) ln 3 (n) + 1 ln (n) = exp 2 (n) = exp = n(1+n ())
Xn converges} T . Indeed,
Xk converges
k=1
=
k=n+1
Xk converges
Tn
where
for all n N. n 2. both lim sup Xn and lim inf n Xn are T measurable as are lim sup S bn
n n
Hence we have
Sn bn .
T and similarly,
Sn exists in R bn
lim sup
n
Sn Sn = lim inf n bn bn
ln(/e) 2 (n)
cn
=n
T.
4. limn
= 0 Tk for all k.
P (X1 cn ) = if < 1
n=1
Denition 7.39. Let (, B , P ) be a probability space. A eld, F B is almost trivial i P (F ) = {0, 1} , i.e. P (A) {0, 1} for all A F . is a random variable which is F meaLemma 7.40. Suppose that X : R such that X = c surable, where F B is almost trivial. Then there exists c R a.s.
Page: 63
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
64
7 Independence
Proof. Since {X = } and {X = } are in F , if P (X = ) > 0 or P (X = ) > 0, then P (X = ) = 1 or P (X = ) = 1 respectively. Hence, it suces to nish the proof under the added condition that P (X R) = 1. For each x R, {X x} F and therefore, P (X x) is either 0 or 1. Since the function, F (x) := P (X x) {0, 1} is right continuous, non-decreasing and F () = 0 and F (+) = 1, there is a unique point c R where F (c) = 1 and F (c) = 0. At this point, we have P (X = c) = 1. Proposition 7.41 (Kolmogorovs Zero-One Law). Suppose that P is a probability measure on (, B ) such that {Xn }n=1 are independent random variables. Then T is almost trivial, i.e. P (A) {0, 1} for all A T . Proof. Let A T B . Since A Tn for all n and Tn is independent of Bn , it follows that A is independent of n=1 Bn for all n. Since the latter set is a multiplicative set, it follows that A is independent of B = (Bn ) = n=1 Bn . But A B and hence A is independent of itself, i.e. P (A) = P (A A) = P (A) P (A) . Since the only x R, such that x = x2 is x = 0 or x = 1, the result is proved. In particular the tail events in Example 7.38 have probability either 0 or 1. Corollary 7.42. Keeping the assumptions in Proposition 7.41 and let {bn }n=1 (0, ) such that bn . Then lim sup Xn , lim inf n Xn ,
n
n lim sup S bn , and lim inf n
Proof. Let B0 := n=1 (X1 , X2 , . . . , Xn ) . Then B0 is an algebra and (B0 ) = B . By the regularity Theorem 5.10, for any B B and > 0, there exists An B0 such that An C (B0 ) , B C, and P (C \ B ) < . Since P (An B ) = P ([An \ B ] [B \ An ]) = P (An \ B ) + P (B \ An ) P (C \ B ) + P (B \ C ) < , for suciently large n, we have P (AB ) < where A = An B0 . Now suppose that B S , > 0, and A (X1 , X2 , . . . , Xn ) B0 such that P (AB ) < . Let : N N be the permutation dened by (j ) = j + n, (j + n) = j for j = 1, 2, . . . , n, and (j + 2n) = j + 2n for all j N. Since B = {(X1 , . . . , Xn ) B } = { : (1 , . . . , n ) B } for some B BRn , we have
1 T (B ) = { : ((T ( ))1 , . . . , (T ( ))n ) B } = { : (1 , . . . , n ) B } = { : (n+1 , . . . , n+n ) B } = {(Xn+1 , . . . , Xn+n ) B } (Xn+1 , . . . , Xn+n ) , 1 1 (B ) . (B ) are independent with P (B ) = P T it follows that B and T 2 1 Therefore P B T B = P (B ) . Combining this observation with the iden1 A , we nd tity, P (A) = P (A A) = P A T
Sn bn
are all constant almost surely. In particular, lim Sn n bn exists = 1 and in the
= 0 or P . for some c R
P (A) P (B )
1 1 = P A T A P B T B
1 1 E 1AT A 1B T B 1 1 = E 1A 1T A 1B 1T B
1 1 = E 1AT A 1B T B
Let us now suppose that := R = RN , Xn ( ) = n for all , and B := (X1 , X2 , . . . ) . We say a permutation (i.e. a bijective map on N), : N N is nite if (n) = n for a.a. n. Dene T : by T ( ) = (1 , 2 , . . . ) . Denition 7.43. The permutation invariant eld, S B, is the collec1 tion of sets, A B such that T (A) = A for all nite permutations . In the proof below we will use the identities, 1A
B
1 1 1 = E [1A 1B ] 1T A + 1B 1T A 1T B 1 1 E |[1A 1B ]| + E 1T A 1T B
1 1 = P (AB ) + P T AT B < 2.
= |1A 1B | and P (A
B ) = E |1A 1B | .
< .
2
Proposition 7.44 (Hewitt-Savage Zero-One Law). Let P be a probability measure on (, B ) such that {Xn }n=1 is an i.i.d. sequence. Then S is almost trivial.
Page: 64 job: prob
Since > 0 was arbitrary, we may conclude that P (A) = P (A) for all A S .
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
Example 7.45 (Some Random Walk 0 1 Law Results). Continue the notation in Proposition 7.44. 1. As above, if Sn = X1 + + Xn , then P (Sn B i.o.) {0, 1} for all B BR . Indeed, if is a nite permutation,
1 T ({Sn B i.o.}) = {Sn T B i.o.} = {Sn B i.o.} .
Hence {Sn B i.o.} is in the permutation invariant eld. The same goes for {Sn B a.a.} 2. If P (X1 = 0) > 0, then lim sup Sn = a.s. or lim sup Sn = a.s. Indeed,
n n 1 T lim sup Sn x n
lim sup Sn T x
n
lim sup Sn x
n
which shows that lim sup Sn is S measurable. Therefore, lim sup Sn = c . Since, a.s., a.s. for some c R
n n n n n
c = lim sup Sn+1 = lim sup (Sn + X1 ) = lim sup Sn + X1 = c + X1 , we must have either c {} or X1 = 0 a.s. Since the latter is not allowed, lim sup Sn = or lim sup Sn = a.s.
n n
3. Now assume that P (X1 = 0) > 0 and X1 = X1 , i.e. P (X1 A) = P (X1 A) for all A BR . From item 2. we know that and from what we have already proved, we know lim sup Sn = c a.s. with c {} .
n
Since {Xn }n=1 and {Xn }n=1 are i.i.d. and Xn = Xn , it follows that {Xn }n=1 = {Xn }n=1 .The results of Exercise 7.2 then imply that lim sup Sn = lim sup (Sn ) and in particular lim sup (Sn ) = c a.s. as well.
n n n d d
Since the c = does not satisfy, c c, we must c = . Hence in this symmetric case we have shown, lim sup Sn = and lim sup (Sn ) = a.s.
n n
8 Integration Theory
In this chapter, we will greatly extend the simple integral or expectation which was developed in Section 4.3 above. Recall there that if (, B , ) was measurable space and f : [0, ] was a measurable simple function, then we let E f := (f = ) .
[0,]
f d = lim
n d n2n 1
k=0
= lim
k 2n
k k+1 <f 2n 2n
, integrable if it is measurable and We call a function, f : R . We will denote the space of integrable functions by L1 ()
Theorem 8.3 (Extension to integrable functions). The integral extends to a linear function from L1 () R. Moreover this extension is continuous under dominated convergence (see Theorem 8.34). That is if fn L1 () and there exists g L1 () such that |fn | g and f := limn fn exists pointwise, then f d = lim fn d = lim fn d.
n n
This integral has the following properties. 1. This integral is linear in the sense that (f + g ) d =
Notation 8.4 We write A f d := 1A f d for all A B where f is a measurable function such that 1A f is either non-negative or integrable. f d +
gd
Notation 8.5 If m is Lebesgue measure on BR , f is a non-negative Borel mea , we will often write b f (x) dx or surable function and a < b with a, b R a
b a
whenever f, g 0 are measurable functions and [0, ). 2. The integral is continuous under increasing limits, i.e. if 0 fn f, then f d =
n
f dm for
(a,b]R
f dm.
Example 8.6. Suppose < a < b < , f C ([a, b], R) and m be Lebesgue measure on R. Given a partition, = {a = a0 < a1 < < an = b}, let mesh( ) := max{|aj aj 1 | : j = 1, . . . , n} and f (x) :=
l=0 n1
lim fn d = lim
fn d.
See the monotone convergence Theorem 8.15 below. Remark 8.2. Given f : [0, ] measurable, we know from the approximation Theorem 6.34 n f where
n2 1
n
n :=
k=0
68
8 Integration Theory
b n1 n1
f dm =
a l=0
f (al ) (al+1 al )
is a Riemann sum. Therefore if {k }k=1 is a sequence of partitions with limk mesh(k ) = 0, we know that
b k b
and the latter expression, by the continuity of f, goes to zero as h 0 . This shows F = f on (a, b). For the converse direction, we have by assumption that G (x) = F (x) for x (a, b). Therefore by the mean value theorem, F G = C for some constant C. Hence
b
lim
fk dm =
a a
f (x) dx
(8.1)
a
where the latter integral is the Riemann integral. Using the (uniform) continuity of f on [a, b] , it easily follows that limk fk (x) = f (x) and that |fk (x)| g (x) := M 1(a,b] (x) for all x (a, b] where M := maxx[a,b] |f (x)| < . Since gdm = M (b a) < , we may apply D.C.T. to conclude, R
b k b b
We can use the above results to integrate some non-Riemann integrable functions: Example 8.8. For all > 0,
lim
fk dm =
a
a k
lim fk dm =
a
f dm. 1 dm(x) = . 1 + x2
f dm =
a a
whenever f C ([a, b], R), i.e. the Lebesgue and the Riemann integral agree on continuous functions. See Theorem 8.51 below for a more general statement along these lines. Theorem 8.7 (The Fundamental Theorem of Calculus). Suppose < x a < b < , f C ((a, b), R)L1 ((a, b), m) and F (x) := a f (y )dm(y ). Then 1. F C ([a, b], R) C 1 ((a, b), R). 2. F (x) = f (x) for all x (a, b). 3. If G C ([a, b], R) C 1 ((a, b), R) is an anti-derivative of f on (a, b) (i.e. f = G |(a,b) ) then
b
The proof of these identities are similar. By the monotone convergence theorem, Example 8.6 and the fundamental theorem of calculus for Riemann integrals (or Theorem 8.7 below),
N N
ex dm(x) = lim
0
ex dm(x) = lim
0
ex dx
0
1 x N e |0 = 1
N N
1 dm(x) = lim N 1 + x2
N N
1 dx 1 + x2
Proof. Since F (x) := R 1(a,x) (y )f (y )dm(y ), limxz 1(a,x) (y ) = 1(a,z) (y ) for m a.e. y and 1(a,x) (y )f (y ) 1(a,b) (y ) |f (y )| is an L1 function, it follows from the dominated convergence Theorem 8.34 that F is continuous on [a, b]. Simple manipulations show, x+h 1 x [f (y ) f (x)] dm(y ) if h > 0 F (x + h) F (x) f (x) = h |h| x [f (y ) f (x)] dm(y ) if h < 0
x+h
tan1 (N ) tan1 (N ) = .
(0,1]
1 dm(x) xp
1 1/n
1 |h|
if h > 0 if h < 0
1 xp+1 dx = lim n 1 p xp
if p < 1 if p > 1
Page: 68
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
69
If p = 1 we nd 1 dm(x) = lim n xp
1
1 n
(0,1]
x x n d d n ln 1 = ln 1 + ln fn (x) = dn dn n n 1 x x = ln 1 + n x = h (x/n) n 1 n where, for 0 y < 1, h (y ) := ln(1 y ) + Since h (0) = 0 and h (y ) = 1 1 y + + >0 1 y 1 y (1 y )2 y . 1y
x n
x n2
1 dm (x) = xp
if p 1 . 1 p1 if p > 1
lim
1
0 n
x n
dm(x) = 1.
it follows that h 0. Thus we have shown, fn (x) ex as n as claimed. Example 8.10 (Jordans Lemma). In this example, let us consider the limit;
n
x 1[0,n] (x). Then limn fn (x) = ex for all To verify this, let fn (x) := 1 n x 0 and by taking logarithms of Eq. (7.8),
lim
cos sin
0
en sin() d. n
en sin() .
for all x 0.
lim
cos sin
0
e
0 x
dm(x) = 1 < ,
en sin() d =
R
1{} () dm () = m ({ }) = 0.
is an integrable function on [0, ). Hence by the dominated conso that e vergence theorem,
n n
Exercise 8.2 (Folland 2.28 on p. 60.). Compute the following limits and justify your calculations: 1. lim 2.
x sin( n ) x n dx. (1+ 0 n) n 1 1+nx2 lim 2 n dx n 0 (1+x ) n sin(x/n) lim x(1+x2 ) dx n 0
lim
1
0
x n
dm(x) = lim =
0
fn (x)dm(x)
0
lim fn (x)dm(x) =
0
ex dm(x) = 1.
The limit in the above example may also be computed using the monotone convergence theorem. To do this we must show that n fn (x) is increasing in n for each x and for this it suces to consider n > x. But for n > x,
f (a) := lim
n(1 + n2 x2 )1 dx.
a
Now that we have an overview of the Lebesgue integral, let us proceed to the formal development of the facts stated above.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 69
job: prob
70
8 Integration Theory
2. Since { is simple and f } { is simple and g } , Eq. (8.2) follows from the denition of the integral. 1 3. Since 1{f } 1{f } 1 f f we have 1{f } 1{f } 1 f
p
1 f
Remark 8.12. Because of item 3. of Proposition 4.16, if is a non-negative simple function, X d = E so that X is an extension of E . Lemma 8.13. Let f, g L (B ) . Then: 1. if 0, then f d =
X X +
1{f } d
1{f } f p d
X
f p d.
X
4. If (f = ) > 0, then n := n1{f =} is a simple function such that n f for all n and hence n (f = ) = E (n ) f d
X X
f d f d = . X gd.
X
(8.2)
for all n. Letting n shows X f d = . Thus if (f = ) = 0. Moreover, {f > 0} = n=1 {f > 1/n} with (f > 1/n) n
X
f d < then
f d.
X
(8.3)
Lemma 8.14 (Sums as Integrals). Let X be a set and : X [0, ] be a function, let = xX (x)x on B = 2X , i.e. (A) =
xA
The inequality in Eq. (8.3) is called Chebyshevs Inequality for p = 1 and Markovs inequality for p = 2. 4. If X f d < then (f = ) = 0 (i.e. f < a.e.) and the set {f > 0} is nite. Proof. 1. We may assume > 0 in which case, f d = sup {E : is simple and f }
X
(x).
f .
Proof. Suppose that : X [0, ) is a simple function, then = z [0,) z 1{=z } and =
X xX
(x)
z [0,)
z 1{=z} (x) =
z [0,)
z
xX
(x)1{=z} (x)
f d.
=
z [0,)
z({ = z }) =
X
d.
Page: 70
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
71
fn E [1Xn ] = E [1Xn ] . Then using the continuity of under increasing unions, lim E [1Xn ] = lim 1Xn
y>0
(8.5)
f .
Taking the sup over in this last equation then shows that f d
X X
y 1{=y}
f .
= lim =
y(Xn { = y })
y>0
For the reverse inequality, let X be a nite set and N (0, ). Set f N (x) = min {N, f (x)} and let N, be the simple function given by N, (x) := 1 (x)f N (x). Because N, (x) f (x), f =
X N
y lim (Xn { = y })
n
= N, =
X
y lim ({ = y }) = E []
y>0 n
N, d
X
f d.
f d.
This identity allows us to let n in Eq. (8.5) to conclude limn fn E [] and since (0, 1) was arbitrary we may further conclude,E [] limn fn . The latter inequality being true for all simple functions with f then implies that f lim
n
fn ,
which combined with Eq. (8.4) proves the theorem. f d. Corollary 8.16. If fn L+ is a sequence of functions then
fn = Theorem 8.15 (Monotone Convergence Theorem). Suppose fn L+ is a sequence of functions such that fn f (f is necessarily in L+ ) then fn f as n .
n=1 n=1 n=1
fn . fn < a.e.
In particular, if
n=1
fn < then
fn is increasing in n and
n
by choosing non-negative simple function n and n such that n f1 and n f2 . Then (n + n ) is simple as well and (n + n ) (f1 + f2 ) so by the monotone convergence theorem, (8.4) (f1 + f2 ) = lim = lim (n + n ) = lim n + lim n + f1 + n f2 .
lim
fn
f.
For the opposite inequality, let : X [0, ) be a simple function such that 0 f, (0, 1) and Xn := {fn } . Notice that Xn X and fn 1Xn and so by denition of fn ,
Page: 71 job: prob macro: svmonob.cls
n =
date/time: 23-Feb-2007/15:20
72
8 Integration Theory
N
fn and g =
1
fn , then gN g and so
Proof. If f = 0 a.e. and f is a simple function then = 0 a.e. This implies that (1 ({y })) = 0 for all y > 0 and hence X d = 0 and therefore f d = 0. Conversely, if f d = 0, then by (Lemma 8.13), X (f 1/n) n
fn := lim
n=1
fn = lim
n=1
fn
n=1
f d = 0 for all n.
= lim
gN =
g =:
n=1
fn .
Therefore, (f > 0) n=1 (f 1/n) = 0, i.e. f = 0 a.e. For the second assertion let E be the exceptional set where f > g, i.e. E := {x X : f (x) > g (x)}. By assumption E is a null set and 1E c f 1E c g everywhere. Because g = 1E c g + 1E g and 1E g = 0 a.e., gd = and similarly f d = 1E c gd + 1E gd = 1E c gd
Remark 8.17. It is in the proof of this corollary (i.e. the linearity of the integral) that we really make use of the assumption that all of our functions are measurable. In fact the denition f d makes sense for all functions f : X [0, ] not just measurable functions. Moreover the monotone convergence theorem holds in this generality with no change in the proof. However, in the proof of Corollary 8.16, we use the approximation Theorem 6.34 which relies heavily on the measurability of the functions to be approximated. Example 8.18. Suppose, = N, B := 2N , and (A) = # (A) for A is the counting measure on B . Then for f : N [0, ), the function
N
f d =
Corollary 8.20. Suppose that {fn } is a sequence of non-negative measurable functions and f is a measurable function such that fn f o a null set, then fn f as n .
fN () :=
n=1
f (n) 1{n}
Proof. Let E X be a null set such that fn 1E c f 1E c as n . Then by the monotone convergence theorem and Proposition 8.19, fn = fn 1E c f 1E c = f as n .
f d = lim
N
fN d = lim
N N
f (n) ({n})
n=1
= lim
f (n) =
n=1 n=1
f (n) . Lemma 8.21 (Fatous Lemma). If fn : X [0, ] is a sequence of measurable functions then lim inf fn lim inf
n n nk
Exercise 8.3. Suppose that n : B [0, ] are measures on B for n N. Also suppose that n (A) is increasing in n for all A B. Prove that : B [0, ] dened by (A) := limn n (A) is also a measure. Hint: use Example 8.18 and the monotone convergence theorem. Proposition 8.19. Suppose that f 0 is a measurable function. Then f d = 0 i f = 0 a.e. Also if f, g 0 are measurable functions such that X f g a.e. then f d gd. In particular if f = g a.e. then f d = gd.
fn
Proof. Dene gk := inf fn so that gk lim inf n fn as k . Since gk fn for all k n, gk and therefore
macro: svmonob.cls date/time: 23-Feb-2007/15:20
fn for all n k
Page: 72
job: prob
73
gk lim inf
fn for all k.
We may now use the monotone convergence theorem to let k to nd lim inf fn =
n k
1 d and n=1 An
lim gk =
MCT
lim
gk lim inf
fn .
(An ) =
n=1 X n=1
1An d
The following Lemma and the next Corollary are simple applications of Corollary 8.16. Lemma 8.22 (The First Borell Carntelli Lemma). Let (X, B , ) be a measure space, An B , and set
it suces to show
(8.6)
An .
n=1
If
n=1
x:
n=1
= i<j Ai Aj
and the latter set has measure 0 being the countable union of sets of measure zero. This proves Eq. (8.6) and hence the corollary. Example 8.24. Let {rn } n=1 be an enumeration of the points in Q [0, 1] and dene 1 f (x) = 2n | x rn | n=1 with the convention that 1 |x rn | Since, By Theorem 8.7,
1 0
xX:
n=1
1An (x) = .
>
n=1
(An ) =
n=1 X
1An d =
X n=1
1An d
implies that
n=1
= 5 if x = rn .
(Second Proof.) Of course we may give a strictly measure theoretic proof of this fact: (An i.o.) = lim
N nN
An (An )
1 |x rn |
dx =
nN
rn 1 1 dx + dx x r r x n n rn 0 rn = 2 x rn |1 1 rn rn rn 2 rn x|0 = 2 4,
(An ) < .
we nd
Corollary 8.23. Suppose that (X, B , ) is a measure space and {An }n=1 B is a collection of sets such that (Ai Aj ) = 0 for all i = j, then
f (x)dm(x) =
[0,1] n=1
2n
[0,1]
1 |x rn |
dx
n=1
2n 4 = 4 < .
( n=1 An ) =
n=1
(An ).
In particular, m(f = ) = 0, i.e. that f < for almost every x [0, 1] and this implies that
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 73
job: prob
74
8 Integration Theory
2n
n=1
1 |x rn |
This result is somewhat surprising since the singularities of the summands form a dense subset of [0, 1].
Proof. Let f, g L1 (; R) and a, b R. By modifying f and g on a null set, we may assume that f, g are real valued functions. We have af + bg L1 (; R) because |af + bg | |a| |f | + |b| |g | L1 (; R) . If a < 0, then (af )+ = af and (af ) = af+ so that af = a f + a f+ = a( f+ f ) = a f.
A similar calculation works for a > 0 and the case a = 0 is trivial so we have shown that af = a f. Now set h = f + g. Since h = h+ h ,
are two measurable functions, let f + g denote Convention: If f, g : X R such that h(x) = f (x) + g (x) the collection of measurable functions h : X R whenever f (x) + g (x) is well dened, i.e. is not of the form or + . We use a similar convention for f g. Notice that if f, g L1 (; R) and h1 , h2 f + g, then h1 = h2 a.e. because |f | < and |g | < a.e. Notation 8.26 (Abuse of notation) We will sometimes denote the integral f d by (f ) . With this notation we have (A) = (1A ) for all A B . X Remark 8.27. Since f |f | f+ + f , a measurable function f is integrable i L1 (; R) := |f | d < . Hence |f | d < .
X
If f, g L1 (; R) and f = g a.e. then f = g a.e. and so it follows from Proposition 8.19 that f d = gd. In particular if f, g L1 (; R) we may dene (f + g ) d = hd
X X
The monotonicity property is also a consequence of the linearity of the integral, the fact that f g a.e. implies 0 g f a.e. and Proposition 8.19. f d R f d gd for all f, g Denition 8.29. A measurable function f : X C is integrable if |f | d < . Analogously to the real case, let X L1 (; C) := f : X C : f is measurable and
X
|f | d < .
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
75
denote the complex valued integrable functions. Because, max (|Re f | , |Im f |) |f | 2 max (|Re f | , |Im f |) , |f | d < i |Re f | d + For f L (; C) dene f d = Re f d + i Im f d.
1
g
E E
(f g ) = 0
It is routine to show the integral is still linear on L1 (; C) (prove!). In the remainder of this section, let L1 () be either L1 (; C) or L1 (; R) . If A B and f L1 (; C) or f : X [0, ] is a measurable function, let f d :=
A X
for all E B . Taking E = {Re(f g ) > 0} and using 1E Re(f g ) 0, we learn that 0 = Re
E
(f g )d =
1A f d.
This implies that 1E = 0 a.e. which happens i ({Re(f g ) > 0}) = (E ) = 0. (8.7) Similar (Re(f g ) < 0) = 0 so that Re(f g ) = 0 a.e. Similarly, Im(f g ) = 0 a.e and hence f g = 0 a.e., i.e. f = g a.e. (c) = (b) is clear and so is (b) = (a) since f
E E
|f | d.
Proof. Start by writing X f d = Rei with R 0. We may assume that R = X f d > 0 since otherwise there is nothing to prove. Since R = ei
X
|f g | = 0.
f d =
X X
ei f d =
X
Re ei f d + i
X
Im ei f d,
Denition 8.32. Let (X, B , ) be a measure space and L1 () = L1 (X, B , ) denote the set of L1 () functions modulo the equivalence relation; f g i f = g a.e. We make this into a normed space using the norm f g
L1
|f g | d
L1
f d =
X X
Re e
f d
X
Re e
d
X
|f | d. and into a metric space using 1 (f, g ) = f g . Warning: in the future we will often not make much of a distinction between L1 () and L1 () . On occasion this can be dangerous and this danger will be pointed out when necessary.
Proposition 8.31. Let f, g L () , then 1. The set {f = 0} is nite, in fact {|f | for all n. 2. The following are equivalent a) E f = E g for all E B b) |f g | = 0
X 1 n}
{f = 0} and (|f |
1 n)
<
Remark 8.33. More generally we may dene Lp () = Lp (X, B , ) for p [1, ) as the set of measurable functions f such that |f | d <
X p
c) f = g a.e.
Page: 75
76
8 Integration Theory
Lp
|f | d
for f Lp ()
Proposition 8.35. Suppose that (, B , P ) is a probability space and {Zj }j =1 n are independent integrable random variables. Then j =1 Zj is also integrable and
n n
Lp )
E
j =1 n
Zj =
j =1
EZj .
n
Theorem 8.34 (Dominated Convergence Theorem). Suppose fn , gn , g L1 () , fn f a.e., |fn | gn L1 () , gn g a.e. and X gn d X gd. Then f L1 () and f d = lim
X h
Proof. By denition, {Zj }j =1 are independent i { (Zj )}j =1 are independent. Then as we have seen in a homework problem, E [1A1 . . . 1An ] = E [1A1 ] . . . E [1An ] when Ai (Zi ) for each i. By multi-linearity it follows that
fn d.
X 1
(In most typical applications of this theorem gn = g L () for all n.) Proof. Notice that |f | = limn |fn | limn |gn | g a.e. so that f L1 () . By considering the real and imaginary parts of f separately, it suces to prove the theorem in the case where f is real. By Fatous Lemma, (g f )d =
X
E [1 . . . n ] = E [1 ] . . . E [n ] whenever i are bounded (Zi ) measurable simple functions. By approximation by simple functions and the monotone and dominated convergence theorem, E [Y1 . . . Yn ] = E [Y1 ] . . . E [Yn ] whenever Yi is (Zi ) measurable and either Yi 0 or Yi is bounded. Taking Yi = |Zi | then implies that
n n
X n
(gn fn ) d
= lim =
X
gn d + lim inf
X n X
fn d
gd + lim inf
n X
fn d so that
n j =1
E
j =1
|Zj | =
j =1
E |Zj | <
gd
X X
f d
X
gd +
E
j =1
Zj 1|Zj |K =
j =1
E Zj 1|Zj |K .
fn d
X
f d lim inf
n X X
fn d. f d.
E
j =1
Zj = lim E
K j =1
Zj 1|Zj |K =
j =1 n j =1
lim E Zj 1|Zj |K =
j =1 n
EZj .
n X
Exercise 8.4. Give another proof of Proposition 8.30 by rst proving Eq. (8.7) with f being a simple function in which case the triangle inequality for complex numbers will do the trick. Then use the approximation Theorem 6.34 along with the dominated convergence Theorem 8.34 to handle the general case.
Corollary 8.36. Let {fn }n=1 L1 () be a sequence n=1 fn L1 () < , then n=1 fn is convergent a.e. and
fn
X n=1
d =
n=1 X
fn d.
Page: 76
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
77
n=1
fn is almost
N
Proof. By considering the real and imaginary parts of f separately, we may assume that f is real. Also notice that f (t, x) = lim n(f (t + n1 , x) f (t, x)) n t and therefore, for x f t (t, x) is a sequential limit of measurable functions and hence is measurable for all t J. By the mean value theorem,
|SN |
n=1
|fn |
n=1
|fn | L () .
fn
X n=1
d =
X N N
lim SN d = lim
SN d
X
(8.8)
= lim
fn d =
n=1 X n=1 X
fn d.
|f (t, x)| |f (t, x) f (t0 , x)| + |f (t0 , x)| g (x) |t t0 | + |f (t0 , x)| . This shows f (t, ) L1 () for all t J. Let G(t) :=
X
Example 8.37 (Integration of Power Series). Suppose R > 0 and is a sequence of complex numbers such that n=0 |an | rn < for all r (0, R). Then
{an }n=0
an xn
n=0
dm(x) =
n=0
an
xn dm(x) =
n=0
an
for all R < < < R. Indeed this follows from Corollary 8.36 since
tt0
| |
||
n=0
|an |
| |
n+1
+ || n+1
n+1
2r
n=0
|an | rn <
f (t, x) f (t0 , x) g (x) for all t J and x X. t t0 Therefore, we may apply the dominated convergence theorem to conclude lim G(tn ) G(t0 ) = lim n tn t0 f (tn , x) f (t0 , x) d(x) tn t0 X f (tn , x) f (t0 , x) = lim d(x) n tn t0 X f = (t0 , x)d(x) X t
where r = max(| | , ||). Corollary 8.38 (Dierentiation Under the Integral). Suppose that J R is an open interval and f : J X C is a function such that 1. x f (t, x) is measurable for each t J. 2. f (t0 , ) L1 () for some t0 J. 3. f t (t, x) exists for all (t, x). 4. There is a function g L1 () such that
f t (t, )
g for each t J.
Then f (t, ) L1 () for all t J (i.e. X |f (t, x)| d(x) < ), t f (t, x)d(x) is a dierentiable function on J and X d dt
Page: 77
(t0 ) = for all sequences tn J \ {t0 } such that tn t0 . Therefore, G G(t)G(t0 ) limtt0 exists and tt0 (t0 ) = G
X
f (t, x)d(x) =
X X
f (t, x)d(x). t
macro: svmonob.cls
f (t0 , x)d(x). t
job: prob
date/time: 23-Feb-2007/15:20
78
8 Integration Theory
f d.
(8.9)
Let > 0. For 2 > 0 and n N there exists Cn () < such that 0 d d
n
ex = xn ex C ()ex .
Hint: rst prove the relationship for characteristic functions, then for simple functions, and then for general positive measurable functions. 3. Show that a measurable function f : X C is in L1 ( ) i |f | L1 () and if f L1 ( ) then Eq. (8.9) still holds. Solution to Exercise (8.5). The fact that is a measure follows easily from Corollary 8.16. Clearly Eq. (8.9) holds when f = 1A by denition of . It then holds for positive simple functions, f, by linearity. Finally for general f L+ , choose simple functions, n , such that 0 n f. Then using MCT twice we nd f d = lim
n
d d
1 =
[0,)
d d
ex dm(x)
x e
n x
n x
dm(x). n d = lim
X X n
n d =
X
That is n! =
n [0,)
x e
X n
lim n d =
X
f d.
|f | d
(The reader should check that (t) < for all t > 0.) We have just shown that (n + 1) = n! for all n N. Remark 8.40. Corollary 8.38 may be generalized by allowing the hypothesis to hold for x X \ E where E B is a xed null set, i.e. E must be independent of t. Consider what happens if we formally apply Corollary 8.38 to g (t) := 1xt dm(x), 0 g (t) = d dt
0
f+ d
X
f d =
X
f+ d
X
f d
=
X
[f+ f ] d =
X
f d.
1xt dm(x) =
0
1xt dm(x). t
The complex case easily follows from this identity. Notation 8.41 It is customary to informally describe dened in Exercise 8.5 by writing d = d. Exercise 8.6. Let (X, M, ) be a measure space, (Y, F ) be a measurable space and f : X Y be a measurable map. Dene a function : F [0, ] by (A) := (f 1 (A)) for all A F . 1. Show is a measure. (We will write = f or = f 1 .) 2. Show gd = (g f ) d
Y X
The last integral is zero since t 1xt = 0 unless t = x in which case it is not dened. On the other hand g (t) = t so that g (t) = 1. (The reader should decide which hypothesis of Corollary 8.38 has been violated in this example.)
(8.10)
for all measurable functions g : Y [0, ]. Hint: see the hint from Exercise 8.5.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
79
3. Show a measurable function g : Y C is in L1 ( ) i g f L1 () and that Eq. (8.10) holds for all g L1 ( ). Solution to Exercise (8.6). The fact that is a measure is a direct check which will be left to the reader. The key computation is to observe that if A F and g = 1A , then gd =
Y Y
Exercise 8.7. Let F : R R be a C 1 -function such that F (x) > 0 for all x R and limx F (x) = . (Notice that F is strictly increasing so that F 1 : R R exists and moreover, by the inverse function theorem that F 1 is a C 1 function.) Let m be Lebesgue measure on BR and (A) = m(F (A)) = m( F 1
1 1 (A)) = F m (A)
1A d = (A) = f 1 (A) =
X
1f 1 (A) d.
for all A BR . Show d = F dm. Use this result to prove the change of variable formula, h F F dm =
R R
Moreover, 1f 1 (A) (x) = 1 i x f 1 (A) which happens i f (x) A and hence 1f 1 (A) (x) = 1A (f (x)) = g (f (x)) for all x X. Therefore we have gd =
Y X
hdm
(8.14)
(g f ) d
whenever g is a characteristic function. This identity now extends to nonnegative simple functions by linearity and then to all non-negative measurable functions by MCT. The statements involving complex functions follows as in the solution to Exercise 8.5. Remark 8.42. If X is a random variable on a probability space, (, B , P ) , and F (x) := P (X x) . Then E [f (X )] =
R
which is valid for all Borel measurable functions h : R [0, ]. Hint: Start by showing d = F dm on sets of the form A = (a, b] with a, b R and a < b. Then use the uniqueness assertions in Exercise 5.1 to conclude d = F dm on all of BR . To prove Eq. (8.14) apply Exercise 8.6 with g = h F and f = F 1 . Solution to Exercise (8.7). Let d = F dm and A = (a, b], then ((a, b]) = m(F ((a, b])) = m((F (a), F (b)]) = F (b) F (a) while ((a, b]) =
(a,b] b
F dm =
a
f (x) dF (x)
(8.11)
It follows that both = = F where F is the measure described in Proposition 5.7. By Exercise 8.6 with g = h F and f = F 1 , we nd h F F dm =
R R
where dF (x) is shorthand for dF (x) and F is the unique probability measure on (R, BR ) such that F ((, x]) = F (x) for all x R. Moreover if F : R [0, 1] happens to be C 1 -function, then dF (x) = F (x) dm (x) and Eq. (8.11) may be written as (8.12)
h F d =
R
1 h F d F m = R
(h F ) F 1 dm
=
R
hdm.
This result is also valid for all h L1 (m). Lemma 8.43. Suppose that X is a standard normal random variable, i.e. E [f (X )] =
R
(8.13) then
1 P (X A) = 2
ex
A
/2
dx for all A BR ,
To verify Eq. (8.12) it suces to observe, by the fundamental theorem of calculus, that
b
P (X x) and1
1
1 1 x2 /2 e x 2
(8.15)
F (x) dx =
(a,b]
F dm.
F dm for all A BR .
See, Gordon, Robert D. Values of Mills ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann. Math. Statistics 12, (1941). 364366. (Reviewer: Z. W. Birnbaum) 62.0X
Page: 79
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
80
8 Integration Theory
P (X x)
= 1.
(8.16)
P (X x) =
x
2 1 ey /2 dy 2
1 1 y2 /2 1 y y2 /2 e dy = e |x x 2 2 x
from which Eq. (8.15) follows. To prove Eq. (8.16), let > 1, then P (X x) =
2 2 1 1 ey /2 dy ey /2 dy 2 2 x x x 1 1 y2 /2 x 1 y y2 /2 e dy = e |x 2 x 2 x x 2 2 2 1 1 = ex /2 e x /2 . x 2
for x suciently large. Example 8.44. Let {Xn }n=1 be i.i.d. standard normal random variables. Then P (Xn cn ) Now, suppose that we take cn so that ecn /2 =
2
1 2 c2 n /2 . e cn
Hence P (X x) 1 1 x2 /2 x 2 e
2 x 1 ey /2 dy x 2 1 1 x2 /2 x 2 e
C n
1 ex /2 e ex2 /2
x2 /2
2 2 1 1 e( 1)x /2 .
1 .
(We now take C = 1.) It then follows that P (Xn cn ) and therefore 1 2 ln (n) e
2
ln(n)
1 2 ln (n)
1 n2
Since > 1 was arbitrary, it follows that lim inf Since Eq. (8.15) implies that P (X x) lim sup 1 1 x2 /2 = 1 x x e 2 we are done. Additional information: Suppose that we now take = 1 + xp = Then 2 1 x2 = x2p + 2xp x2 = x22p + 2x2p .
Page: 80 job: prob
P (X x)
2 x 1 1 ex /2 x 2
= 1.
P (Xn cn ) = if < 1
n=1
and
1 + xp . xp
Xn = 1 a.s.. 2 ln n
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
81
, B ) measurable simple function n 0 such assume that f 0. Choose (M that n f as n . Writing n = ak 1Ak
, we may choose Bk M such that Bk Ak and with Ak M (Ak \ Bk ) = 0. Letting n := ak 1Bk we have produced a (M, B ) measurable simple function n 0 such that En := {n = n } has zero measure. Since (n En ) n (En ) , there exists F M such that n En F and (F ) = 0. It now follows that 1F n = 1F n g := 1F f as n . This shows that g = 1F f is (M, B ) measurable and that {f = g } F has measure zero. Since f = g , a.e., X f d = X gd so to prove Eq. (8.18) it suces to prove gd =
X X
gd.
(8.18)
Since (1E g )1 (B ) E if 0 / B and (E ) = 0, it follow by completeness of B that (1E g )1 (B ) B if 0 / B. Therefore Eq. (8.17) shows that 1E g is measurable. 2. Let E = {x : lim fn (x) = f (x)} by assumption E B and
n
(E ) = 0. Since g := 1E f = limn 1E c fn , g is measurable. Because f = g on E c and (E ) = 0, f = g a.e. so by part 1. f is also measurable. The above results are in general false if (X, B , ) is not complete. For example, let X = {0, 1, 2}, B = {{0}, {1, 2}, X, } and = 0 . Take g (0) = 0, g (1) = 1, g (2) = 2, then g = 0 a.e. yet g is not measurable. is the comLemma 8.46. Suppose that (X, M, ) is a measure space and M pletion of M relative to and is the extension of to M. Then a function , B = B R ) measurable i there exists a function g : X R f : X R is (M and that is (M, B ) measurable such E = {x : f (x) = g (x)} M (E ) = 0, i.e. f (x) = g (x) for a.e. x. Moreover for such a pair f and g, f L1 ( ) i g L1 () and in which case f d =
X X
Because = on M, Eq. (8.18) is easily veried for non-negative M measurable simple functions. Then by the monotone convergence theorem and the approximation Theorem 6.34 it holds for all M measurable functions g : X [0, ]. The rest of the assertions follow in the standard way by considering (Re g ) and (Im g ) .
gd.
Proof. Suppose rst that such a function g exists so that (E ) = 0. Since , B ) measurable, we see from Proposition 8.45 that f is (M , B) g is also (M , B ) measurable, by considering f we may measurable. Conversely if f is (M
2
Recall this means that if N X is a set such that N A M and (A) = 0, then N M as well.
G = f (a)1{a} +
1
Page: 81
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
82
8 Integration Theory
S f = Notice that
Mj (tj tj 1 ) and s f =
b b
mj (tj tj 1 ).
Lemma 8.50. The functions H, h : [a, b] R satisfy: 1. h(x) f (x) H (x) for all x [a, b] and h(x) = H (x) i f is continuous at x. 2. If {k }k=1 is any increasing sequence of partitions such that mesh(k ) 0 and G and g are dened as in Eq. (8.20), then G(x) = H (x) f (x) h(x) = g (x) x / := k=1 k . (8.23)
S f =
a
G dm and s f =
a
g dm.
f (x)dx = sup s f.
b f a
Denition 8.47. The function f is Riemann integrable i and which case the Riemann integral
b b b a
b a
(Note is a countable set.) 3. H and h are Borel measurable. f R Proof. Let Gk := Gk G and gk := gk g. 1. It is clear that h(x) f (x) H (x) for all x and H (x) = h(x) i lim f (y )
y x
f (x)dx =
a a
f (x)dx =
a
f (x)dx.
The proof of the following Lemma is left to the reader as Exercise 8.18. Lemma 8.48. If and are two partitions of [a, b] and then G G f g g and S f S f s f s f. There exists an increasing sequence of partitions {k }k=1 such that mesh(k ) 0 and
b b
exists and is equal to f (x). That is H (x) = h(x) i f is continuous at x. 2. For x / , Gk (x) H (x) f (x) h(x) gk (x) k and letting k in this equation implies G(x) H (x) f (x) h(x) g (x) x / . Moreover, given > 0 and x / , sup{f (y ) : |y x| , y [a, b]} Gk (x) for all k large enough, since eventually Gk (x) is the supremum of f (y ) over some interval contained in [x , x + ]. Again letting k implies sup f (y ) G(x) and therefore, that
|y x|
(8.24)
Sk f
a
f and sk f
a
f as k .
(8.20)
gdm = lim
[a,b]
gk = lim sk f =
[a,b] k a
f (x)dx
(8.21)
and
b
for all x / . Combining this equation with Eq. (8.24) then implies H (x) = G(x) if x / . A similar argument shows that h(x) = g (x) if x / and hence Eq. (8.23) is proved. 3. The functions G and g are limits of measurable functions and hence measurable. Since H = G and h = g except possibly on the countable set , both H and h are also Borel measurable. (You justify this statement.)
Gdm = lim
[a,b]
Gk = lim Sk f =
[a,b] k a
f (x)dx.
Notation 8.49 For x [a, b], let H (x) = lim sup f (y ) := lim sup{f (y ) : |y x| , y [a, b]} and
y x y x 0
f=
a [a,b]
Hdm and
a
f=
[a,b]
hdm
(8.25)
Page: 82
8.7 Exercises
83
1. H (x) = h(x) for m -a.e. x, 2. the set E := {x [a, b] : f is discontinuous at x} is an m null set. 3. f is Riemann integrable. If f is Riemann integrable then f is Lebesgue measurable3 , i.e. f is L/B measurable where L is the Lebesgue algebra and B is the Borel algebra on [a, b]. Moreover if we let m denote the completion of m, then
b
2. Dene A B i (AB ) = 0 and notice that (A, B ) = 0 i A B. Show is an equivalence relation. 3. Let M/ denote M modulo the equivalence relation, , and let [A] := {B M : B A} . Show that ([A] , [B ]) := (A, B ) is gives a well dened metric on M/ . 4. Similarly show ([A]) = (A) is a well dened function on M/ and show : (M/ ) R+ is continuous. Exercise 8.10. Suppose that n : M [0, ] are measures on M for n N. Also suppose that n (A) is increasing in n for all A M. Prove that : M [0, ] dened by (A) := limn n (A) is also a measure. Exercise 8.11. Now suppose that is some index set and for each , : M [0, ] is a measure on M. Dene : M [0, ] by (A) = (A) for each A M. Show that is also a measure. Exercise 8.12. Let (X, M, ) be a measure space and {An }n=1 M, show ({An a.a.}) lim inf (An )
n
Hdm =
[a,b] a
f (x)dx =
[a,b]
f dm =
[a,b]
hdm.
(8.26)
Proof. Let {k }k=1 be an increasing sequence of partitions of [a, b] as described in Lemma 8.48 and let G and g be dened as in Lemma 8.50. Since m( ) = 0, H = G a.e., Eq. (8.25) is a consequence of Eqs. (8.21) and (8.22). From Eq. (8.25), f is Riemann integrable i Hdm =
[a,b] [a,b]
hdm
and if (mn Am ) < for some n, then ({An i.o.}) lim sup (An ) .
n
and because h f H this happens i h(x) = H (x) for m - a.e. x. Since E = {x : H (x) = h(x)}, this last condition is equivalent to E being a m null set. In light of these results and Eq. (8.23), the remaining assertions including Eq. (8.26) are now consequences of Lemma 8.46. Notation 8.52 In view of this theorem we will often write b f dm. a
b a
Exercise 8.13 (Folland 2.13 on p. 52.). Suppose that {fn }n=1 is a sequence of non-negative measurable functions such that fn f pointwise and
n
lim
fn =
f < .
f = lim
8.7 Exercises
Exercise 8.8. Let be a measure on an algebra A 2X , then (A) + (B ) = (A B ) + (A B ) for all A, B A. Exercise 8.9 (From problem 12 on p. 27 of Folland.). Let (X, M, ) be a nite measure space and for A, B M let (A, B ) = (AB ) where AB = (A \ B ) (B \ A) . It is clear that (A, B ) = (B, A) . Show: 1. satises the triangle inequality: (A, C ) (A, B ) + (B, C ) for all A, B, C M.
3
fn
E
for all measurable sets E M. The conclusion need not hold if limn f. Hint: Fatou times two.
fn =
Exercise 8.14. Give examples of measurable functions {fn } on R such that fn decreases to 0 uniformly yet fn dm = for all n. Also give an example of a sequence of measurable functions {gn } on [0, 1] such that gn 0 while gn dm = 1 for all n. Exercise 8.15. Suppose {an }n= C is a summable sequence (i.e. in is a continuous function for n= |an | < ), then f ( ) := n= an e R and 1 f ()ein d. an = 2
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 83
job: prob
84
8 Integration Theory
Exercise 8.16. For any function f L1 (m) , show x R (,x] f (t) dm (t) is continuous in x. Also nd a nite measure, , on BR such that x (,x] f (t) d (t) is not continuous. Exercise 8.17. Folland 2.31b and 2.31e on p. 60. (The answer in 2.13b is wrong by a factor of 1 and the sum is on k = 1 to . In part (e), s should be taken to be a. You may also freely use the Taylor series expansion (1 z )1/2 = (2n 1)!! n (2n)! n z = 2 z for |z | < 1. n n 2 n! n=0 n=0 4 (n!)
Exercise 8.20 (A simple form of the Strong Law of Large Numbers). 4 Suppose now that E |X1 | < . Show for all > 0 and n N that Sn n
4
= =
1 n + 3n(n 1) 4 n4 1 n1 + 3 1 n1 4 n2
Exercise 8.18. Prove Lemma 8.48. 8.7.1 Laws of Large Numbers Exercises For the rest of the problems of this section, let (, B , P ) be a probability n space, {Xn }n=1 be a sequence if i.i.d. random variables, and Sn := k=1 Xk . If E |Xn | = E |X1 | < let := EXn be the mean of Xn , if E |Xn |
2
Conclude from the last estimate and the rst Borel Cantelli Lemma 8.22 that n limn S n = a.s.
= E |X1 |
< , let
2 = E Xn 2 be the standard deviation of Xn
Exercise 8.19 (A simple form of the Weak Law of Large Numbers). 2 Assume E |X1 | < . Show E E P for all > 0 and n N.
Page: 84 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20
Sn = , n
2
Sn n
Sn > n
2 , and n 2 2 n =
Theorem 9.3 (Dynkins Multiplicative System Theorem). Suppose that H is a vector subspace of bounded functions from to R which contains the constant functions and is closed under monotone convergence. If M is multiplicative system (i.e. M is a subset of H which is closed under pointwise multiplication), then H contains all bounded (M) measurable functions. Proof. Let L := {A : 1A H} . We then have L since 1 = 1 H, if A, B L with A B then B \ A L since 1B \A = 1B 1A H, and if An L with An A, then A L because 1An H and 1An 1A H. Therefore L is system. Let n (x) = 0 [(nx) 1] (see Figure 9.1 below) so that n (x) 1x>0 . Given f1 , f2 , . . . , fk M and a1 , . . . , ak R, let
k
Fn :=
i=1
n (fi ai )
By the Weierstrass approximation Theorem 4.23, we may nd polynomial functions, pl (x) such that pl n uniformly on [M, M ] .Since pl is a polynomial k it is easily seen that i=1 pl (fi ai ) H. Moreover,
k
pl (fi ai ) Fn uniformly as l ,
i=1
+ n n+1 . Fn
Taking n := 2n , then n n+1 = 2n (1 1/2) = 2(n+1) in which case gn+1 gn 0 for all n. By choosing M suciently large, we will also have gn 0 for all n. Since H is a vector space containing the constant functions, gn H and since gn f + M, it follows that f = f + M M H. So we have shown that H is closed under uniform convergence.
it follows that 1k H or equivalently that k i=1 {fi > ai } L. Therei=1 {fi >ai } fore L contains the system, P , consisting of nite intersections of sets of the form, {f > a} with f M and a R.
86
algebra. Using the fact that H is closed under bounded convergence, it follows that B is closed under increasing unions and hence that B is algebra. Since H is a vector space, H contains all B measurable simple functions. Since every bounded B measurable function may be written as a bounded limit of such simple functions, it follows that H contains all bounded B measurable functions. The proof is now completed by showing B contains (M) as was done in second paragraph of the proof of Theorem 9.3. Corollary 9.5. Suppose H is a real subspace of bounded functions such that 1 H and H is closed under bounded convergence. If P 2 is a multiplicative class such that 1A H for all A P , then H contains all bounded (P ) measurable functions. Proof. Let M = {1}{1A : A P} . Then M H is a multiplicative system and the proof is completed with an application of Theorem 9.3. Example 9.6. Suppose and are two probability measure on (, B ) such that f d =
As a consequence of the above paragraphs and the theorem, L contains (P ) = (M) . In particular it follows that 1A H for all A (M) . Since any positive (M) measurable function may be written as a increasing limit of simple functions, it follows that H contains all non-negative bounded (M) measurable functions. Finally, since any bounded (M) measurable functions may be written as the dierence of two such non-negative simple functions, it follows that H contains all bounded (M) measurable functions. Corollary 9.4. Suppose that H is a vector subspace of bounded functions from to R which contains the constant functions and is closed under bounded convergence. If M is a subset of H which is closed under pointwise multiplication, then H contains all bounded (M) measurable functions. Proof. This is of course a direct consequence of Theorem 9.3. Moreover, under the assumptions here, the proof of Theorem 9.3 simplies in that Proposition 9.2 is no longer needed. For fun, let us give another self-contained proof of this corollary which does not even refer to the theorem. In this proof, we will assume that H is the smallest subspace of bounded functions on which contains the constant functions, contains M, and is closed under bounded convergence. (As usual such a space exists by taking the intersection of all such spaces.) For f H, let Hf := {g H : gf H} . The reader will now easily verify that Hf is a linear subspace of H, 1 Hf , and Hf is closed under bounded convergence. Moreover if f M, then M Hf and so by the denition of H, H = Hf , i.e. f g H for all f M and g H. Having proved this it now follows for any f H that M Hf and therefore f g H whenever f, g H, i.e. H is now an algebra of functions. We will now show that B := {A : 1A H} is algebra. Using the fact that H is an algebra containing constants, the reader will easily verify that B is closed under complementation, nite intersections, and contains , i.e. B is an
Page: 86 job: prob
f d
(9.1)
for all f in a multiplicative subset, M, of bounded measurable functions on . Then = on (M) . Indeed, apply Theorem 9.3 with H being the bounded measurable functions on such that Eq. (9.1) holds. In particular if M = {1} {1A : A P} with P being a multiplicative class we learn that = on (M) = (P ) . Corollary 9.7. The smallest subspace of real valued functions, H, on R which contains Cc (R, R) (the space of continuous functions on R with compact support) is the collection of bounded Borel measurable function on R. Proof. By a homework problem, for < a < b < , 1(a,b] may be written as a bounded limit of continuous functions with compact support from which it follows that (Cc (R, R)) = BR . It is also easy to see that 1 is a bounded limit of functions in Cc (R, R) and hence 1 H. The corollary now follows by an application of The result now follows by an application of Theorem 9.3 with M := Cc (R, R). For the rest of this chapter, recall for p [1, ) that Lp () = Lp (X, B , ) is 1/p p the set of measurable functions f : R such that f Lp := |f | d < . It is easy to see that f p = || f p for all R and we will show below that f + g p f p + g p for all f, g Lp () , i.e.
p
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
87
Theorem 9.8 (Density Theorem). Let p [1, ), (, B , ) be a measure space and M be an algebra of bounded R valued measurable functions such that 1. M Lp (, R) and (M) = B . 2. There exists k M such that k 1 boundedly. Then to every function f Lp (, R) , there exist n M such that limn f n Lp () = 0, i.e. M is dense in Lp (, R) . Proof. Fix k N for the moment and let H denote those bounded B measurable functions, f : R, for which there exists {n }n=1 M such that limn k f n Lp () = 0. A routine check shows H is a subspace of the bounded measurable R valued functions on , 1 H, M H and H is closed under bounded convergence. To verify the latter assertion, suppose fn H and fn f boundedly. Then, by the dominated convergence theorem, limn k (f fn ) Lp () = 0.1 (Take the dominating function to be g = p [2C |k |] where C is a constant bounding all of the {|fn |}n=1 .) We may now 1 choose n M such that n k fn Lp () n then lim sup k f n
n Lp ()
Theorem 9.10. Suppose p [1, ), A B 2 is an algebra such that (A) = B and is nite on A. Let S(A, ) denote the measurable simple functions, : R such { = y } A for all y R and ({ = 0}) < . Then S(A, ) is dense subspace of Lp (). Proof. Let M := S(A, ). By assumption there exists k A such that (k ) < and k as k . If A A, then k A A and (k A) < so that 1k A M. Therefore 1A = limk 1k A is (M) measurable for every A A. So we have shown that A (M) B and therefore B = (A) (M) B , i.e. (M) = B . The theorem now follows from Theorem 9.8 after observing k := 1k M and k 1 boundedly. Theorem 9.11 (Separability of Lp Spaces). Suppose, p [1, ), A B is a countable algebra such that (A) = B and is nite on A. Then Lp () is separable and D={ aj 1Aj : aj Q + iQ, Aj A with (Aj ) < }
is a countable dense subset. Proof. It is left to reader to check D is dense in S(A, ) relative to the Lp () norm. Once this is done, the proof is then complete since S(A, ) is a dense subspace of Lp () by Theorem 9.10. Notation 9.12 Given a collection of bounded functions, M, from a set, , to R, let M (M ) denote the the bounded monotone increasing (decreasing) limits of functions from M. More explicitly a bounded function, f : R is in M respectively M i there exists fn M such that fn f respectively fn f. Exercise 9.1. Let (, B , P ) be a probability space and X, Y : R be a pair of random variables such that E [f (X ) g (Y )] = E [f (X ) g (X )] for every pair of bounded measurable functions, f, g : R R. Show P (X = Y ) = 1. Hint: Let H denote the bounded Borel measurable functions, h : R2 R such that E [h (X, Y )] = E [h (X, X )] . Use Corollary 9.4 to show H is the vector space of all bounded Borel measurable functions. Then take h (x, y ) = 1{x=y} . Theorem 9.13 (Bounded Approximation Theorem). Let (, B , ) be a nite measure space and M be an algebra of bounded R valued measurable functions such that:
macro: svmonob.cls date/time: 23-Feb-2007/15:20
lim sup k (f fn )
n n
Lp () Lp ()
+ lim sup k fn n
=0
(9.2)
which implies f H. An application of Dynkins Multiplicative System Theorem 9.3, now shows H contains all bounded measurable functions on . Let f Lp () be given. The dominated convergence theorem implies limk k 1{|f |k} f f Lp () = 0. p (Take the dominating function to be g = [2C |f |] where C is a bound on all of the |k | .) Using this and what we have just proved, there exists k M such that 1 k 1{|f |k} f k Lp () . k The same line of reasoning used in Eq. (9.2) now implies limk f k Lp () = 0. Example 9.9. Let be a measure on (R, BR ) such that ([M, M ]) < for all M < . Then, Cc (R, R) (the space of continuous functions on R with compact support) is dense in Lp () for all 1 p < . To see this, apply Theorem 9.8 with M = Cc (R, R) and k := 1[k,k] .
1
Page: 87
job: prob
88
Since > 0 was arbitrary, if follows that g H for 0. Similarly, M h g f M and (f (h)) = (h f ) < . which shows g H as well. Because of Theorem 9.3, to complete this proof, it suces to show H is closed under monotone convergence. So suppose that gn H and gn g, where g : R is a bounded function. Since H is a vector space, it follows that 0 n := gn+1 gn H for all n N. So if > 0 is given, we can nd, M un n vn M such that (vn un ) 2n for all n. By replacing un by un 0 M (by observation 1.), we may further assume that un 0. Let
N
Then for every bounded (M) measurable function, g : R, and every niel > 0, there exists f M and h M such that f g h and (h f ) < . in Proof. Let us begin with a few simple observations. s to of 1. M is a lattice if f, g M then s. 1 f g = (f + g + |f g |) M 2 1 f g = (f + g |f g |) M. 2 If f, g M or f, g M then f + g M or f + g M respectively. If 0 and f M (f M ), then f M (f M ) . If f M then f M and visa versa. If fn M and fn f where f : R is a bounded function, then f M . Indeed, by assumption there exists fn,i M such that fn,i fn as i . By observation (1), gn := max {fij : i, j n} M. Moreover it is clear that gn max {fk : k n} = fn f and hence gn g := limn gn f. Since fij g for all i, j, it follows that fn = limj fnj g and consequently that f = limn fn g f. So we have shown that gn f M . and
v :=
n=1
vn = lim
2. 3. 4. 5.
uN :=
n=1
Then
n = lim
n=1
n = lim (gN +1 g1 ) = g g1
n=1 N
and uN g g1 v. Moreover,
N N
Now let H denote the collection of bounded measurable functions which satisfy the assertion of the theorem. Clearly, M H and in fact it is also easy to see that M and M are contained in H as well. For example, if f M , by denition, there exists fn M M such that fn f. Since M fn f f M and (f fn ) 0 by the dominated convergence theorem, it follows that f H. As similar argument shows M H. We will now show H is a vector sub-space of the bounded B = (M) measurable functions. H is closed under addition. If gi H for i = 1, 2, and > 0 is given, we may nd fi M and hi M such that fi gi hi and (hi fi ) < /2 for i = 1, 2. Since h = h1 + h2 M , f := f1 + f2 M , f g1 + g2 h, and (h f ) = (h1 f1 ) + (h2 f2 ) < , it follows that g1 + g2 H. H is closed under scalar multiplication. If g H then g H for all R. Indeed suppose that > 0 is given and f M and h M such that f g h and (h f ) < . Then for 0, M f g h M and (h f ) = (h f ) < .
Page: 88 job: prob
v uN =
n=1
(vn un ) +
n=N +1
(vn )
n=1
2n +
n=N +1
(vn )
+
n=N +1
(vn ) .
However, since
(vn )
n=1 n=1
n + 2n =
n=1
(n ) + ( )
=
n=1
(g g1 ) + ( ) < ,
it follows that for N N suciently large that n=N +1 (vn ) < . Therefore, for this N, we have v uN < 2 and since > 0 is arbitrary, if follows that g g1 H. Since g1 H and H is a vector space, we may conclude that g = (g g1 ) + g1 H.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Theorem 9.14 (Complex Multiplicative System Theorem). Suppose H is a complex linear subspace of the bounded complex functions on , 1 H, H is closed under complex conjugation, and H is closed under bounded convergence. If M H is multiplicative system which is closed under conjugation, then H contains all bounded complex valued (M)-measurable functions. Proof. Let M0 = spanC (M {1}) be the complex span of M. As the reader should verify, M0 is an algebra, M0 H, M0 is closed under complex conjugation and (M0 ) = (M) . Let HR := {f H : f is real valued} and MR 0 := {f M0 : f is real valued} . Then HR is a real linear space of bounded real valued functions 1 which is closed R R under bounded convergence and MR 0 H . Moreover, M0 is a multiplicative system (as the reader should check) and therefore by Theorem 9.3, HR contains all bounded MR 0 measurable real valued functions. Since H and M0 are complex linear spaces closed under complex conjugation, for any f H or 1 f M0 , the functions Re f = 1 2 f + f and Im f = 2i f f are in H or R R R M0 respectively. Therefore M0 = M0 + iM0 , M0 = (M0 ) = (M) , and H = HR + iHR . Hence if f : C is a bounded (M) measurable function, then f = Re f + i Im f H since Re f and Im f are in HR .
x
Y
(10.3) (10.4)
d (y )f (x, y ) :=
X Y
f (x, y )d (y ) d(x)
d(x)
X Y
d (y )f (x, y ) =
Y
d (y )
X
d(x)f (x, y ).
(10.5)
and d (y )
Y X
f (x, y )d(x) d (y ).
f (x, y ) = 1AB (x, y ) = 1A (x)1B (y ) and one sees that Eqs. (10.1) and (10.2) hold. Moreover f (x, y )d (y ) =
Y Y
Notation 10.2 Suppose that f : X C and g : Y C are functions, let f g denote the function on X Y given by f g (x, y ) = f (x)g (y ). Notice that if f, g are measurable, then f g is (M N , BC ) measurable. To prove this let F (x, y ) = f (x) and G(x, y ) = g (y ) so that f g = F G will be measurable provided that F and G are measurable. Now F = f 1 where 1 : X Y X is the projection map. This shows that F is the composition of measurable functions and hence measurable. Similarly one shows that G is measurable.
1A (x)1B (y )d (y ) = 1A (x) (B ),
d (y )f (x, y ) = (B )(A).
(10.6)
d (y )
X
Theorem 10.3. Suppose (X, M, ) and (Y, N , ) are -nite measure spaces and f is a nonnegative (M N , BR ) measurable function, then for each y Y, x f (x, y ) is M B[0,] measurable, for each x X, y f (x, y ) is N B[0,] measurable, (10.2) (10.1)
from which it follows that Eqs. (10.4) and (10.5) hold in this case as well. For the moment let us now further assume that (X ) < and (Y ) < and let H be the collection of all bounded (M N , BR ) measurable functions on X Y such that Eqs. (10.1) (10.5) hold. Using the fact that measurable functions are closed under pointwise limits and the dominated convergence theorem (the dominating function always being a constant), one easily shows that H closed under bounded convergence. Since we have just veried that 1E H for all E in the class, E , it follows by Corollary 9.5 that H is the space
92
of all bounded (M N , BR ) measurable functions on X Y. Moreover, if f : X Y [0, ] is a (M N , BR ) measurable function, let fM = M f so that fM f as M . Then Eqs. (10.1) (10.5) hold with f replaced by fM for all M N. Repeated use of the monotone convergence theorem allows us to pass to the limit M in these equations to deduce the theorem in the case and are nite measures. For the nite case, choose Xn M, Yn N such that Xn X, Yn Y, (Xn ) < and (Yn ) < for all m, n N. Then dene m (A) = (Xm A) and n (B ) = (Yn B ) for all A M and B N or equivalently dm = 1Xm d and dn = 1Yn d. By what we have just proved Eqs. (10.1) (10.5) with replaced by m and by n for all (M N , BR ) measurable functions, f : X Y [0, ]. The validity of Eqs. (10.1) (10.5) then follows by passing to the limits m and then n making use of the monotone convergence theorem in the following context. For all u L+ (X, M), udm =
X X
Theorem 10.6 (Tonellis Theorem). Suppose (X, M, ) and (Y, N , ) are nite measure spaces and = is the product measure on M N . If f L+ (X Y, M N ), then f (, y ) L+ (X, M) for all y Y, f (x, ) L+ (Y, N ) for all x X, f (, y )d (y ) L+ (X, M),
Y X
and f d =
X Y X
d(x)
Y
(10.8) (10.9)
=
Y
d (y )
u1Xm d
X
ud as m ,
Proof. By Theorem 10.3 and Corollary 10.4, the theorem holds when f = 1E with E M N . Using the linearity of all of the statements, the theorem is also true for non-negative simple functions. Then using the monotone convergence theorem repeatedly along with the approximation Theorem 6.34, one deduces the theorem for general f L+ (X Y, M N ). Example 10.7. In this example we are going to show, I := 2. To this end we observe, using Tonellis theorem, that
2 R
v 1Yn d
Y
vd as n .
ex
/2
dm (x) =
Corollary 10.4. Suppose (X, M, ) and (Y, N , ) are nite measure spaces. Then there exists a unique measure on M N such that (A B ) = (A) (B ) for all A M and B N . Moreover is given by (E ) =
X
I2 =
R
ex
/2
dm (x)
=
R
ey
/2 R
ex
/2
dm (x) dm (y )
=
R2
e(x
+y 2 )/2
dm2 (x, y )
d(x)
Y
d (y )1E (x, y ) =
Y
d (y )
X
d(x)1E (x, y )
(10.7)
where m2 = m m is Lebesgue measure on R2 , BR2 = BR BR . From the monotone convergence theorem, I 2 = lim
R
for all E M N and is nite. Proof. Notice that any measure such that (A B ) = (A) (B ) for all A M and B N is necessarily nite. Indeed, let Xn M and Yn N be chosen so that (Xn ) < , (Yn ) < , Xn X and Yn Y, then Xn Yn M N , Xn Yn X Y and (Xn Yn ) < for all n. The uniqueness assertion is a consequence of the combination of Exercises 4.5 and 5.1 Proposition 4.26 with E = M N . For the existence, it suces to observe, using the monotone convergence theorem, that dened in Eq. (10.7) is a measure on M N . Moreover this measure satises (A B ) = (A) (B ) for all A M and B N from Eq. (10.6). Notation 10.5 The measure is called the product measure of and and will be denoted by .
Page: 92 job: prob
e(x
DR
+y 2 )/2
d (x, y )
where DR = (x, y ) : x2 + y 2 < R2 . Using the change of variables theorem described in Section 10.5 below,1 we nd e(x
DR
2
+y 2 )/2
d (x, y ) =
(0,R)(0,2 ) R
er er
0
2
/2
rdrd
2
= 2
1
/2
rdr = 2 1 eR
/2
Alternatively, you can easily show that the integral D f dm2 agrees with the R multiple integral in undergraduate analysis when f is continuous. Then use the change of variables theorem from undergraduate analysis.
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
93
f (x, y ) d (y ) =
/2
1E c (x) f (x, y ) d (y )
Y
= 2
=
Y
as desired. =
Noting that 1E c (x) f (x, y ) = (1E c 1Y f ) (x, y ) is a positive M N measurable function, it follows from another application of Tonellis theorem that x Y f (x, y ) d (y ) is M measurable, being the dierence of two measurable functions. Moreover f (x, y ) d (y ) d (x)
X Y X Y
which shows Y f (, y )dv (y ) L1 (). Integrating Eq. (10.14) on x and using Tonellis theorem repeatedly implies, f (x, y ) d (y ) d (x)
X Y
=
X
d (x)
Y
d (y ) 1E c (x) f+ (x, y )
X
d (x)
Y
=
Y
d (y )
X
d (y )
=
Y
d (y )
X
d (x) f+ (x, y )
Y
d (y )
X
If any one (and hence all) of these condition hold, then f (x, ) L1 ( ) for -a.e. x, f (, y ) L1 () for -a.e. y, Y f (, y )dv (y ) L1 (), X f (x, )d(x) L1 ( ) and Eqs. (10.8) and (10.9) are still valid. Proof. The equivalence of Eqs. (10.10) (10.12) is a direct consequence of Tonellis Theorem 10.6. Now suppose f L1 ( ) is a real valued function and let E := xX:
Y
=
X Y
f+ d
X Y
f d =
X Y
(f+ f ) d =
which proves Eq. (10.8) holds. Now suppose that f = u + iv is complex valued and again let E be as in Eq. (10.13). Just as above we still have E M and (E ) = 0. By our convention, f (x, y ) d (y ) =
Y Y
|f (x, y )| d (y ) = .
(10.13)
1E c (x) f (x, y ) d (y ) =
Y
Then by Tonellis theorem, x Y |f (x, y )| d (y ) is measurable and hence E M. Moreover Tonellis theorem implies |f (x, y )| d (y ) d (x) =
X Y X Y
=
Y
1E c (x) u (x, y ) d (y ) + i
|f | d <
which implies that (E ) = 0. Let f be the positive and negative parts of f, then using the above convention we have
Page: 93 job: prob
which is measurable in x by what we have just proved. Similarly one shows f (, y ) d (y ) L1 () and Eq. (10.8) still holds by a computation similar to Y that done in Eq. (10.15). The assertions pertaining to Eq. (10.9) may be proved in the same way. The previous theorems have obvious generalizations to products of any nite number of nite measure spaces. For example the following theorem holds.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
94
Theorem 10.9. Suppose {(Xi , Mi , i )}i=1 are nite measure spaces and X := X1 Xn . Then there exists a unique measure, , on (X, M1 Mn ) such that (A1 An ) = 1 (A1 ) . . . n (An ) for all Ai Mi .
Ef (X1 , . . . , Xn ) = P ((X1 , . . . , Xn ) A1 An )
n n
=
j =1
P (Xj Aj ) =
j =1
j (Aj )
= (This measure and its completion will be denoted by 1 n .) If f : X [0, ] is a M1 Mn measurable function then f d =
X X(1) Rn
d(1) (x(1) ) . . .
X(n)
Therefore, H contains the multiplicative system, M := {1A1 An : Ai BR } and so by the multiplicative systems theorem, H contains all bounded (M) = BRn measurable functions. (2 = 3) Let A BRn and f = 1A in Eq. (10.17) to conclude that (A) = P ((X1 , . . . , Xn ) A) = E1A (X1 , . . . , Xn ) = 1A (x1 , . . . , xn ) d1 (x1 ) . . . dn (xn ) = (1 n ) (A) .
Rn
where is any permutation of {1, 2, . . . , n}. This equation also holds for any f L1 ( ) and moreover, f L1 ( ) i d(1) (x(1) ) . . .
X(1) X(n)
for some (and hence all) permutations, . This theorem can be proved by the same methods as in the two factor case, see Exercise 10.4. Alternatively, one can use the theorems already proved and induction on n, see Exercise 10.5 in this regard. Proposition 10.10. Suppose that {Xk }k=1 are random variables on a prob1 ability space (, B , P ) and k = P Xk is the distribution for Xk for 1 k = 1, 2, . . . , n, and := P (X1 , . . . , Xn ) is the joint distribution of (X1 , . . . , Xn ) . Then the following are equivalent, 1. {Xk }k=1 are independent, 2. for all bounded measurable functions, f : (Rn , BRn ) (R, BR ) , Ef (X1 , . . . , Xn ) =
Rn n n
P ((X1 , . . . , Xn ) A1 An ) = (A1 An ) =
j =1 n
j (Aj )
=
j =1
P (Xj Aj ) ,
which is valid for all Aj BR . Example 10.11 (No Ties). Suppose that X and Y are independent random variables on a probability space (, B , P ) . If F (x) := P (X x) is continuous, then P (X = Y ) = 0. To prove this, let (A) := P (X A) and (A) = P (Y A) . Because F is continuous, ({y }) = F (y ) F (y ) = 0, and hence P (X = Y ) = E 1{X =Y } =
R2
and 3. = 1 2 n . Proof. (1 = 2) Suppose that {Xk }k=1 are independent and let H denote the set of bounded measurable functions, f : (Rn , BRn ) (R, BR ) such that Eq. (10.17) holds. Then it is easily checked that H is a vector space which contains the constant functions and is closed under bounded convergence. Moreover, if f = 1A1 An where Ai BR , we have
n
=
R
d (y )
R
d (x) 1{x=y} =
=
R
0 d (y ) = 0.
lim
sin x dx = /2. x
(10.18)
1 x
tx e dt 0
95
sin x dx = x =
M 0 0 0 0
etx sin x dt dx
0 M
sin x x e dx = x =
dx sin x ex
0 M 0
etx dt
etx sin x dx dt
dt
0 0
dx sin x e(+t)x
1 1 teM t sin M eM t cos M dt 2 1 + t 0 1 dt = as M , 1 + t2 2 0 = wherein we have used the dominated convergence theorem (for instance, take 1 t g (t) := 1+ + et )) to pass to the limit. t2 (1 + te The next example is a renement of this result. Example 10.13. We have
0
=
0
=
0
cos M + ( + t) sin M ( + t) + 1
2
eM (+t) dt.
1 + ( + t ) ( + t) + 1
2
C,
eM (+t) dt = C
eM . M
(10.20)
where C = maxx0
1+x 1+x2
This estimate along with Eq. (10.21) proves Eq. (10.20) from which Eq. (10.18) follows by taking and Eq. (10.19) follows (using the dominated convergence theorem again) by letting M .
To verify these assertions, rst notice that by the fundamental theorem of calculus,
x x x
|sin x| =
0
cos ydy
0
|cos y | dy
0
1dy = |x|
so
sin x x
etx dt = 1/x
0 xE
:= {y Y : (x, y ) E }.
Similarly if y Y is given let Ey := {x X : (x, y ) E }. If f : X Y C is a function let fx = f (x, ) and f y := f (, y ) so that fx : Y C and f y : X C.
Page: 95
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
96
Theorem 10.15. Suppose (X, M, ) and (Y, N , ) are complete nite measure spaces. Let (X Y, L, ) be the completion of (X Y, M N , ). If f is L measurable and (a) f 0 or (b) f L1 () then fx is N measurable for a.e. x and f y is M measurable for a.e. y and in case (b) fx L1 ( ) and f y L1 () for a.e. x and a.e. y respectively. Moreover, x
Y
g (x, y )d(x).
fx d
L1 () and
y
X
f y d
L1 ( )
d (y )f (x, y ) =
X
d(x)
Y
and f d =
X Y Y
= d
X
d (y )
Y
d f =
X
d
Y
d f. =
f (x, y )d(x, y ).
X Y
(x E )d(x) =
X
This shows that ({x : (x E ) = 0}) = 0 and ({y : (Ey ) = 0}) = 0, i.e. (x E ) = 0 for a.e. x and (Ey ) = 0 for a.e. y. If h is L measurable and h = 0 for a.e., then there exists E M N such that {(x, y ) : h(x, y ) = 0} E and ( )(E ) = 0. Therefore |h(x, y )| 1E (x, y ) and ( )(E ) = 0. Since {hx = 0} = {y Y : h(x, y ) = 0} x E and {hy = 0} = {x X : h(x, y ) = 0} Ey we learn that for a.e. x and a.e. y that {hx = 0} M, {hy = 0} N , ({hx = 0}) = 0 and a.e. and ({hy = 0}) = 0. This implies Y h(x, y )d (y ) exists and equals 0 for a.e. x and similarly that X h(x, y )d(x) exists and equals 0 for a.e. y. Therefore 0=
X Y 1
d(x)f (x, y ) =
X Y
f (x, y )d(x, y ).
m := m m on BRd = BR BR be the d fold product of Lebesgue measure m on BR . We will also use md to denote its completion and let Ld be the completion of BRd relative to md . A subset A Ld is called a Lebesgue measurable set and md is called d dimensional Lebesgue measure, or just Lebesgue measure for short. Denition 10.17. A function f : Rd R is Lebesgue measurable if f 1 (BR ) Ld . Notation 10.18 I will often be sloppy in the sequel and write m for md and dx for dm(x) = dmd (x), i.e. f (x) dx = f dm =
Rd Rd
hd =
Y X
hd
d =
X 1 Y
hd
d.
For general f L (), we may choose g L (MN , ) such that f (x, y ) = g (x, y ) for a.e. (x, y ). Dene h := f g. Then h = 0, a.e. Hence by what we have just proved and Theorem 10.6 f = g + h has the following properties: 1. For a.e. x, y f (x, y ) = g (x, y ) + h(x, y ) is in L1 ( ) and f (x, y )d (y ) =
Y Y
f dmd .
g (x, y )d (y ).
Rd
Hopefully the reader will understand the meaning from the context.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 96
job: prob
97
Theorem 10.19. Lebesgue measure md is translation invariant. Moreover md is the unique translation invariant measure on BRd such that md ((0, 1]d ) = 1. Proof. Let A = J1 Jd with Ji BR and x Rd . Then x + A = (x1 + J1 ) (x2 + J2 ) (xd + Jd ) and therefore by translation invariance of m on BR we nd that md (x + A) = m(x1 + J1 ) . . . m(xd + Jd ) = m(J1 ) . . . m(Jd ) = md (A) and hence md (x + A) = md (A) for all A BRd since it holds for A in a multiplicative system which generates BRd . From this fact we see that the measure md (x + ) and md () have the same null sets. Using this it is easily seen that m(x + A) = m(A) for all A Ld . The proof of the second assertion is Exercise 10.6. Exercise 10.1. In this problem you are asked to show there is no reasonable notion of Lebesgue measure on an innite dimensional Hilbert space. To be more precise, suppose H is an innite dimensional Hilbert space and m is a countably additive measure on BH which is invariant under translations and satises, m(B0 ()) > 0 for all > 0. Show m(V ) = for all non-empty open subsets V H. Theorem 10.20 (Change of Variables Theorem). Let o Rd be an open set and T : T ( ) o Rd be a C 1 dieomorphism,2 see Figure 10.1. Then for any Borel measurable function, f : T ( ) [0, ], f (T (x)) | det T (x) |dx =
T ( ) d where T (x) is the linear transformation on Rd dened by T (x)v := dt |0 T (x + d tv ). More explicitly, viewing vectors in R as columns, T (x) may be represented by the matrix 1 T1 (x) . . . d T1 (x) . . .. . . T (x) = (10.23) , . . .
Remark 10.21. Theorem 10.20 is best remembered as the statement: if we make the change of variables y = T (x) , then dy = | det T (x) |dx. As usual, you must also change the limits of integration appropriately, i.e. if x ranges through then y must range through T ( ) . Proof. The proof will be by induction on d. The case d = 1 was essentially done in Exercise 8.7. Nevertheless, for the sake of completeness let us give a proof here. Suppose d = 1, a < < < b such that [a, b] is a compact subinterval of . Then | det T | = |T | and 1T ((, ]) (T (x)) |T (x)| dx =
[a,b] [a,b]
f (y ) dy,
(10.22)
|T (x)| dx.
|T (x)| dx =
T (x) dx = T ( ) T () 1T ((, ]) (y ) dy
T ([a,b])
1 Td (x) . . . d Td (x) i.e. the i - j matrix entry of T (x) is given by T (x)ij = i Tj (x) where T (x) = (T1 (x), . . . , Td (x))tr and i = /xi .
2
That is T : T ( ) o Rd is a continuously dierentiable bijection and the inverse map T 1 : T ( ) is also continuously dierentiable.
Page: 97
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
98
|T (x)| dx =
= m (T ((, ])) = Combining the previous three equations shows f (T (x)) |T (x)| dx =
[a,b]
for some i {1, . . . , d} . For deniteness we will assume T is as in Eq. (10.25), the case of T in Eq. (10.26) may be handled similarly. For t R, let it : Rd1 Rd be the inclusion map dened by it (w) := wt := (w1 , . . . , wi1 , t, wi+1 , . . . , wd1 ) , t be the (possibly empty) open subset of Rd1 dened by
f (y ) dy
T ([a,b])
(10.24)
t := w Rd1 : (w1 , . . . , wi1 , t, wi+1 , . . . , wd1 ) and Tt : t Rd1 be dened by Tt (w) = (T2 (wt ) , . . . , Td (wt )) , see Figure 10.2. Expanding det T (wt ) along the rst row of the matrix T (wt )
whenever f is of the form f = 1T ((, ]) with a < < < b. An application of Dynkins multiplicative system Theorem 9.3 then implies that Eq. (10.24) holds for every bounded measurable function f : T ([a, b]) R. (Observe that |T (x)| is continuous and hence bounded for x in the compact interval, [a, b] .) N Recall that = n=1 (an , bn ) where an , bn R {} for n = 1, 2, < N with N = possible. Hence if f : T ( ) R + is a Borel measurable function and an < k < k < bn with k an and k bn , then by what we have already proved and the monotone convergence theorem 1(an ,bn ) (f T ) |T |dm =
= lim = lim
k T ( )
1T ([k ,k ]) f dm
=
T ( )
Summing this equality on n, then shows Eq. (10.22) holds. To carry out the induction step, we now suppose d > 1 and suppose the theorem is valid with d being replaced by d 1. For notational compactness, let us write vectors in Rd as row vectors rather than column vectors. Nevertheless, the matrix associated to the dierential, T (x) , will always be taken to be given as in Eq. (10.23). Case 1. Suppose T (x) has the form T (x) = (xi , T2 (x) , . . . , Td (x)) or T (x) = (T1 (x) , . . . , Td1 (x) , xi ) (10.26) (10.25)
shows |det T (wt )| = |det Tt (w)| . Now by the Fubini-Tonelli Theorem and the induction hypothesis,
Page: 98
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
99
f T | det T |dm =
Rd
=
R
(f T ) (wt ) | det T (wt ) |dw dt f (t, Tt (w)) | det Tt (w) |dw dt f (t, z ) dz dt = f (y ) dy
R
(T1 (x) , . . . , Td (x)) = T (x) = R (S (x)) = R ((xi , T2 (x) , . . . , Td (x))) for all x W, if (z1 , z2 , . . . , zd ) = S (x) = (xi , T2 (x) , . . . , Td (x)) then R (z ) = T1 S 1 (z ) , z2 , . . . , zd . (10.28)
=
R
=
R
Rd1
Tt (t )
1T ( ) (t, z ) f (t, z ) dz dt
=
T ( )
Observe that S is a map of the form in Eq. (10.25), R is a map of the form in Eq. (10.26), T (x) = R (S (x)) S (x) (by the chain rule) and (by the multiplicative property of the determinant) |det T (x)| = | det R (S (x)) | |det S (x)| x W. So if f : T (W ) [0, ] is a Borel measurable function, two applications of the results in Case 1. shows, f T | det T |dm =
W W
wherein the last two equalities we have used Fubini-Tonelli along with the identity; T ( ) = T (it ( )) = {(t, z ) : z Tt (t )} .
tR tR
Case 2. (Eq. (10.22) is true locally.) Suppose that T : Rd is a general map as in the statement of the theorem and x0 is an arbitrary point. We will now show there exists an open neighborhood W of x0 such that f T | det T |dm =
W T (W )
f dm
f dm
=
T (W )
f dm
holds for all Borel measurable function, f : T (W ) [0, ]. Let Mi be the 1-i minor of T (x0 ) , i.e. the determinant of T (x0 ) with the rst row and ith column removed. Since
d
and Case 2. is proved. Case 3. (General Case.) Let f : [0, ] be a general non-negative Borel measurable function and let Kn := {x : dist(x, c ) 1/n and |x| n} . Then each Kn is a compact subset of and Kn as n . Using the compactness of Kn and case 2, for each n N, there is a nite open cover Wn of Kn such that W and Eq. (10.22) holds with replaced by W for each W Wn . Let {Wi }i=1 be an enumeration of n=1 Wn and set W1 = W1 and Wi := Wi \ (W1 Wi1 ) for all i 2. Then = i=1 Wi and by repeated use of case 2.,
macro: svmonob.cls date/time: 23-Feb-2007/15:20
0 = det T (x0 ) =
i=1
(1)
i+1
i Tj (x0 ) Mi ,
there must be some i such that Mi = 0. Fix an i such that Mi = 0 and let, S (x) := (xi , T2 (x) , . . . , Td (x)) . (10.27)
Observe that |det S (x0 )| = |Mi | = 0. Hence by the inverse function Theorem, there exist an open neighborhood W of x0 such that W o and S (W ) o Rd
Page: 99 job: prob
100
f T | det T |dm =
i=1
1T (A) (y ) dy =
Rd
=
Rd
=
i=1 T (Wi )
1T (W i ) f dm =
i=1 T ( )
1T (W i ) f dm
wherein the second equality we have made the change of variables, y = T (x) . Hence we have shown d (m T ) = |det T ()| dm. In particular if T GL(d, R) = GL(Rd ) the space of d d invertible matrices, then m T = |det T | m, i.e. m (T (A)) = |det T | m (A) for allA BRd . (10.30)
=
T ( )
f dm.
Remark 10.22. When d = 1, one often learns the change of variables formula as
b T (b)
This equation also shows that m T and m have the same null sets and hence the equality in Eq. (10.30) is valid for any A Ld . Exercise 10.2. Show that f L1 T ( ) , md i |f T | | det T |dm <
f (T (x)) T (x) dx =
a T (a)
f (y ) dy
(10.29)
where f : [a, b] R is a continuous function and T is C 1 function dened in a neighborhood of [a, b] . If T > 0 on (a, b) then T ((a, b)) = (T (a) , T (b)) and Eq. (10.29) is implies Eq. (10.22) with = (a, b) . On the other hand if T < 0 on (a, b) then T ((a, b)) = (T (b) , T (a)) and Eq. (10.29) is equivalent to
T (a)
and if f L1 T ( ) , md , then Eq. (10.22) holds. Example 10.24 (Polar Coordinates). Suppose T : (0, ) (0, 2 ) R2 is dened by x = T (r, ) = (r cos , r sin ) , i.e. we are making the change of variable, x1 = r cos and x2 = r sin for 0 < r < and 0 < < 2. In this case T (r, ) = and therefore dx = |det T (r, )| drd = rdrd. Observing that R2 \ T ((0, ) (0, 2 )) = := {(x, 0) : x 0} has m2 measure zero, it follows from the change of variables Theorem 10.20 that
2
f (T (x)) ( |T (x)|) dx =
(a,b) T ( b)
f (y ) dy =
T ((a,b))
f (y ) dy
which is again implies Eq. (10.22). On the other hand Eq. (10.29) is more general than Eq. (10.22) since it does not require T to be injective. The standard proof of Eq. (10.29) is as follows. For z T ([a, b]) , let
z
F (z ) :=
T (a)
f (y ) dy.
f (T (x)) T (x) dx =
a a
F (T (x)) T (x) dx =
a T (b)
d [F (T (x))] dx dx
= F (T (x)) |b a =
T (a)
f (y ) dy.
An application of Dynkins multiplicative systems theorem now shows that Eq. (10.29) holds for all bounded measurable functions f on (a, b) . Then by the usual truncation argument, it also holds for all positive measurable functions on (a, b) .
Page: 100 job: prob
f (x)dx =
R2 0
d
0
dr r f (r (cos , sin ))
(10.31)
101
Example 10.25 (Holomorphic Change of Variables). Suppose that f : o C = R2 C is an injective holomorphic function such that f (z ) = 0 for all z . We may express f as f (x + iy ) = U (x, y ) + iV (x, y ) for all z = x + iy . Hence if we make the change of variables, w = u + iv = f (x + iy ) = U (x, y ) + iV (x, y ) then dudv = det Ux Uy Vx Vy dxdy = |Ux Vy Uy Vx | dxdy.
Recalling that U and V satisfy the Cauchy Riemann equations, Ux = Vy and Uy = Vx with f = Ux + iVx , we learn Ux Vy Uy Vx = Therefore
2 Ux
Fig. 10.3. The region consists of the two curved rectangular regions shown.
2 Vx 2
= |f | . Exercise 10.3 (Spherical Coordinates). Let T : (0, ) (0, ) (0, 2 ) R3 be dened by T (r, , ) = (r sin cos , r sin sin , r cos ) = r (sin cos , sin sin , cos ) , see Figure 10.4. By making the change of variables x = T (r, , ) , show
dudv = |f (x + iy )| dxdy. Example 10.26. In this example we will evaluate the integral I :=
x4 y 4 dxdy
where = (x, y ) : 1 < x2 y 2 < 2, 0 < xy < 1 , see Figure 10.3. We are going to do this by making the change of variables, (u, v ) := T (x, y ) = x2 y 2 , xy , in which case dudv = det Notice that 1 ududv. 2 The function T is not injective on but it is injective on each of its connected components. Let D be the connected component in the rst quadrant so that = D D and T (D) = (1, 2) (0, 1) . The change of variables theorem then implies x4 y 4 = x2 y 2 x2 + y 2 = u x2 + y 2 = I :=
D
2x 2y y x
dxdy = 2 x2 + y 2 dxdy
x4 y 4 dxdy =
1 2
ududv =
(1,2)(0,1)
1 u2 2 3 | 1= 2 2 1 4
102
f (x)dx =
R3 0
d
0
d
0
dr r2 sin f (T (r, , ))
Denition 10.28. For E BS d1 , let (E ) := d m(E1 ). We call the surface measure on S d1 . It is easy to check that is a measure. Indeed if E BS d1 , then E1 = 1 ((0, 1] E ) BRd so that m(E1 ) is well dened. Moreover if E = i=1 Ei , then E1 = i=1 (Ei )1 and
for any Borel measurable function, f : R3 [0, ]. Lemma 10.27. Let a > 0 and Id (a) :=
Rd
m ((Ei )1 ) =
i=1 i=1
(Ei ).
The intuition behind this denition is as follows. If E S d1 is a set and > 0 is a small number, then the volume of (1, 1 + ] E = {r : r (1, 1 + ] and E } (10.32) should be approximately given by m ((1, 1 + ] E ) = (E ), see Figure 10.5 below. On the other hand
ea|x| dm(x) =
R2 \{0}
I2 (a) =
0
dr r
0 M M 0
d ear = 2
0
rear dr
2
= 2 lim
re
ar 2
ear dr = 2 lim M 2a
=
0
2 = /a. 2a
Fig. 10.5. Motivating the denition of surface measure for a sphere.
This shows that I2 (a) = /a and the result now follows from Eq. (10.32).
m ((1, 1 + ]E ) = m (E1+ \ E1 ) = (1 + )d 1 m(E1 ). Therefore we expect the area of E should be given by (E ) = lim
0
d1
= {x R : |x| :=
i=1
x2 i = 1}
be the unit sphere in Rd equipped with its Borel algebra, BS d1 and : 1 Rd \ {0} (0, ) S d1 be dened by (x) := (|x| , |x| x). The inverse map, 1 d1 d 1 : (0, ) S R \ {0} , is given by (r, ) = r. Since and 1 are continuous, they are both Borel measurable. For E BS d1 and a > 0, let Ea := {r : r (0, a] and E } = 1 ((0, a] E ) BRd .
Page: 102 job: prob
(1 + )d 1 m(E1 ) = d m(E1 ).
The following theorem is motivated by Example 10.24 and Exercise 10.3. Theorem 10.29 (Polar Coordinates). If f : Rd [0, ] is a (BRd , B ) measurable function then
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
103
f (x)dm(x) =
Rd (0,)S d1
f (r )rd1 drd ( ).
(10.33)
Rd
f dm =
(0,)S d1
f 1
d ( )
which combined with Tonellis Theorem 10.6 proves Eq. (10.35). Corollary 10.30. The surface area (S d1 ) of the unit sphere S d1 Rd is (10.34) (S d1 ) = where is the gamma function given by
f (|x|)dx =
Rd 0
f (r)dV (r)
2 d/2 (d/2)
(10.40)
(x) := f 1
(0,)S d1
ux1 eu du
0
(10.41)
f 1 dm =
d ( m)
(10.35)
Moreover, (1/2) =
and therefore to prove Eq. (10.33) we must work out the measure m on B(0,) BS d1 dened by m(A) := m 1 (A) A B(0,) BS d1 . If A = (a, b] E with 0 < a < b and E BS d1 , then 1 (A) = {r : r (a, b] and E } = bE1 \ aE1 wherein we have used Ea = aE1 in the last equality. Therefore by the basic scaling properties of m and the fundamental theorem of calculus, ( m) ((a, b] E ) = m (bE1 \ aE1 ) = m(bE1 ) m(aE1 )
b
Id (1) =
0
dr rd1 er
d = (S d1 )
S d1 0
rd1 er dr.
(10.36)
We simplify this last integral by making the change of variables u = r2 so that 1/2 du. The result is r = u1/2 and dr = 1 2u
0
rd1 er dr =
1 eu u1/2 du 2 0 d 1 1 = u 2 1 eu du = (d/2). 2 0 2 u
d1 2
(10.42)
rd1 dr.
(10.37)
Combing the the last two equations with Lemma 10.27 which states that Id (1) = d/2 , we conclude that 1 d/2 = Id (1) = (S d1 ) (d/2) 2 which proves Eq. (10.40). Example 8.8 implies (1) = 1 and from Eq. (10.42),
d1
dr J B(0,) ,
(10.38)
(1/2) = 2
Eq. (10.37) may be written as ( m) ((a, b] E ) = ((a, b]) (E ) = ( ) ((a, b] E ) . Since E = {(a, b] E : 0 < a < b and E BS d1 } , is a class (in fact it is an elementary class) such that (E ) = B(0,) BS d1 , it follows from the Theorem and Eq. (10.39) that m = . Using this result in Eq. (10.35) gives
Page: 103 job: prob macro: svmonob.cls
er dr = 0 = I1 (1) = .
er dr
(10.39)
The relation, (x + 1) = x (x) is the consequence of the following integration by parts argument:
(x + 1) =
0
eu ux+1
du = u
ux
0
d u e du du
=x
0
ux1 eu du = x (x).
date/time: 23-Feb-2007/15:20
104
and more generally, x1 = r sin n2 . . . sin 2 sin 1 cos x2 = r sin n2 . . . sin 2 sin 1 sin x3 = r sin n2 . . . sin 2 cos 1 . . . xn2 = r sin n2 sin n3 cos n4 xn1 = r sin n2 cos n3 xn = r cos n2 . By the change of variables formula, f (x)dm(x)
Rn
(10.43)
=
0
dr
0i ,0 2
d1 . . . dn2 d
(10.45)
x1 x2 = x3
T2 (, r sin 1 ) r cos 1
If f is a function on rS n1 the sphere of radius r centered at 0 inside of Rn , then f (x)d (x) = rn1
rS n1 S n1
f (r )d ( )
We continue to work inductively this way to dene x1 . Tn (, 1 , . . . , n2 , r sin n1 , ) . = Tn+1 (, 1 , . . . , n2 , n1 , r). . = r cos n1 xn xn+1 So for example, x1 x2 x3 x4 = r sin 2 sin 1 cos = r sin 2 sin 1 sin = r sin 2 cos 1 = r cos 2
=
0i ,0 2
Proof. We are going to compute n inductively. Letting := r sin n1 Tn n and writing T for (, 1 , . . . , n2 , ) we have n+1 (,1 , . . . , n2 , n1 , r) =
Tn Tn 1
... 0 ...
Tn Tn Tn n2 r cos n1
r sin n1
sin n1 cos n1
Page: 104
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
105
Indeed, 2(k+1)+1 = and 2k + 2 2k + 2 (2k )!! [2(k + 1)]!! 2k+1 = 2 =2 2k + 3 2k + 3 (2k + 1)!! (2(k + 1) + 1)!!
To arrive at this result we have expanded the determinant along the bottom row. Staring with 2 (, r) = r already derived in Example 10.24, Eq. (10.47) implies, 3 (, 1 , r) = r2 (, r sin 1 ) = r2 sin 1 4 (, 1 , 2 , r) = r3 (, 1 , r sin 2 ) = r3 sin2 2 sin 1 . . . n (, 1 , . . . , n2 , r) = rn1 sinn2 n2 . . . sin2 2 sin 1 which proves Eq. (10.45). Equation (10.46) now follows from Eqs. (10.33), (10.44) and (10.45). As a simple application, Eq. (10.46) implies (S n1 ) =
0i ,0 2 n2
2k + 1 2k + 1 (2k 1)!! (2k + 1)!! 2k = = . 2k + 1 2k + 2 (2k )!! (2k + 2)!! The recursion relation in Eq. (10.48) may be written as 2(k+1) = (S n ) = S n1 n1 which combined with S
1
(10.49)
= 2 implies
S 1 = 2, (S 2 ) = 2 1 = 2 2, 1 22 2 (S 3 ) = 2 2 2 = 2 2 = , 2 2!! 22 2 2 23 2 22 2 3 = 2 = (S 4 ) = 2!! 2!! 3 3!! 1 2 31 23 3 5 (S ) = 2 2 2 = , 2 3 42 4!! 1 2 31 42 24 3 (S 6 ) = 2 2 2 2= 2 3 42 53 5!! and more generally that (S 2n ) = 2 (2 ) (2 ) and (S 2n+1 ) = (2n 1)!! (2n)!!
n n n+1
= 2 where k :=
0
(10.48)
(10.50)
k =
0
sink d =
0
sink2 cos2 d
which is veried inductively using Eq. (10.49). Indeed, (S 2n+1 ) = (S 2n )2n = and (S (n+1) ) = (S 2n+2 ) = (S 2n+1 )2n+1 = Using (2n)!! = 2n (2(n 1)) . . . (2 1) = 2n n! (2 ) (2n)!! 2 (2 ) 2 = . (2n)!! (2n + 1)!! (2n + 1)!!
n+1 n+1
= 2k,1 + (k 1)
0
sin
n+1
and hence k satises 0 = , 1 = 2 and the recursion relation k = Hence we may conclude 0 = , 1 = 2, 2 = 1 2 31 42 531 , 3 = 2, 4 = , 5 = 2, 6 = 2 3 42 53 642 k1 k2 for k 2. k
and more generally by induction that 2k = (2k 1)!! (2k )!! and 2k+1 = 2 . (2k )!! (2k + 1)!!
job: prob
we may write (S 2n+1 ) = 2n! which shows that Eqs. (10.33) and (10.50 are in agreement. We may also write the formula in Eq. (10.50) as n/2 2(2) (n1)!! for n even n n+1 (S ) = (2) 2 for n odd. (n1)!!
macro: svmonob.cls date/time: 23-Feb-2007/15:20
n+1
Page: 105
106
10.8 Exercises
Exercise 10.4. Prove Theorem 10.9. Suggestion, to get started dene (A) :=
X1
Exercise 10.8. Folland Problem 2.48 on p. 69. (Counter example related to Fubini Theorem involving counting measures.) d (x1 ) . . .
Xn
d (xn ) 1A (x1 , . . . , xn )
Exercise 10.9. Folland Problem 2.50 on p. 69 pertaining to area under a curve. (Note the M BR should be M BR in this problem.) Exercise 10.10. Folland Problem 2.55 on p. 77. (Explicit integrations.) Exercise 10.11. Folland Problem 2.56 on p. 77. Let f L1 ((0, a), dm), g (x) = a f (t) 1 t dt for x (0, a), show g L ((0, a), dm) and x
a a
and then show Eq. (10.16) holds. Use the case of two factors as the model of your proof. Exercise 10.5. Let (Xj , Mj , j ) for j = 1, 2, 3 be nite measure spaces. Let F : (X1 X2 ) X3 X1 X2 X3 be dened by F ((x1 , x2 ), x3 ) = (x1 , x2 , x3 ).
g (x)dx =
0 0
f (t)dt.
1. Show F is ((M1 M2 ) M3 , M1 M2 M3 ) measurable and F 1 is (M1 M2 M3 , (M1 M2 ) M3 ) measurable. That is F : ((X1 X2 ) X3 , (M1 M2 ) M3 ) (X1 X2 X3 , M1 M2 M3 ) is a measure theoretic isomorphism. 2. Let := F [(1 2 ) 3 ] , i.e. (A) = [(1 2 ) 3 ] (F 1 (A)) for all A M1 M2 M3 . Then is the unique measure on M1 M2 M3 such that (A1 A2 A3 ) = 1 (A1 )2 (A2 )3 (A3 ) for all Ai Mi . We will write := 1 2 3 . 3. Let f : X1 X2 X3 [0, ] be a (M1 M2 M3 , BR ) measurable function. Verify the identity, f d =
X1 X2 X3 X3
x x dm(x) = . So sin / L1 ([0, ), m) and Exercise 10.12. Show 0 sin x x sin x x dm(x) is not dened as a Lebesgue integral. 0
Exercise 10.13. Folland Problem 2.57 on p. 77. Exercise 10.14. Folland Problem 2.58 on p. 77. Exercise 10.15. Folland Problem 2.60 on p. 77. Properties of the function. Exercise 10.16. Folland Problem 2.61 on p. 77. Fractional integration. Exercise 10.17. Folland Problem 2.62 on p. 80. Rotation invariance of surface measure on S n1 . Exercise 10.18. Folland Problem 2.64 on p. 80. On the integrability of a b |x| |log |x|| for x near 0 and x near in Rn . Exercise 10.19. Show, using Problem 10.17 that i j d ( ) =
S d1
d3 (x3 )
X2
d2 (x2 )
X1
d1 (x1 )f (x1 , x2 , x3 ),
makes sense and is correct. 4. (Optional.) Also show the above identity holds for any one of the six possible orderings of the iterated integrals. Exercise 10.6. Prove the second assertion of Theorem 10.19. That is show md is the unique translation invariant measure on BRd such that md ((0, 1]d ) = 1. Hint: Look at the proof of Theorem 5.22. Exercise 10.7. (Part of Folland Problem 2.46 on p. 69.) Let X = [0, 1], M = B[0,1] be the Borel eld on X, m be Lebesgue measure on [0, 1] and be counting measure, (A) = #(A). Finally let D = {(x, x) X 2 : x X } be the diagonal in X 2 . Show 1D (x, y )d (y ) dm(x) =
X X X X
1 ij S d1 . d
Hint: show
S d1
S d1
2 i d ( ) =
1 d
d S d1 2 j d ( ) .
j =1
1D (x, y )dm(x) d (y )
11 Lp spaces
Let (, B , ) be a measure space and for 0 < p < and a measurable function f : C let
1/p
:=
|f | d
(11.1)
Denition 11.3. 1. {fn } is a.e. Cauchy if there is a set E B such that (E ) = 0 and{1E c fn } is a pointwise Cauchy sequences. 2. {fn } is Cauchy in measure (or L0 Cauchy) if limm,n (|fn fm | > ) = 0 for all > 0. 3. {fn } is Cauchy in Lp if limm,n fn fm p = 0. When is a probability measure, we describe, fn f as fn converging to f in probability. If a sequence {fn }n=1 is Lp convergent, then it is Lp Cauchy. For example, when p [1, ] and fn f in Lp , we have fn fm
p
(11.2)
fn f
+ f fm
0 as m, n .
Lp (, B , ) = {f : C : f is measurable and f
< }/
The case where p = 0 will be handled in Theorem 11.7 below. Lemma 11.4 (Lp convergence implies convergence in probability). Let p [1, ). If {fn } Lp is Lp convergent (Cauchy) then {fn } is also convergent (Cauchy) in measure. Proof. By Chebyshevs inequality (8.3), (|f | ) = (|f | p ) and therefore if {fn } is Lp Cauchy, then 1 fn fm p p 0 as m, n p showing {fn } is L0 Cauchy. A similar argument holds for the Lp convergent case. (|fn fm | )
p
where f g i f = g a.e. Notice that f g p = 0 i f g and if f g then f p = g p . In general we will (by abuse of notation) use f to denote both the function f and the equivalence class containing f. Remark 11.1. Suppose that f M, then for all a > M, (|f | > a) = 0 and therefore (|f | > M ) = limn (|f | > M + 1/n) = 0, i.e. |f ( )| M for a.e. . Conversely, if |f | M a.e. and a > M then (|f | > a) = 0 and hence f M. This leads to the identity: f
1 p
|f | d =
1 f p
p p
0 in L1 , fn 0.
108
11 Lp spaces
0 a.e., and
0 in L . or in
Theorem 11.5 (Egoro s Theorem: almost sure convergence implies convergence in probability). Suppose ( ) = 1 and fn f a.s. Then for all > 0 there exists E = E B such that (E ) < and fn f uniformly on E c . In particular fn f as n . Proof. Let fn f a.e. Then for all > 0, 0 = ({|fn f | > i.o. n}) = lim
N nN
(11.3)
{|fn f | > }
0 in L1 .
from which it follows that fn f as n . To get the uniform convergence o a small exceptional set, the equality in Eq. (11.3) allows us to choose an increasing sequence {Nk }k=1 , such that, if Ek :=
nNk
|fn f | >
1 k
k The set, E := = . k=1 Ek , then satises the estimate, (E ) < k 2 1 Moreover, for / E, we have |fn ( ) f ( )| k for all n Nk and all k. That is fn f uniformly on E c .
Page: 108
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
109
n < . Then
2. Suppose fn f, > 0 and m, n N and are such that |fn ( ) fm ( )| > . Then < |fn ( ) fm ( )| |fn ( ) f ( )| + |f ( ) fm ( )| from which it follows that either |fn ( ) f ( )| > /2 or |f ( ) fm ( )| > /2. Therefore we have shown,
k .
|am an | =
k=n
(ak+1 ak )
k=n
|ak+1 ak |
k=n
k := n .
(11.4)
{|fn fm | > } {|fn f | > /2} {|fm f | > /2} and hence (|fn fm | > ) (|fn f | > /2)+ (|fm f | > /2) 0 as m, n . 3. Suppose {fn } is L0 () Cauchy and let n > 0 such that (n = 2n would do) and set n = subsequence of N such that ({|gj +1 gj | > j }) j . Let FN := j N {|gj +1 gj | > j } and E := N =1 FN = {|gj +1 gj | > j i.o.} and observe that (FN ) N < . Since
k=n
So |am an | min(m,n) 0 as , m, n , i.e. {an } is Cauchy. Let m in (11.4) to nd |a an | n . Theorem 11.7. Let (, B , ) be a measure space and {fn }n=1 be a sequence of measurable functions on . 1. If f and g are measurable functions and fn f and fn g then f = g a.e. 2. If fn f then {fn }n=1 is Cauchy in measure. 3. If {fn }n=1 is Cauchy in measure, there exists a measurable function, f, and a subsequence gj = fnj of {fn } such that limj gj := f exists a.e. 4. If {fn }n=1 is Cauchy in measure and f is as in item 3. then fn f. 5. Let us now further assume that ( ) < . In this case, a sequence of func tions, {fn }n=1 converges to f in probability i every subsequence, {fn }n=1 of {fn }n=1 has a further subsequence, {fn }n=1 , which is almost surely convergent to f. Proof. 1. Suppose that f and g are measurable functions such that fn g and fn f as n and > 0 is given. Since {|f g | > } = {|f fn + fn g | > } {|f fn | + |fn g | > } {|f fn | > /2} {|g fn | > /2} , (|f g | > ) (|f fn | > /2) + (|g fn | > /2) 0 as n . Hence (|f g | > 0) = n=1 |f g | > i.e. f = g a.e. 1 n
n <
n=1
({|gj +1 gj | > j })
j =1 j =1
j < ,
For / E, |gj +1 ( ) gj ( )| j for a.a. j and so by Lemma 11.6, f ( ) := lim gj ( ) exists. For E we may dene f ( ) 0.
|f g | >
1 n
= 0, then
c FN = j N {|gj +1 gj | j } ,
|gj +1 ( ) gj ( )| j for all j N. Another application of Lemma 11.6 shows |f ( ) gj ( )| j for all j N, i.e.
Page: 109
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
110
11 Lp spaces
c FN j N { : |f ( ) gj ( )| j } .
Taking complements of this equation shows {|f gN | > N } j N {|f gj | > j } FN . and therefore, (|f gN | > N ) (FN ) N 0 as N and in particular, gN f as N . With this in hand, it is straightforward to show fn f. Indeed, since {|fn f | > } = {|f gj + gj fn | > } {|f gj | + |gj fn | > } {|f gj | > /2} {|gj fn | > /2}, we have ({|fn f | > }) ({|f gj | > /2}) + (|gj fn | > /2). Therefore, letting j in this inequality gives, ({|fn f | > }) lim sup (|gj fn | > /2) 0 as n
j
Proof. First notice that |f | g a.e. and hence f L1 since g L1 . To see that |f | g, use Theorem 11.7 to nd subsequences {fnk } and {gnk } of {fn } and {gn } respectively which are almost everywhere convergent. Then |f | = lim |fnk | lim gnk = g a.e.
k k
(11.5)
Using Theorem 11.7 again, we may assume (by passing to a further subsequences if necessary) that fnk f and gnk g almost everywhere. Noting, |f fnk | g + gnk 2g and (g + gnk ) 2g, an application of the dominated convergence Theorem 8.34 implies limk |f fnk | = 0 which contradicts Eq. (11.5). Exercise 11.1 (Fatous Lemma). Let (, B , ) be a measure space. If fn 0 and fn f in measure, then f d lim inf n fn d. Exercise 11.2. Let (, B , ) be a measure space, p [1, ), {fn } Lp () p p and f Lp () . Then fn f in Lp () i fn f and |fn | |f | . Solution to Exercise (11.2). By the triangle inequality, f fn
p
because {fn }n=1 was Cauchy in measure. 5. If {fn }n=1 is convergent and hence Cauchy in probability then any subse quence, {fn }n=1 is also Cauchy in probability. Hence by item 3. there is a further subsequence, {fn }n=1 of {fn }n=1 which is convergent almost surely. Conversely if {fn }n=1 does not converge to f in probability, then there exists an > 0 and a subsequence, {nk } such that inf k (|f fnk | ) > 0. Any subsequence of {fnk } would have the same property and hence can not be almost surely convergent because of Theorem 11.5.
fn
which shows
|fn |
|f | if fn f in L . Moreover Chebyschevs
inequality implies fn f if fn f in Lp . p p p For the converse, let Fn := |f fn | and Gn := 2p1 [|f | + |fn | ] . Then p Fn 0, Fn Gn L1 , and Gn G where G := 2p |f | L1 . Therefore, p by Corollary 11.8, |f fn | = Fn 0 = 0. Corollary 11.9. Suppose (, B , ) is a probability space, fn f and gn g and : R R and : R2 R are continuous functions. Then 1. (fn ) (f ) , 2. (fn , gn ) (f, g ) , 3. fn + gn f + g, and 4. fn gn f g. Proof. Item 1., 3. and 4. all follow from item 2. by taking (x, y ) = (x) , (x, y ) = x + y, and (x, y ) = x y respectively. So it suces to prove item 2. To do this we will make repeated use of Theorem 11.7.
Corollary 11.8 (Dominated Convergence Theorem). Let (, B , ) be a measure space. Suppose {fn } , {gn } , and g are in L1 and f L0 are functions such that |fn | gn a.e., fn f, gn g, and Then f L1 and limn f fn limn fn = f.
Page: 110
1
gn
g as n .
= 0, i.e. fn f in L1 . In particular
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
111
Given a subsequence, {nk } , of N there is a subsequence, {nk } of {nk } such that fnk f a.s. and yet a further subsequence {nk } of {nk } such that gnk g a.s. Hence, by the continuity of , it now follows that
k
f d
|f | d
|f | d.
n 1 i=1 pi
= (f, g ) a.s.
= 1,
ln si i
i=1
1 ln sp i e i = pi
i=1
i sp i . pi
(11.7)
1 Indeed, we have applied Eq. (11.6) with = {1, 2, . . . , n} , = i=1 p i and i pi f (i) := ln si . As a special case of Eq. (11.7), suppose that s, t, p, q (1, ) p 1 1 with q = p 1 (i.e. p + q = 1) then
st
1 p 1 q s + t . p q
(11.8)
f d
(f )d
where if f / L1 (), then f is integrable in the extended sense and ( f ) d = . Proof. Let t = f d (a, b) and let R ( = (t) when (t) exists), be such that (s) (t) (s t) for all s (a, b). (See Lemma 7.31) and Figure 7.2 when is C 1 and Theorem 11.38 below for the existence of such a in the general case.) Then integrating the inequality, (f ) (t) (f t), implies that 0
(When p = q = 1/2, the inequality in Eq. (11.8) follows from the inequality, 2 0 (s t) .) 1/n As another special case of Eq. (11.7), take pi = n and si = ai with ai > 0, then we get the arithmetic geometric mean inequality, n a1 . . . an 1 n
n
ai .
i=1
(11.9)
Theorem 11.12 (H olders inequality). Suppose that 1 p and q := p 1 , or equivalently p + q 1 = 1. If f and g are measurable functions then p1 fg
1
g q.
(11.10)
p
(f )d (t) =
(f )d (
f d).
Assuming p (1, ) and f p g q < , equality holds in Eq. (11.10) i |f | q and |g | are linearly dependent as elements of L1 which happens i |g |q f
p p
Moreover, if (f ) is not integrable, then (f ) (t) + (f t) which shows that negative part of (f ) is integrable. Therefore, (f )d = in this case. Example 11.11. Since ex for x R, ln x for x > 0, and xp for x 0 and p 1 are all convex functions, we have the following inequalities exp
= g
q q
|f | a.e.
(11.11)
f d
ef d, |f | d
(11.6)
Proof. The cases p = 1 and q = or p = and q = 1 are easy to deal with and will be left to the reader. So we now assume that p, q (1, ) . If f q = 0 or or g p = 0 or , Eq. (11.10) is again easily veried. So we will now assume that 0 < f q , g p < . Taking s = |f | / f p and t = |g |/ g q in Eq. (11.8) gives, p |f g | 1 |f | 1 |g |q + (11.12) f p g q p f p q g q with equality i |g/ g q | = |f | / f p p g q q |f | . Integrating Eq. (11.12) implies
macro: svmonob.cls
p1 (p1)
= |f |
p/q
/ f
p/q p ,
i.e. |g |q f
p p
and for p 1,
Page: 111 job: prob
date/time: 23-Feb-2007/15:20
112
11 Lp spaces
fg f
p
1 1 + =1 p q
with equality i Eq. (11.11) holds. The proof is nished since it is easily checked p q q p that equality holds in Eq. (11.10) when |f | = c |g | of |g | = c |f | for some constant c. Example 11.13. Suppose that ak C for k = 1, 2, . . . , n and p [1, ), then
n p n
Theorem 11.15 (Minkowskis Inequality). If 1 p and f, g Lp then f + g p f p + g p. (11.15) Proof. When p = , |f | f a.e. and |g | g |f | + |g | f + g a.e. and therefore f +g When p < , |f + g | (2 max (|f | , |g |)) = 2p max (|f | , |g | ) 2p (|f | + |g | ) , which implies1 f + g Lp since f +g
p p p p p p p p
a.e. so that |f + g |
+ g
ak
k=1
np1
k=1
|ak | .
(11.13)
Indeed, by H olders inequality applied using the measure space, {1, 2, . . . , n} equipped with counting measure, we have
n n n 1/p n 1/q n 1/p
ak =
k=1 k=1 p p1 .
ak 1
k=1
|ak |
p k=1
=n
1/q k=1
|ak |
2p
p p
+ g
p p
< .
where q =
Taking the pth power of this inequality then gives, Eq. (11.14).
|f + g |d
|f | d +
|g |d = f
+ g 1.
p
Theorem 11.14 (Generalized H olders inequality). Suppose that fi : C are measurable functions for i = 1, . . . , n and p1 , . . . , pn and r are positive n 1 numbers such that i=1 p = r1 , then i
n n
We now consider p (1, ) . We may assume f + g p , f p and g all positive since otherwise the theorem is easily veried. Integrating |f + g |p = |f + g ||f + g |p1 (|f | + |g |)|f + g |p1 and then applying Holders inequality with q = p/(p 1) gives |f + g |p d
are
fi
i=1 r
i=1
fi
pi
(11.14)
Proof. One may prove this theorem by induction based on H olders Theorem 11.12 above. Alternatively we may give a proof along the lines of the proof of Theorem 11.12 which is what we will do here. Since Eq. (11.14) is easily seen to hold if fi pi = 0 for some i, we will n ri = 1, hence we may assume that fi pi > 0 for all i. By assumption, i=1 p i replace si by sr and p by p /r for each i in Eq. (11.7) to nd i i i
n
|f | |f + g |p1 d +
p
|g | |f + g |p1 d
p1 q,
( f where |f + g |p1
q q
+ g p ) |f + g |
(11.16)
sr 1
. . . sr n
pi r
i=1
i (sr i) pi /r
p /r
=r
i=1
i sp i . pi
(|f + g |p1 )q d =
|f + g |p d = f + g p p.
(11.17)
f +g
p
p/q p
+ g
f +g
p/q p
(11.18)
fi
fi
pi i=1 r
r
i=1
1 1 i pi fi p pi
|fi | i d =
i=1
r = 1. pi
In light of Example 11.13, the last 2p in the above inequality may be replaced by 2p1 .
Page: 112
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
113
gj f
p p
k k
|gj gk |p d
+ gj
fn gj
+ gj f
See Proposition 12.5 for an important example of the use of this theorem.
= 0 for all n Nk .
Then (E ) = 0 and for x E , |f (x) fn (x)| k for all n Nk . This shows that fn f uniformly on E c . Conversely, if there exists E B such that (E ) = 0 and fn f uniformly on E c , then for any > 0, (|f fn | ) = ({|f fn | } E c ) = 0 for all n suciently large. That is to say lim sup f fn
j
The density of simple functions follows from the approximation Theorem 6.34. So the last item to prove is the completeness of L . Suppose m,n := fm fn 0 as m, n . Let Em,n = {|fn fm | > m,n } and E := Em,n , then (E ) = 0 and sup |fm (x) fn (x)| m,n 0 as m, n .
xE c
Therefore, f := limn fn exists on E c and the limit is uniform on E c . Letting f = limn 1E c fn , it then follows that limn fn f = 0. Theorem 11.17 (Completeness of L ()). For 1 p , L () equipped with the Lp norm, p (see Eq. (11.1)), is a Banach space. Proof. By Minkowskis Theorem 11.15, p satises the triangle inequality. As above the reader may easily check the remaining conditions that ensure p is a norm. So we are left to prove the completeness of Lp () for 1 p < , the case p = being done in Theorem 11.16. Let {fn }n=1 Lp () be a Cauchy sequence. By Chebyshevs inequality (Lemma 11.4), {fn } is L0 -Cauchy (i.e. Cauchy in measure) and by Theorem 11.7 there exists a subsequence {gj } of {fn } such that gj f a.e. By Fatous Lemma,
Page: 113 job: prob
p p
= f 1
= ( )1/a f
= ( )( p q ) f
q.
The reader may easily check this nal formula is correct even when q = provided we interpret 1/p 1/ to be 1/p.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
114
11 Lp spaces
1 Since n we have
1/p
= M. For p = q < 0,
lim
= lim
ap i
i=1
Conclusion. If we extend the denition of a p to p = and p = p a (0, ) is a by a = maxi ai and a = mini ai , then R p continuous non-decreasing function of p. Proposition 11.20. Suppose that 0 < p0 < p1 , (0, 1) and p (p0 , p1 ) be dened by 1 1 = + (11.19) p p0 p1 with the interpretation that /p1 = 0 if p1 = .2 Then Lp Lp0 + Lp1 , i.e. every function f Lp may be written as f = g + h with g Lp0 and h Lp1 . For 1 p0 < p1 and f Lp0 + Lp1 let f := inf g
p0
1/q
1/q = 1 a
1 q
:=
1 := (1/a1 , . . . , 1/an ) . So for p < 0, as p increases, q = p decreases, so where a 1 1 1 that a q is decreasing and hence a is increasing. Hence we have shown q that p a p is increasing for p R \ {0} . We now claim that limp0 a p = n a1 . . . an . To prove this, write ap i = 2 p ln ai = 1 + p ln ai + O p for p near zero. Therefore, e
+ h
p1
:f =g+h .
ap i =1+p
i=1
1 n
Then (Lp0 + Lp1 , ) is a Banach space and the inclusion map from Lp to Lp0 + Lp1 is bounded; in fact f 2 f p for all f Lp . Proof. Let M > 0, then the local singularities of f are contained in the set E := {|f | > M } and the behavior of f at innity is solely determined by f on E c . Hence let g = f 1E and h = f 1E c so that f = g + h. By our earlier discussion we expect that g Lp0 and h Lp1 and this is the case since,
ln ai + O p2 .
i=1
p0
p0
Pn
1 n
1/p
ap i
i=1 ln ai
= lim n n
p0
1 1+p n
1/p
ln ai + O p
i=1
p0 p0
|f |
p0
1|f |>M = M p0 f M
p
f M
p0
1|f |>M
p p
i=1
a1 . . . an . and h
p1 p1
M p0
1|f |>M M p0 p f
<
So if we now dene a 0 := a1 . . . an , the map p R a p (0, ) is continuous and increasing in p. We will now show that limp a p = maxi ai =: M and limp a p = mini ai =: m. Indeed, for p > 0, 1 1 p M n n and therefore, 1 n
Page: 114
1/p n
= f 1|f |M M p1
p1 p1
|f |
p1
1|f |M = M p1
p p
f M < .
p1
1|f |M
ap i
i=1
p
2
f M
1|f |M M p1 p f
M a
M.
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
115
+ M 1p /p1 f
p /p1 p
Corollary 11.23. Suppose now that is counting measure on . Then Lp () Lq () for all 0 < p < q and f q f p . Proof. Suppose that 0 < p < q = , then f
p p p p p
then gives f
1p /p0
|f (x)| = f
1p /p1
and then taking = 1 shows f 2 f p . The proof that (Lp0 + Lp1 , ) is a Banach space is left as Exercise 11.6 to the reader. Corollary 11.21 (Interpolation of L norms). Suppose that 0 < p0 < p1 , (0, 1) and p (p0 , p1 ) be dened as in Eq. (11.19), then Lp0 Lp1 Lp and 1 (11.20) f p f p0 f p1 . Further assume 1 p0 < p < p1 , and for f Lp0 Lp1 let f := f
p0 p
i.e. f f p for all 0 < p < . For 0 < p q , apply Corollary 11.21 with p0 = p and p1 = to nd f
q
p/q p
1p/q
p/q p
1p/q p
= f
11.4.1 Summary: Lp0 Lp1 Lq Lp0 + Lp1 for any q (p0 , p1 ). If p q, then p q and f q f p . p Since (|f | > ) p f p , Lp convergence implies L0 convergence. L0 convergence implies almost everywhere convergence for some subsequence. 5. If ( ) < then almost everywhere convergence implies uniform convergence o certain sets of small measure and in particular we have L0 convergence. 6. If ( ) < , then Lq Lp for all p q and Lq convergence implies Lp convergence. 1. 2. 3. 4.
+ f
p1
Then (Lp0 Lp1 , ) is a Banach space and the inclusion map of Lp0 Lp1 into Lp is bounded, in fact f
p
max 1 , (1 )1
p0
+ f
p1
(11.21)
The heuristic explanation of this corollary is that if f Lp0 Lp1 , then f has local singularities no worse than an Lp1 function and behavior at innity no worse than an Lp0 function. Hence f Lp for any p between p0 and p1 . Proof. Let be determined as above, a = p0 / and b = p1 /(1 ), then by Theorem 11.14, f
p
= |f | |f |
1 p
|f |
|f |
1 b
= f
p0
1 p1
It is easily checked that is a norm on Lp0 Lp1 . To show this space is complete, suppose that {fn } Lp0 Lp1 is a Cauchy sequence. Then {fn } is both Lp0 and Lp1 Cauchy. Hence there exist f Lp0 and g Lp1 such that limn f fn p0 = 0 and limn g fn p = 0. By Chebyshevs inequality (Lemma 11.4) fn f and fn g in measure and therefore by Theorem 11.7, f = g a.e. It now is clear that limn f fn = 0. The estimate in Eq. (11.21) is left as Exercise 11.5 to the reader. Remark 11.22. Combining Proposition 11.20 and Corollary 11.21 gives L
p0
f d.
f d.
p1
p0
+L
p1
116
11 Lp spaces
(11.22)
i=1
|ci | (E Bi ) + g
i=1
|ci | (E ) + /2.
n i=1
The condition in Eq. (11.22) implies supf f 1 < .3 Indeed, choose a suciently large so that supf (|f | : |f | a) 1, then for f f
1
|ci |)
Proposition 11.29. A subset L1 () is uniformly integrable i L1 () is bounded is uniformly absolutely continuous. Proof. ( = ) We have already seen that uniformly integrable subsets, , are bounded in L1 () . Moreover, for f , and E B , (|f | : E ) = (|f | : |f | M, E ) + (|f | : |f | < M, E ) sup (|f | : |f | M ) + M (E ).
n
Let us also note that if = {f } with f L1 () , then is uniformly integrable. Indeed, lima (|f | : |f | a) = 0 by the dominated convergence theorem. Denition 11.26. A collection of functions, L1 () is said to be uniformly absolutely continuous if for all > 0 there exists > 0 such that sup (|f | : E ) < whenever (E ) < .
f
(11.23)
Remark 11.27. It is not in general true that if {fn } L1 () is uniformly absolutely continuous implies supn fn 1 < . For example take = {} and ({}) = 1. Let fn () = n. Since for < 1 a set E such that (E ) < is in fact the empty set and hence {fn }n=1 is uniformly absolutely continuous. However, for nite measure spaces without atoms, for every > 0 we may k nd a nite partition of by sets {E } =1 with (E ) < . If Eq. (11.23) holds with = 1, then
k
So given > 0 choose M so large that supf (|f | : |f | M ) < /2 and then take = 2M to verify that is uniformly absolutely continuous. (=) Let K := supf f 1 < . Then for f , we have (|f | a) f
1
Hence given > 0 and > 0 as in the denition of uniform absolute continuity, we may choose a = K/ in which case sup (|f | : |f | a) < .
f
(|fn |) =
=1
(|fn | : E ) k
showing that (|fn |) k for all n. Lemma 11.28 (This lemma may be skipped.). For any g L1 (), = {g } is uniformly absolutely continuous. Proof. First Proof. If the Lemma is false, there would exist > 0 and sets En such that (En ) 0 while (|g | : En ) for all n. Since |1En g | |g | L1 and for any > 0, (1En |g | > ) (En ) 0 as n , the dominated convergence theorem of Corollary 11.8 implies limn (|g | : En ) = 0. This contradicts (|g | : En ) for all n and the proof is complete. n Second Proof. Let = i=1 ci 1Bi be a simple function such that g 1 < /2. Then
3
Since > 0 was arbitrary, it follows that lima supf (|f | : |f | a) = 0 as desired. Corollary 11.30. Suppose {fn }n=1 and {gn }n=1 are two uniformly integrable sequences, then {fn + gn }n=1 is also uniformly integrable. Proof. By Proposition 11.29, {fn }n=1 and {gn }n=1 are both bounded in L1 () and are both uniformly absolutely continuous. Since fn + gn 1 fn 1 + gn 1 it follows that {fn + gn }n=1 is bounded in L1 () as well. Moreover, for > 0 we may choose > 0 such that (|fn | : E ) < and (|gn | : E ) < whenever (E ) < . For this choice of and , we then have (|fn + gn | : E ) (|fn | + |gn | : E ) < 2 whenever (E ) < , showing {fn + gn }n=1 uniformly absolutely continuous. Another application of Proposition 11.29 completes the proof.
This is not necessarily the case if ( ) = . Indeed, if = R and = m is Lebesgue measure, the sequences of functions, fn := 1[n,n] n=1 are uniformly integrable but not bounded in L1 (m) .
Page: 116
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
117
Exercise 11.3 (Problem 5 on p. 196 of Resnick.). Suppose that n is a sequence of integrable and i.i.d random variables. Then S n n=1 is uniformly integrable. Theorem 11.31 (Vitali Convergence Theorem). Let (, B , ) be a nite measure space, := {fn }n=1 be a sequence of functions in L1 () , and f : C be a measurable function. Then f L1 () and f fn 1 0 as n i fn f in measure and is uniformly integrable. Proof. (=) If fn f in measure and = is uniformly integrable then we know M := supn fn 1 < . Hence and application of Fatous lemma, see Exercise 11.1, |f | d lim inf
n {fn }n=1
{Xn }n=1
(11.25) where gN = |f | + |fn | L . Given > 0 x N large so that N < /2 and then choose > 0 (by Lemma 11.28) such that (gN : E ) < if (E ) < . It then follows from Eq. (11.25) that sup (|fn | : E ) < /2 + /2 = when (E ) < .
n
|fn | d M < ,
Example 11.32. Let = [0, 1] , B = B[0,1] and P = m be Lebesgue measure on B . Then the collection of functions, f (x) := 2 (1 x/) 0 for (0, 1) is bounded in L1 (P ) , f 0 a.e. as 0 but 0=
0
i.e. f L1 (). One now easily checks that 0 := {f fn }n=1 is bounded in L1 () and (using Lemma 11.28 and Proposition 11.29) 0 is uniformly absolutely continuous and hence 0 is uniformly integrable. Therefore, f fn
1
lim f dP = lim
0
f dP = 1.
This is a typical example of a bounded and pointwise convergent sequence in L1 which is not uniformly integrable. Example 11.33. Let = [0, 1] , P be Lebesgue measure on B = B[0,1] , and for (0, 1) let a > 0 with lim0 a = and let f := a 1[0,] . Then Ef = a and so sup>0 f 1 =: K < i a K for all . Since sup E [f : f M ] = sup [a 1a M ] ,
1|f fn |<a |f fn | d
(11.24)
Since 1|f fn |<a |f fn | a L1 () and 1|f fn |<a |f fn | > (|f fn | > ) 0 as n , we may pass to the limit in Eq. (11.24), with the aid of the dominated convergence theorem (see Corollary 11.8), to nd lim sup f fn
n 1
if {f } is uniformly integrable and > 0 is given, for large M we have a for small enough so that a M. From this we conclude that lim sup0 (a ) and since > 0 was arbitrary, lim0 a = 0 if {f } is uniformly integrable. By reversing these steps one sees the converse is also true. Alternatively. No matter how a > 0 is chosen, lim0 f = 0 a.s.. So from Theorem 11.31, if {f } is uniformly integrable we would have to have lim (a ) = lim Ef = E0 = 0.
0 0
(a) 0 as a .
( = ) If fn f in L1 () , then by Chebyschevs inequality it follows that fn f in measure. Since convergent sequences are bounded, to show is uniformly integrable it suces to shows is uniformly absolutely continuous. Now for E B and n N, (|fn | : E ) (|f fn | : E ) + (|f | : E ) f fn Let N := supn>N f fn
Page: 117
1 1
Corollary 11.34. Let (, B , ) be a nite measure space, p [1, ), {fn }n=1 be a sequence of functions in Lp () , and f : C be a measurable function. Then f Lp () and f fn p 0 as n i fn f in measure and p := {|fn | }n=1 is uniformly integrable. Proof. ( = ) Suppose that fn f in measure and := {|fn | }n=1 p p is uniformly integrable. By Corollary 11.9, |fn | |f | in measure, and p p p p hn := |f fn | 0, and by Theorem 11.31, |f | L1 () and |fn | |f | in 1 L () . Since
macro: svmonob.cls date/time: 23-Feb-2007/15:20
p
+ (|f | : E ).
, then N 0 as N and
job: prob
118
11 Lp spaces
hn := |f fn | (|f | + |fn |) 2p1 (|f | + |fn | ) =: gn L1 () with gn g := 2p1 |f | in L1 () , the dominated convergence theorem in Corollary 11.8, implies f fn
p p p
Proof. 1. Let be as in item 1. above and set a := supxa a by assumption. Then for f (|f | : |f | a) = |f | (|f |) : |f | a (|f |)
x (x)
0 as
( (|f |) : |f | a)a
|f fn | d =
hn d 0 as n . and hence
(=) Suppose f Lp and fn f in Lp . Again fn f in measure by Lemma 11.4. Let hn := ||fn |p |f |p | |fn |p + |f |p =: gn L1 and g := 2|f |p L1 . Then gn g, hn 0 and gn d gd. Therefore by the dominated convergence theorem in Corollary 11.8, lim hn d = 0,
n
a f
(n + 1) an <
n=0
i.e. |fn | |f | in L () . Hence it follows from Theorem 11.31 that is uniformly integrable. The following Lemma gives a concrete necessary and sucient conditions for verifying a sequence of functions is uniformly integrable. Lemma 11.35. Suppose that ( ) < , and L ( ) is a collection of functions. 1. If there exists a non decreasing function : R+ R+ such that limx (x)/x = and K := sup ((|f |)) <
f 0
(x) =
n=0
i.e. (x) =
0
(y )dy =
n=0
(n + 1) (x an+1 x an ) .
(11.26)
By construction is continuous, (0) = 0, (x) is increasing (so is convex) and (x) (n + 1) for x an . In particular (x) (an ) + (n + 1)x n + 1 for x an x x from which we conclude limx (x)/x = . We also have (x) (n + 1) on [0, an+1 ] and therefore (x) (n + 1)x for x an+1 . So for f ,
then is uniformly integrable. 2. Conversely if is uniformly integrable, there exists a non-decreasing continuous function : R+ R+ such that (0) = 0, limx (x)/x = and Eq. (11.26) is valid.
4
Here is an alternative proof. By the mean value theorem, ||f |p |fn |p | p(max(|f | , |fn |))p1 ||f | |fn || p(|f | + |fn |)p1 ||f | |fn || and therefore by H olders inequality,
||f |p |fn |p | d p
((|f |)) =
(|f | + |fn |)p1 |f fn | d
p/q p
p f fn p( f
p
q p
= p |f | + |fn |
f fn
+ fn
f fn
||f |p |fn |p | d 0 as n .
n=0
(n + 1) |f | 1|f |an
n=0
(n + 1) an
Page: 118
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
119
(n + 1) an < .
n=0
11.6 Exercises
Exercise 11.4. Let f Lp L for some p < . Show f = limq f q . If we further assume (X ) < , show f = limq f q for all measurable functions f : X C. In particular, f L i limq f q < . Hints: Use Corollary 11.21 to show lim supq f q f and to show lim inf q f q f , let M < f and make use of Chebyshevs inequality. Exercise 11.5. Prove Eq. (11.21) in Corollary 11.21. (Part of Folland 6.3 on p. 186.) Hint: Use the inequality, with a, b 1 with a1 + b1 = 1 chosen appropriately, sa tb st + a b applied to the right side of Eq. (11.20). Exercise 11.6. Complete the proof of Proposition 11.20 by showing (Lp + Lr , ) is a Banach space.
Fig. 11.1. A convex function with three cords. Notice the slope relationships; m1 m3 m2 .
1. F (x, y ) is increasing in each of its arguments. 2. The following limits exist, + (x) := F (x, x+) := lim F (x, y ) < and
y x
(11.27) (11.28)
3. The functions, are both increasing functions and further satisfy, < (x) + (x) (y ) < a < x < y < b. 4. For any t (x) , + (x) , (y ) (x) + t (y x) for all x, y (a, b) . 5. For a < < < b, let K := max + () , ( ) . Then (11.30) (11.29)
| (y ) (x)| K |y x| for all x, y [, ] . That is is Lipschitz continuous on [, ] . 6. The function + is right continuous and is left continuous. 7. The set of discontinuity points for + and for are the same as the set of points of non-dierentiability of . Moreover this set is at most countable.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
The same formula would dene F (x, y ) for x = y. However, since F (x, y ) = F (y, x) , we would gain no new information by this extension.
Page: 119
job: prob
120
11 Lp spaces
Proof. 1. and 2. If we let ht = t(x1 ) + (1 t)(x0 ), then (xt , ht ) is on the line segment joining (x0 , (x0 )) to (x1 , (x1 )) and the statement that is convex is then equivalent of (xt ) ht for all 0 t 1. Since (x1 ) (x0 ) (x1 ) ht ht (x0 ) = = , xt x0 x1 x0 x1 xt the convexity of is equivalent to ht (x0 ) (x1 ) (x0 ) (xt ) (x0 ) = for all x0 xt x1 xt x0 xt x0 x1 x0 and to (x1 ) ht (x1 ) (xt ) (x1 ) (x0 ) = for all x0 xt x1 x1 x0 x1 xt x1 xt and convexity also implies (xt ) (x0 ) ht (x0 ) (x1 ) ht (x1 ) (xt ) = = . xt x0 xt x0 x1 xt x1 xt These inequalities may be written more compactly as, (v ) (u) (w) (u) (w) (v ) , vu wu wv (11.31)
(x) (y ) xy
(y ) (x) t (x y ) = (x) + t (y x) for y x. Hence we have proved Eq. (11.30) for all x, y (a, b) . 5. For a < x < y < b, we have + () + (x) = F (x, x+) F (x, y ) F (y , y ) = (y ) ( ) (11.32) and in particular, K + () (y ) (x) ( ) K. yx
valid for all a < u < v < w < b, again see Figure 11.1. The rst (second) inequality in Eq. (11.31) shows F (x, y ) is increasing y (x). This then implies the limits in item 2. are monotone and hence exist as claimed. 3. Let a < x < y < b. Using the increasing nature of F, < (x) = F (x, x) F (x, x+) = + (x) < and + (x) = F (x, x+) F (y , y ) = (y ) as desired. 4. Let t (x) , + (x) . Then t + (x) = F (x, x+) F (x, y ) = or equivalently, (y ) (x) + t (y x) for y x. Therefore Eq. (11.30) holds for y x. Similarly, for y < x, (y ) (x) yx
This last inequality implies, | (y ) (x)| K (y x) which is the desired Lipschitz bound. 6. For a < c < x < y < b, we have + (x) = F (x, x+) F (x, y ) and letting x c (using the continuity of F ) we learn + (c+) F (c, y ) . We may now let y c to conclude + (c+) + (c) . Since + (c) + (c+) , it follows that + (c) = + (c+) and hence that + is right continuous. Similarly, for a < x < y < c < b, we have (y ) F (x, y ) and letting y c (using the continuity of F ) we learn (c) F (x, c) . Now let x c to conclude (c) (c) . Since (c) (c) , it follows that (c) = (c) , i.e. is left continuous. 7. Since are increasing functions, they have at most countably many points of discontinuity. Letting x y in Eq. (11.29), using the left continuity of , shows (y ) = + (y ) . Hence if is continuous at y, (y ) = (y +) = + (y ) and is dierentiable at y. Conversely if is dierentiable at y, then + (y ) = (y ) = (y ) = + (y ) which shows + is continuous at y. Thus we have shown that set of discontinuity points of + is the same as the set of points of non-dierentiability of . That the discontinuity set of is the same as the non-dierentiability set of is proved similarly. Corollary 11.39. If : (a, b) R is a convex function and D (a, b) is a dense set, then (y ) = sup [ (x) + (x) (y x)] for all x, y (a, b) .
xD
Page: 120
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
Proof. Let (y ) := supxD [ (x) + (x) (y x)] . According to Eq. (11.30) above, we know that (y ) (y ) for all y (a, b) . Now suppose that x (a, b) and xn with xn x. Then passing to the limit in the estimate, (y ) (xn ) + (xn ) (y xn ) , shows (y ) (x) + (x) (y x) . Since x (a, b) is arbitrary we may take x = y to discover (y ) (y ) and hence (y ) = (y ) . The proof that (y ) = + (y ) is similar.
Part III
Convergence Results
Exercise 12.1 (A correlation inequality). Suppose that X is a random variable and f, g : R R are two increasing functions such that both f (X ) and g (X ) are square integrable. Show Cov (f (X ) , g (X )) 0. Hint: let Y be another random variable which has the same law as X and is independent of X. Then consider E [(f (Y ) f (X )) (g (Y ) g (X ))] .
(EX )
(12.1)
We say that X and Y are uncorrelated if Cov (X, Y ) = 0, i.e. E [XY ] = n EX EY. More generally we say {Xk }k=1 L2 (P ) are uncorrelated i Cov (Xi , Xj ) = 0 for all i = j. Notice that if X and Y are independent random variables, then f (X ) , g (Y ) are independent and hence uncorrelated for any choice of Borel measurable functions, f, g : R R such that f (X ) and g (X ) are square integrable. It also follows from Eq. (12.1) that Var (X ) E X 2 for all X L2 (P ) . (12.2)
Theorem 12.3 (An L2 Weak Law of Large Numbers). Let {Xn }n=1 be a sequence of uncorrelated square integrable random variables, n = EXn and 2 = Var (Xn ) . If there exists an increasing positive sequence, {an } and R n such that 1 an 1 a2 n then
Sn an n
j as n and
j =1 n 2 j 0 as n , j =1
Lemma 12.2. The covariance function, Cov (X, Y ) is bilinear in X and Y and Cov (X, Y ) = 0 if either X or Y is constant. For any constant k, Var (X + k ) = n Var (X ) and Var (kX ) = k 2 Var (X ) . If {Xk }k=1 are uncorrelated L2 (P ) random variables, then
n
j and
n
2 j = Var (Sn ) =
Var (Xj ) =
j =1 j =1
2 j .
Var (Sn ) =
k=1
Proof. We leave most of this simple proof to the reader. As an example of the type of argument involved, let us prove Var (X + k ) = Var (X ) ; Var (X + k ) = Cov (X + k, X + k ) = Cov (X + k, X ) + Cov (X + k, k ) = Cov (X + k, X ) = Cov (X, X ) + Cov (k, X ) = Cov (X, X ) = Var (X ) .
1 an j
j
j =1 2
an
1 a2 n
n 2 j 0. j =1
126
Sn an
=
L2 (P )
Sn
n j =1
an Sn
n j =1
n j =1
an +
L2 (P ) n j =1
Theorem 12.6 (Khintchins WLLN). If {Xn }n=1 are i.i.d. L1 (P ) random variables, then Proof. Letting 0. Sn :=
i=1 n P 1 n Sn
= EX1 .
j
L2 (P )
an
an
Xi 1|Xi |n ,
Example 12.4. Suppose that {Xk }k=1 L2 (P ) are uncorrelated identically distributed random variables. Then Sn n
L2 (P )
we have {Sn = Sn } n i=1 {|Xi | > n} . Therefore, using Chebyschevs inequality along with the dominated convergence theorem, we have
n
= EX1 as n .
P (Sn = Sn )
i=1
To see this, simply apply Theorem 12.3 with an = n. Proposition 12.5 (L2 - Convergence of Random Sums). Suppose that {Xk }k=1 L2 (P ) are uncorrelated. If k=1 Var (Xk ) < then
S Sn n > n n
P (Sn = Sn ) 0 as n ,
Sn P n
(Xk k ) converges in L2 (P ) .
k=1
Sn P n
0. So it suces to prove
.
2 Sn L (P ) n
where k := EXk . Proof. Letting Sn := k=1 (Xk k ) , it suces by the completeness of L (P ) (see Theorem 11.17) to show Sn Sm 2 0 as m, n . Supposing n > m, we have
2 n 2 n
We will now complete the proof by showing that, in fact, this end, let n := 1 1 ESn = n n
n
. To
E Xi 1|Xi |n = E X1 1|X1 |n
i=1
Sn
2 Sm 2
=E
k=m+1 n
(Xk k )
n 2 k 0 as m, n . k=m+1
= Var = = 1 n2
n
=
k=m+1
Var (Xk ) =
Sn n
1 Var (Sn ) n2
Var Xi 1|Xi |n
i=1
Note well: since L2 (P ) convergence implies Lp (P ) convergence for 0 p 2, where by L0 (P ) convergence we mean convergence in probability. The remainder of this chapter is mostly devoted to proving a.s. convergence for the quantities in Theorem 11.17 and Proposition 12.5 under various assumptions. These results will be described in the next section.
L2 (P )
Sn n n
L2 (P )
127
Theorem 12.7 (Kolmogorovs Strong Law of Large Numbers). Suppose that {Xn }n=1 are i.i.d. random variables and let Sn := X1 + + Xn . Then 1 there exists R such that n Sn a.s. i Xn is integrable and in which case EXn = .
1 Remark 12.8. If E |X1 | = but EX1 < , then n Sn a.s. To prove this, n M M M for M > 0 let Xn := Xn M and Sn := i=1 Xi . It follows from Theorem 1 M M M Sn M := EX1 a.s.. Since Sn Sn , we may conclude that 12.7 that n
Theorem 12.11 (Kolmogorovs Convergence Criteria). Suppose that {Yn }n=1 are independent square integrable random variables. If j =1 Var (Yj ) < , then j =1 (Yj EYj ) converges a.s. Proof. One way to prove this is to appeal Proposition 12.5 above and L evys Theorem 12.31 below. As second method is to make use of Kolmogorovs inequality. We will give this second proof below. The next theorem generalizes the previous theorem by giving necessary and sucient conditions for a random series of independent random variables to converge. Theorem 12.12 (Kolmogorovs Three Series Theorem). Suppose that {Xn }n=1 are independent random variables. Then the random series, j =1 Xj , is almost surely convergent i there exists c > 0 such that 1. 2. 3.
n=1 n=1 n=1
lim inf
n
One proof of Theorem 12.7 is based on the study of random series. Theorem 12.11 and 12.12 are standard convergence criteria for random series. Denition 12.9. Two sequences, {Xn } and {Xn } , of random variables are tail equivalent if
Moreover, if the three series above converge for some c > 0 then they converge for all values of c > 0. Proof. Proof of suciency. Suppose the three series converge for some c > 0. If we let Xn := Xn 1|Xn |c , then
E
n=1
1Xn =Xn =
n=1
P (Xn = Xn ) < .
Proposition 12.10. Suppose {Xn } and {Xn } are tail equivalent. Then 1. (Xn Xn ) converges a.s. 2. The sum Xn is convergent a.s. i the sum generally we have P Xn is convergent Xn is convergent a.s. More
P (Xn = Xn ) =
n=1 n=1
Xn is convergent
=1
Hence {Xn } and {Xn } are tail equivalent and so it suces to show n=1 Xn is almost surely convergent. However, by the convergence of the second series we learn
3. If there exists a random variable, X , and a sequence an such that 1 n an lim then 1 n an lim
n
Var (Xn ) =
n=1 n=1
Xk = X a.s
k=1
Xk = X a.s
k=1
Proof. If {Xn } and {Xn } are tail equivalent, we know; for a.e. , Xn ( ) = Xn ( ) for a.a n. The proposition is an easy consequence of this observation.
Finally, the third series guarantees that n=1 EXn = n=1 E Xn 1|Xn |c is convergent, therefore we may conclude n=1 Xn is convergent. The proof of the reverse direction will be given in Section 12.8 below.
Page: 127
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
128
12.2 Examples
12.2.1 Random Series Examples Example 12.13 (Kolmogorovs Convergence Criteria Example). Suppose that {Yn }n=1 are independent square integrable random variables, such that j =1 Var (Yj ) < and j =1 EYj converges a.s., then j =1 Yj converges a.s.. Denition 12.14. A random variable, Y, is normal with mean standard deviation 2 i P (Y B ) = 1 2 2
B d
>
n=1
P (|n N + n | > c) =
1 2 n=1
e 2 x dx
Bn
(12.4)
where Bn = (,
c + n ) n
c n , . n
(12.3)
If limn n = 0 then there is a c > 0 such that either n c i.o. or n c i.o. In the rst case in which case (0, ) Bn and in the second (, 0) 1 2 e 2 x dx 1/2 i.o. which would Bn and in either case we will have 1 2 Bn contradict Eq. (12.4). Hence we may concluded that limn n = 0. Similarly if limn n = 0, then we may conclude that Bn contains a set of the form [, ) i.o. for some < and so 1 2
1 2 1 e 2 x dx 2 Bn
We will abbreviate this by writing Y = N , 2 . When = 0 and 2 = 1 we will simply write N for N (0, 1) and if Y = N, we will say Y is a standard normal random variable. Observe that Eq. (12.3) is equivalent to writing E [f (Y )] = 1 2 2
R d d
e 2 x dx i.o.
which would again contradict Eq. (12.4). Therefore we may conclude that limn n = limn n = 0. 2. The convergence of the second series for all c > 0 implies
f (y ) e 22 (y) dy >
Var Yn 1|Yn |c =
n=1 n=1
for all bounded measurable functions, f : R R. Also observe that Y = d N , 2 is equivalent to Y = N +. Indeed, by making the change of variable, y = x + , we nd 1 E [f (N + )] = 2 1 = 2 f (x + ) e 2 x dx
R
1 2
>
n=1
n=1
2 n n .
f (y ) e 22 (y)
R
dy 1 = 2 2
j =1
f (y ) e 22 (y) dy.
R
where n := Var N 1|n N +n |c . As the reader should check, n 1 as 2 n and therefore we may conclude n=1 n < . It now follows by Kol mogorovs convergence criteria that n=1 (Yn n ) is almost surely convergent and therefore
Lemma 12.15. Suppose that {Yn }n=1 are independent square integrable ran2 dom variables such that Yn = N n , n . Then 2 converges. < and j j =1 j j =1 d
n =
n=1 n=1
Yn
n=1
(Yn n )
Yj converges a.s. i
Proof. The implication = is true without the assumption that the Yn are normal random variables as pointed out in Example 12.13. To prove the converse directions we will make use of the Kolmogorovs three series theo rem. Namely, if j =1 Yj converges a.s. then the three series in Theorem 12.12 converge for all c > 0. d 1. Since Yn = n N + n , we have for any c > 0 that
converges as well. Alternatively: we may also deduce the convergence of third series as well. Indeed, for all c > 0 implies
n=1
n by the
E [n N + n ] 1|n N +n |c
n=1
is convergent, i.e.
[n n + n n ] is convergent.
n=1
Page: 128
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
12.2 Examples
129
where n := E N 1|n N +n |c and n := E 1|n N +n |c . With a little eort one can show, n ek/n and 1 n ek/n for large n.
2 Since ek/n Cn for large n, it follows that so that n=1 n n is convergent. Moreover,
2 2 2
1 0
d sin dt
k t 2
dt = =
k2 2 22
cos
0
k t 2
dt
1
k 2 2 2 k 1 t + sin kt 22 k 4 4
=
0
k2 2 . 23
n=1
|n n | C
n=1
3 n <
|n (n 1)| C
n=1 n=1
2 |n | n <
Fact: Wiener in 1923 showed the series in Eq. (12.5) is in fact almost surely uniformly convergent. Given this, the process, t Bt is almost surely continuous. The process {Bt : 0 t 1} is Brownian Motion. Example 12.17. As a simple application of Theorem 12.12, we will now use Theorem 12.12 to give a proof of Theorem 12.11. We will apply Theorem 12.12 with Xn := Yn EYn . We need to then check the three series in the statement of Theorem 12.12 converge. For the rst series we have by the Markov inequality,
and hence
n =
n=1 n=1
n n
n=1
n (n 1)
must also be convergent. Example 12.16 (Brownian Motion). Let dom variable, i.e. P (Nn A) =
A {Nn }n=1
1 1 2 E |Xn | = 2 2 c c n=1
2 1 ex /2 dx for all A BR . 2
Var Xn 1|Xn |c
n=1 n=1
Xn 1|Xn |c
n=1
2 Xn
=
n=1
E Xn 1|Xn |c
n=1
E |Xn | 1|Xn |c
n=1
provided n=1 a2 n < . This is a simple consequence of Kolmogorovs convergence criteria, Theorem 12.11, and the facts that E [an Nn sin n t] = 0 and
2 2 Var (an Nn sin n t) = a2 n sin n t an .
12.2.2 A WLLN Example Let {Xn }n=1 be i.i.d. random variables with common distribution function, F (x) := P (Xn x) . For x R let Fn (x) be the empirical distribution function dened by, n n 1 1 Fn (x) := 1X x = X ((, x]) . n j =1 j n j =1 j Since E1Xj x = F (x) and 1Xj x
j =1
As a special case, if we take n = (2n 1) 2 and an = that 2 2 Nk Bt := sin k t k 2 is a.s. convergent for all t R.
1 0 k=1,3,5,... The factor 2k2
2 (2n1) ,
d 2 2 sin (kt) dt k
dt = 1
weak law of large numbers implies Fn (x) F (x) as n . As usual, for p (0, 1) let F (p) := inf {x : F (x) p} and recall that F (p) x i F (x) p. Let us notice that
as seen by,
Page: 129 job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
130
1Xj x np
j =1
and hence,
P (F (p) Fn (p) ) = P (Fn (F (p) ) F (F (p) ) p F (F (p) )) = P (Fn (F (p) ) F (F (p) ) ) 0 as n .
= inf {x : # {j n : Xj x} np} . The order statistic of ( n) ( n) ( n) X1 , X 2 , . . . , Xn , where (X1 , . . . , Xn ) is the nite sequence, ( n) ( n) ( n) X1 , X 2 , . . . , Xn denotes (X1 , . . . , Xn )
( n)
( n) np
F (p) as n .
arranged in increasing order with possible repetitions. Let us observe that Xk ( n) are all random variables for k n. Indeed, Xk x i # {j n : Xj x} k n i j =1 1Xj x k, i.e. Xk
( n)
x =
1Xj x k
j =1
B.
Moreover, if we let x = min {n Z : n x} , the reader may easily check that ( n) (p) = X np . Fn Proposition 12.18. Keeping the notation above. Suppose that p (0, 1) is a point where F (F (p) ) < p < F (F (p) + ) for all > 0
(p) F (p) as n . Thus we can recover, with high then X np = Fn n th probability, the p quantile of the distribution F by observing {Xi }i=1 . ( n) P
t TNt +1 TNt < . Nt Nt Nt Since Xi > 0 a.s., 1 := {Nt as t } also has full measure and for 0 1 we have = lim TNt () ( ) TNt ()+1 ( ) Nt ( ) + 1 t lim lim = . t t Nt ( ) Nt ( ) Nt ( ) + 1 Nt ( )
so that
{Fn (p) F (p) > } = {Fn (F (p) + ) < p} = {Fn ( + F (p)) F ( + F (p)) < p F (F (p) + )} .
Example 12.20 (Renewal Theory II). Let {Xi }i=1 be i.i.d. and {Yi }i=1 be i.i.d. with {Xi }i=1 being independent of the {Yi }i=1 . Also again assume that 0 < Xi < and 0 < Yi < a.s. We will interpret Yi to be the amount of time the ith bulb remains out after burning out before it is replaced by bulb number i + 1. Let Rt be the amount of time that we have a working bulb in the time interval [0, t] . We are now going to show 1 EX1 lim Rt = . t EX1 + EY1
Page: 130
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
131
To prove this, now let Tn := (Xi + Yi ) be the time that the nth bulb is replaced and Nt := sup {n 0 : Tn t} denote the number of bulbs which have burned out up to time n. Then Rt = Nt 1 1 i=1 Xi . Setting = EX1 and = EY1 , we now have t Nt + a.s. so that 1 Nt = + t + o (t) a.s. Therefore, by the strong law of large numbers, 1 1 Rt = t t Nt 1 Xi = t Nt i=1
Nt Nt
n i=1
and x0 = . Observe that it is possible that xi = xi+1 for some of the i. This can occur when F has jumps of size greater than 1/k.
Xi
i=1
1 a.s. +
Theorem 12.21 (Glivenko-Cantelli Theorem). Suppose that {Xn }n=1 are n 1 i.i.d. random variables and F (x) := P (Xi x) . Further let n := n i=1 Xi be the empirical distribution with empirical distribution function, Fn (x) := n ((, x]) = Then
n xR
1 n
1Xi x .
i=1
lim sup |Fn (x) F (x)| = 0 a.s. Now suppose i has been chosen so that xi < xi+1 and let x (xi , xi+1 ) . Further let N ( ) N be chosen so that |Fn (xi ) F (xi )| < 1/k and |Fn (xi ) F (xi )| < 1/k . for n N ( ) and i = 1, 2, . . . , k 1 and k with P (k ) = 1. We then have Fn (x) Fn (xi+1 ) F (xi+1 ) + 1/k F (x) + 2/k and Fn (x) Fn (xi ) F (xi ) 1/k F (xi+1 ) 2/k F (x) 2/k. From this it follows that |F (x) Fn (x)| 2/k and we have shown for k and n N ( ) that sup |F (x) Fn (x)| 2/k.
xR
Proof. Since {1Xi x }i=1 are i.i.d random variables with E1Xi x = P (Xi x) = F (x) , it follows by the strong law of large numbers the limn Fn (x) = F (x) a.s. for each x R. Our goal is to now show that this convergence is uniform.1 To do this we will use one more application of the strong law of large numbers applied to {1Xi <x } which allows us to conclude, for each x R, that
n
i Given k N, let k := and let xi := k : i = 1, 2, . . . , k 1 inf {x : F (x) i/k } for i = 1, 1, 2, . . . , k 1. Let us further set xk =
1
Observation. If F is continouous then, by what we have just shown, there is a set 0 such that P (0 ) = 1 and on 0 , Fn (r) F (r) for all r Q. Moreover on 0 , if x R and r x s with r, s Q, we have F (r) = lim Fn (r) lim inf Fn (x) lim sup Fn (x) lim Fn (s) = F (s) .
n n n n
We may now let s x and r x to conclude, on 0 , on F (x) lim inf Fn (x) lim sup Fn (x) F (x) for all x R,
n n
i.e. on 0 , limn Fn (x) = F (x) . Thus, in this special case we have shown o a xed null set independent of x that limn Fn (x) = F (x) for all x R. Page: 131 job: prob macro: svmonob.cls
date/time: 23-Feb-2007/15:20
132
Example 12.22 (Shannons Theorem). Let {Xi }i=1 be a sequence of i.i.d. random variables with values in {1, 2, . . . , r} N. Let p (k ) := P (Xi = k ) > 0 for 1 k r. Further, let n ( ) = p (X1 ( )) . . . p (Xn ( )) be the probability of the realization, (X1 ( ) , . . . , Xn ( )) . Since {ln p (Xi )}i=1 are i.i.d., 1 1 ln n = n n
n r
P (|Xk | > n) 0
k=1
(12.7)
and 1 n2 then
n 2 E Xk : |Xk | n 0, k=1
(12.8)
p (k ) ln p (k ) =: H (p) .
H = =
1 n
ln n > 0 as n . Since
Sn an P 0. n
H+
Proof. A key ingredient in this proof and proofs of other versions of the law of large numbers is to introduce truncations of the {Xk } . In this case we consider
n
Sn :=
k=1
Xk 1|Xk |n .
= n > en(H +)
n < en(H )
= n en(H +) n en(H ) = en(H +) n en(H ) , it follows that P en(H +) n en(H ) 1 as n . Thus the probability, n , that the random sample {X1 , . . . , Xn } should occur is approximately enH with high probability. The number H is called the entropy r of the distribution, {p (k )}k=1 . E Hence it suces to show
Sn an n L (P )
2
P (Sn = Sn )
k=1 Sn an P n
P (|Xk | > n) 0 as n .
1 1 Var (Sn ) = 2 n2 n 1 n2
n
Var Xk 1|Xk |n
k=1
2 E Xk 1|Xk |n 0 as n . k=1
We now verify the hypothesis of Theorem 12.23 in three situations. Corollary 12.24. If {Xn }n=1 are i.i.d. L2 (P ) random variables, then
1 n Sn P
= EX1 .
an :=
k=1
If
Page: 132 job: prob
(12.9)
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
133
Moreover, 1 n2
n 2 E Xk : |Xk | n = k=1
10x|X |n xdx = 2
n
2
0
xP (|X | x) dx = 2
0
1 2 E |X1 | 0 as n . n2
With these observations we may now apply Theorem 12.23 to complete the proof. Corollary 12.25 (Khintchins WLLN). If dom variables, then
P 1 n Sn {Xn }n=1
(x) dx + 2
M
(x) dx 2KM + 2 (n M )
= EX1 .
where K = sup { (x) : x 0} . Dividing this estimate by n and then letting n shows 1 2 lim sup E |X | : |X | n 2. n n Since > 0 was arbitrary, the proof is complete. Corollary 12.27 (Fellers WLLN). If {Xn }n=1 are i.i.d. and (x) := xP (|X1 | > x) 0 as x , then the hypothesis of Theorem 12.23 are satised. Proof. Since
n
Proof. Again we have by Eq. (12.9), Chebyschevs inequality, and the dominated convergence theorem, that
n
k=1
Also 1 n2
n
k=1
and the latter expression goes to zero as n by the dominated convergence theorem, since |X1 | |X1 | 1|X1 |n |X1 | L1 (P ) n
1| and limn |X1 | |X n 1|X1 |n = 0. Hence again the hypothesis of Theorem 12.23 have been veried.
Eq. (12.7) is satised. Eq. (12.8), follows from Lemma 12.26 and the identity, 1 n2
n 2 E Xk : |Xk | n = k=1
1 2 E |X1 | : |X1 | n . n
Lemma 12.26. Let X be a random variable such that (x) := xP (|X | x) 0 as x , then 1 2 lim E |X | : |X | n = 0. (12.10) n n Note: If X L1 (P ) , then by Chebyschevs inequality and the dominated convergence theorem, (x) E [|X | : |X | x] 0 as x .
1 2 : |SN | . E SN 2
Page: 133
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
134
Proof. Let J = inf {j : |Sj | } with the inmum of the empty set being taken to be equal to . Observe that {J = j } = {|S1 | < , . . . , |Sj 1 | < , |Sj | } (X1 , . . . , Xj ) . Now
N
Proof. (The proof of this Corollary may be skipped. We will give another proof in Corollary 12.36 below.) From Theorem 12.28, we have for every > 0 that P
SN Np = P (SN N p )
1 C 1 2 E SN = 2 2p CN = 2 (2p1) . 2 N 2p N N
2 SN
|SN |
> =E
N
2 SN
:J N =
j =1
E
2
2 SN
P
n=1
SN n p Nn
n=1
C 2 n(2p1)
<
=
j =1 N
E (Sj + SN Sj ) : J = j
2 E Sj + (SN Sj ) + 2Sj (SN Sj ) : J = j j =1 N 2
=
( )
= 0.
E
j =1 N
2 Sj
+ (SN Sj ) : J = j
N 2 j =1 2 (|SN |
Nn From this it follows that limn N p = 0 a.s. n To nish the proof, for m N, we may choose n = n (m) such that
j =1
2 Sj
:J =j
P [J = j ] = P
The equality, () , is a consequence of the observations: 1) 1J =j Sj is (X1 , . . . , Xj ) measurable, 2) (Sn Sj ) is (Xj +1 , . . . , Xn ) measurable and hence 1J =j Sj and (Sn Sj ) are independent, and so 3) E [Sj (SN Sj ) : J = j ] = E [Sj 1J =j (SN Sj )] = E [Sj 1J =j ] E [SN Sj ] = E [Sj 1J =j ] 0 = 0.
and Nn+1 /Nn 1 as n , it follows that 0 = lim S lim m p p m N m N m mp n(m) n(m)+1 SN SN n(m)+1 n(m)+1 lim = lim = 0 a.s. p p m N m N n(m) n(m)+1 = lim = 0 a.s.
SN n(m) SN n(m)
Corollary 12.29 (L2 SSLN). Let {Xn } be a sequence of independent rann 2 dom variables with mean zero, and 2 = EXn < . Letting Sn = k=1 Xk and p > 1/2, we have 1 Sn 0 a.s. np If {Yn } is a sequence of independent random variables EYn = and 2 = Var (Xn ) < , then for any (0, 1/2) , 1 n
n
That is limm
Sm mp
Theorem 12.30 (Skorohods Inequality). Let {Xn } be a sequence of independent random variables and let > 0. Let Sn := X1 + + Xn . Then for all > 0, P (|SN | > ) (1 cN ()) P max |Sj | > 2 ,
j N
Yk = O
k=1
1 n
. where
j N
Page: 134
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
135
To this end, let J = inf {j : |Sj | > 2} with the inmum of the empty set being taken to be equal to . Observe that {J = j } = {|S1 | 2, . . . , |Sj 1 | 2, |Sj | > 2} and therefore max |Sj | > 2
j N N
Proof. Let Sn := Xk . Since almost sure convergence implies convergence in probability, it suces to show; if Sn is convergent in probability then Sn is almost surely convergent. Given M M, let QM := supnM |Sn SM | and for M < N, let QM,N := supM nN |Sn SM | . Given (0, 1) , by assumption, there exists M = M () N such that maxM j N P (|SN Sj | > ) < for all N M. An application of Skorohods inequality, then shows P (QM,N 2) P (|SN SM | > ) . (1 maxM j N P (|SN Sj | > )) 1
n k=1
Since QM,N QM as N , we may conclude {J = j } . P (QM 2) Since, M := sup |Sn Sm | sup [|Sn SM | + |SM Sm |] = 2QM
m,nM m,nM 1
=
j =1
. 1
Also observe that on {J = j } , |SN | = |SN Sj + Sj | |Sj | |SN Sj | > 2 |SN Sj | . Hence on the {J = j, |SN Sj | } we have |SN | > , i.e. {J = j, |SN Sj | } {|SN | > } for all j N. Hence ti follows from this identity and the independence of {Xn } that
N
follows that M 0 as M . Moreover, since M is decreasing in M, it P follows that limM M =: exists and because M 0 we may concluded that = 0 a.s. Thus we have shown
m,n
P (|SN | > )
j =1 N
P (J = j, |SN Sj | )
=
j =1
P (J = j ) P (|SN Sj | ) .
and therefore {Sn }n=1 is almost surely Cauchy and hence almost surely convergent. Proposition 12.32 (Reection Principle). Let X be a separable Banach
N d
Under the assumption that P (|SN Sj | > ) c for all j N, we nd P (|SN Sj | ) 1 c and therefore,
N
space and {i }i=1 be independent symmetric (i.e. i = i ) random variables k with values in X. Let Sk := i=1 i and Sk := supj k Sj with the convention that S0 = 0. Then P (SN r) 2P ( SN r) . (12.11) Proof. Since
P (|SN | > )
j =1
P (J = j ) (1 c) = (1 c) P
{SN r} =
N j =1
Sj r, Sj 1 < r ,
As an application of Theorem 12.30 we have the following convergence result. Theorem 12.31 (L evys Theorem). Suppose that {Xn }n=1 are i.i.d. random variables then n=1 Xn converges in probability i n=1 Xn converges a.s.
Page: 135 job: prob
SN < r ) (12.12)
where
macro: svmonob.cls date/time: 23-Feb-2007/15:20
136
r,
SN < r ) =
j =1
P ( Sj r,
Sj 1
< r,
SN < r).
(12.13)
Proof. First proof. By Proposition 12.5, the sum, j =1 (Yj EYj ) , is L2 (P ) convergent and hence convergent in probability. An application of L evys Theorem 12.31 then shows j =1 (Yj EYj ) is almost surely convergent. n Second proof. Let Sn := j =1 Xj where Xj := Yj EYj . According to Kolmogorovs inequality, Theorem 12.28, for all M < N, P max |Sj SM | 1 1 2 E (SN SM ) = 2 2 1 2
N N 2 E Xj j =M +1
Sj +
k>j
k < r )
M j N
= P ( Sj r, Sj 1 < r, = P ( Sj r, Sj 1 < r,
Sj
k>j
k < r )
Var (Xj ) .
j =M +1
If Sj r and 2Sj SN < r, then r > 2Sj SN 2 Sj SN 2r SN and hence SN > r. This shows, Sj r,
Sj 1
Var (Xj ) .
j =M +1
< r,
2S j S N < r
Sj r,
Sj 1
< r,
SN > r
and therefore,
P ( Sj r, Sj 1 < r, SN < r) P ( Sj r, Sj 1 < r,
SN > r).
Var (Xj ) 0 as M ,
j =M +1
SN < r )
j =1
P ( Sj r, Sj 1 < r,
SN > r )
= P (SN r,
SN > r) P ( SN r).
i.e. M 0 as M . Since M is decreasing in M, it follows that P limM M =: exists and because M 0 we may concluded that = 0 a.s. Thus we have shown
m,n
This estimate along with the estimate in Eq. (12.12) completes the proof of the theorem.
and therefore {Sn }n=1 is almost surely Cauchy and hence almost surely convergent. Lemma 12.34 (Kroneckers Lemma). Suppose that {xk } R and {ak } k (0, ) are sequences such that ak and k=1 x ak exists. Then 1 n an lim
n
xk = 0.
k=1
Page: 136
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
137
Proof. Before going to the proof, let us warm-up by proving the following continuous version of the lemma. Let a (s) (0, ) and x (s) R be continuous (s) functions such that a (s) as s and 1 x a(s) ds exists. We are going to show n 1 lim x (s) ds = 0. n a (n) 1 Let X (s) :=
s 0
x (u) du and
1 an
r (s) :=
s
X (u) du = a (u)
1 an
(ak ak1 )
k=m
Then by assumption, r (s) 0 as s 0 and X (s) = a (s) r (s) . Integrating this equation shows
s s
This completes the proof since supkm |rk | 0 as m . r (u) a (u) du. Corollary 12.35. Let {Xn } be a sequence of independent square integrable random variables and bn be a sequence such that bn . If
X (s) X (s0 ) =
s0
a (u) r (u) du =
+
s0
Dividing this equation by a (s) and then letting s gives 1 a (s0 ) r (s0 ) a (s) r (s) |X (s)| = lim sup + r (u) a (u) du lim sup a (s) a (s) a (s) s0 s s s 1 lim sup r (s) + |r (u)| a (u) du a (s) s0 s a (s) a (s0 ) sup |r (u)| = sup |r (u)| 0 as s0 . lim sup a (s) s us0 us0 With this as warm-up, we go to the discrete case. Let
k s
k=1
then
Sn ESn 0 a.s. bn
k=1
Sk :=
j =1
xj and rk :=
j =k
xj . aj
Sn ESn . bn
ak (rk rk+1 ) =
k=1 n
1 an
n+1
ak rk
k=1 k=2
ak1 rk Corollary 12.36 (L2 SSLN). Let {Xn } be a sequence of independent rann 2 dom variables such that 2 = EXn < . Letting Sn = k=1 Xk and := EXn , we have 1 (Sn n) 0 a.s. (12.14) bn provided bn and
p 1 n=1 b2 n
1 a1 r1 an rn+1 + an
job: prob
date/time: 23-Feb-2007/15:20
138
Sn n = o (1) bn or equivalently, Sn bn = o (1) . n n Proof. This corollary is a special case of Corollary 12.35. Let us simply observe here that
1ny y
n=1 n=1
1ny + 1 =
n=0
1ny .
(12.16)
Taking y = |X | / in Eq. (12.16) and then take expectations gives the estimate in Eq. (12.15). Proposition 12.40. Suppose that {Xn }n=1 are i.i.d. random variables, then the following are equivalent: 1. E |X1 | < . 2. There exists > 0 such that n=1 P (|X1 | n) < . 3. For all > 0, n=1 P (|X1 | n) < . n| 4. limn |X n = 0 a.s. Proof. The equivalence of items 1., 2., and 3. easily follows from Lemma 12.39. So to nish the proof it suces to show 3. is equivalent to 4. To this end n| we start by noting that limn |X n = 0 a.s. i 0=P |Xn | i.o. n
1 n1/2 (ln n)
1/2+ 2
1
1+2
n=2
n=2 n (ln n)
1 x ln
1+2
dx =
ln 2
1 ey y 1+2
ey dy =
ln 2
1 dy < , y 1+2
wherein we have made the change of variables, y = ln x. Fact 12.37 Under the hypothesis in Corollary 12.36,
n
lim
Sn n n1/2 (ln ln n)
1/2
2 a.s.
(12.17)
Our next goal is to prove the Strong Law of Large numbers (in Theorem 12.7) under the assumption that E |X1 | < .
However, since {|Xn | n}n=1 are independent sets, Borel zero-one law shows the statement in Eq. (12.17) is equivalent to n=1 P (|Xn | n) < for all > 0. Corollary 12.41. Suppose that {Xn }n=1 are i.i.d. random variables such that 1 1 n Sn c R a.s., then Xn L (P ) and := EXn = c. Proof. If
1 n Sn
psp1 P (|X | s) ds =
0
c a.s. then n :=
Sn+1 n+1
Sn n
psp1 ds = p
0
1s|X | sp1 ds = p
0
Sn+1 Sn 1 1 Xn+1 = = n + Sn n+1 n+1 n+1 n n+1 Sn 1 = n + 0 + 0 c = 0. (n + 1) n Hence an application of Proposition 12.40 shows Xn L1 (P ) . Moreover by 1 Exercise 11.3, n Sn n=1 is a uniformly integrable sequenced and therefore, =E 1 Sn E n
n
Taking expectations of this identity along with an application of Tonellis theorem completes the proof. Lemma 12.39. If X is a random variable and > 0, then
P (|X | n)
n=1
1 E |X | P (|X | n) . n=0
job: prob
lim
1 Sn = E [c] = c. n
(12.15)
Page: 138
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
139
nx
Proof. The proof will be by comparison with the integral, For example, 1 1 1 + dt = 1 + 1 = 2 2 2 n t 1 n=1 and so
nx
1 1 |X |
2E |X | .
With this as preparation we are now in a position to prove Theorem 12.7 which we restate here. Theorem 12.44 (Kolmogorovs Strong Law of Large Numbers). Sup pose that {Xn }n=1 are i.i.d. random variables and let Sn := X1 + + Xn . 1 Then there exists R such that n Sn a.s. i Xn is integrable and in which case EXn = .
nx
1 1 1 1 dt = 2 + = 2 t x x x
1+
1 x
2 , x
1 Proof. The implication, n Sn a.s. implies Xn L1 (P ) and EXn = has already been proved in Corollary 12.41. So let us now assume Xn L1 (P ) and let := EXn . Let Xn := Xn 1|Xn |n . By Proposition 12.40,
P (Xn = Xn ) =
n=1 n=1
P (|Xn | > n) =
n=1
and hence {Xn } and {Xn } are tail equivalent. Therefore it suces to show 1 limn n Sn = a.s. where Sn := X1 + + Xn . But by Lemma 12.43,
E |X | 1 2 n |Xn |n Var (Xn ) E |Xn | = 2 2 2 n n n n=1 n=1 n=1 2
=
n=1
E |X1 | 1|X1 |n n2
2E |X1 | < .
Therefore by Kolmogorovs convergence criteria, Lemma 12.43. Suppose that X : R is a random variable, then 1 2 E |X | : 1|X |n 2E |X | . 2 n n=1
Xn EXn is almost surely convergent. n n=1 Kroneckers lemma then implies 1 n n lim
n
Page: 139
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
140
k=1
1 EXk = lim n n
n
E Xn 1|Xn |n
k=1
1 = lim n n
lim
N j =1
1 Aj ln N
1 j
= 0 a.s.
E X1 1|X1 |n
k=1
= lim E X1 1|X1 |n = . Here we have used the dominated convergence theorem to see that an := E X1 1|X1 |n as n . It is now easy (and standard) to check that n 1 limn n k=1 an = limn an = as well. We end this section with another example of using Kolmogorovs convergence criteria in conjunction with Kroneckers lemma. We now assume that {Xn }n=1 are i.i.d. random variables with a continuous distribution function and let Aj denote the event when Xj is a record, i.e. Aj := {Xj > max {X1 , X2 , . . . , Xk1 }} . Recall from Renyi Theorem 7.28 that {Aj }j =1 are independent and P (Aj ) = for all j. Proposition 12.45. Keeping the preceding notation and let N := denote the number of records in the rst N observations. Then limN a.s. Proof. Since 1Aj are Bernoulli random variables, E1Aj = Var 1Aj = E12 Aj E1Aj Observing that
n n 2 1 j 1 j n
ln N
= 1.
(12.18)
ln (N + 1) =
1 N
1 dx = x
N j
j +1
j =1
1 dx x
N
j +1 j N
=
j =1
1 1 x j 1 j
dx +
j =1
1 j (12.19)
= N +
j =1
N j =1 1Aj N ln N = 1
where
N
|N | =
j =1
ln
and
j+1 1 = j j
ln (1 + 1/j )
j =1
1 j
j =1
1 j2
1 j1 1 2 = . j j j2
and hence we conclude that limN N < . So dividing Eq. (12.19) by ln N and letting N gives the desired limit in Eq. (12.18).
E1Aj =
j =1 j =1
1 j
N 1 N
1 dx = ln N x
we are lead to try to normalize the sum j =1 1Aj by ln N. So in the spirit of the proof of the strong law of large numbers let us compute;
Var
j =2
1Aj ln j
=
j =2
1 j1 ln2 j j 2
1 1 dx = ln2 x x
ln 2
1 dy < . y2
1 Aj ln j
1 j
=
j =2
j =2
1 Aj 1Aj E ln j ln j
EYj2
j =1
(12.20)
where as usual, Sn :=
macro: svmonob.cls
n j =1
Yj .
date/time: 23-Feb-2007/15:20
141
Remark 12.47. It follows from Eq. (12.20) that if P (supn |Sn | < ) > 0, then 2 j =1 Yj = limn Sn j =1 EYj < and hence by Kolmogorovs Theorem, exists a.s. and in particular, P (supn |Sn | < ) . Proof. Let > 0 and be the rst time |Sn | > , i.e. let be the stopping time dened by, = := inf {n 1 : |Sn | > } . As usual, = if {n 1 : |Sn | > } = . Then for N N,
2 2 2 E SN = E SN : N + E SN : >N 2 E SN : N + 2 P [ > N ] .
Since Sn is convergent a.s., it follows that P (supn |Sn | < ) = 1 and therefore,
lim P
= 1.
Hence for suciently large, P (supn |Sn | < ) > 0 ad we learn that
2 EYj2 = lim E SN j =1 N
Moreover,
N 2 E SN : N = j =1 N 2 E SN : =j = j =1 2 E Sj + 2Sj (SN Sj ) + (SN Sj ) : = j j =1 N N 2 N
E |Sj + SN Sj | : = j
Lemma 12.48. Suppose that {Yn }n=1 are independent random variables such that there exists c < such that |Yn | c a.s. for all n. If n=1 Yn converges in R a.s. then n=1 EYn converges as well. Proof. Let (0 , B0 , P0 ) be the probability space that {Yn }n=1 is dened on and let := 0 0 , B := B0 B0 , and P := P0 P0 . Further let Yn (1 , 2 ) := Yn (1 ) and Yn (1 , 2 ) := Yn (2 ) and
=
j =1 N
2 Sj
: =j +
j =1 2
E (SN Sj )
P [ = j ]
N
j =1 N
2 E (Sj 1 + Yj ) : = j + E SN j =1
P [ = j ]
Zn (1 , 2 ) =
n=1 n=1
Yn (1 )
n=1
Yn (2 ) exists
j =1
2 E ( + c) : = j + E SN P [ N ] 2
2 = ( + c) + E SN
P [ N ] . > =
Var (Zn ) =
n=1
Var (Yn Yn )
P [ N ] + 2 P [ > N ] P [ N ] + ( + c) P [ > N ]
2 SN 2
Var (Yn ) .
Thus by Kolmogorovs convergence theorem, it follows that n=1 (Yn EYn ) is convergent. Since n=1 Yn is a.s. convergent, we may conclude that n=1 EYn is also convergent. We are now ready to complete the proof of Theorem 12.12. Proof. Our goal is to show if {Xn }n=1 are independent random variables, then the random series, n=1 Xn , is almost surely convergent i for all c > 0 the following three series converge; 1.
n=1
Page: 141
macro: svmonob.cls
2. 3.
n=1 n=1
Since n=1 Xn is almost surely convergent, it follows that limn Xn = 0 a.s. and hence for every c > 0, P ({|Xn | c i.o.}) = 0. According the Borel zero one law this implies for every c > 0 that n=1 P (|Xn | > c) < . Given c this, we now know that {Xn } and Xn := Xn 1|Xn |c are tail equivalent for c all c > 0 and in particular n=1 Xn is almost surely convergent for all c > 0. c ), So according to Lemma 12.48 (with Yn = Xn
c EXn = n=1 n=1
E Xn 1|Xn |c
converges.
c c , we may now conclude that n=1 Yn is almost surely EXn Letting Yn := Xn convergent. Since {Yn } is uniformly bounded and EYn = 0 for all n, an application of Lemma 12.46 allows us to conclude 2 EYn < . n=1
Var Xn 1|Xn |c =
n=1
Proof. Let = and h := f g : R so that d = hdm. Since ( ) = ( ) ( ) = 1 1 = 0, if A B we have (A) + (Ac ) = ( ) = 0. In particular this shows | (A)| = | (Ac )| and therefore,
| (A)| =
1 1 hdm [| (A)| + | (Ac )|] = hdm + 2 2 Ac A 1 1 |h| dm + |h| dm = |h| dm. 2 A 2 Ac dT V (, ) = sup | (A)|
AB
(13.1)
This shows
Remark 13.2. The function, : B R dened by, (A) := (A) (A) for all A B , is an example of a signed measure. For signed measures, one usually denes
n
1 2
|h| dm.
To prove the converse inequality, simply take A = {h > 0} (note Ac = {h 0}) in Eq. (13.1) to nd | (A)| = 1 2 1 = 2 hdm
A Ac
TV
:= sup
i=1
hdm |h| dm =
Ac
You are asked to show in Exercise 13.1 below, that when = , dT V (, ) = 1 2 TV . Lemma 13.3 (Sche es Lemma). Suppose that m is another positive measure on (, B ) such that there exists measurable functions, f, g : [0, ), such that d = f dm and d = gdm.1 Then dT V (, ) =
|h| dm +
A
1 2
|h| dm.
For the second assertion, let Gn := fn + g and observe that |fn g | 0 m a.e., |fn g | Gn L1 (m) , Gn G := 2g a.e. and Gn dm = 2 2 = Gdm and n . Therefore, by the dominated convergence theorem 8.34,
n
1 2
|f g | dm.
lim dT V (n , ) =
1 lim 2 n
|fn g | dm = 0.
Moreover, if {n }n=1 is a sequence of probability measure of the form, dn = fn dm with fn : [0, ), and fn g, m - a.e., then dT V (n , ) 0 as n .
1
For a concrete application of Sche es Lemma, see Proposition 13.35 below. Corollary 13.4. Let h := sup |h ( )| when h : R is a bounded random variable. Continuing the notation in Sche es lemma above, we have
144
dT V (, ) = Consequently,
1 sup 2
hd
hd : h
1 .
(13.2)
hd
hd 2dT V (, ) h
(13.3)
hd if dT V (n , ) 0.
(13.4)
Nevertheless we would like Xn to be close to U in distribution. Let us observe that if we let Fn (y ) := P (Xn y ) and F (y ) := P (U y ) , then Fn (y ) = P (Xn y ) = 1 i # i {1, 2, . . . , n} : y n n
hd =
h (f g ) dm
|h| |f g | dm
and F (y ) := P (U y ) = (y 1) 0. . From these formula, it easily follows that F (y ) = limn Fn (y ) for all y R. This suggest that we should say that Xn converges in distribution to X i P (Xn y ) P (X y ) for all y R. However, the next simple example shows this denition is also too restrictive. Example 13.7. Suppose that P (Xn = 1/n) = 1 for all n and P (X0 = 0) = 1. Then it is reasonable to insist that Xn converges of X0 in distribution. However, Fn (y ) = 1y1/n 1y0 = F0 (y ) for all y R except for y = 0. Observe that y is the only point of discontinuity of F0 . Notation 13.8 Let (X, d) be a metric space, f : X R be a function. The set of x X where f is continuous (discontinuous) at x will be denoted by C (f ) (D (f )). Observe that if F : R [0, 1] is a non-decreasing function, then C (F ) is at most countable. To see this, suppose that > 0 is given and let C := {y R : F (y +) F (y ) } . If y < y with y, y C , then F (y +) < F (y ) and (F (y ) , F (y +)) and (F (y ) , F (y +)) are disjoint intervals of length greater that . Hence it follows that 1 = m ([0, 1])
y C
|f g | dm = 2dT V (, ) h
hd
when h := 1f >g 1f g . These two equations prove Eqs. (13.2) and (13.3) and the latter implies Eq. (13.4). Exercise 13.1. Under the hypothesis of Sche es Lemma 13.3, show
TV
|f g | dm = 2dT V (, ) .
Exercise 13.2. Suppose that is a (at most) countable set, B := 2 , and {n }n=0 are probability measures on (, B ) . Let fn ( ) := n ({ }) for . Show 1 dT V (n , 0 ) = |fn ( ) f0 ( )| 2
and limn dT V (n , 0 ) = 0 i limn n ({ }) = 0 ({ }) for all . Notation 13.5 Suppose that X and Y are random variables, let dT V (X, Y ) := dT V (X , Y ) = sup |P (X A) P (Y A)| ,
ABR
m ((F (y ) , F (y +))) # (C )
where X = P X
and Y = P Y
More generally, if and are two probability measure on (R, BR ) such that ({x}) = 0 for all x R while concentrates on a countable set, then dT F (, ) = 1.
Page: 144
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
145
Denition 13.9. Let {F, Fn : n = 1, 2, . . . } be a collection of right continuous non-increasing functions from R to [0, 1] and by abuse of notation let us also denote the associated measures, F and Fn by F and Fn respectively. Then 1. Fn converges to F vaguely and write, Fn F, i Fn ((a, b]) F ((a, b]) for all a, b C (F ) . w 2. Fn converges to F weakly and write, Fn F, i Fn (x) F (x) for all x C (F ) . 3. We say F is proper, if F is a distribution function of a probability measure, i.e. if F () = 1 and F () = 0. Example 13.10. If Xn and U are as in Example 13.6 and Fn (y ) := P (Xn y ) v w and F (y ) := P (Y y ) , then Fn F and Fn F. Lemma 13.11. Let {F, Fn : n = 1, 2, . . . } be a collection of proper distribution v w functions. Then Fn F i Fn F. In the case where Fn and F are proper w and Fn F, we will write Fn = F. Proof. If Fn F, then Fn ((a, b]) = Fn (b) Fn (a) F (b) F (a) = v v F ((a, b]) for all a, b C (F ) and therefore Fn F. So now suppose Fn F and let a < x with a, x C (F ) . Then F (x) = F (a) + lim [Fn (x) Fn (a)] F (a) + lim inf Fn (x) .
n n w v
Example 13.13 (Central Limit Theorem). The central limit theorem (see the next chapter) states; if {Xn }n=1 are i.i.d. L2 (P ) random variables with := EX1 and 2 = Var (X1 ) , then Sn n d = N (0, ) = N (0, 1) . n Written out explicitly we nd lim P a< Sn n b n = P (a < N (0, 1) b) 1 = 2 or equivalently put 1 lim P n + na < Sn n + nb = n 2 More intuitively, we have Sn = n +
d b a b a
e 2 x dx
e 2 x dx.
nN (0, 1) = N n, n 2 .
Letting a , using the fact that F is proper, implies F (x) lim inf Fn (x) .
n
Lemma 13.14. Suppose X is a random variable, {cn }n=1 R, and Xn = X + cn . If c := limn cn exists, then Xn = X + c. Proof. Let F (x) := P (X x) and Fn (x) := P (Xn x) = P (X + cn x) = F (x cn ) . Clearly, if cn c as n , then for all x C (F ( c)) we have Fn (x) F (x c) . Since F (x c) = P (X + c x) , we see that Xn = X + c. Observe that Fn (x) F (x c) only for x C (F ( c)) but this is sucient to assert Xn = X + c. Example 13.15. Suppose that P (Xn = n) = 1 for all n, then Fn (y ) = 1yn 0 = F (y ) as n . Notice that F is not a distribution function because all 1 for all of the mass went o to +. Similarly, if we suppose, P (Xn = n) = 2 1 1 n, then Fn = 2 1[n,n) + 1[n,) 2 = F (y ) as n . Again, F is not a distribution function on R since half the mass went to while the other half went to +. Example 13.16. Suppose X is a non-zero random variables such that X = X, d n then Xn := (1) X = X for all n and therefore, Xn = X as n . On the other hand, Xn does not converge to X almost surely or in probability.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
d
Likewise, F (x) F (a) = lim [Fn (x) Fn (a)] lim sup [Fn (x) 1] = lim sup Fn (x) 1
n n n
which upon letting a , (so F (a) 1) allows us to conclude, F (x) lim sup Fn (x) .
n
Denition 13.12. A sequence of random variables, {Xn }n=1 is said to converge weakly or to converge in distribution to a random variable X (written Xn = X ) i Fn (y ) := P (Xn y ) = F (y ) := P (X y ) .
Page: 145
job: prob
146
The next theorem summarizes a number of useful equivalent characterizations of weak convergence. (The reader should compare Theorem 13.17 with Corollary 13.4.) In this theorem we will write BC (R) for the bounded continuous functions, f : R R (or f : R C) and Cc (R) for those f C (R) which have compact support, i.e. f (x) 0 if |x| is suciently large. Theorem 13.17. Suppose that {n }n=0 is a sequence of probability measures on (R, BR ) and for each n, let Fn (y ) := n ((, y ]) be the (proper) distribution function associated to n . Then the following are equivalent. 1. For all f BC (R) , f dn
R R
where the second equality in each of the equations holds because a and b are points of continuity of F0 . Hence we have shown that limn n ((a, b]) exists and is equal to 0 ((a, b]) .
f d0 as n .
(13.5)
2. Eq. (13.5) holds for all f BC (R) which are uniformly continuous. 3. Eq. (13.5) holds for all f Cc (R) . 4. Fn = F. 5. There exists a probability space (, B , P ) and random variables, Yn , on this 1 = n for all n and Yn Y0 a.s. space such that P Yn Proof. Clearly 1. = 2. = 3. and 5. = 1. by the dominated convergence theorem. Indeed, we have f dn = E [f (Yn )] E [f (Y )] =
R R D.C.T.
f d0
for all f BC (R) . Therefore it suces to prove 3. = 4. and 4. = 5. The proof of 4. = 5. will be the content of Skorohods Theorem 13.28 below. Given Skorohods Theorem, we will now complete the proof. (3. = 4.) Let < a < b < with a, b C (F0 ) and for > 0, let f (x) 1(a,b] and g (x) 1(a,b] be the functions in Cc (R) pictured in Figure 13.1. Then lim sup n ((a, b]) lim sup
n n R
Corollary 13.18. Suppose that {Xn }n=0 is a sequence of random variables, such that Xn X0 , then Xn = X0 . (Recall that example 13.16 shows the converse is in general false.) Proof. Let g BC (R) , then by Corollary 11.9, g (Xn ) g (X0 ) and since g is bounded, we may apply the dominated convergence theorem (see Corollary 11.8) to conclude that E [g (Xn )] E [g (X0 )] . Lemma 13.19. Suppose {Xn }n=1 is a sequence of random variables on a com P P
f dn =
R
f d0
(13.6)
g dn =
R
g d0 .
(13.7)
mon probability space and c R. Then Xn = c i Xn c. Proof. Recall that Xn c i for all > 0, P (|Xn c| > ) 0. Since {|Xn c| > } = {Xn > c + } {Xn < c } it follows Xn c i P (Xn > x) 0 for all x > c and P (Xn < x) 0 for all x < c. These conditions are also equivalent to P (Xn x) 1 for all x > c and P P (Xn x) P (Xn x ) 0 for all x < c (where x < x < c). So Xn c i
macro: svmonob.cls date/time: 23-Feb-2007/15:20
P P
Since f 1[a,b] and g 1(a,b) as 0, we may use the dominated convergence theorem to pass to the limit as 0 in Eqs. (13.6) and (13.7) to conclude, lim sup n ((a, b]) 0 ([a, b]) = 0 ((a, b])
n
and
Page: 146 job: prob
147
lim P (Xn x) =
{Fn }n=1
where F (x) = P (c x) = 1xc . Since C (F ) = R \ {c} , we have shown Xn c i Xn = c. We end this section with a few more equivalent characterizations of weak convergence. The combination of Theorem 13.17 and 13.20 is often called the Portmanteau Theorem. Theorem 13.20 (The Portmanteau Theorem). Suppose {Fn }n=0 are proper distribution functions. By abuse of notation, we will denote Fn (A) simply by Fn (A) for all A BR . Then the following are equivalent. 1. Fn = F0 . 2. lim inf n Fn (U ) F0 (U ) for open subsets, U R. 3. lim supn Fn (C ) F0 (C ) for all closed subsets, C R. 4. limn Fn (A) = F0 (A) for all A BR such that F0 (A) = 0. Proof. (1. = 2.) By Theorem 13.28 we may choose random variables, Yn , such that P (Yn y ) = Fn (y ) for all y R and n N and Yn Y0 a.s. as n . Since U is open, it follows that 1U (Y ) lim inf 1U (Yn ) a.s.
n
In particular, it follows that sup |F ((a, b]) Fn ((a, b])| = sup |F (b) F (a) (Fn (b) Fn (a))|
a<b a<b
0 as n . Hints for part 2. Given > 0, show that there exists, = 0 < 1 < < n = , such that |F (i+1 ) F (i )| for all i. Now show, for x [i , i+1 ), that |F (x) Fn (x)| (F (i+1 ) F (i ))+|F (i ) Fn (i )|+(Fn (i+1 ) Fn (i )) .
and so by Fatous lemma, F (U ) = P (Y U ) = E [1U (Y )] lim inf E [1U (Yn )] = lim inf P (Yn U ) = lim inf Fn (U ) .
n n n
(2. 3.) This follows from the observations: 1) C R is closed i U := C c is open, 2) F (U ) = 1 F (C ) , and 3) lim inf n (Fn (C )) = lim supn Fn (C ) . with F0 A \ Ao = 0. (2. and 3. 4.) If F0 (A) = 0, then Ao A A Therefore F0 A = F0 (A) . F0 (A) = F0 (Ao ) lim inf Fn (Ao ) lim sup Fn A
n n
(4. = 1.) Let a, b C (F0 ) and take A := (a, b]. Then F0 (A) = F0 ({a, b}) = 0 and therefore, limn Fn ((a, b]) = F0 ((a, b]) , i.e. Fn = F0 . Exercise 13.3. Suppose that F is a continuous proper distribution function. Show, 1. F : R [0, 1] is uniformly continuous.
Page: 147 job: prob
We will begin by showing f is lower semi-continuous, i.e. f a is closed (or equivalently f > a is open) for all a R. Indeed, if f (x) > a, then there exists y Bx ( ) such that f (y ) > a. Since this y is in Bx ( ) whenever d (x, x ) < d (x, y ) (because then, d (x , y ) d (x, y ) + d (x, x ) < ) it follows that f (x ) > a for all x Bx ( d (x, y )) . This shows f > a is open in X. We similarly dene f : X R {} by f (x) := Since f = (f ) , it follows that {f a} = (f ) a
macro: svmonob.cls date/time: 23-Feb-2007/15:20
y Bx ( )
inf
f (y ) .
148
is closed for all a R, i.e. f is upper semi-continuous. Moreover, f f f for all > 0 and f f 0 and f f0 as 0, where f0 f f 0 and f0 : X R {} and f 0 : X R {} are measurable functions. The proof is now complete since it is easy to see that D (f ) = f > f0 = f f0 = 0 BX .
0 0
where M = sup |f | . Since, Xn = X, we know E [f (Xn , c)] E [f (X, c)] and hence we have shown, lim sup |E [f (Xn , Yn ) f (X, c)]|
n
lim sup |E [f (Xn , Yn ) f (Xn , c)]| + lim sup |E [f (Xn , c) f (X, c)]| .
n n
Remark 13.22. Suppose that xn x with x C (f ) := D (f ) . Then f (xn ) f (x) as n . Theorem 13.23 (Continuous Mapping Theorem). Let f : R R be a Borel measurable functions. If Xn = X0 and P (X0 D (f )) = 0, then f (Xn ) = f (X0 ) . If in addition, f is bounded, Ef (Xn ) Ef (X0 ) . Proof. Let {Yn }n=0 be random variables on some probability space as in Theorem 13.28. For g BC (R) we observe that D (g f ) D (f ) and therefore, P (Y0 D (g f )) P (Y0 D (f )) = P (X0 D (f )) = 0. Hence it follows that g f Yn g f Y0 a.s. So an application of the dominated convergence theorem (see Corollary 11.8) implies E [g (f (Xn ))] = E [g (f (Yn ))] E [g (f (Y0 ))] = E [g (f (X0 ))] . (13.8)
Since > 0 was arbitrary, we learn that limn Ef (Xn , Yn ) = Ef (X, c) . Now suppose f BC R2 with f 0 and let k (x, y ) [0, 1] be continuous functions with compact support such that k (x, y ) = 1 if |x| |y | k and k (x, y ) 1 as k . Then applying what we have just proved to fk := k f, we nd E [fk (X, c)] = lim E [fk (Xn , Yn )] lim inf E [f (Xn , Yn )] .
n n
Letting k in this inequality then implies that E [f (X, c)] lim inf E [f (Xn , Yn )] .
n
This inequality with f replaced by M f 0 then shows, M E [f (X, c)] lim inf E [M f (Xn , Yn )] = M lim sup E [f (Xn , Yn )] .
n n
Hence we have shown, lim sup E [f (Xn , Yn )] E [f (X, c)] lim inf E [f (Xn , Yn )]
n n
This proves the rst assertion. For the second assertion we take g (x) = (x M ) (M ) in Eq. (13.8) where M is a bound on |f | . Theorem 13.24 (Slutzkys Theorem). Suppose that Xn = X and P Yn c where c is a constant. Then (Xn , Yn ) = (X, c) in the sense that E [f (Xn , Yn )] E [f (X, c)] for all f BC R2 . In particular, by taking f (x, y ) = g (x + y ) and f (x, y ) = g (x y ) with g BC (R) , we learn Xn + Yn = X + c and Xn Yn = X c respectively. Proof. First suppose that f Cc R2 , and for > 0, let := () be chosen so that |f (x, y ) f (x , y )| if (x, y ) (x , y ) . Then |E [f (Xn , Yn ) f (Xn , c)]| E [|f (Xn , Yn ) f (Xn , c)| : |Yn c| ] + E [|f (Xn , Yn ) f (Xn , c)| : |Yn c| > ] + 2M P (|Yn c| > ) as n ,
and therefore limn E [f (Xn , Yn )] = E [f (X, c)] for all f BC R2 with f 0. This completes the proof since any f BC R2 may be written as a dierence of its positive and negative parts. Theorem 13.25 ( method). Suppose that {Xn }n=1 are random variables, b R, an R\ {0} with limn an = 0, and Xn b = Z. an If g : R R be a measurable function which is dierentiable at b, then g (Xn ) g (b) = g (b) Z. an Proof. Observe that Xn b = an Xn b = 0 Z = 0 an
date/time: 23-Feb-2007/15:20
Page: 148
job: prob
macro: svmonob.cls
149
so that Xn = b and hence Xn b. By denition of the derivative of g at b, we have g (x + ) = g (b) + g (b) + () where () 0 as 0. Let Yn and Y be random variables on a xed probability space such that Yn = Xn = an Yn + b, so that g (Xn ) g (b) d g (an Yn + b) g (b) an Yn (an Yn ) = = g (b) Yn + an an an = g (b) Yn + Yn (an Yn ) g (b) Y a.s. This completes the proof since g (b) Y = g (b) Z. Example 13.26. Suppose that {Un }n=1 are i.i.d. random variables which are uniformly distributed on [0, 1] and let Yn := j =1 Ujn . Our goal is to nd an bn is weakly convergent to a non-constant random variable. and bn such that Yna n To this end, let n 1 ln Uj . Xn := ln Yn = n j =1 By the strong law of large numbers, lim Xn = E [ln U1 ] =
0 a.s. 1 a.s. 1 n
1
Yn e
1 n
= e1 N (0, 1) = N 0, e2 .
Xn b an
Uj e1 = N 0, e2 .
1 n
Exercise 13.4. Given a function, f : X R and a point x X, let lim inf f (y ) := lim
y x y x 0 y Bx ( ) 0 y B ( ) x
inf
f (y ) and
(13.9) (13.10)
lim sup f (y ) := lim sup f (y ) , where Bx ( ) := {y X : 0 < d (x, y ) < } . Show f is lower (upper) semi-continuous i lim inf yx f (y ) lim supyx f (y ) f (x) for all x X.
f (x)
Solution to Exercise (13.4). Suppose Eq. (13.9) holds, a R, and x X such that f (x) > a. Since, lim
0 y Bx ( )
inf
ln xdx = [x ln x x]0 = 1
.
1
E ln2 U1 =
0 2
ln2 xdx = 2
it follows that inf yBx () f (y ) > a for some > 0. Hence we may conclude that Bx ( ) {f > a} which shows {f > a} is open. Conversely, suppose now that {f > a} is open for all a R. Given x X and a < f (x) , there exists > 0 such that Bx ( ) {f > a} . Hence it follows that lim inf yx f (y ) a and then letting a f (x) then implies lim inf yx f (y ) f (x) .
so that Var (ln U1 ) = 2 (1) = 1. Hence by the central limit theorem, Xn (1)
1 n
150
order to nish the proof it suces to show, Yn (x) Y (x) for all x / E, where E is the countable null set dened as above, E := {x (0, 1) : Y (x) < Y + (x)} . We now suppose x / E. If y C (F0 ) with y < Y (x) , we have limn Fn (y ) = F0 (y ) < x and in particular, Fn (y ) < x for almost all n. This implies that Yn (x) y for a.a. n and hence that lim inf n Yn (x) y. Letting y Y (x) with y C (F0 ) then implies lim inf Yn (x) Y (x) .
n
Similarly, for x / E and y C (F0 ) with Y (x) = Y + (x) < y, we have limn Fn (y ) = F0 (y ) > x and in particular, Fn (y ) > x for almost all n. This implies that Yn (x) y for a.a. n and hence that lim supn Yn (x) y. Letting y Y (x) with y C (F0 ) then implies We will need the following simple observations about Y and Y + which are easily understood from Figure 13.4. 1. Y (x) Y + (x) and Y (x) < Y + (x) i x is the height of a at spot of F. 2. The set, E := {x (0, 1) : Y (x) < Y + (x)} , of at spot heights is at most countable. This is because, {(Y (x) , Y + (x))}xE is a collection of pairwise disjoint intervals which is necessarily countable. (Each such interval contains a rational number.) 3. The following inequality holds, F (Y (x) ) x F (Y (x)) for all x (0, 1) . (13.11) lim sup Yn (x) Y (x) .
n
Hence we have shown, for x / E, that lim sup Yn (x) Y (x) lim inf Yn (x)
n n
which shows
n lim Fn (x) = lim Yn (x) = Y (x) = F (x) for all x / E. n
(13.12)
Indeed, if y > Y (x) , then F (y ) x and by right continuity of F it follows that F (Y (x)) x. Similarly, if y < Y (x) , then F (y ) < x and hence F (Y (x) ) x. 4. {x (0, 1) : Y (x) y0 } = (0, F (y0 )] (0, 1) . To prove this assertion rst suppose that Y (x) y0 , then according to Eq. (13.11) we have x F (Y (x)) F (y0 ) , i.e. x (0, F (y0 )] (0, 1) . Conversely, if x (0, 1) and x F (y0 ) , then Y (x) y0 by denition of Y. 5. As a consequence of item 4. we see that Y is B(0,1) /BR measurable and m Y 1 = F, where m is Lebesgue measure on (0, 1) , B(0,1) . Theorem 13.28 (Baby Skorohod Theorem). Suppose that {Fn }n=0 is a collection of distribution functions such that Fn = F0 . Then there ex ists a probability space, (, B , P ) and random variables, {Yn }n=1 such that P (Yn y ) = Fn (y ) for all n N {} and limn Fn = limn Yn = Y = F a.s. Proof. We will take := (0, 1) , B = B(0,1) , and P = m Lebesgue measure on and let Yn := Fn and Y := F0 as in Notation 13.27. Because of the above comments, P (Yn y ) = Fn (y ) and P (Y y ) = F0 (y ) for all y R. So in
Page: 150 job: prob
Denition 13.29. Two random variables, Y and Z, are said to be of the same type if there exists constants, A > 0 and B R such that Z = AY + B.
d
(13.13)
Alternatively put, if U (y ) := P (Y y ) and V (y ) := P (Z y ) , then U and V should satisfy, U (y ) = P (Y y ) = P (Z Ay + B ) = V (Ay + B ) . For the next theorem we will need the following elementary observation. Lemma 13.30. If Y is non-constant (a.s.) random variable and U (y ) := P (Y y ) , then U (1 ) < U (2 ) for all 1 suciently close to 0 and 2 suciently close to 1. Proof. Observe that Y is constant i U (y ) = 1yc for some c R, i.e. i U only takes on the values, {0, 1} . So since Y is not constant, there exists y R such that 0 < U (y ) < 1. Hence if 2 > U (y ) then U (2 ) y and if 1 < U (y ) then U (1 ) y. Moreover, if we suppose that 1 is not the height of a at spot of U, then in fact, U (1 ) < U (2 ) . This inequality then remains valid as 1 decreases and 2 increases.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
151
Theorem 13.31 (Convergence of Types). Suppose is a sequence of random variables and an , n (0, ) , bn , n R are constants and Y and Z are non-constant random variables. Then 1. if X n bn = Y an and Xn n = Z, n n n bn A = lim (0, ) and B := lim n an n an
d
{Xn }n=1
X n bn y an
= Fn (an y + bn ) and P
Xn n y n
= Fn (n y + n ) .
(13.14)
Fn (x) bn . an
exists and Y = AZ + B. 2. If the relations in Eq. (13.16) hold then either of the convergences in Eqs. (13.14) or (13.15) implies the others with Z and Y related by Eq. (13.13). 3. If there are some constants, an > 0 and bn R and a non-constant random variable Y, such that Eq. (13.14) holds, then Eq. (13.15) holds using n and n of the form,
n := Fn (2 ) Fn (1 ) and n := Fn (1 )
Fn (x) n . n With these identities, it now follows from the proof of Skorohods Theorem 13.28 (see Eq. (13.12)) that there exists an at most countable subset, , of (0, 1) such that,
sup {y : Fn (n y + n ) < x} =
(x) bn Fn = sup {y : Fn (an y + bn ) < x} U (x) and an (x) n Fn = sup {y : Fn (n y + n ) < x} V (x) n
(13.17)
for some 0 < 1 < 2 < 1. If the Fn are invertible functions, Eq. (13.17) may be written as Fn (n ) = 1 and Fn (n + n ) = 2 . (13.18)
for all x / . Since Y and Z are not constants a.s., we can choose, by Lemma 13.30, 1 < 2 not in such that U (1 ) < U (2 ) and V (1 ) < V (2 ) . In particular it follows that
F (2 ) bn F (1 ) bn Fn (2 ) Fn (1 ) = n n an an an U (2 ) U (1 ) > 0
(13.19)
Proof. (2) Assume the limits in Eq. (13.16) hold. If Eq. (13.14) is satised, then by Slutskys Theorem 13.20, Xn bn + bn n an Xn n = n an n Xn bn an n bn an = an n an n 1 = A (Y B ) =: Z Similarly, if Eq. (13.15) is satised, then Xn n n n bn X n bn = + = AZ + B =: Y. an n an an (1) If Fn (y ) := P (Xn y ) , then
and similarly
Fn (2 ) Fn (1 ) V (2 ) V (1 ) > 0. n
Taking ratios of the last two displayed equations shows, U (2 ) U (1 ) n A := (0, ) . an V (2 ) V (1 ) Moreover,
Fn (1 ) bn U (1 ) and an Fn (1 ) n F (1 ) n n = n AV (1 ) an n an
(13.20)
Page: 151
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
152
and therefore, n bn F (1 ) n F (1 ) bn = n n AV (1 ) U (1 ) := B. an an an
Fn (1 ) and n := (2 ) Fn (3) Now suppose that we dene n := Fn (1 ) , then according to Eqs. (13.19) and (13.20)we have
From this it follows that bn 1 ln n. Given this, we now try to nd an by requiring, P Mn b n 1 an = Fn (an + bn ) = [F (an + bn )] 2 (0, 1) .
n
n /an U (2 ) U (1 ) (0, 1) and n bn U (1 ) as n . an Thus we may always center and scale the {Xn } using n and n of the form described in Eq. (13.17).
However, by what we have done above, this requires an + bn 1 ln n. Hence we may as well take an to be constant and for simplicity we take an = 1. 2. We now compute
n
1 e(x+ 1 ex n
ln n)
= exp ex .
Notice that F (x) is a distribution function for some random variable, Y, and therefore we have shown Mn 1 ln n = Y as n
where P (Y x) = exp ex . Example 13.33. For p (0, 1) , let Xp denote the number of trials to get success in a sequence of independent trials with success probability p. Then n P (Xp > n) = (1 p) and therefore for x > 0, P (pXp > x) = P Xp >
x
=
j =1
P (Xj x) = [F (x)] = 1 ex
Mn bn an
x p
x x = (1 p)[ p ] = e[ p ] ln(1p)
ep[ p ] ex as p 0. = Y. Therefore pXp = T where T = exp (1) , i.e. P (T > x) = ex for x 0 or alternatively, P (T y ) = 1 ey0 . Remarks on this example. Let us see in a couple of ways where the appropriate centering and scaling of the Xp come from in this example. For n1 this let q = 1 p, then P (Xp = n) = (1 p) p = q n1 p for n N. Also let Fp (x) = P (Xp x) = P (Xp [x]) = 1 q [x]
n d
We now wish to nd an > 0 and bn R such that 1. To this end we note that P Mn b n x an = P (Mn an x + bn )
= Fn (an x + bn ) = [F (an x + bn )] . If we demand (c.f. Eq. (13.18) above) P Mn bn 0 an = Fn (bn ) = [F (bn )] 1 (0, 1) ,
where [x] := n=1 n 1[n,n+1) . Method 1. Our goal is to choose ap > 0 and bp R such that limp 0 Fp (ap x + bp ) exists. As above, we rst demand (taking x = 0) that
p 0
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
153
Since, 1 Fp (bp ) 1 q bp we require, q bp 1 1 and hence, c bp ln q = bp ln (1 p) bp p. This suggests that we take bp = 1/p say. Having done this, we would like to choose ap such that F0 (x) := lim Fp (ap x + bp ) exists.
p 0
|z=1
pz . 1 qz
Since, F0 (x) Fp (ap x + bp ) 1 q this requires that (1 p) and hence that ln (1 F0 (x)) = (ap x + bp ) ln q (ap x + bp ) (p) = pap x 1. From this (setting x = 1) we see that pap c > 0. Hence we might take ap = 1/p as well. We then have
p1 (x+1)] Fp (ap x + bp ) = Fp p1 x + p1 = 1 (1 p)[ ap x+bp ap x+bp
d pz p (1 qz ) + qpz p = = 2 2 dz 1 qz (1 qz ) (1 qz ) d2 pz pq =2 3 dz 2 1 qz (1 qz ) p
2
and =q
ap x+bp
p := EXp =
1 and p
p1 (x + 1) ln (1 p) exp ( (x + 1)) .
2q 1 + p2 p
1 p
q 1p 2q + p 1 = 2 = . 2 p p p2
Thus, if we had used p and p to center and scale Xp we would have considered, Xp
1p p 1 p
pXp 1 = T 1 = 1p
or again that pXp = T. Method 2. (Center and scale using the rst moment and the variance of Xp .) The generating function is given by
instead. Theorem 13.34. Let {Xn }n=1 be i.i.d. random variables such that P (Xn = 1) = 1/2 and let Sn := X1 + + Xn the position of a drunk after n steps. Observe that |Sn | is an odd integer if n is odd and an even Sm integer if n is even. Then = N (0, 1) as m . m Proof. (Sketch of the proof.) We start by observing that S2n = 2k i # {i 2n : Xi = 1} = n + k while # {i 2n : Xi = 1} = 2n (n + k ) = n k and therefore,
macro: svmonob.cls date/time: 23-Feb-2007/15:20
f (z ) := E z Xp =
n=1
z n q n1 p =
pz . 1 qz
Observe that f (z ) is well dened for |z | < 1 q and that f (1) = 1, reecting the fact that P (Xp N) = 1, i.e. a success must occur almost surely. Moreover, we have f (z ) = E Xp z f
(k ) Xp 1
, f (z ) = E Xp (Xp 1) z
Xp k
Xp 2
,...
(z ) = E Xp (Xp 1) . . . (Xp k + 1) z
Page: 153
job: prob
154
P (S2n = 2k ) =
2n n+k
1 2
2n
(2n)! (n + k )! (n k )!
1 2
2n
2k with k {0, 1, . . . , n} . Since where the sum is over x of the form, x = 2n 2 is the increment of x as k increases by 1 , we see the latter expression in 2n Eq. (13.21) is the Riemann sum approximation to
1 2 This proves
S2n 2n
b a
ex
/2
dx.
= N (0, 1) . Since 1 1+
1 2n
(2n) n (n + k ) (n k ) 1 1+ 1
k n
2n 2n
4n
nk (nk) k) e
2 (n + k ) (n 1+ k n
(n+k)
2 (n k )
1 2
2n
S S n + X2n+1 S2 n 2n+1 = 2 = 2n + 1 2n + 1 2n
X2n+1 + , 2n + 1
S 2n+1 2n+1
1 k2 n2
n
k n k n
(nk)
1 = n 1 = n
1
n
k n
1 k n
1+ 1
1
k1/2
k n
= N (0, 1)
k2 n2
k1/2
1+
k n
.
x , 2n
Proposition 13.35. Suppose that {Un }n=1 are i.i.d. random variables which are uniformly distributed in (0, 1) . Let U(k,n) denote the position of the k th largest number from the list, {U1 , U2 , . . . , Un } . Further let k (n) be chosen so n) that limn k (n) = while limn k( n = 0 and let Xn := U(k(n),n) k (n) /n .
k ( n) n
we have
n/21/2
Then dT V (Xn , N (0, 1)) 0 as n . Proof. (Sketch only. See Resnick, Proposition 8.2.1 for more details.) Observe that, for x (0, 1) , that
n n
P U(k,n) x = P
i=1
Xi k
=
l=k
n l nl x (1 x) . l
d From this it follows that n (x) := 1(0,1) (x) dx P U(k,n) x is the probability density for U(k,n) . It now turns out that n (x) is a Beta distribution,
n (x) = P S 2n = x 2n ex
axb
2
n nk k xk1 (1 x) . k
1 = 2
/2
2 2n
(13.21)
Giving a direct computation of this result is not so illuminating. So let us go another route. To do this we are going to estimate, P U(k,n) (x, x + ] , for (0, 1) . Observe that if U(k,n) (x, x + ], then there must be at least one Ui (x, x + ], for otherwise, U(k,n) x + would imply U(k,n) x as well and hence U(k,n) / (x, x + ]. Let
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 154
job: prob
155
i := {Ui (x, x + ] and Uj / (x, x + ] for j = i} . Since P (Ui , Uj (x, x + ] for some i = j with i, j n)
i<j n
we arrive at n n! (k 1)! (n k )! 2 1
k (k1/2) n k (nk+1/2) n
P (Ui , Uj (x, x + ]) n2 n 2 , 2
we see that
n
P U(k,n) (x, x + ] =
i=1
= nP U(k,n) (x, x + ], 1 + O 2 . Now on the set, 1 ; U(k,n) (x, x + ] i there are exactly k 1 of U2 , . . . , Un in [0, x] and n k of these in [x + , 1] . This leads to the conclusion that P U(k,n) n 1 k1 nk (x, x + ] = n x (1 (x + )) + O 2 k1
k ( n) dx, n
x = k (n) at u = 0, and
n k (n)
k (n) n
n k (n)
k (n) n
=: bn ,
and therefore, P U(k,n) (x, x + ] n! nk = xk1 (1 x) . n (x) = lim 0 (k 1)! (n k )! By Stirlings formula, n! (k 1)! (n k )! (k 1) 1 ne = 2 1 ne = 2 Since k1 n
(k1/2) 1
E [F (Xn )] =
0
u k ( n ) /n du n (u) F
k ( n) n
bn
= nn en 2n 2 (k 1) (n k ) 1
k1 n nk (nk) n (nk) (nk) e
k ( n)
k (n) n n
Using this information, it is then shown in Resnick that 2 (n k ) k (n) n n k (n) x + k (n) /n n ex /2 2
2
(k1) (k1) e
k1 (k1) n
nk n
1
k1 (k1/2) n
. Remark 13.36. It is possible to understand the normalization constants in the denition of Xn by computing the mean and the variance of U(n,k) . After some computations (see Chapter ??), one arrives at
= =
k n k n
(k1/2)
(k1/2)
k1 k 1 k
(k1/2)
(k1/2)
1 k n
(k1/2)
e1
Page: 155
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
156
EU(k,n) = =
2 EU( k,n) =
= Var U(k,n) = =
for all . If, for all x R, we dene F = G+ as in Eq. (13.22), then Fn (x) F (x) for all x C (F ) . (Note well; as we have already seen, it is possible that F () < 1 and F () > 0 so that F need not be a distribution function for a measure on (R, BR ) .) Proof. Suppose that x, y R with x < y and and s, t are chosen so that x < s < y < t. Then passing to the limit in the inequality, Fn (s) Fn (y ) Fn (t) implies F (x) = G+ (x) G (s) lim inf Fn (y ) lim sup Fn (y ) G (t) .
n n
Taking the innum over t (y, ) and then letting x R tend up to y, we may conclude F (y ) lim inf Fn (y ) lim sup Fn (y ) F (y ) for all y R.
n n
This completes the proof, since F (y ) = F (y ) for y C (F ) . , BR The next theorem deals with weak convergence of measures on R . So with as not have to introduce any new machinery, the reader should identify R [1, 1] R via the map, [1, 1] x tan . x R 2
Lemma 13.37. If G : R is a non-decreasing function, then F (x) := G+ (x) := inf {G () : x < } is a non-decreasing right continuous function. Proof. To show F is right continuous, let x R and such that > x. Then for any y (x, ) , F (x) F (y ) = G+ (y ) G () and therefore, F (x) F (x+) := lim F (y ) G () .
y x
(13.22)
, BR Hence a probability measure on R may be identied with a probability measure on (R, BR ) which is supported on [1, 1] . Using this identication, we see that a should only be considered a point of continuity of a distribution [0, 1] i and only if F () = 0. On the other hand, is function, F : R always a point of continuity. Theorem 13.39 (Hellys Selection Theorem). Every sequence of probabil , BR ity measures, {n }n=1 , on R has a sub-sequence which is weakly conver , BR gent to a probability measure, 0 on R . Proof. Using the identication described above, rather than viewing n as , BR probability measures on R , we may view them as probability measures on (R, BR ) which are supported on [1, 1] , i.e. n ([1, 1]) = 1. As usual, let Fn (x) := n ((, x]) = n ((, x] [1, 1]) . Since {Fn (x)}n=1 [0, 1] and [0, 1] is compact, for each x R we may nd a convergence subsequence of {Fn (x)}n=1 . Hence by Cantors diagonalization
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Since > x with is arbitrary, we may conclude, F (x) F (x+) G+ (x) = F (x) , i.e. F (x+) = F (x) . Proposition 13.38. Suppose that {Fn }n=1 is a sequence of distribution functions and R is a dense set such that G () := limn Fn () [0, 1] exists
Page: 156 job: prob
157
argument we may nd a subsequence, {Gk := of the such that G (x) := limk Gk (x) exists for all x := Q. Letting F (x) := G (x+) as in Eq. (13.22), it follows from Lemma 13.37 and Proposition 13.38 that Gk = Fnk = F0 . Moreover, since Gk (x) = 0 for all x Q (, 1) and Gk (x) = 1 for all x Q [1, ). Therefore, F0 (x) = 1 for all x 1 and F0 (x) = 0 for all x < 1 and the corresponding measure, 0 is supported on [1, 1] . Hence 0 may now be transferred back to a measure , BR on R . Example 13.40. Suppose n = and n = and 1 2 (n + n ) = 1 ( + ) . This shows that probability may indeed transfer to the points 2 at . The next question we would like to address is when is the limiting measure, , BR 0 on R concentrated on R. The following notion of tightness is the key to answering this question. Denition 13.41. A collection of probability measures, , on (R, BR ) is tight i for every > 0 there exists M < such that
Fnk }k=1
{Fn }n=1
we may nd M < such that M , M C (F0 ) and n ([M , M ]) 1 for all n. Hence it follows that 0 ([M , M ]) = lim nk ([M , M ]) 1
k
and by letting 0 we conclude that 0 (R) = lim0 0 ([M , M ]) = 1. Conversely, suppose there is a subsequence {nk }k=1 such that nk = 0 , BR with 0 being a probability measure on R such that 0 (R) < 1. In this case 0 := 0 ({, }) > 0 and hence for all M < we have 0 ({, }) = 1 0 . 0 ([M, M ]) 0 R By choosing M so that M and M are points of continuity of F0 , it then follows that lim nk ([M, M ]) = 0 ([M, M ]) 1 0 .
k
Therefore,
nN
inf ([M , M ]) 1 .
(13.23)
We further say that a collection of random variables, {X : } is tight 1 i the collection probability measures, P X : is tight. Equivalently put, {X : } is tight i
M
(13.24)
Observe that the denition of uniform integrability (see Denition 11.25) is considerably stronger than the notion of tightness. It is also worth observing that if > 0 and C := sup E |X | < , then by Chebyschevs inequality, 1 C sup P (|X | M ) sup E |X | 0 as M M M and therefore {X : } is tight. Theorem 13.42. Let := {n }n=1 be a sequence of probability measures on , BR (R, BR ) . Then is tight, i every subsequently limit measure, 0 , on R is supported on R. In particular if is tight, there is a weakly convergent subsequence of converging to a probability measure on (R, BR ) . Proof. Suppose that nk = 0 with 0 being a probability measure on , BR R . As usual, let F0 (x) := 0 ([, x]) . If is tight and > 0 is given,
Page: 157 job: prob
4. lim inf n Pn (G) P (G) for all G o X. 5. limn Pn (A) = P (A) for all A B such that P (bd(A)) = 0.
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
158
i=1
(i 1) 1{ (i1) f < i } f k k k
i=1
i 1 (i1) i . k { k f < k }
(13.26)
(13.25)
Let Fi :=
k
i k
and let fn (x) := (nd(x, F )). Then fn BC (X, [0, 1]) is uniformly continuous, 0 1F fn for all n and fn 1F as n . Passing to the limit n in the equation 0 Pn (F ) Pn (fm ) gives 0 lim sup Pn (F ) P (fm )
n
i=1
(i 1) [P (Fi1 ) P (Fi )] P (f ) k
i=1
i [P (Fi1 ) P (Fi )] . k
(13.27)
and then letting m in this inequality implies item 3. 3. 4. Assuming item 3., let F = Gc , then 1 lim inf Pn (G) = lim sup(1 Pn (G)) = lim sup Pn (Gc )
n n c n
=
i=1 k1
P (G ) = 1 P (G) \ Ao , which implies 4. Similarly 4. = 3. 3. 5. Recall that bd(A) = A so if P (bd(A)) = 0 and 3. (and hence also 4. holds) we have ) P (A ) = P (A) and lim sup Pn (A) lim sup Pn (A
n n
=
i=1
P (Fi )
i=1
= X and =
i=1 i=1 k1
from which it follows that limn Pn (A) = P (A). Conversely, let F set F := {x X : (x, F ) } . Then bd(F ) F \ {x X : (x, F ) < } = A
P (Fi ) P (f )
i=1
1 k
k1
P (Fi ) + 1/k.
i=1
and in particular the set := { > 0 : P (A ) > 0} is at most countable. Let n / be chosen so that n 0 as n , then P (Fm ) = lim Pn (Fm ) lim sup Pn (F ).
n n
Using this equation with P = Pn and then with P = P we nd lim sup Pn (f ) lim sup
n n k1
1 k
k1
Pn (Fi ) + 1/k
i=1
Let m in this equation to conclude P (F ) lim supn Pn (F ) as desired. To nish the proof we will now show 3. = 1. By an ane change of variables it suces to consider f C (X, (0, 1)) in which case we have
Page: 158 job: prob macro: svmonob.cls
1 k
date/time: 23-Feb-2007/15:20
159
Since k is arbitrary, lim supn Pn (f ) P (f ). Replacing f by 1 f in this inequality also gives lim inf n Pn (f ) P (f ) and hence we have shown limn Pn (f ) = P (f ) as claimed. Theorem 13.45 (Skorohod Theorem). Let (X, d) be a separable metric space and {n }n=0 be probability measures on (X, BX ) such that n = 0 as n . Then there exists a probability space, (, B , P ) and measurable func1 tions, Yn : X, such that n = P Yn for all n N0 := N {0} and limn Yn = Y a.s. Proof. See Theorem 4.30 on page 79 of Kallenberg [3]. Denition 13.46. Let X be a topological space. A collection of probability measures on (X, BX ) is said to be tight if for every > 0 there exists a compact set K BX such that P (K ) 1 for all P . Theorem 13.47. Suppose X is a separable metrizable space and = {Pn }n=1 is a tight sequence of probability measures on BX . Then there exists a subse quence {Pnk }k=1 which is weakly convergent to a probability measure P on BX . Proof. First suppose that X is compact. In this case C (X ) is a Banach space which is separable by the Stone Weirstrass theorem, see Exercise ??. By the Riesz theorem, Corollary ??, we know that C (X ) is in one to one correspondence with the complex measures on (X, BX ). We have also seen that C (X ) is metrizable and the unit ball in C (X ) is weak - * compact, see Theo rem ??. Hence there exists a subsequence {Pnk }k=1 which is weak -* convergent to a probability measure P on X. Alternatively, use the cantors diagonaliza tion procedure on a countable dense set C (X ) so nd {Pnk }k=1 such that (f ) := limk Pnk (f ) exists for all f . Then for g C (X ) and f , we have |Pnk (g ) Pnl (g )| |Pnk (g ) Pnk (f )| + |Pnk (f ) Pnl (f )| + |Pnl (f ) Pnl (g )| 2 g f + |Pnk (f ) Pnl (f )| which shows lim sup |Pnk (g ) Pnl (g )| 2 g f
n
n (A) := P n (A X ) for all A BX by setting P . By what we have just proved, := P n such that P converges weakly to a there is a subsequence P k k =1 k k probability measure P on X. The main thing we now have to prove is that (X ) = 1, this is where the tightness assumption is going to be used. Given P n (K ) 1 for all n. Since > 0, let K X be a compact set such that P K is compact in X it is compact in X as well and in particular a closed subset Therefore by Proposition 13.44 of X. (K ) lim sup P (K ) = 1 . P k
k
Since > 0 is arbitrary, this shows with X0 := n=1 K1/n satises P (X0 ) = 1. , we may view P as a measure on B by letting P (A) := Because X0 BX BX X (A X0 ) for all A BX . Given a closed subset F X, choose F X such P X. Then that F = F (F ) P (F ) = P (F X0 ) = P (F ), lim sup Pk (F ) = lim sup P k
k k
which shows Pk P.
Letting f tend to g in C (X ) shows lim supn |Pnk (g ) Pnl (g )| = 0 and hence (g ) := limk Pnk (g ) for all g C (X ). It is now clear that (g ) 0 for all g 0 so that is a positive linear functional on X and thus there is a probability measure P such that (g ) = P (g ). General case. By Theorem 18.34 we may assume that X is a subset of We now extend Pn to X a compact metric space which we will denote by X.
Page: 159 job: prob macro: svmonob.cls date/time: 23-Feb-2007/15:20
d ( ) (x) =
Rn
u (x y ) d (y ) dx.
eix d (x)
be the Fourier transform or characteristic function of . If X = (X1 , . . . , Xn ) : Rn is a random vector on some probability space (, B , P ) , then we let f () := fX () := E eiX . Of course, if := P X 1 , then fX () = () . Notation 14.2 Given a measure on a measurable space, (, B ) and a function, f L1 () , we will often write (f ) for f d. Denition 14.3. Let and be two probability measure on (Rn , BRn ) . The 1 convolution of and , denoted , is the measure, P (X + Y ) where 1 {X, Y } are two independent random vectors such that P X = and P Y 1 = . Of course we may give a more direct denition of the convolution of and by observing for A BRn that (A) = P (X + Y A) =
Rn
u (x y ) v (y ) dy dx.
u (x y ) v (y ) dy =
Rn
v (x y ) u (y ) dy.
Example 14.5. Suppose that n = 1, d (x) = 1[0,1] (x) dx and d (x) = 1[1,0] (x) dx so that (A) = (A) . In this case d ( ) (x) = 1[0,1] 1[1,0] (x) dx where 1[0,1] 1[1,0] (x) =
R
d (x)
Rn
d (y ) 1A (x + y )
1[0,1]+x (y ) 1[0,1] (y ) dy
=
Rn
(A x) d (x) (A x) d (x) .
Rn
Remark 14.4. Suppose that d (x) = u (x) dx where u (x) 0 and u (x) dx = 1. Then using the translation invariance of Lebesgue meaRn sure and Tonellis theorem, we have (f ) =
Rn Rn
f (x + y ) u (x) dxd (y ) =
Rn Rn
f (x) u (x y ) dxd (y )
k 0 for all (1 , . . . , m ) Cm . f (j k ) j
j,k=1
162
Notation 14.7 For l N {0} , let C l (Rn , C) denote the vector space of functions, f : Rn C which are l - time continuously dierentiable. More explicitly, , then f C l (Rn , C) i the partial derivatives, j1 . . . jk f, exist if j := x j and are continuous for k = 1, 2, . . . , l and all j1 , . . . , jk {1, 2, . . . , n} . Proposition 14.8 (Basic Properties of ). Let and be two probability measures on (Rn , BRn ) , then; 1. (0) = 1, and | ()| 1 for all . 2. () is continuous. 3. () = () for all Rn and in particular, is real valued i is symmetric, i.e. i (A) = (A) for all A BRn . (If = P X 1 for some random vector X, then is symmetric i X = X.) 4. is a positive denite function. (For the converse of this result, see Bochners Theorem 14.41 below. l 5. If Rn x d (x) < , then C l (Rn , C) and j1 . . . jm () =
Rn d
k = (j k ) j
j,k=1 Rn j,k=1 m
=
Rn j =1
ij x
d (x) 0.
Example 14.9 (Example 14.5 continued.). Let d (x) = 1[0,1] (x) dx and (A) = (A) . Then () = ei 1 , i 0 ei 1 () = () = () = , and i eix dx =
2 1
6. If X and Y are independent random vectors then fX +Y () = fX () fY () for all Rn . This may be alternatively expressed as () = () () for all Rn . 7. If a R, b Rn , and X : Rn is a random vector, then faX +b () = eib fX (a) . Proof. The proof of items 1., 2., 6., and 7. are elementary and will be left to the reader. It also easy to see that () = () and () = () if is symmetric. Therefore if is symmetric, then () is real. Conversely if () is real then () = () = eix d (x) = ()
Rn
() = () () = | ()| =
ei 1 i
2 [1 cos ] . 2
eix (1 |x|)+ dx =
R
=2
0
(1 x) cos x dx = 2
0 1 1 0
(1 x) d
sin x
sin x = 2 d (1 x) =2 0 1 cos . =2 2
Proposition 14.10 (Injectivity of the Fourier Transform). If and are two probability measure on (Rn , BRn ) such that = , then = . Proof. Let H be the subspace of bounded measurable complex functions, f : Rn C, such that (f ) = (f ) . Then H is closed under bounded convergence and complex conjugation. Suppose that Zd is a nite set, L > 0 and p (x) =
where (A) := (A) . The uniqueness Proposition 14.10 below then implies = , i.e. is symmetric. This proves item 3. Item 5. follows by induction using Corollary 8.38. For item 4. let m N, m {j }j =1 Rn and (1 , . . . , m ) Cm . Then
a eix/(2L)
(14.4)
Page: 162
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
163
2L
2L
= (p)
Proof. This will be proved by induction on m. We start with m = 0 in which case we automatically we know by Proposition 14.8 or Lemma 14.11 that f C (R, C)). Since u () := Re f () = E [cos (X )] , it follows that u is an even function of and hence u = Re f is an odd function of and in particular, u (0) = 0. By the mean value theorem, to each > 0 with near 0, there exists 0 < c < such that u () u (0) = u (c ) = u (c ) u (0) . Therefore, u (0) u () u (c ) u (0) = u (0) as 0. c c Since E and lim0 1 cos (X ) u (0) u () 1 cos (X ) E = 2 c c
2 =1 2 X , we may apply Fatous lemma to conclude,
so that p H. From the Stone-Weirstrass theorem (see Exercise 14.7 below) or the theory of the Fourier series, any f C (Rn , C) which is L periodic, (i.e. f (x + Lei ) = f (x) for all x Rd and i = 1, 2, . . . , n) may be uniformly approximated by a trigonometric polynomial of the form in Eq. (14.4), see Exercise 14.8 below. Hence it follows from the bounded convergence theorem that f H for all f C (Rn , C) which are L periodic. Now suppose f Cc (Rn , C) . Then for L > 0 suciently large the function, fL (x) :=
Zn
f (x + L) ,
is continuous and L periodic and hence fL H. Since fL f boundedly as L , we may further conclude that f H as well, i.e. Cc (Rn , C) H. An application of the multiplicative system Theorem (see either Theorem 9.3 or Theorem 9.14) implies H contains all bounded (Cc (Rn , R)) = BRn measurable functions and this certainly implies = . For the most part we are now going to stick to the one dimensional case, i.e. X will be a random variable and will be a probability measure on (R, BR ) . The following Lemma is a special case of item 4. of Proposition 14.8. Lemma 14.11. Suppose n N and X is random variables such that E [|X | ] < () := E eiX is C n . If = P X 1 is the distribution of X, then dierentiable and (l) () = E (iX ) eiX =
R l n
1cos(X ) 2
1 1 cos (X ) E X 2 lim inf E u (0) < . 0 2 2 An application of Lemma 14.11 then implies that f C 2 (R, C) . For the general induction step we assume the truth of the theorem at level m in which case we know by Lemma 14.11 that f (2m) () = (1) E X 2m eiX =: (1) g () . By assumption we know that g is dierentiable in a neighborhood of 0 and that g (0) exists. We now proceed exactly as before but now with u := Re g. So for each > 0 near 0, there exists c (0, ) such that u (0) u () u (0) as 0 c and E X 2m 1 cos (X ) 1 cos (X ) u (0) u () E X 2m = . 2 c c
m m
The following theorem is a partial converse to this lemma. Hence the combination of Lemma 14.11 and Theorem 14.12 (see also Corollary 14.34 below) shows that there is a correspondence between the number of moments of X and the dierentiability of fX . Theorem 14.12. Let X be a random variable, m {0, 1, 2, . . . } , f () = E eiX . If f C 2m (R, C) such that g := f (2m) is dierentiable in a neighborhood of 0 and g (0) = f (2m+2) (0) exists. Then E X 2m+2 < and f C 2m+2 (R, C) .
Page: 163 job: prob
Another use of Fatous lemma gives, 1 1 cos (X ) E X 2m+2 = lim inf E X 2m u (0) < 0 2 2 from which Lemma 14.11 may be used to show f C 2m+2 (R, C) . This completes the induction argument.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
164
14.2 Examples
Example 14.13. If < a < b < and d (x) = () = 1 ba
b 1 ba 1[a,b]
FT (t) := P (T t) = 1 ea(t0) . Since FT (t) is piecewise dierentiable, the law of T, := P T 1 , has a density, (x) dx then Therefore, E eiaT =
0
eix dx =
a
eib eia . i (b a)
aeat eit dt = a
a = () . a i a (a i)
3
(a i)
and () = 2
1 () = 1 2 c2 + . . . 3!
xd (x) = 0 and
R
1 x d (x) = c2 . 3 R
2
= a2 .
2
Example 14.14. Suppose Z is a Poisson random variable with mean a > 0, i.e. n P (Z = n) = ea a n! . Then
/2
/2
. In partic-
fZ () = E eiZ = ea
n=0
ein
aei an = ea n! n! n=0
= exp a ei 1
/2 ix
dx,
fZ () = a2 ei2 aei exp a ei 1 from which we conclude, 1 EZ = fZ (0) = a and EZ 2 = fZ (0) = a2 + a. i Therefore, EZ = a = Var (Z ) . Example 14.15. Suppose T is a positive random variable such that P (T t + s|T s) = P (T t) for all s, t 0, or equivalently P (T t + s) = P (T t) P (T s) for all s, t 0, then P (T t) = eat for some a > 0. (Such exponential random variables are often used to model waiting times.) The distribution function for T is
/2 ix
dx
d x2 /2 ix e e dx dx R 2 d ex /2 eix dx = () . dx R
/2
(0) = e
/2
(R) = e
/2
Page: 164
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
165
Example 14.17. If is a probability measure on (R, BR ) and n N, then n is the characteristic function of the probability measure, namely the measure
n times
:= .
(14.5)
1 Alternatively put, if {Xk }k=1 are i.i.d. random variables with = P Xk , then n fX1 ++Xn () = fX () . 1
(1 fX ()) d =
2/a
a 2
2/a
(1 Re fX ()) d
2/a
(14.6)
Example 14.18. Suppose that {n }n=0 are probability measure on (R, BR ) and {pn }n=0 [0, 1] such that n=0 pn = 1. Then n=0 pn n is the characteristic function of the probability measure,
Proof. Recall that the Fourier transform of the uniform distribution on c [c, c] is sin c and hence 1 2c Therefore, 1 2c where
c
fX () d =
c
1 2c
E eiX d = E
c
sin cX . cX
:=
n=0
pn n .
. Let {Xn }n=0 {T } be independent P (T = n) = pn for all n N0 . Then
Here is a more interesting interpretation of 1 = n and random variables with P Xn (A) = P (XT A) , where XT ( ) := XT () ( ) . Indeed,
(1 fX ()) d = 1 E
c
sin cX cX
= E [Yc ]
(14.7)
(A) = P (XT A) =
n=0
P (XT A, T = n) =
n=0
P (Xn A, T = n)
sin cX . cX Notice that Yc 0 (see Eq. (14.47)) and moreover, Yc 1/2 if |cX | 2. Hence we may conclude Yc := 1 E [Yc ] E [Yc : |cX | 2] E 1 1 : |cX | 2 = P (|X | 2/c) . 2 2
=
n=0
P (Xn A, T = n) =
n=0
pn n (A) .
() = E eiXT =
n=0
E eiXT : T = n =
n=0
(1 fX ()) d
c
1 P (|X | 2/c) . 2
=
n=0
E eiXn P (T = n) =
n=0
pn n () .
Taking a = 2/c in this estimate proves Eq. (14.6). Theorem 14.21 (Continuity Theorem). Suppose that {n }n=1 is a sequence of probability measure on (R, BR ) and suppose that f () := limn n () exists for all R. If f is continuous at = 0, then f is the characteristic function of a unique probability measure, , on BR and n = as n . Proof. By the continuity of f at = 0, for ever > 0 we may choose a suciently large so that 1 a 2
macro: svmonob.cls
2/a
n is the Example 14.19. If is a probability measure on (R, BR ) then n=0 pn characteristic function of a probability measure, , on (R, BR ) . In this case, = n=0 pn n where n is dened in Eq. (14.5). As an explicit example, if n a , then a > 0 and pn = a n! e
pn n =
n=0
an a n 1) e = ea ea = ea( n ! n=0
is the characteristic function of a probability measure. In other words, fXT () = E eiXT = exp (a (fX1 () 1)) .
Page: 165 job: prob
(1 Re f ()) d /2.
2/a
date/time: 23-Feb-2007/15:20
166
2/a 2/a
1 eiX d = 1
[c,c]
d
(14.9)
(1 Re f ()) d /2.
2/a
where as before, Yc 0 and Yc 1/2 if c |Xj | 2 for some j, i.e. if c |X | 2. Therefore taking expectations of Eq. (14.9) implies, 1 2c
d [c,c]d
Hence n ({x : |x| a }) for all suciently large n, say n N. By increasing a if necessary we can assure that n ({x : |x| a }) for all n and hence := {n }n=1 is tight. By Theorem 13.42, we may nd a subsequence, {nk }k=1 and a probability measure on BR such that nk = as k . Since x eix is a bounded and continuous function, it follows that () = lim nk () = f () for all R,
k
Taking c = 2/a in this expression implies Eq. (14.8). The following lemma will be needed before giving our rst applications of the continuity theorem. Lemma 14.23. Suppose that {zn }n=1 C satises, limn nzn = C, then n lim (1 + zn ) = e .
n Proof. Since nzn , it follows that zn n 0 as n and therefore ln(1+zn ) by Lemma 14.45 below, (1 + zn ) = e and 2 ln (1 + zn ) = zn + O zn = zn + O
that is f is the characteristic function of a probability measure, . We now claim that n = as n . If not, we could nd a bounded continuous function, g, such that limn n (g ) = (g ) or equivalently, there would exists > 0 and a subsequence {k := nk } such that | (g ) k (g )| for all k N. However by Theorem 13.42 again, there is a further subsequence, l = kl of k such that l = for some probability measure . Since () = liml l () = f () = () , it follows that = . This leads to a contradiction since, lim | (g ) l (g )| = | (g ) (g )| = 0.
l
1 n2
Therefore, (1 + zn ) = eln(1+zn )
n n
Remark 14.22. One could also use Bochners Theorem 14.41 to conclude; if f () := limn n () is continuous then f is the characteristic function of a probability measure. Indeed, the condition of a function being positive denite is preserved under taking pointwise limits. Exercise 14.1. Suppose now X : (, B , P ) Rd is a random vector and fX () := E eiX is its characteristic function. Show for a > 0, P (|X | a) 2 a 4
d
Proposition 14.24 (Weak Law of Large Numbers revisited). Suppose P n that {Xn }n=1 are i.i.d. integrable random variables. Then S n EX1 =: . Proof. Let f () := fX1 () = E eiX1 . Then by Taylors theorem, f () = 1 + i + o () . Since, f Sn () = f
n
(1 fX ()) d = 2
[2/a,2/a]d
a 4
d [2/a,2/a]d
(1 Re fX ()) d (14.8)
= 1 + i
+o n
1 n
Page: 166
job: prob
167
lim f Sn () = ei
n
Corollary 14.26. If {Xn }n=1 are 2 that EX1 = 0 and EX1 = 1, then
which is the characteristic function of the constant random variable, . By the n continuity Theorem 14.21, it follows that S and since is constant we n = may apply Lemma 13.19 to conclude
Sn n
sup P
R
S n y P (N (0, 1) y ) 0 as n . n
(14.11)
Theorem 14.25 (The Basic Central Limit Theorem). Suppose that {Xn }n=1 are i.i.d. square integrable random variables such that EX1 = 0 and Sn 2 EX1 = 1. Then = N (0, 1) . n Proof. By Theorem 14.21 and Proposition 14.16, it suces to show
n
Proof. This is a direct consequence of Theorem 14.25 and Exercise 13.3. Berry (1941) and Esse n (1942) showed there exists a constant, C < , such 3 3 that; if := E |X1 | < , then sup P
R
S n y P (N (0, 1) y ) C n
/ n.
lim E e
Sn i n
=e
2 /2
for all R.
Letting f () := E eiX1 , we have by Taylors theorem (see Eq. (14.43) and (14.46)) that 1 (14.10) f () = 1 (1 + ()) 2 2 where () 0 as 0. Therefore,
Sn () = E e f n Sn i n
In particular the rate of convergence is n1/2 . The exact value of the best constant C is still unknown but it is known to be less than 1. We will not prove this theorem here. However we will give a related result in Theorem 14.28 below. Remark 14.27. It is now a reasonable question to ask why is the limiting random variable normal in Theorem 14.25. One way to understand this is, if Sn under the assumptions of Theorem 14.25, we know = L where L is some n 2 random variable with EL = 0 and EL = 1, then S 1 2n = 2n 2
2n k=1, k odd
= f 1+
n n
1 = 1 2
2 n
Xj
/2
2n k=1, k even
Xj
(14.12)
1 = (L1 + L2 ) 2 where L1 = L = L2 and L1 and L2 are independent. To rigorously understand this, using characteristic functions we would conclude from Eq. (14.12) that
S2n () = f Sn f 2n
/2
n 2 e /2n = f n 2 n f e /2n n 1 2 2 =n 1 1+ 1 +O 2 n 2n n
Sn f
0 as n . = 1 1 2 2
n 2
2n 1+ 2
n
f () = f
Page: 167
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
168
r (x, ) := 2n 1+ 2
n
1 2
f
0
(x + t) (1 t) dt.
2 n
Taking Eq. (14.15) with replaced by and subtracting the results then implies 1 f (x + ) f (x + ) = f (x) ( ) + f (x) 2 2 + (x, ) , (14.16) 2 where | (x, )| = r (x, ) 3 r (x, ) 3 M 3 3 || + | | , 3! (14.17)
= fN (0,1) () .
That is we must have L = N (0, 1) . It is interesting to give another proof of the central limit theorem. For this proof we will assume {Xn }n=1 has third moments. The only property about normal random variables that we shall use the proof is that if {Nn }n=1 are i.i.d. standard normal random variables, then T N + + Nn d n := 1 = N (0, 1) . n n Theorem 14.28 (A Non-Characteristic Proof of the CLT). Suppose that 3 {Xn }n=1 are mean zero variance one i.i.d random variables such that E |X1 | < 3 (3) . Then for f C (R) with M := supxR f (x) < , Ef S n n 1 M 3 3 E |N | + |X1 | Ef (N ) n 3!
d
wherein we have used the simple estimate, |r (x, )| M/3!. If we dene Uk := (N1 + + Nk1 + Xk+1 + + Xn ) / n, then Vk = Uk + Nk / n and V = U + X / n. Hence, using Eq. (14.16) with x = Uk , k 1 k k = Nk / n and = Xk / n, it follows that f (Vk ) f (Vk1 ) = f Uk + Nk / n f Uk + Xk / n 1 1 2 2 = f (Uk ) (Nk Xk ) + f (Uk ) Nk Xk + Rk 2n n (14.18) where M 3 3 |Nk | + |Xk | . (14.19) 3! n3/2 2 Taking expectations of Eq. (14.18) using; Eq. (14.19), ENk = 1 = EXk , ENk = 2 1 = EXk and the fact that Uk is independent of both Xk and Nk , we nd |Rk | = |E [f (Vk ) f (Vk1 )]| = |ERk | M 3 3 E |Nk | + |Xk | 3! n3/2
(14.13)
where Sn := X1 + + Xn and N = N (0, 1) . n , Nn Proof. Let X be independent random variables such that Nn = n=1 d n by Xn . Let N (0, 1) and Xn = X1 . To simplify notation, we will denote X Tn := N1 + + Nn and for 0 k n, let Vk := (N1 + + Nk + Xk+1 + + Xn ) / n with the convention that Vn = Sn / n and V0 = Tn / n. Then by a telescoping series argument, it follows that f Sn / n f Tn / n = f (Vn ) f (V0 ) =
n d
ERk
k=1
E |Rk |
Tn d = n
N because,
We now make use of Taylors theorem with integral remainder the form, 1 f (x + ) f (x) = f (x) + f (x) 2 + r (x, ) 3 2 where
Page: 168 job: prob
(14.15)
1 2 = exp n 2 n
= exp 2 /2 = fN () .
For more in this direction the reader is advised to look up Steins method.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
169
([a, b]) =
a
f (x) dx =
a b
1 2
() eix d dx
= =
1 2 1 2
()
a
()
c
1 c 2
()
c
eia eib i
for f, g L2 ([L, L] , dx) . Then it is well known (and fairly elementary 2 to prove) that eL n : n Z is an orthonormal basis for L ([L, L] , dx) . In particular, if f Cc (R) with supp(f ) [L, L] , then for x [L, L] , f (x) =
nZ
This should provide some motivation for Theorem 14.30 below. The following lemma is needed in the proof of the inversion Theorem 14.30 below. Lemma 14.29. For c > 0, let S (c) := 1 2
c c
f, eL n
eL L n
(x) =
1 2L
L nZ L
sin d.
(14.21)
f (y ) ei L y dy ei L x (14.20)
1 = 2L where
nZ
n x n ei L f L
(14.22)
where
() = f
f (y ) eiy dy.
() eix d f
Proof. The rst assertion has already been dealt with in Example 10.12. We will repeat the argument here for the readers convenience. By symmetry and Fubinis theorem, S (c) =
1 c 1 c sin d = sin et dt d 0 0 0 c 1 = dt d sin et 0 0 1 1 1 = + etc [ cos c t sin c] dt, 2 0 1 + t2 c c
nZ
() eix d. f
(14.23)
Hence if we now think that f (x) is a probability density and let d (x) := () , we should expect f (x) dx so that () = f
d sin et = Im
0 0
dei et = Im
0
de(it) e(it)c 1 (i t)
= Im =
e(it)c 1 (i t)
1 Im 1 + t2
Page: 169
job: prob
macro: svmonob.cls
170
1 1 1 dt = . 0 1 + t2 2 The the integral in Eq. (14.23) tends to as c by the dominated convergence theorem. The second assertion in Eq. (14.22) is a consequence of the change of variables, z = y. Theorem 14.30 (Fourier Inversion Formula). If is a probability measure on (R, BR ) and < a < b < , then 1 c 2 lim
c
and
Corollary 14.31. Suppose that is a probability measure on (R, BR ) such that L1 (m) , then d = dm where is a continuous density on R. Proof. The function, (x) := 1 2 () eix d,
R
()
c
eia eib i
c
d = ((a, b)) +
1 ( ({a}) + ({b})) . 2
(x) dx =
a
1 2
dx
a R
d () eix
b
()
eia eib i
d d
=
c R
eix d (x)
c
eia eib i e
ia
1 2 1 = 2 =
d ()
R a
d ()
R c
=
R
d (x)
c c
deix d
c
e i e i
ib
1 = lim 2 c .
()
c
=
R
d (x)
i(ax)
i(bx)
= ((a, b)) +
1 [ ({a}) + ({b})] . 2
Letting a b over a R such that ({a}) = 0 in this identity shows ({b}) = 0 for all b R. Therefore we have shown
b
((a, b]) =
a
d (x)
c c
d Re d
c
=
R
d (x)
Using one of the multiplicative systems theorems, it is now easy to verify that (A) = A (x) dx for all A BR or R hd = R hd for all bounded measurable functions h : R R. This then implies that 0, m a.e., and the d = dm. Example 14.32. Recall from Example 14.9 that eix (1 |x|)+ dx = 2
R
= 2
R
Now letting c in this expression (using the DCT) shows 1 1 lim I (c) = c 2 2 1 = 2 d (x) [sgn(x a) sgn(x b)]
R
1 cos . 2
1 cos ix e d. 2
(14.24)
= ((a, b)) +
1 [ ({a}) + ({b})] . 2
job: prob
This identity could also be veried directly using residue calculus techniques from complex variables.
Page: 170
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
14.5 Exercises
171
Proof. Let u () := Re fX () = E [cos X ] and assume that u C 1 ((2, 2) , C) . Then according to Eq. (14.25) E |X | =
R
1 u () d = 2
||
1 u () d + 2
||>
1 u () d. 2
1=
Making the change of variables, M , in the above integral then shows M= 1 cos (M ) d. 2 1 Re fX () d. 2
1cos d 2
1 u () d = lim 0 2
||
1 u () d. 2
(1 u ()) d 1
||
u () 1 u () 1 | + | 1 u () d +
1 u () d
||
Suppose that we did not know the value of c := still proceed as above to learn E |X | = 1 c
1 Re fX () d. 2
u () 1 u () 1
u ( ) 1 u ( ) 1 . + lim
0 ||
We could then evaluate c by making a judicious choice of X. For example if d X = N (0, 1) , we would have on one hand 1 E |X | = 2 |x| ex
R
2
1 u () d +
u () + u () + u (0) u (0)
/2
2 dx = 2 and so
1
xex
/2
dx =
2 .
||
u () + u () |u ()| d + ||
2 /2
=2
0
|u ()| u () + u () d + < .
2 /2
d 2 c
1 = c
d 1e
R
2 /2
Passing the limit as 0 using the fact that u () is an odd function, we learn 1 u () d = lim 0 2
e
R
/2
d =
1 u () d +
||
||
u () + u ()
from which it follows, again, that c = . Corollary 14.34. Suppose X is a random variable such that u () := fX () continuously dierentiable for (2, 2) for some > 0. We further assume
0
2
0
|u ()| u () + u () d + < .
|u ()| d < .
(14.26)
Then E |X | < and fX C 1 (R, C) . (Since u is even, u is odd and u (0) = 0. Hence if u () were H older continuous for some > 0, then Eq. (14.26) would hold.)
Page: 171 job: prob
14.5 Exercises
Exercise 14.2. For x, R, let
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
172
(, x) :=
ix 1ix e if x = 0 x2
2 1 2
if x = 0.
(It is easy to see that (, 0) = limx0 (, x) and in fact that (, x) is n n smooth in (, x) .) Let {xk }k=1 R \ {0} , {Zk }k=1 {N } be independent random variables with N = N (0, 1) and Zk being Poisson random variables an with mean ak > 0, i.e. P (Zk = n) = eak nk ! for n = 0, 1, 2 . . . . With Y := n x ( Z a ) + N, show k k k=1 k fY () := E eiY = exp
R d
Exercise 14.5 (Exercise 2.3 in [5]). Let be the probability measure on 1 (R, BR ) , such that ({n}) = p (n) = c n2 ln |n| 1|n|2 with c chosen so that 1 p ( n ) = 1 . Show that C ( R , C ) even though R |x| d (x) = . To do nZ this show, 1 cos nt g (t) : n2 ln n
n2
is continuously dierentiable. Exercise 14.6 (Polyas Criterioin [1, Problem 26.3 on p. 305.] and [2, p. 104-107.]). Suppose () is a non-negative symmetric continuous function such that (0) = 1, () is non-increasing and convex for 0. Show () = () for some probability measure, , on (R, BR ) . Solution to Exercise (14.6). Because of the continuity theorem and some simple limiting arguments, it suces to prove the result for a function as pictured in Figure 14.1. From Example 14.32, we know that (1 ||)+ = ()
(, x) d (x)
= 2 0 +
k=1
ak x2 k xk .
(14.27)
Exercise 14.3. To each nite and compactly supported measure, , on (R, BR ) show there exists a sequence {n }n=1 of nitely supported nite measures on (R, BR ) such that n = . Here we say is compactly supported if there exists M < such that ({x : |x| M }) = 0 and we say is nitely supported if there exists a nite subset, R such that (R \ ) = 0. Please interpret n = to mean, f dn
R R
(, x) d (x)
(14.28)
is the characteristic function of a probability measure on (R, BR ) . Here is an outline to follow. (You may nd the calculus estimates in Section 14.8 to be of help.) 1. Show f () is continuous. 2. Now suppose that is compactly supported. Show, using Exercises 14.2, 14.3, and the continuity Theorem 14.21 that exp R (, x) d (x) is the characteristic function of a probability measure on (R, BR ) . 3. For the general case, approximate by a sequence of nite measures with compact support as in item 2.
Fig. 14.1. Here is a piecewise linear convex function. We will assume that dn > 0 for all n and that () = 0 for suciently large. This last restriction may be removed later by a limiting argument.
For a > 0, let a (A) = (aA) in which case a (f ) = f a1 for all bounded measurable f and in particular, a () = a1 . To nish the proof it suces to show that () may be expressed as
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 172
job: prob
173
() =
n=1
pn an () =
n=1
pn 1
an
(14.29)
+
for some an > 0 and pn 0 such that n=1 pn . Indeed, if this is the case we may take, := n=1 pn an . It is pretty clear that we should take an = d1 + + dn for all n N. Since we are assuming () = 0 for large , there is a rst index, N N, such that
N
0 = (aN ) = 1
n=1
dn sn .
(14.30)
Equivalently, for each N N there exists constants CN < such that |f (x)| CN (1 + |x|)N for all x Rn . A function f C (Rn , C) is said to have (at most) polynomial growth if there exists N < such sup (1 + |x|)
N
|f (x)| < ,
() =
n=k
i.e. there exists N N and C < such that |f (x)| C (1 + |x|)N for all x Rn . Denition 14.36 (Schwartz Test Functions). Let S denote the space of functions f C (Rn ) such that f and all of its partial derivatives have rapid decay and let f N, = sup (1 + |x|)N f (x)
xRn
we must require, sk =
pn
n=k
1 for all k an
pk a1 k
so that S = f C (Rn ) : f
N,
Since is convex, we know that sk sk+1 or sk sk+1 for all k and therefore pk 0 and pk = 0 for all k > N. Moreover,
Also let P denote those functions g C (Rn ) such that g and all of its derivatives have at most polynomial growth, i.e. g C (Rn ) is in P i for all multiindices , there exists N < such sup (1 + |x|)
N
pk =
k=1 k=1
ak (sk sk+1 ) =
k=1
ak sk
k=2
ak1 sk
| g (x)| < .
(Notice that any polynomial function on Rn is in P .) sk dk Denition 14.37. A function : Rn C is said to be positive (semi) m denite i the matrices A := {(k j )}k,j =1 are positive denite for all m m N and {j }j =1 Rn . Proposition 14.38. Suppose that : Rn C is said to be positive denite with (0) = 1. If is continuous at 0 then in fact is uniformly continuous on all of Rn . Proof. Taking 1 = x, 2 = y and 3 = 0 in Denition 14.37 we conclude that 1 (x y ) (x) 1 (x y ) (x) 1 (y ) = (x y ) 1 (y ) A := (y x) (x) (y ) 1 (x) (y ) 1
macro: svmonob.cls date/time: 23-Feb-2007/15:20
= a1 s1 +
k=2
sk (ak ak1 ) = d1 s1 +
k=2
=
k=1
sk dk = 1
where the last equality follows from Eq. (14.30). Working backwards with pk d = dened as in Eq. (14.31) it is now easily shown that d n=1 pn 1 an
+
() for / {a1 , a2 , . . . } and since both functions are equal to 1 at = 0 we may conclude that Eq. (14.29) is indeed valid.
Page: 173
job: prob
174
is positive denite. In particular, 0 det A = 1 + (x y ) (y ) (x) + (x) (x y ) (y ) | (x)| | (y )| | (x y )| . Combining this inequality with the identity, | (x) (y )| = | (x)| + | (y )| (x) (y ) (y ) (x) , gives 0 1 | (x y )| + (x y ) (y ) (x) + (x) (x y ) (y ) | (x) (y )| + (x) (y ) + (y ) (x) = 1 | (x y )| | (x) (y )| + (x y ) (y ) (x) (y ) (x) + (x) (x y ) (y ) (x) (y ) = 1 | (x y )| | (x) (y )| + 2 Re (( (x y ) 1) (y ) (x)) 1 | (x y )| | (x) (y )| + 2 | (x y ) 1| . Hence we have | (x) (y )| 1 | (x y )| + 2 | (x y ) 1| = (1 | (x y )|) (1 + | (x y )|) + 2 | (x y ) 1| 4 |1 (x y )| which completes the proof. Lemma 14.39. If C (Rn , C) is a positive denite function, then 1. (0) 0. 2. ( ) = ( ) for all Rn . 3. |( )| (0) for all Rn . 4. For all f S(Rd ), ( )f ( )f ( )dd 0.
Rn Rn 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
0 det
and hence |( )| (0) for all . This proves items 2. and 3. Item 4. follows by approximating the integral in Eq. (14.32) by Riemann sums, ( )f ( )f ( )dd
Rn Rn
= lim 2n
0 , (Zn )[1 ,1 ]n
( )f ( )f ( ) 0.
The details are left to the reader. Lemma 14.40. If is a nite positive measure on BRn , then := C (Rn , C) is a positive denite function. Proof. As has already been observed after Denition ??, the dominated convergence theorem implies C (Rn , C). Since is a positive measure (and hence real), ( ) =
Rn
eix d(x) =
Rn
( ). eix d(x) =
m
From this it follows that for any m N and {j }j =1 Rn , the matrix A := m { (k j )}k,j =1 is self-adjoint. Moreover if Cm ,
m m
j = (k j )k
k,j =1 Rn k,j =1 m
d(x) 0
Proof. Taking m = 1 and 1 = 0 we learn (0) || 0 for all C which proves item 1. Taking m = 2, 1 = and 2 = , the matrix A := (0) ( ) ( ) (0)
Theorem 14.41 (Bochners Theorem). Suppose C (Rn , C) is positive denite function, then there exists a unique positive measure on BRn such that = . Proof. If ( ) = ( ), then for f S we would have f d =
Rn Rn
(f ) d =
Rn
f ( ) ( )d.
date/time: 23-Feb-2007/15:20
175
0 I, f ( )f ( )d for all f S .
f = f
I, I, f
We will now show I is positive in the sense if f S and f 0 then I (f ) 0. For general f S we have I (|f | ) =
Rn 2
(14.34)
( ) |f |
( )d =
Rn
( ) f
( )d f
=
Rn
( )d d = ( )f ( )f
Rn
( )f ( )f ( )d d (14.33)
=
Rn
( )f ( )f ( )d d 0.
2
(Rn , R) where C (K ) is a nite constant for each compact for all f DRn := Cc n subset of R . Because of the estimate in Eq. (14.34), it follows that I |DRn has a unique extension I to Cc (Rn , R) still satisfying the estimates in Eq. (14.34) and moreover this extension is still positive. So by the Riesz Markov Theorem ??, there exists a unique Radon measure on Rn such that such that I, f = (f ) for all f Cc (Rn , R). To nish the proof we must show ( ) = ( ) for all Rn given
/2 t
S and dene
2
(f ) =
Rn
(14.35)
pt (x) := I (pt (x )) = I (
pt (x ) ) pt (x ) S . Using
(Rn , R+ ) be a radial function such f (0) = 1 and f (x) is decreasing Let f Cc as |x| increases. Let f (x) := f (x), then by Theorem ??,
pt (x y )eiy dy =
Rn ix e pt ( )
pt (y )ei(y+x) dy ,
=e
Rn ix t| |2 /2
( )n f (
Rn
It , =
Rn
Rn
)d.
(14.36)
= =
Rn Rn
Because Rn f ( )d = F f (0) = f (0) = 1, we may apply the approximate function Theorem ?? to Eq. (14.36) to nd eix f (x)d(x) ( ) as 0.
Rn
/2
(x)d dx
(14.37)
/2
( ) ( )d = I ( ) as t 0.
On the the other hand, when = 0, the monotone convergence theorem implies (f ) (1) = (Rn ) and therefore (Rn ) = (1) = (0) < . Now knowing the is a nite measure we may use the dominated convergence theorem to concluded (eix f (x)) (eix ) = ( ) as 0 for all . Combining this equation with Eq. (14.37) shows ( ) = ( ) for all Rn .
Hence if 0, then I ( ) = limt0 It , 0. Let K R be a compact set and Cc (R, [0, )) be a function such that = 1 on K. If f Cc (R, R) is a smooth function with supp(f ) K, then 0 f f S and hence
Page: 175 job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
176
where pn (x) =
(){0,1}d
=
(){0,1}d
(1 xi )
k=1 i=1
1i (k)
xi i
(k )
is a polynomial of degree nd. In fact more is true. Suppose > 0 is given, M = sup {|f (x)| : x K } , and = sup {|f (y ) f (x)| : x, y K and y x } . By uniform continuity of f on K, lim0 = 0. Therefore, |f (x) pn (x)| = E f (x) f ( Sn Sn ) E f (x) f ( ) n n Sn E f (x) f ( ) : Sn x > n Sn + E f (x) f ( ) : Sn x n (14.39)
P (Xn = ) =
i=1 d
(1 xi )
1i
i x i
j is a Bernoulli random variable for all = (1 , . . . , d ) {0, 1} . Since each Xn j with P Xn = 1 = xj , we know that j EXn = x and Var Xn = xj x2 j = xj (1 xj ).
d , 4n2
Sn x n
=
j =1 d
j Sn xj n j Sn n
=
j =1 d
Var
n
j Sn xj n
and therefore, Eq. (14.39) yields the estimate sup |f (x) pn (x)|
xK
=
j =1
Var
d
1 = 2 n j =1
Var
k=1
j Xk
2dM + n2
1 = n
d xj (1 xj ) . 4 n j =1
P
This shows Sn /n x in L2 (P ) and hence by Chebyshevs inequality, Sn /n x P n in and by a continuity theorem, f S f (x) as n . This along with the n dominated convergence theorem shows pn (x) := E f Sn n f (x) as n , (14.38)
Here is a version of the complex Weirstrass approximation theorem. Theorem 14.43 (Complex Weierstrass Approximation Theorem). Suppose that K Cd = Rd Rd is a compact rectangle. Then there exists polynomials in (z = x + iy, z = x iy ) , pn (z, z ) for z Cd , such that supzK |qn (z, z ) f (z )| 0 as n for every f C (K, C) .
Page: 176
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
177
Proof. The mapping (x, y ) Rd Rd z = x + iy Cd is an isomorphism z z z of vector spaces. Letting z = x iy as usual, we have x = z+ 2 and y = 2i . d d Therefore under this identication any polynomial p(x, y ) on R R may be written as a polynomial q in (z, z ), namely z+z zz , ). q (z, z ) = p( 2 2i Conversely a polynomial q in (z, z ) may be thought of as a polynomial p in (x, y ), namely p(x, y ) = q (x + iy, x iy ). Hence the result now follows from Theorem 14.42. Example 14.44. Let K = S = {z C : |z | = 1} and A be the set of polynomials in (z, z ) restricted to S 1 . Then A is dense in C (S 1 ). To prove this rst observe if f C S 1 then F (z ) = |z | f ( |z z | ) for z = 0 and F (0) = 0 denes F C (C) 1 such that F |S = f. By applying Theorem 14.43 to F restricted to a compact rectangle containing S 1 we may nd qn (z, z ) converging uniformly to F on K and hence on S 1 . Since z = z 1 on S 1 , we have shown polynomials in z and z 1 are dense in C (S 1 ). This example generalizes in an obvious way to d K = S 1 Cd . Exercise 14.7. Use Example 14.44 to show that any 2 periodic continuous function, g : Rd C, may be uniformly approximated by a trigonometric polynomial of the form p (x) = a eix
1
p () :=
n=N
bn ein
(14.40)
satises () p () . sup f
Exercise 14.8. Suppose f C (R, C) is a 2 periodic function (i.e. f (x + 2 ) = f (x) for all x R) and
2
show again that f 0. Hint: Use Exercise 14.7. Solution to Exercise (14.8). By assumption, and so by the linearity of the Riemann integral,
2 2 0
0=
0
f () p () d.
(14.41)
() Choose trigonometric polynomials, p , as in Eq. (14.40) such that p () f uniformly in as 0. Passing to the limit in Eq. (14.41) implies
2 2 2
0 = lim
0 0
f () p () d =
0
() d = f () f
0
|f ()| d.
where is a nite subset of Z and a C for all . Hint: start by d showing there exists a unique continuous function, f : S 1 C such that f eix1 , . . . , eixd = F (x) for all x = (x1 , . . . , xd ) Rd . Solution to Exercise (14.7). I will write out the solution when d = 1. For z S 1 , dene F (z ) := f (ei ) where R is chosen so that z = ei . Since f is 2 periodic, F is well dened since if solves ei = z then all other solutions are of the form { + 2n : n Z} . Since the map ei is a local homeomorphism, := ei : J S 1 i.e. for any J = (a, b) with b a < 2, the map J J This shows is a homeomorphism, it follows that F (z ) = f 1 (z ) for z J. 1 F is continuous when restricted to J. Since such sets cover S , it follows that F is continuous. It now follows from Example 14.44 that polynomials in z and z 1 are dense in C (S 1 ). Hence for any > 0 there exists p(z, z ) = am,n z m z n = am,n z m z n = am,n z mn
From this it follows that f 0, for if |f (0 )| > 0 for some 0 then |f ()| > 0 for in a neighborhood of 0 by continuity of f. It would then follow that 2 2 |f ()| d > 0. 0
f (z + ) =
n=0 k1
f (n) (z ) f (n) (z )
n=0
n + k rk (z, ) n! n 1 (k ) + k f (z ) + (z, ) n! k!
(14.42)
such that |F (z ) p(z, z )| for all z. Taking z = ei then implies there exists bn C and N N such that
Page: 177 job: prob macro: svmonob.cls
(14.43)
date/time: 23-Feb-2007/15:20
178
(14.48)
|ez 1 z | = z 2
0
etz (1 t) dt |z |
2 0
et Re z (1 t) dt,
if Re z 0, then (z + t) f
(k )
(z ) (1 t)
k1
|ez 1 z | |z | /2 dt 0 as 0. (14.46) and if Re z > 0 then |ez 1 z | eRe z |z | /2. Combining these into one estimate gives,
2
(14.49)
f (k) (z + t)
0
d dt
(1 t) dt
t=1
+ k! t=0
1 0
|ez 1 z | e0Re z
1 ity e dt, 0
|z | . 2
(14.50)
1 (k ) f (z ) k + k+1 rk+1 (z, ) . k! The result now follows by induction. 1 2. For y R, sin y = y 0 cos (ty ) dt and hence |sin y | |y | . 3. For y R we have
1 1
Lemma 14.45. For z = rei with < < and r > 0, let ln z = ln r + i. Then ln : C \ (, 0] C is a holomorphic function such that eln z = z 3 and if |z | < 1 then (14.47) |ln (1 + z ) z | |z |
2
1 2 (1 |z |)
2
for |z | < 1.
(14.52)
cos y = 1 + y 2
0
cos (ty ) (1 t) dt 1 + y 2
0 2
(1 t) dt = 1
y . 2
Proof. Clearly eln z = z and ln z is continuous. Therefore by the inverse function theorem for holomorphic functions, ln z is holomorphic and z
3
Equivalently put ,
2
d d ln z = eln z ln z = 1. dz dz
Alternatively,
For the purposes of this lemma it suces to dene ln (1 + z ) = and to then observe: 1)
P
n=1
(z )n /n
Z
sin xdx
0
X d 1 ln (1 + z ) = (z )n = , dz 1 + z n=0
xdx = y 2 /2.
and 2) the functions 1 + z and eln(1+z) both solve f (z ) = and therefore eln(1+z) = 1 + z. 1 f (z ) with f (0) = 1 1+z
This last inequality may also be proved as a simple calculus exercise following from; g () = and g (y ) = 0 i sin y = y which happens i y = 0.
Page: 178
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
179
Therefore,
d dz
ln z =
1 z
and
d dz 2
ln (1 + z ) = z z 2
0
1 (1 + tz )
2
(1 t) dt.
Combining the last two inequalities completes the proof of Eq. (14.56). Equation (14.57) is proved similarly and hence will be omitted. Lemma 14.47. If X is a square integrable random variable, then f () := E eiX = 1 + iEX 2 E X 2 + r () 2!
1 (1 + tz )
2
(1 t) dt
2 (1 |z |)
2.
Eq. (14.52) is now a consequence of Eq. (14.53) and Eq. (14.54). Lemma 14.46. For all y R and n N {0} ,
n
= 2 ()
eiy
k=0
(iy ) k!
|y | (n + 1)!
n+1
|| |X | 3!
0 as 0.
(14.58)
(14.56)
f () 1 + iEX
2 E X2 2!
E eiX 1 + iX 2 2 E X 2 || |X | 3!
3
X2 2!
eiy
k=0
(iy ) k!
|y | 2 |y | . (n + 1)! n!
iy
n+1
=: 2 () .
(14.57)
The DCT, with X 2 L1 (P ) being the dominating function, allows us to conclude that lim0 () = 0.
eiy
k=0
(iy ) k!
y n+1 n! |y | n!
n+1
1 0 1 0
in+1 eity (1 t) dt (1 t) dt =
n
|y | (n + 1)!
n+1
which is Eq. (14.55). Using Eq. (14.55) with n = 1 implies eiy 1 + iy y2 2! eiy (1 + iy ) + y2 y2 + = y2 2 2 y2 2
Page: 179
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
Remark 15.3. The reader should observe that in order for condition (M ) to hold in the setup in Example 15.1 it is necessary that limn s2 n = . Lemma 15.4. Let us continue the notation in Example 15.1. Then {Xn,k := Xk /sn } satises (LC ) if either of two conditions hold; 1. {Xn }n=1 are i.i.d. 2. The {Xn }n=1 satisfy Liapunov condition; there exists some > 2 such that n k=1 E |Xk | = 0. (15.6) lim n s n More generally, if {Xn,k } satises the Liapunov condition,
n n
Sn :=
k=1
Xn,k .
2 n,k
(15.1) =E
2 Xn,k
Until further notice we are going to assume E [Xn,k ] = 0, and Var (Sn ) =
n k=1 2 n,k
< ,
denote the characteristic function of Xn,k . Example 15.1. Suppose are mean zero square integrable random varin n 2 2 ables with k = Var (Xn ) . If we let s2 n := k=1 Var (Xk ) = k=1 k , n 2 2 2 n,k := k /sn , and Xn,k := Xk /sn , then {Xn,k }k=1 satisfy the above hypothesis n and Sn = s1 k=1 Xk . n Our main interest in this chapter is to consider the limiting behavior of Sn as n . In order to do this, it will be useful to put conditions on the {Xn,k } such that no one term dominates sum dening the sum dening Sn in Eq. (15.1) in the limit as n . Denition 15.2. We say that {Xn,k } satises the Lindeberg Condition (LC) i
n n {Xn }n=1
lim
where : [0, ) [0, ) is a non-decreasing function such that (t) > 0 for all t > 0, then {Xn,k } satises (LC ) . 2 and Proof. 1. If {Xn }n=1 are i.i.d., then sn = n where 2 = EX1
n
E
k=1
2 Xn,k
1 : |Xn,k | > t = 2 sn = =
(15.7) nt
1 n 2
lim
(15.3)
1 2 E X1 : |X1 | > nt 2
which, by DCT, tends to zero as n . 2. Assuming Eq. (15.6), then for any t > 0, (15.4)
n 2 E Xn,k : |Xn,k | > t k=1 k=1 n 2 E Xn,k
and we say {Xn,k } is uniformly asymptotic negligibility (UAN) if for all > 0, lim max P (|Xn,k | > ) = 0. (15.5)
n kn
Xn,k t
: |Xn,k | > t
1 t2
E [|Xn,k | ] =
k=1
1 t2 s n
E |Xk | 0.
k=1
182
n1 i=1
ai and b :=
n1 i=1 bi
as n .
1 (t)
|an a bn b| |an a an b| + |an b bn b| = |an | |a b| + |an bn | |b| |a b| + |an bn | . The proof is now easily completed by induction on n. Theorem 15.7 (Lindeberg-Feller CLT (I)). Suppose {Xn,k } satises (LC ) , then Sn = N (0, 1) . (15.8) (See Theorem 15.11 for a converse to this theorem.)
Lemma 15.5. Let {Xn,k }n=1 be as above, then (LC ) = (M ) = (U AN ) . Proof. For k n,
2 2 2 2 1|Xn,k |>t n,k = E Xn,k = E Xn,k 1|Xn,k |t + E Xn,k n
t +E
t +
m=1
/2
as n .
(15.9)
This clearly implies (M ) holds. The assertion that (M ) implies (U AN ) follows by Chebyschevs inequality, max P (|Xn,k | > ) max
kn kn
Before starting the formal proof, let me give an informal explanation for Eq. (15.9). Using 2 2 fnk () 1 nk , 2 we might expect
n
E eiSn =
k=1
1 2
k=1
=
k=1
e(fnk ()1)
(B )
Pn 2 2 2 e k=1 2 nk = e 2 .
P (|Xn,k | > )
k=1
1 2
The question then becomes under what conditions are these approximations valid. It turns out that approximation (A), namely that
n n
We will need the following lemma for our subsequent applications of the continuity theorem. Lemma 15.6. Suppose that ai , bi C with |ai | , |bi | 1 for i = 1, 2, . . . , n. Then
n n n
lim
fnk () exp
k=1 k=1
(fnk () 1)
= 0,
(15.10)
is valid if condition (M ) holds, see Lemma 15.9 below. It is shown in the estimate Eq. (15.11) below that the approximation (B ) is valid, i.e.
n n
ai
i=1 i=1
bi
i=1
|ai bi | .
lim
k=1
1 (fnk () 1) = 2 , 2
date/time: 23-Feb-2007/15:20
Page: 182
job: prob
macro: svmonob.cls
183
if (LC ) is satised. These observations would then constitute a proof of Theorem 15.7. The proof given below of Theorem 15.7 will not quite follow this route and will not use Lemma 15.9 directly. However, this lemma will be used in the proofs of Theorems 15.11 and 15.14. Proof. Now on to the formal proof of Theorem 15.7. Since
n n
and since > 0 is arbitrary, we may conclude that lim supn An,k = 0. n To estimate k=1 Bn,k , we use the estimate, |eu 1 + u| u /2 valid for u 0 (see Eq. 14.49 with z = u). With this estimate we nd,
n n
n k=1 2
Bn,k =
k=1 k=1 n
2 2 n,k 2 2 e n,k /2 2 2
E eiSn =
k=1
fnk () and e
/2
=
k=1
2 n,k /2
k=1 4
2 2 n,k /2
2 1 2 n,k 2 2 n
4 8
n 4 n,k k=1
E e where
iSn
2 /2
k=1
fnk () e
=
k=1
(An,k + Bn,k )
max 2 8 kn n,k
2 n,k = k=1
4 max 2 0, 8 kn n,k
2 2 n,k 2
and
wherein we have used (M ) (which is implied by (LC )) in taking the limit as n . As an application of Theorem 15.7 we can give half of the proof of Theorem 12.12. Theorem 15.8 (Converse assertion in Theorem 12.12). If {Xn }n=1 are independent random variables and the random series, n=1 Xn , is almost surely convergent, then for all c > 0 the following three series converge; 1. 2. 3.
n=1 n=1 n=1
2 2 n,k 2 2 e n,k /2 . 2
2 2 1 + Xn,k 2 || |Xn,k | 3!
3
E e
iXn,k
2 2 1 + Xn,k 2
2 2 E Xn,k
2 2 E Xn,k
2 E
Proof. Since n=1 Xn is almost surely convergent, it follows that limn Xn = 0 a.s. and hence for every c > 0, P ({|Xn | c i.o.}) = 0. Accord ing the Borel zero one law this implies for every c > 0 that n=1 P (|Xn | > c) < c . Since Xn 0 a.s., {Xn } and Xn := Xn 1|Xn |c are tail equivalent for all c c > 0. In particular n=1 Xn is almost surely convergent for all c > 0. c c Fix c > 0, let Yn := Xn E [Xn ] and let
n n n c Var (Xk )= k=1 k=1
2 2 2 || E |Xn,k | : |Xn,k | + 2 E Xn,k : |Xn,k | > 3! 3 || 2 2 n,k + 2 E Xn,k : |Xn,k | > . = 6 From this estimate and (LC ) it follows that
n
s2 n = Var (Y1 + + Yn ) =
k=1
Var (Yk ) =
Var Xk 1|Xk |c .
For the sake of contradictions, suppose s2 n as n . Since |Yk | 2c, it n 2 follows that k=1 E Yk 1|Yk |>sn t = 0 for all suciently large n and hence 1 n s2 n lim
n 2 E Yk 1|Yk |>sn t = 0, k=1
lim sup
n k=1
3 + 2 6
3 6 (15.11)
i.e. {Yn,k := Yk /sn }n=1 satises (LC ) see Examples 15.1 and Remark 15.3. So by the central limit Theorem 15.7, it follows that
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 183
job: prob
184
1 s2 n
1 s2 n
Yk = N (0, 1) .
k=1
Proof. For the rst item we estimate, EeiX 1 E eiX 1 E [2 |X |] = E [2 |X | : |X | ] + E [2 |X | : |X | < ] 2 2 2P [|X | ] + || 2 E |X | + || Replacing X by Xn,k and in the above inequality shows |n,k ()| = |fn,k () 1|
2 2n,k 2 2 E | X | + | | = + || . n,k 2 2
1 s2 n
n c Xn k=1
1 s2 n
Yk = N (0, 1) .
k=1
But it is not possible for constant (i.e. non-random) variables, cn := n 1 c k=1 E [Xn ] , to converge to a non-degenerate limit. (Think about this eis2 n ther in terms of characteristic functions or in terms of distribution functions.) Thus we must conclude that
2Dn + || = || 0 as 0. 2
Var Xn 1|Xn |c =
n=1 n=1
For the second item, observe that Re n,k () = Re fn,k () 1 0 and hence en,k () = eRe n,k () e0 = 1 and hence we have from Lemma 15.6 and the estimate (14.49),
n n n
fn,k ()
k=1 k=1
n,k ()
k=1 n
= 1 2
|n,k ()|
k=1
Lemma 15.9. Suppose that {Xn,k } satises property (M ) , i.e. Dn 2 maxkn n,k 0. If we dene, n,k () := fn,k () 1 = E eiXn,k 1 , then; 1. limn maxkn |n,k ()| = 0 and n 2. fSn () k=1 en,k () 0 as n , where
n
:=
|n,k ()| .
k=1
|n,k ()| =
k=1 k=1 n
fn,k () .
k=1
fSn () = E eiSn =
k=1
2 . 2
Page: 184
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
185
fn,k ()
k=1 k=1
en,k ()
and the latter expression tends to zero by item 1. Lemma 15.10. Let X be a random variable such that EX 2 < and EX = 0. Further let f () := E eiX and u () := Re (f () 1) . Then for all c > 0, u () + or equivalently 2 E cos X 1 + X 2 E X 2 2 : |X | > c . 2 2 c In particular if we choose || 6/ |c| , then 2 1 E cos X 1 + X 2 2 E X 2 : |X | > c . 2 c
2
en,k () = e
k=1
/2
2 E X2 E X2 2 : |X | > c 2 2 c
(15.12)
(15.13)
lim
Re n,k () = 2 /2.
k=1
(15.14)
lim
E cos (Xn,k ) 1 +
k=1
2 2 X =0 2 n,k
2 Proof. For all R, we have (see Eq. (14.48)) cos X 1 + 2 X 0 and cos X 1 2. Therefore,
2 2 u () + E X 2 = E cos X 1 + X 2 2 2 2 E cos X 1 + X 2 : |X | > c 2 2 2 E 2 + X : |X | > c 2 E 2 which gives Eq. (15.12). Theorem 15.11 (Lindeberg-Feller CLT (II)). Suppose {Xn,k } satises (M ) and also the central limit theorem in Eq. (15.8) holds, then {Xn,k } satises (LC ) . So under condition (M ) , Sn converges to a normal random variable i (LC ) holds. Proof. By assumption we have
n n kn 2 lim max n,k = 0 and lim n
lim
|X | 2 2 + X : |X | > c c2 2
fn,k () = e
k=1
/2
Page: 185
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
186
Proof. Recall from Example 14.14 that for any a > 0, E eiZ = exp a ei 1 Since E e it follows that E e
iSn iXn,k i
exp pn,k ei 1
k=1
k=1
1 + pn,k ei 1
n
.
i
1 ,
1 2
|zn,k |
k=1 n
|zn,k |
k=1
2 max pn,k
1kn
=
k=1
1 + pn,k e
k=1
Since 1 + pn,k ei 1 lies on the line segment joining 1 to ei , it follows that 1 + pn,k ei 1 1. Since
n
exp pn,k ei 1
k=1
k=1
1 + pn,k ei 1
0 as n .
exp pn,k ei 1
k=1
= exp
k=1
pn,k ei 1
exp a ei 1
we have shown
n n
1 + pn,k ei 1
k=1 n
exp pn,k ei 1
k=1
= exp a ei 1
The result now follows by an application of the continuity Theorem 14.21. Hence we may apply Lemma 15.6 to nd
n n
Remark 15.13. Keeping the notation in Theorem 15.12, we have 1 + pn,k ei 1 E [Xn,k ] = pn,k and Var (Xn,k ) = pn,k (1 pn,k ) and
n n
exp pn,k ei 1
k=1 n
k=1
k=1 n
exp pn,k ei 1
1 + pn,k ei 1
s2 n :=
k=1
Var (Xn,k ) =
k=1
pn,k (1 pn,k ) .
=
k=1
where zn,k = pn,k ei 1 . Since Re zn,k = pn,k (cos 1) 0, we may use the calculus estimate in Eq. (14.49) to conclude,
Page: 186 job: prob
Under the assumptions of Theorem 15.12, we see that s2 n a as n . Let Xn,k pn,k 2 so that E [Yn,k ] = 0 and n,k := Var (Yn,k ) = s1 Yn,k := 2 Var (Xn,k ) = sn n 1 p (1 p ) which satises condition ( M ) . Let us observe that, for large n,k n,k s2 n n,
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
187
n kn
2 Yn,k
fn,k () = eln fn,k () = eln[1+(fn,k ()1)] = e(fn,k ()1) and hence that
n n n
E eiSn = pn,k
k=1
fn,k () =
k=1 k=1
(fn,k () 1) . (15.15)
lim
1 pn,k sn
lim
E eiSn exp
k=1
(fn,k () 1)
= 0.
(15.16)
Yn,k =
k=1
n k=1
Xn,k sn
n k=1
pn,k
Z a a
where Z is a Poisson random variable with mean a. Notice that the limit is not a normal random variable. We wish to characterize the possible limiting distributions of sequences {Sn }n=1 when we relax the Lindeberg condition (LC ) to condition (M ) . We have the following theorem. Theorem 15.14. Suppose {Xn,k }k=1 satisfy property (M ) and Sn := n = L for some random variable L. Then the characteristic k=1 Xn,k function fL () := E eiL must be of the form, fL () = exp
R n
exp
k=1
(fn,k () 1)
= exp
k=1 R
ix
1 ix d (x) x2
= exp
R
eix 1 ix
k=1
= exp
R := where n n k=1
where is a nite positive measure on (R, BR ) such that (R) 1. (Recall ix 1ix that you proved in Exercise 14.4 that exp R e d (x) is always the x2 characteristic function of a probability measure.) Proof. As before, let fn,k () = E e the continuity theorem we are assuming lim fSn () = lim
iXn,k
eix 1 ix dn (x)
x2 dn (x) = k=1 R
x2 dn,k (x) =
k=1
2 n,k = 1.
n n n
fn,k () = f ()
k=1
Hence if we dene d (x) := x2 dn (x) , then n is a probability measure and we have from Eqs. (15.16) and Eq. (15.17) that
fSn () exp
R
eix 1 ix dn (x) x2
0.
(15.18)
Page: 187
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
188
ix 1ix with h () = 0, there Since h (x) := e is a continuous function of R x2 is a subsequence, {nl } of {n} such that nl (h) (h) for some probability , BR measure on R . Combining this with Eq. (15.18) allows us to conclude,
where s is the nite measure on (R, BR ) dened by s (A) := s2 s1 A for all A BR . The reader should observe that eix 1 ix 1 = 2 2 x x and hence (, x)
eix 1ix x2
eix 1 ix dn (x) l
= exp
R
eix 1 ix d (x) . x2
k=2
(ix) 1 = 2 k! x
k=2
ik k k2 x k!
n {Xn,k }k=1
is smooth. Moreover,
(15.19) and
d eix 1 ix ixeix ix eix 1 = = i d x2 x2 x d2 eix 1 ix ixeix = i = eix . d2 x2 x Using these remarks and the fact that (R) < , it is easy to see that fL () =
R
Corollary 15.16. Suppose satisfy properties (M ) and (BV ) . If n Sn := k=1 Xn,k = L for some random variable L, then fL () = exp
R
eix 1 ix d (x) x2
(15.20) and fL () =
eix 1 ds (x) fL () x
eix ds (x) +
R R
eix 1 ds (x) x
fL ()
and in particular, fL (0) = 0 and fL (0) = s (R) . Therefore the probability measure, , on (R, BR ) such that () = fL () has mean zero and variance, s (R) < . Denition 15.17. A probability distribution, , on (R, BR ) is innitely divisible i for all n N there exists i.i.d. nondegenerate random variables, d n {Xn,k }k=1 , such that Xn,1 + +Xn,n = . This can be formulated in the following two equivalent ways. For all n N there should exists a non-degenerate probn n ability measure, n , on (R, BR ) such that () = [g ()] n = . For all n N, for some non-constant characteristic function, g. Theorem 15.18. The following class of symmetric distributions on (R, BR ) are equal; 1. C1 all possible limiting distributions under properties (M ) and (BV ) . 2. C2 all distributions with characteristic functions of the form given in Corollary 15.16.
macro: svmonob.cls date/time: 23-Feb-2007/15:20
eix 1 ix d (x) x2
where is a nite positive measure on (R, BR ) such that (R) 1. Letting s in this expression then implies fL () = exp
R
= exp
R
s d (x)
= exp
R
eix 1 ix ds (x) x2
Page: 188
job: prob
3. C3 all innitely divisible distributions with mean zero and nite variance. Proof. The inclusion, C1 C2 , is the content of Corollary 15.16. For C2 C3 , observe that if () = exp
R n
eix 1 ix d (x) x2
then () = [ n ()] where n is the unique probability measure on (R, BR ) such that eix 1 ix 1 d (x) . n () = exp x2 n R For C3 C1 , simply dene {Xn,k }k=1 to be i.i.d with E eiXn,k = n () . In this case Sn =
n k=1 n
Xn,k = .
15.1.1 Stable Laws See the le, dynkin-stable-innitely-divs.pdf, and Durrett [2, Example 3.10 on p. 106 and Section 2.7.].
Part IV
Fig. 16.1. The picture behind the proof of the Schwarz inequality.
= x + y |x + y = x = x
2
+ y
+ x|y + y |x (16.1)
+ y
+ 2Re x|y .
Theorem 16.2 (Schwarz Inequality). Let (H, | ) be an inner product space, then for all x, y H | x|y | x y and equality holds i x and y are linearly dependent. Proof. If y = 0, the result holds trivially. So assume that y = 0 and observe; 2 if x = y for some C, then x|y = y and hence | x|y | = || y
2
Corollary 16.3. Let (H, | ) be an inner product space and x := x|x . Then the Hilbertian norm, , is a norm on H. Moreover | is continuous on H H, where H is viewed as the normed space (H, ). Proof. If x, y H, then, using Schwarzs inequality, x+y
2
= x x
2 2
+ y + y
2 2
+ 2Re x|y + 2 x y = ( x + y )2 .
= x y .
Now suppose that x H is arbitrary, let z := x y 2 x|y y. (So z is the orthogonal projection of x onto y, see Figure 16.1.) Then 0 z
2
Taking the square root of this inequality shows satises the triangle inequality. Checking that satises the remaining axioms of a norm is now routine and will be left to the reader. If x, x , y, y H, then | x + x|y + y x|y | = | x|y + x|y + x|y | x y + y x + x y 0 as x, y 0, from which it follows that | is continuous.
= x
x|y y = x y 2 | x|y |2 = x 2 y 2
2
| x|y |2 y y 4
2Re x|
x|y y y 2
Denition 16.4. Let (H, | ) be an inner product space, we say x, y H are orthogonal and write x y i x|y = 0. More generally if A H is a set, x H is orthogonal to A (write x A) i x|y = 0 for all y A. Let
194
A = {x H : x A} be the set of vectors orthogonal to A. A subset S H is an orthogonal set if x y for all distinct elements x, y S. If S further satises, x = 1 for all x S, then S is said to be an orthonormal set. Proposition 16.5. Let (H, | ) be an inner product space then 1. (Parallelogram Law) x+y
2
Denition 16.8. A subset C of a vector space X is said to be convex if for all x, y C the line segment [x, y ] := {tx + (1 t)y : 0 t 1} joining x to y is contained in C as well. (Notice that any vector subspace of X is convex.) Theorem 16.9 (Best Approximation Theorem). Suppose that H is a Hilbert space and M H is a closed convex subset of H. Then for any x H there exists a unique y M such that
+ xy
=2 x
+2 y
(16.2)
x y = d(x, M ) = inf
z M
xz .
Moreover, if M is a vector subspace of H, then the point y may also be characterized as the unique point in M such that (x y ) M. Proof. Uniqueness. By replacing M by M x := {m x : m M } we may assume x = 0. Let := d(0, M ) = inf mM m and y, z M, see Figure 16.2.
x
xS
=
xS
x 2.
(16.3)
3. If A H is a set, then A is a closed linear subspace of H. Proof. I will assume that H is a complex Hilbert space, the real case being easier. Items 1. and 2. are proved by the following elementary computations; x+y
2
+ xy = x
2
2 2
+ y
2
+ 2Re x|y + x
2
+ y
2Re x|y
=2 x and
+2 y ,
x
xS
=
xS
x|
y S
y =
x,y S
x|y x 2.
Fig. 16.2. The geometry of convex sets.
=
xS
x|x =
xS
+2 z
= y+z =4
+ yz
2
where Nul( |x ) = {y H : y |x = 0} a closed subspace of H. Denition 16.6. A Hilbert space is an inner product space (H, | ) such that the induced Hilbertian norm is complete. Example 16.7. For any measure space, (, B , ) , H := L2 () with inner product, f |g =
y+z 2
+ yz
4 2 + y z 2 .
(16.4)
f ( ) g ( ) d ( )
Hence if y = z = , then 2 2 + 2 2 4 2 + y z 2 , so that y z 2 = 0. Therefore, if a minimizer for d(0, )|M exists, it is unique. Existence. Let yn M be chosen such that yn = n d(0, M ). Taking y = ym and z = yn in Eq. (16.4) shows
2 2 2m + 2n 4 2 + yn ym 2 .
date/time: 23-Feb-2007/15:20
195
i.e. lim supm,n yn ym 2 = 0. Therefore, by completeness of H, {yn }n=1 is convergent. Because M is closed, y := lim yn M and because the norm is n continuous, y = lim yn = = d(0, M ).
n
Proof. 1. Let x1 , x2 H and C, then PM x1 + PM x2 M and PM x1 + PM x2 (x1 + x2 ) = [PM x1 x1 + (PM x2 x2 )] M showing PM x1 + PM x2 = PM (x1 + x2 ), i.e. PM is linear. 2 2. Obviously Ran(PM ) = M and PM x = x for all x M . Therefore PM = PM . 3. Let x, y H, then since (x PM x) and (y PM y ) are in M , PM x|y = PM x|PM y + y PM y = PM x|PM y = PM x + (x PM x)|PM y = x|PM y . 4. We have already seen, Ran(PM ) = M and PM x = 0 i x = x 0 M , i.e. Nul(PM ) = M . 5. If N M H it is clear that PM PN = PN since PM = Id on N = Ran(PN ) M. Taking adjoints gives the other identity, namely that PN PM = PN . More directly, if x H and n N, we have PN PM x|n = PM x|PN n = PM x|n = x|PM n = x|n . Since this holds for all n we may conclude that PN PM x = PN x. Corollary 16.13. If M H is a proper closed subspace of a Hilbert space H, then H = M M . Proof. Given x H, let y = PM x so that xy M . Then x = y +(xy ) 2 M + M . If x M M , then x x, i.e. x = x|x = 0. So M M = {0} . Exercise 16.1. Suppose M is a subset of H, then M = span(M ). Theorem 16.14 (Riesz Theorem). Let H be the dual space of H (i.e. that linear space of continuous linear functionals on H ). The map z H |z H is a conjugate linear1 isometric isomorphism.
1
So y is the desired point in M which is closest to 0. Now suppose M is a closed subspace of H and x H. Let y M be the closest point in M to x. Then for w M, the function g (t) := x (y + tw)
2
= xy
2tRe x y |w + t2 w
has a minimum at t = 0 and therefore 0 = g (0) = 2Re x y |w . Since w M is arbitrary, this implies that (x y ) M. Finally suppose y M is any point such that (x y ) M. Then for z M, by Pythagoreans theorem, xz
2
= xy+yz
= xy
+ yz
xy
which shows d(x, M )2 x y 2 . That is to say y is the point in M closest to x. Denition 16.10. Suppose that A : H H is a bounded operator, i.e. A := sup { Ax : x H with x = 1} < .
The adjoint of A, denoted A , is the unique operator A : H H such that Ax|y = x|A y . (The proof that A exists and is unique will be given in Proposition 16.15 below.) A bounded operator A : H H is self - adjoint or Hermitian if A = A . Denition 16.11. Let H be a Hilbert space and M H be a closed subspace. The orthogonal projection of H onto M is the function PM : H H such that for x H, PM (x) is the unique element in M such that (x PM (x)) M, i.e. PM (x) is the unique element in M such that x|m = PM (x)|m for all m M. (16.5)
(16.6)
Theorem 16.12 (Projection Theorem). Let H be a Hilbert space and M H be a closed subspace. The orthogonal projection PM satises: 1. PM is linear and hence we will write PM x rather than PM (x).
Page: 195 job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
196
Proof. The map j is conjugate linear by the axioms of the inner products. Moreover, for x, z H, | x|z | x z for all x H
Ax|y1 + y2
with equality when x = z. This implies that jz H = |z H = z . Therefore j is isometric and this implies j is injective. To nish the proof we must show that j is surjective. So let f H which we assume, with out loss of generality, is non-zero. Then M =Nul(f ) a closed proper subspace of H. Since, by Corollary 16.13, H = M M , f : H/M = M F is a linear isomorphism. This shows that dim(M ) = 1 and hence H = M Fx0 where x0 M \ {0} .2 (x0 )/ x0 2 . Then Choose z = x0 M such that f (x0 ) = x0 |z , i.e. = f for x = m + x0 with m M and F, f (x) = f (x0 ) = x0 |z = x0 |z = m + x0 |z = x|z which shows that f = jz. Proposition 16.15 (Adjoints). Let H and K be Hilbert spaces and A : H K be a bounded operator. Then there exists a unique bounded operator A : K H such that Ax|y
K
and by the uniqueness of A (y1 + y2 ) we nd A (y1 + y2 ) = A (y1 ) + A (y2 ). This shows A is linear and so we will now write A y instead of A (y ). Since A y |x H = x|A y H = Ax|y K = y |Ax K is Exercise it follows that A = A. The assertion that (A + B ) = A + B 16.2. Items 3. and 4. Making use of Schwarzs inequality (Theorem 16.2), we have A = = = sup
kK : k =1
sup sup
kK : k =1 hH : h =1
Ah = A
= x|A y
(16.7)
hH : h =1 kK : k =1
A = A
= =
sup
hH : h =1
Ah
sup
hH : h =1
| Ah|Ah | sup A Ah = A A
2
sup
hH : h =1
| h|A Ah |
2
(16.8)
Proof. For each y K, the map x Ax|y K is in H and therefore there exists, by Theorem 16.14, a unique vector z H (we will denote this z by A (y )) such that Ax|y K = x|z H for all x H. This shows there is a unique map A : K H such that Ax|y K = x|A (y ) for all x H and y K. To see A is linear, let y1 , y2 K and C, then for any x H,
2
hH : h =1
= A A .
A A A
(16.9)
Alternatively, choose x0 M \ {0} such that f (x0 ) = 1. For x M we have f (x x0 ) = 0 provided that := f (x). Therefore x x0 M M = {0} , i.e. x = x0 . This again shows that M is spanned by x0 .
which then implies A A . Replacing A by A in this last inequality shows A A and hence that A = A . Using this identity back in 2 Eq. (16.9) proves A = A A . Now suppose that K = H. Then ABh|k = Bh|A k = h|B A k
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 196
job: prob
197
Proof. Let z Z and choose zn S such that zn z. Since T zm T zn C zm zn 0 as m, n , z exists. Moreover, it follows by the completeness of X that limn T zn =: T if wn S is another sequence converging to z, then T zn T wn C zn wn C z z = 0 z is well dened. It is now a simple matter to check that T : and therefore T Z X is still linear and that z = lim T
n
A = AA1 = A
1
= I = I and = I = I.
1
A1
. Similarly if A is
Exercise 16.2. Let H, K, M be Hilbert spaces, A, B L(H, K ), C L(K, M ) and (CA) = A C L(M, H ). and C. Show (A + B ) = A + B Exercise 16.3. Let H = Cn and K = Cm equipped with the usual inner products, i.e. z |w H = z w for z, w H. Let A be an m n matrix thought of as a linear operator from H to K. Show the matrix associated to A : K H is the conjugate transpose of A. Lemma 16.16. Suppose A : H K is a bounded operator, then: 1. Nul(A ) = Ran(A) . 2. Ran(A) = Nul(A ) . 3. if K = H and V H is an A invariant subspace (i.e. A(V ) V ), then V is A invariant. Proof. An element y K is in Nul(A ) i 0 = A y |x = y |Ax for all x H which happens i y Ran(A) . Because, by Exercise 16.1, Ran(A) = Ran(A) , and so by the rst item, Ran(A) = Nul(A ) . Now suppose A(V ) V and y V , then A y |x = y |Ax = 0 for all x V which shows A y V . The next elementary theorem (referred to as the bounded linear transformation theorem, or B.L.T. theorem for short) is often useful. Theorem 16.17 (B. L. T. Theorem). Suppose that Z is a normed space, X is a Banach3 space, and S Z is a dense linear subspace of Z. If T : S X is a bounded linear transformation (i.e. there exists C < such that T z C z L(Z, X ) and for all z S ), then T has a unique extension to an element T this extension still satises z C z T
3
T zn lim C zn = C z
n
for all x Z.
is an extension of T to all of the Z. The uniqueness of this extension is Thus T easy to prove and will be left to the reader.
. for all z S
A Banach space is a complete normed space. The main examples for us are Hilbert spaces.
Page: 197
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
Denition 17.2. Let and be two positive measure on a measurable space, (X, M). Then: 1. and are mutually singular (written as ) if there exists A M such that (A) = 0 and (Ac ) = 0. We say that lives on A and lives on Ac . 2. The measure is absolutely continuous relative to (written as ) provided (A) = 0 whenever (A) = 0. As an example, suppose that is a positive measure and 0 is a measurable function. Then the measure, := is absolutely continuous relative to . Indeed, if (A) = 0 then (A) =
A
a (f = a)
a0
a (f = a) = (f ) .
In light of Theorem 6.34 and the MCT, this inequality continues to hold for all non-negative measurable functions. Furthermore if f L1 () , then (|f |) (|f |) < and hence f L1 ( ) and | (f )| (|f |) (|f |) (X )
1 /2
L2 ()
d = 0. , then d = d
Therefore, L2 () f (f ) C is a continuous linear functional on L2 (). By the Riesz representation Theorem 16.14, there exists a unique L2 () such that (f ) = f d for all f L2 ().
X
We will eventually show that if and are nite and for some measurable function, 0.
In particular this equation holds for all bounded measurable functions, f : X R and for such a function we have (f ) = Re (f ) = Re
X
Denition 17.3 (Lebesgue Decomposition). Let and be two positive measure on a measurable space, (X, M). Two positive measures a and s form a Lebesgue decomposition of relative to if = a + s , a , and s . Lemma 17.4. If 1 , 2 and are positive measures on (X, M) such that 1 and 2 , then (1 + 2 ) . More generally if {i }i=1 is a sequence of positive measures such that i for all i then = i=1 i is singular relative to . Proof. It suces to prove the second assertion since we can then take j 0 for all j 3. Choose Ai M such that (Ai ) = 0 and i (Ac i ) = 0 for all i. c Letting A := i Ai we have (A) = 0. Moreover, since Ac = i Ac i Am for c c all m, we have i (A ) = 0 for all i and therefore, (A ) = 0. This shows that . Lemma 17.5. Let and be positive measures on (X, M). If there exists a Lebesgue decomposition, = s + a , of the measure relative to then this decomposition is unique. Moreover: if is a nite measure then so are s and a .
f d =
X
f Re d.
(17.1)
Thus by replacing by Re if necessary we may assume is real. Taking f = 1<0 in Eq. (17.1) shows 0 ( < 0) =
X
1<0 d 0,
from which we conclude that 1<0 = 0, a.e., i.e. ( < 0) = 0. Therefore 0, a.e. Similarly for > 1, ( > ) ( > ) =
X
1> d ( > )
which is possible i ( > ) = 0. Letting 1, it follows that ( > 1) = 0 and hence 0 1, - a.e.
200
Proof. Since s , there exists A M such that (A) = 0 and s (Ac ) = 0 and because a , we also know that a (A) = 0. So for C M, (C A) = s (C A) + a (C A) = s (C A) = s (C ) and (C Ac ) = s (C Ac ) + a (C Ac ) = a (C Ac ) = a (C ) . (17.3) (17.2)
Theorem 17.8 (Radon Nikodym Theorem for Positive Measures). Suppose that and are nite positive measures on (X, M). Then has a unique Lebesgue decomposition = a + s relative to and there exists a unique (modulo sets of measure 0) function : X [0, ) such that da = d. Moreover, s = 0 i . Proof. The uniqueness assertions follow directly from Lemmas 17.5 and 17.6. Existence when and are both nite measures. (Von-Neumanns Proof. See Remark 17.9 for the motivation for this proof.) First suppose that and are nite measures and let = + . By Theorem 17.1, d = hd with 0 h 1 and this implies, for all non-negative measurable functions f, that (f ) = (f h) = (f h) + (f h) or equivalently (f (1 h)) = (f h). Taking f = 1{h=1} in Eq. (17.6) shows that ({h = 1}) = (1{h=1} (1 h)) = 0, i.e. 0 h (x) < 1 for - a.e. x. Let := 1{h<1} h 1h (17.6) (17.5)
Now suppose we have another Lebesgue decomposition, = a + s with M such that s and a . Working as above, we may choose A ) = 0 and A c is is still a null set and and (A s null. Then B = A A c is a null set for both s and B c = Ac A s . Therefore we may use Eqs. (17.2) and (17.3) with A being replaced by B to conclude, s (C ) = (C B ) = s (C ) and c a (C ) = (C B ) = a (C ) for all C M. Lastly if is a nite measure then there exists Xn M such that X = n=1 Xn and (Xn ) < for all n. Since > (Xn ) = a (Xn ) + s (Xn ), we must have a (Xn ) < and s (Xn ) < , showing a and s are nite as well. Lemma 17.6. Suppose is a positive measure on (X, M) and f, g : X [0, ] are functions such that the measures, f d and gd are nite and further satisfy, f d =
A A
gd for all A M.
(17.4)
and then take f = g 1{h<1} (1 h)1 with g 0 in Eq. (17.6) to learn (g 1{h<1} ) = (g 1{h<1} (1 h)1 h) = (g ). Hence if we dene a := 1{h<1} and s := 1{h=1} , we then have s (since s lives on {h = 1} while (h = 1) = 0) and a = and in particular a . Hence = a + s is the desired Lebesgue decomposition of . If we further assume that , then (h = 1) = 0 implies (h = 1) = 0 and hence that s = 0 and we conclude that = a = . Existence when and are -nite measures. Write X = n=1 Xn where Xn M are chosen so that (Xn ) < and (Xn ) < for all n. Let dn = 1Xn d and dn = 1Xn d. Then by what we have just proved there exists s s n L1 (X, n ) L1 (X, ) and measure n such that dn = n dn + dn with s s n n . Since n and n live on Xn there exists An MXn such that (An ) = n (An ) = 0 and
Then f (x) = g (x) for a.e. x. Proof. By assumption there exists Xn M such that Xn X and f d < and Xn gd < for all n. Replacing A by A Xn in Eq. Xn (17.4) implies 1Xn f d =
A AXn
f d =
AXn
gd =
A
1Xn gd
for all A M. Since 1Xn f and 1Xn g are in L1 () for all n, this equation implies 1Xn f = 1Xn g, a.e. Letting n then shows that f = g, a.e. Remark 17.7. Lemma 17.6 is in general false without the niteness assumption. A trivial counterexample is to take M = 2X , (A) = for all non-empty A M, f = 1X and g = 2 1X . Then Eq. (17.4) holds yet f = g.
Page: 200
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
s s n (X \ An ) = n (Xn \ An ) = 0. s This shows that n for all n and so by Lemma 17.4, s := singular relative to . Since s (n n + n )= n=1 n=1 s (n 1Xn + n ) = + s , n=1 s n is
=
n=1
n =
(17.7)
where := n=1 1Xn n , it follows that = a + s with a = . Hence this is the desired Lebesgue decomposition of relative to . Remark 17.9. Here is the motivation for the above construction. Suppose that d = ds + d is the Radon-Nikodym decomposition and X = A B such that s (B ) = 0 and (A) = 0. Then we nd s (f ) + (f ) = (f ) = (hf ) = (hf ) + (hf ). Letting f 1A f then implies that (1A f ) = s (1A f ) = (1A hf ) which show that h = 1, a.e. on A. Also letting f 1B f implies that (1B f ) = (h1B f ) + (h1B f ) = (h1B f ) + (h1B f ) which implies, = h + h, a.e. on B, i.e. (1 h) = h, a.e. on B.
h In particular it follows that h < 1, = a.e. on B and that = 1 h 1h<1 , a.e. So up to sets of measure zero, A = {h = 1} and B = {h < 1} and therefore,
h 1h<1 d. 1h
18 Conditional Expectation
In this section let (, B , P ) be a probability space, i.e. (, B , P ) is a measure space and P ( ) = 1. Let G B be a sub sigma algebra of B and write f Gb if f : C is bounded and f is (G , BC ) measurable. If A B and P (A) > 0, we will let E [X |A] := P (A B ) E [X : A] and P (B |A) := E [1B |A] := P (A) P (A) 0 = E [F : F = 0] E [G : F = 0] and hence that G1F =0 = 0 a.s. Similarly if A := {G > F } with > 1 in Eq. (18.1), then E [F : G > F ] E [G : G > F ] E [F : G > F ] = E [F : G > F ] . Since > 1, the only way this can happen is if E [F : G > F ] = 0. By the MCT we may now let 1 to conclude, 0 = E [F : G > F ] , i.e. F 1G>F = 0 a.s. Therefore, we have shown, almost surely, either F = 0 then G = 0 and F = 0 then G F and hence G F a.s. If F L1 (, B , P ) and E [F : A] = 0 for all A B, we may conclude by a simple limiting argument that E [F h] = 0 for all h Bb . Taking h := sgn(F ) := F |F | 1|F |>0 in this identity then implies 0 = E [F h] = E F which implies that F = 0 a.s. Denition 18.3 (Conditional Expectation). Let EG : L2 (, B , P ) L2 (, G , P ) denote orthogonal projection of L2 (, B , P ) onto the closed subspace L2 (, G , P ). For f L2 (, B , P ), we say that EG f L2 (, G , P ) is the conditional expectation of f. Theorem 18.4. Let (, B , P ) and G B be as above and let f, g L1 (, B , P ). The operator EG : L2 (, B , P ) L2 (, G , P ) extends uniquely to a linear contraction from L1 (, B , P ) to L1 (, G , P ). This extension enjoys the following properties; 1. If f 0, P a.e. then EG f 0, P a.e. 2. Monotinicity. If f g, P a.e. there EG f EG g, P a.e. 3. |EG f | EG |f | , P a.e. 4. If f L1 (, B , P ) then F = EG f L1 (, G , P ) i E(F h) = E(f h) for all h Gb . (18.2) F 1|F |>0 = E |F | 1|F |>0 = E [|F |] |F |
for all integrable random variables, X, and B B . We will often use the factorization Lemma 6.35 in this section. Because of this let us repeat it here. Lemma 18.1. Suppose that (Y, F ) is a measurable space and F : Y is a , there is a map. Then to every ( (F ), BR ) measurable function, H : R such that H = h F. (F , BR ) measurable function h : Y R Proof. First suppose that H = 1A where A (F ) = F 1 (F ). Let B F such that A = F 1 (B ) then 1A = 1F 1 (B ) = 1B F and hence the Lemma is valid in this case with h = 1B . More generally if H = ai 1Ai is a simple function, then there exists Bi F such that 1Ai = 1Bi F and hence H = h F . For general ( (F ), F ) measurable with h := ai 1Bi a simple function on R , choose simple functions Hn converging to H. Let hn function, H, from R such that Hn = hn F. Then it follows that be simple functions on R H = lim Hn = lim sup Hn = lim sup hn F = h F
n n n
Lemma 18.2. Suppose that F, G : [0, ] are B measurable functions. Then F G a.s. i E [F : A] E [G : A] for all A B . (18.1)
In particular F = G a.s. i equality holds in Eq. (18.1). Moreover, for F L1 (, B , P ) , F = 0 a.s. i E [F : A] = 0 for all A B . Proof. Hopefully it is clear to the reader that it suces to prove the rst assertion. Also it is clear that F G a.s. implies Eq. (18.1). For the converse assertion, if we take A = {F = 0} in Eq. (18.1) we learn that
204
18 Conditional Expectation
6. Tower or smoothing property. If G0 G1 B. Then EG0 EG1 f = EG1 EG0 f = EG0 f a.s. for all f L1 (, B , P ) . (18.3)
E [|EG f | h] = E EG f sgn (EG f )h = E f sgn (EG f )h E [|f | h] = E [EG |f | h] . Since h is arbitrary, it follows that |EG f | EG |f | , P a.e. Item 6. Now suppose 0 fn f L1 (, B , P ) and fn f a.s. Then by the MCT (or DCT) it follows that fn f in L1 (, B , P ) and therefore EG fn EG f in L1 (, B , P ) . On the other hand, by item 2. g := limn EG fn exists a.s. and we may identify g with EG f a.s. Thus we have shown EG fn EG f almost surely and in L1 (, B , P ) . Item 6., by the item 5. of the projection Theorem 16.12, Eq. (18.3) holds on L2 (, B , P ). By continuity of conditional expectation on L1 (, B , P ) and the density of L1 probability spaces in L2 probability spaces shows that Eq. (18.3) continues to hold on L1 (, B , P ). Remark 18.5. There is another standard construction of EG f based on the characterization in Eq. (18.2) and the Radon Nikodym Theorem 17.8. It goes as follows, for 0 f L1 (P ) , let Q := f P and observe that Q|G P |G and hence there exists 0 g L1 (, G , P ) such that dQ|G = gdP |G . This then implies that f dP = Q (A) =
A A
Proof. By the denition of orthogonal projection, f L2 (, B , P ) and h Gb , E(f h) = E(f EG h) = E(EG f h). (18.4) Taking h = sgn (EG f ) := in Eq. (18.4) shows E(|EG f |) = E(EG f h) = E(f h) E(|f h|) E(|f |). (18.6) EG f 1|E f |>0 EG f G (18.5)
It follows from this equation and the BLT (Theorem 16.17) that EG extends uniquely to a contraction form L1 (, B , P ) to L1 (, G , P ). Moreover, by a simple limiting argument, Eq. (18.4) remains valid for all f L1 (, B , P ) and h Gb . Indeed, if fn := f 1|f |n L1 (, B , P ) , then fn f in L1 (, B , P ) and hence E(EG f h) = E( lim EG fn h) = lim E(EG fn h)
n n
(18.7)
i.e. g = EG f. For general real valued, f L1 (P ) , dene EG f = EG f+ EG f and then for complex f L1 (P ) let EG f = EG Re f + iEG Im f. Notation 18.6 In the future, we will often write EG f as E [f |G ] . Moreover, if (X, M) is a measurable space and X : X is a measurable map. We will often simply denote E [f | (X )] simply by E [f |X ] . We will further let P (A|G ) := E [1A |G ] be the conditional probability of A given G , and P (A|X ) := P (A| (X )) be conditional probability of A given X. Exercise 18.1. Suppose f L1 (, B , P ) and f > 0 a.s. Show E [f |G ] > 0 a.s. Use this result to conclude if f (a, b) a.s. for some a, b such that a < b , then E [f |G ] (a, b) a.s. More precisely you are to show that any version, g, of E [f |G ] satises, g (a, b) a.s.
Conversely if F L1 (, G , P ) satises Eq. (18.2), then E(F h) = E(f h) = E(EG f h) for all h Gb , or equivalently E((F EG f ) h) = 0 for all h Gb . Taking h = sgn (F EG f ) in this identity then shows E [|F EG f |] = 0, i.e. F = EG f a.s. This proves item 4. Item 5. is now an easy consequence of the characterization in item 4., since if h Gb , E [(g EG f ) h] = E [EG f hg ] = E [f hg ] = E [gf h] = E [EG (gf ) h] . Thus EG (gf ) = g EG f, P a.e. Items 1 and 2. If f, h 0 then 0 E(f h) = E(EG f h) and since this holds for all h 0 in Gb , EG f 0, P a.e. If f g a.s., we may apply this result with f replaced by f g to complete the proof of both items. Item 3. If f is real, f |f | and so by Item 2., EG f EG |f | , i.e. |EG f | EG |f | , P a.e. For complex f, let h 0 be a bounded and G measurable function. Then
18.1 Examples
Example 18.7. Suppose G is the trivial algebra, i.e. G = {, } . In this case EG f = Ef a.s. Example 18.8. On the opposite extreme, if G = B , then EG f = Ef a.s.
Page: 204
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
18.1 Examples
205
Lemma 18.9. Suppose (X, M) is a measurable space, X : X is a measurable function, and G is a sub- -algebra of B . If X is independent of G and f : X R is a measurable function such that f (X ) L (, B , P ) , then EG [f (X )] = E [f (X )] a.s.. Conversely if EG [f (X )] = E [f (X )] a.s. for all bounded measurable functions, f : X R, then X is independent of G . Proof. Suppose that X is independent of G , f : X R is a measurable function such that f (X ) L (, B , P ) , := E [f (X )] , and A G . Then, by independence, E [f (X ) : A] = E [f (X ) 1A ] = E [f (X )] E [1A ] = E [1A ] = E [ : A] . Therefore EG [f (X )] = = E [f (X )] a.s. Conversely if EG [f (X )] = E [f (X )] = and A G , then E [f (X ) 1A ] = E [f (X ) : A] = E [ : A] = E [1A ] = E [f (X )] E [1A ] . Since this last equation is assumed to hold true for all A G and all bounded measurable functions, f : X R, X is independent of G . The following remark is often useful in computing conditional expectations. The following Exercise should help you gain some more intuition about conditional expectations. (X ) a.s. Remark 18.10 (Note well.). According to Lemma 18.1, E (f |X ) = f (X ) is for some measurable function, f : X R. So computing E (f |X ) = f equivalent to nding a function, f : X R, such that (X ) h (X ) E [f h (X )] = E f for all bounded and measurable functions, h : X R. Exercise 18.2. Suppose (, B , P ) is a probability space and P := B is a partition of . (Recall this means = i=1 Ai .) Let G be the algebra generated by P . Show: 1. B G i B = i Ai for some N. 2. g : R is G measurable i g = i=1 i 1Ai for some i R. 1 3. For f L (, B , P ), let E [f |Ai ] := E [1Ai f ] /P (Ai ) if P (Ai ) = 0 and E [f |Ai ] = 0 otherwise. Show
{Ai }i=1
Proposition 18.11. Suppose that (, B , P ) is a probability space, (X, M, ) and (Y, N , ) are two nite measure spaces, X : X and Y : Y are measurable functions, and there exists 0 L1 (, B , ) such that P ((X, Y ) U ) = U (x, y ) d (x) d (y ) for all U M N . Let (x) :=
Y
(x, y ) d (y )
(18.9)
(18.10)
where y0 is some arbitrary but xed point in Y. Then for any bounded (or nonnegative) measurable function, f : Y R, we have E [f (Y ) |X ] = Q (X, f ) a.s. where Q (X, f ) :=
1 (X )
Proof. Our goal is to compute E [f (Y ) |X ] . According to Remark 18.10, we are searching for a bounded measurable function, g : X R, such that E [f (Y ) h (X )] = E [g (X ) h (X )] for all h Mb . (18.11)
(18.8)
(Throughout this argument we are going to repeatedly use the Tonelli - Fubini theorems.) We now compare both sides of this equality, E [f (Y ) h (X )] =
XY
f (y ) (x, y ) d (y ) d (x)
(18.12)
E [g (X ) h (X )] =
XY
(18.13)
EG f =
i=1
where Comparing Eqs. (18.12) and (18.13), which are to be equal for all h Mb , requires us to demand, f (y ) (x, y ) d (y ) = g (x) (x) for a.e. x.
Y
(18.14)
Page: 205
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
206
18 Conditional Expectation
There are two possible problems in solving this equation for g (x) as a particular point x; the rst is when (x) = 0 and the second is when (x) = . Since (x) d (x) =
X X Y
Denition 18.12. Let (X, M) and (Y, N ) be measurable spaces. A function, Q : X N [0, 1] is a probability kernel on X Y i 1. Q (x, ) : N [0, 1] is a probability measure on (Y, N ) for each x X and 2. Q (, B ) : X [0, 1] is M/BR measurable for all B N . If Q is a probability kernel on X Y and f : Y R is a bounded measurable function or a positive measurable function, then x Q (x, f ) := f (y ) Q (x, dy ) is M/BR measurable. This is clear for simple functions and Y then for general functions via simple limiting arguments. Denition 18.13. Let (X, M) and (Y, N ) be measurable spaces and X : X and Y : Y be measurable functions. A probability kernel, Q, on X Y is said to be a regular conditional distribution of Y given X i Q (X, B ) is a version of P (Y B |X ) for each B N . Equivalently, we should have Q (X, f ) = E [f |X ] a.s. for all f Nb . When X = and M = G is a sub- algebra of B , we way that Q is the regular conditional distribution of Y given G . The probability kernel, Q, dened in Eq. (18.10) is an example of a regular conditional distribution of Y given X. Remark 18.14. Unfortunately, regular conditional distributions do not always exists. However, if we require Y to be a standard Borel space, (i.e. Y is isomorphic to a Borel subset of R), then a conditional distribution of Y given X will always exists. See Theorem 18.21. Moreover, it is known that all reasonable measure spaces are standard Borel spaces, see Section 18.4 below for more details. So in most instances of interest a regular conditional distribution of Y given X will exist. Exercise 18.3. Suppose that (X, M) and (Y, N ) are measurable spaces, X : X and Y : Y are measurable functions, and there exists a regular conditional distribution, Q, of Y given X. Show: 1. For all bounded measurable functions, f : (X Y, M N ) R, the function X x Q (x, f (x, )) is measurable and Q (X, f (X, )) = E [f (X, Y ) |X ] a.s. (18.16)
(x, y ) d (y ) d (x) = 1,
we know that (x) < for a.e. x and so the second problem is not an issue. Moreover if (x) = 0, then (x, y ) = 0 for a.e. y and therefore f (y ) (x, y ) d (y ) = 0
Y
(18.15)
and Eq. (18.14) will be valid no matter how we choose g (x) at points where (x) = 0. Therefore, if we let y0 Y be an arbitrary but xed point and then dene 1 (x) (0, ) (x) Y f (y ) (x, y ) d (y ) if g (x) := , f (y0 ) if (x) {0, } then we have shown E [f |X ] = g (X ) = Q (X, f ) a.s. as desired. (Observe where that when (x) < , (x, ) L1 ( ) and hence the integral in the denition of g is well dened.) Just for added security, let us check directly that g (X ) = E [f (Y ) |X ] a.s.. According to Eq. (18.13) we have E [g (X ) h (X )] =
X
= =
X{0<< }
1 (x)
f (y ) (x, y ) d (y ) d (x)
Y
= =
X
f (y ) (x, y ) d (y ) d (x)
h (x)
Y
= E [f (Y ) h (X )]
wherein we have repeatedly used ( = ) = 0 and Eq. (18.15) holds when (x) = 0. This completes the verication that g (X ) = E [f (Y ) |X ] a.s.. This proposition shows that conditional expectation is a generalization of the notion of performing integration over a partial subset of the variables in the integrand. Whereas to compute the expectation, one should integrate over all of the variables. It also gives an example of regular conditional probabilities.
Hint: let H denote the set of bounded measurable functions, f, on X Y such that the two assertions are valid. 2. If A M N and := P X 1 be the law of X, then P ((X, Y ) A) =
X
d (x)
Y
Page: 206
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
207
Exercise 18.4. Keeping the same notation as in Exercise 18.3 and further assume that X and Y are independent. Find a regular conditional distribution of Y given X and prove E [f (X, Y ) |X ] = hf (X ) a.s. bounded measurable f : X Y R, where hf (x) := E [f (x, Y )] for all x X, i.e. E [f (X, Y ) |X ] = E [f (x, Y ) |X ] |x=X a.s.
Thus Eq. (18.18) holds and this uniquely determines F follows from Lemma 18.2. If 0 f g, then EG f = lim EG [f n] lim EG [g n] = EG g a.s.
n n
and so EG still preserves order. Item 2. Suppose that, almost surely, 0 fn fn+1 for all n, then EG fn is a.s. increasing in n. Hence, again by two applications of the MCT, for any A G , we have E lim EG fn : A = lim E [EG fn : A] = lim E [fn : A]
n n n
= E lim fn : A = E EG
n
lim fn : A
from which it follows that limn EG fn = EG [limn fn ] a.s. Item 1. We have already proved property 2. The other properties are also easily proved by simple limiting arguments. Indeed if 0 g Gb and f 0, then by cMCT, EG [gf ] = lim EG [g (f n)] = lim g EG [(f n)] = g EG f a.s.
n n
Similarly by cMCT, EG0 EG1 f = EG0 lim EG1 (f n) = lim EG0 EG1 (f n)
n n
and (18.19) EG1 EG0 f = EG1 lim EG0 (f n) = lim EG1 EG0 [f n]
n n
4. Conditional Dominated Convergence (cDCT). If fn f a.s. and |fn | g L1 (, B , P ) , then EG fn EG f a.s. Remark 18.16. Regarding item 4. above. Suppose that fn f, |fn | gn P L1 (, B , P ) , gn g L1 (, B , P ) and Egn Eg. Then by the DCT in Corollary 11.8, we know that fn f in L1 (, B , P ) . Since EG is a contraction, it follows that EG fn EG f in L1 (, B , P ) and hence in probability. Proof. Since f n L1 (, B , P ) and f n is increasing, it follows that F := limn EG [f n] exists a.s. Moreover, by two applications of the standard MCT, we have for any A G , that
P
Item 3. For 0 fn , let gk := inf nk fn . Then gk fk for all k and gk lim inf n fn and hence by cMCT and item 1., EG lim inf fn = lim EG gk lim inf EG fk a.s.
n k k
Item 4. As usual it suces to consider the real case. Let fn f a.s. and |fn | g a.s. with g L1 (, B , P ) . Then following the proof of the Dominated convergence theorem, we start with the fact that 0 g fn a.s. for all n. Hence by cFatou,
macro: svmonob.cls date/time: 23-Feb-2007/15:20
Page: 207
job: prob
208
18 Conditional Expectation
n
where the above equations hold a.s. Cancelling EG g from both sides of the equation then implies lim sup EG (fn ) EG f lim inf EG (fn ) a.s.
n n
Fn (x, t) := Theorem 18.17 (Conditional Jensens inequality). Let (, B , P ) be a probability space, a < b , and : (a, b) R be a convex function. Assume f L1 (, B , P ; R) is a random variable satisfying, f (a, b) a.s. and (f ) L1 (, B , P ; R). Then (EG f ) EG [(f )] a.s. (18.20)
k=
is M BR /BR measurable. Using the right continuity assumption, it follows that F (x, t) = limn Fn (x, t) for all (x, t) X R and therefore F is also M BR /BR measurable. Theorem 18.20. Suppose that (X, M) is a measurable space, X : X is a measurable function and Y : R is a random variable. Then there exits a probability kernel, Q, on X R such that E [f (Y ) |X ] = Q (X, f ) , P a.s., for all bounded measurable functions, f : R R. Proof. For each r Q, let qr : X [0, 1] be a measurable function such that E [1Y r |X ] = qr (X ) a.s. Let := P X 1 . Then using the basic properties of conditional expectation, qr qs a.s. for all r s, limr qr = 1 and limr qr = 0, a.s. Hence the set, X0 X where qr (x) qs (x) for all r s, limr qr (x) = 1, and limr qr (x) = 0 satises, (X0 ) = P (X X0 ) = 1. For t R, let F (x, t) := 1X0 (x) inf {qr (x) : r > t} + 1X\X0 (x) 1t0 . Then F (, t) : X R is measurable for each t R and F (x, ) is a distribution function on R for each x X. Hence an application of Lemma 18.19 shows F : X R [0, 1] is measurable. For each x X and B BR , let Q (x, B ) = F (x,) (B ) where F denotes the probability measure on R determined by a distribution function, F : R [0, 1] . We claim that Q is the desired probability kernel. To prove this, let H be the collection of bounded measurable functions, f : R R, such that X x Q (x, f ) R is measurable and E [f (Y ) |X ] = Q (X, f ) , P a.s. It is easily seen that H is a linear subspace which is closed under bounded convergence. We will nish the proof by showing that H contains the multiplicative class, M = 1(,t] : t R . Notice that Q x, 1(,t] = F (x, t) is measurable. Now let r Q and g : X R be a bounded measurable function, then
Proof. Let := Q (a, b) a countable dense subset of (a, b) . By Theorem 11.38 (also see Lemma 7.31) and Figure 7.2 when is C 1 ) (y ) (x) + (x)(y x) for all for all x, y (a, b) , where (x) is the left hand derivative of at x. Taking y = f and then taking conditional expectations imply, EG [(f )] EG (x) + (x)(f x) = (x) + (x)(EG f x) a.s. Since this is true for all x (a, b) (and hence all x in the countable set, ) we may conclude that EG [(f )] sup (x) + (x)(EG f x) a.s.
x
By Exercise 18.1, EG f (a, b) , and hence it follows from Corollary 11.39 that
x
Combining the last two estimates proves Eq. (18.20). Corollary 18.18. The conditional expectation operator, EG maps Lp (, B , P ) into Lp (, B , P ) and the map remains a contraction for all 1 p . Proof. The case p = and p = 1 have already been covered in Theorem 18.4. So now suppose, 1 < p < , and apply Jensens inequality with (x) = p p p |x| to nd |EG f | EG |f | a.s. Taking expectations of this inequality gives the desired result.
Page: 208 job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
209
E [1Y r g (X )] = E [E [1Y r |X ] g (X )] = E [qr (X ) g (X )] = E [qr (X ) 1X0 (X ) g (X )] . For t R, we may let r t in the above equality (use DCT) to learn, E [1Y t g (X )] = E [F (X, t) 1X0 (X ) g (X )] = E [F (X, t) g (X )] . Since g was arbitrary, we may conclude that Q X, 1(,t] = F (X, t) = E [1Y t |X ] a.s. This completes the proof. This result leads fairly immediately to the following far reaching generalization. Theorem 18.21. Suppose that (X, M) is a measurable space and (Y, N ) is a standard Borel space, see Appendix 18.4 below. Suppose that X : X and Y : Y are measurable functions. Then there exits a probability kernel, Q, on X Y such that E [f (Y ) |X ] = Q (X, f ) , P a.s., for all bounded measurable functions, f : Y R. Proof. By denition of a standard Borel space, we may assume that Y BR and N = BY . In this case Y may also be viewed to be a measurable map form R such that Y ( ) Y. By Theorem 18.20, we may nd a probability kernel, Q0 , on X R such that E [f (Y ) |X ] = Q0 (X, f ) , P a.s., for all bounded measurable functions, f : R R. Taking f = 1Y in Eq. (18.21) shows 1 = E [1Y (Y ) |X ] = Q0 (X, Y) a.s.. Thus if we let X0 := {x X : Q0 (x, Y) = 1} , we know that P (X X0 ) = 1. Let us now dene Q (x, B ) := 1X0 (x) Q0 (x, B ) + 1X\X0 (x) y (B ) for (x, B ) X BY , where y is an arbitrary but xed point in Y. Then and hence Q is a probability kernel on X Y. Moreover if B BY BR , then Q (X, B ) = 1X0 (X ) Q0 (X, B ) = 1X0 (X ) E [1B (Y ) |X ] = E [1B (Y ) |X ] a.s. This shows that Q is the desired regular conditional probability. Corollary 18.22. Suppose G is a sub- algebra, (Y, N ) is a standard Borel space, and Y : Y is a measurable function. Then there exits a probability kernel, Q, on (, G ) (Y, N ) such that E [f (Y ) |G ] = Q (, f ) , P - a.s. for all bounded measurable functions, f : Y R. Proof. This is a special case of Theorem 18.21 applied with (X, M) = (, G ) and Y : being the identity map which is B /G measurable.
Page: 209 job: prob
On rst reading, you may wish to skip the rest of this section.
Lemma 18.26. Suppose (X, M) and (Y, N ) are measurable spaces such that X = n=1 Xn , Y = n=1 Yn , with Xn M and Yn N . If (Xn , MXn ) is isomorphic to (Yn , NYn ) for all n then X = Y. Moreover, if (Xn , Mn ) and (Yn , Nn ) are isomorphic measure spaces, then (X := n=1 Xn , n=1 Mn ) are (Y := n=1 Yn , n=1 Nn ) are isomorphic. Proof. For each n N, let fn : Xn Yn be a measure theoretic isomorphism. Then dene f : X Y by f = fn on Xn . Clearly, f : X Y is a bijection and if B N , then
1 1 f 1 (B ) = (B Yn ) = n=1 f n=1 fn (B Yn ) M.
(18.21)
This shows f is measurable and by similar considerations, f 1 is measurable as well. Therefore, f : X Y is the desired measure theoretic isomorphism. For the second assertion, let fn : Xn Yn be a measure theoretic isomorphism of all n N and then dene f (x) = (f1 (x1 ) , f2 (x2 ) , . . . ) with x = (x1 , x2 , . . . ) X. Again it is clear that f is bijective and measurable, since
f 1
n=1
Bn
=
n=1
1 fn (Bn ) n=1 Nn
210
18 Conditional Expectation
Proposition 18.27. Let < a < b < . The following measurable spaces equipped with there Borel algebras are all isomorphic; (0, 1) , [0, 1] , (0, 1], [0, 1), (a, b) , [a, b] , (a, b], [a, b), R, and (0, 1) where is a nite or countable subset of R \ (0, 1) . Proof. It is easy to see by that any bounded open, closed, or half open interval is isomorphic to any other such interval using an ane transformation. Let us now show (1, 1) = [1, 1] . To prove this it suces, by Lemma 18.26,to observe that
Notation 18.28 Suppose (X, M) is a measurable space and A is a set. Let a : X A X denote projection operator onto the ath component of X A (i.e. a ( ) = (a) for all a A) and let MA := (a : a A) be the product algebra on X A . Lemma 18.29. If : A B is a bijection of sets and (X, M) is a measurable space, then X A , MA = X B , MB . Proof. The map f : X B X A dened by f ( ) = for all X B is a bijection with f 1 () = 1 . If a A and X B , we have
X X a f ( ) = f ( ) (a) = ( (a)) = (a) ( ) , X X and b are the projection operators on X A and X B respectively. where a XA XB Thus a f = (a) for all a A which shows f is measurable. Similarly, X X 1 b f 1 = is measurable as well. 1 (b) showing f
B A A B A B
(1, 1) = {0}
n=0
(2n , 2n ] [2n1 , 2n )
[2
n=0
, 2
n1
) (2
n1
,2
] .
(0, 1) =
n=0
(2n1 , 2n ].
The assertion involving R can be proved using the bijection, tan : (/2, /2) R. If = {1} , then by Lemma 18.26 and what we have already proved, (0, 1) {1} = (0, 1] = (0, 1) . Similarly if N N with N 2 and = {2, . . . , N + 1} , then
N 1
Proposition 18.30. Let := {0, 1} , i : {0, 1} be projection onto the ith component, and B := (1 , 2 , . . . ) be the product algebra on . Then (, B ) = (0, 1) , B(0,1) . Proof. We will begin by using a specic binary digit expansion of a point x [0, 1) to construct a map from [0, 1) . To this end, let r1 (x) = x, 1 (x) := 1x21 and r2 (x) := x 21 1 (x) (0, 21 ), then let 2 := 1r2 22 and r3 = r2 22 2 0, 22 . Working inductively, we construct {k (x) , rk (x)}k=1 such that k (x) {0, 1} , and
k
(2n , 2n1 ]
while
N 1
(0, 1) = 0, 2N +1
n=1
2n , 2n1
2n : n = 1, 2, . . . , N
2j j (x) 0, 2k
(18.22)
and so again it follows from what we have proved and Lemma 18.26 that (0, 1) = (0, 1) . Finally if = {2, 3, 4, . . . } is a countable set, we can show (0, 1) = (0, 1) with the aid of the identities,
for all k. Let us now dene g : [0, 1) by g (x) := (1 (x) , 2 (x) , . . . ) . Since each component function, j g = j : [0, 1) {0, 1} , is measurable it follows that g is measurable. By construction,
k
(0, 1) =
n=1
2n , 2n1
2n : n N
x=
j =1
,2
n1
] . x=
n=1
2
j =1
2j j (x) .
(18.23)
Page: 210
job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
211
Hence if we dene f : [0, 1] by f = 2j j , then f (g (x)) = x for all x [0, 1). This shows g is injective, f is surjective, and f in injective on the range of g. We now claim that 0 := g ([0, 1)) , the range of g, consists of those such that i = 0 for innitely many i. Indeed, if there exists an k N such that j (x) = 1 for all j k, then (by Eq. (18.23)) rk+1 (x) = 2k which would contradict Eq. (18.22). Hence g ([0, 1)) 0 . Conversely if 0 and x = f ( ) [0, 1), it is not hard to show inductively that j (x) = j for all j, i.e. g (x) = . For example, if 1 = 1 then x 21 and hence 1 (x) = 1. Alternatively, if 1 = 0, then
j =1
this map is measurable, we have i g 1 : {0, 1} N 1 1 i g () = g () (i) () = (i, ) and hence j i g () = (i, j ) = i,j from which it follows that j i g 1 = {0,1} N i N 1
N N2 N
= {0, 1} is given
2
{0,1}N
()
x=
j =2
2j j <
j =2
2j = 21
and hence g is measurable for all i N and hence g 1 is measurable. N2 N{1,2,...,d} This shows is analogous. = {0, 1} . The proof that d = {0, 1} We may now complete the proof with a couple of applications of Lemma 18.29. Indeed N, N {1, 2, . . . , d} , and N2 all have the same cardinality and therefore, N N{1,2,...,d} N2 {0, 1} = {0, 1} = . = {0, 1}
so that 1 (x) = 0. Hence it follows that r2 (x) = j =2 2j j and by similar reasoning we learn r2 (x) 22 i 2 = 1, i.e. 2 (x) = 1 i 2 = 1. The full induction argument is now left to the reader. Since single point sets are in B and := \ 0 = n=1 { : j = 1 for j n} is a countable set, it follows that B and therefore 0 = \ B . Hence we may now conclude that g : [0, 1), B[0,1) (0 , B0 ) is a measurable bijection with measurable inverse given by f |0 , i.e. [0, 1), B[0,1) = (0 , B0 ) . An application of Lemma 18.26 and Proposition 18.27 now implies = 0 = [0, 1) N = [0, 1) = (0, 1) .
Corollary 18.32. Suppose that (Xn , Mn ) for n N are standard Borel spaces, then X := n=1 Xn equipped with the product algebra, M := n=1 Mn is again a standard Borel space. Proof. Let An B[0,1] be Borel sets on [0, 1] such that there exists a mea surable isomorpohism, fn : Xn An . Then f : X A := n=1 An dened by f (x1 , x2 , . . . ) = (f1 (x1 ) , f2 (x2 ) , . . . ) is easily seen to me a measure theoretic isomorphism when A is equipped with the product algebra, n=1 BAn . So according to Corollary 18.31, to nish the proof it suce to show n=1 BAn = MA N where M := n=1 B[0,1] is the product algebra on [0, 1] . The algebra, n=1 BAn , is generated by sets of the form, B := n=1 Bn where Bn BAn B[0,1] . On the other hand, the algebra, MA is generated where B := B by sets of the form, A B n=1 n with Bn B[0,1] . Since
Corollary 18.31. The following spaces are all isomorphic to (0, 1) , B(0,1) ; d N (0, 1) and Rd for any d N and [0, 1] and RN where both of these spaces are equipped with their natural product algebras, . Proof. In light of Lemma 18.26 and Proposition 18.27 we know that d N N (0, 1) = Rd and (0, 1) = [0, 1] = RN . So, using Proposition 18.30, it sufd N ces to show (0, 1) = = (0, 1) and to do this it suces to show d = and N = . N{1,2,...,d} To reduce the problem further, let us observe that d = {0, 1} 2 2 N N and N be dened by = {0, 1} . For example, let g : N {0, 1} g ( ) (i, j ) = (i) (j ) for all = {0, 1}
N N N
= AB
n=1
n An = B
n=1
Bn
n An is the generic element in BA , we see that BA and where Bn = B n=1 n n MA can both be generated by the same collections of sets, we may conclude that n=1 BAn = MA . Our next goal is to show that any Polish space with its Borel algebra is a standard Borel space. Notation 18.33 Let Q := [0, 1]N denote the (innite dimensional) unit cube in RN . For a, b Q let d(a, b) := 1 1 |an bn | = |n (a) n (b)| . n n 2 2 n=1 n=1
since (i,j )
{0 ,1 }
N2
(18.24)
verse, g 1 : {0, 1}
Page: 211
date/time: 23-Feb-2007/15:20
212
18 Conditional Expectation
Exercise 18.5. Show d is a metric and that the Borel algebra on (Q, d) is the same as the product algebra. Solution to Exercise (18.5). It is easily seen that d is a metric on Q which, by Eq. (18.24) is measurable relative to the product algebra, M.. Therefore, M contains all open balls and hence contains the Borel algebra, B . Conversely, since |n (a) n (b)| 2n d (a, b) , each of the projection operators, n : Q [0, 1] is continuous. Therefore each n is B measurable and hence M = ({n }n=1 ) B . Theorem 18.34. To every separable metric space (X, ), there exists a continuous injective map G : X Q such that G : X G(X ) Q is a homeomorphism. Moreover if the metric, , is also complete, then G (X ) is a G set, i.e. the G (X ) is the countable intersection of open subsets of (Q, d) . In short, any separable metrizable space X is homeomorphic to a subset of (Q, d) and if X is a Polish space then X is homeomorphic to a G subset of (Q, d). Proof. (This proof follows that in Rogers and Williams [4, Theorem 82.5 on p. 106.].) By replacing by 1+ if necessary, we may assume that 0 < 1. Let D = {an }n=1 be a countable dense subset of X and dene G (x) = ( (x, a1 ) , (x, a2 ) , (x, a3 ) , . . . ) Q and (x, y ) = d (G (x) , G (y )) = 1 | (x, an ) (y, an )| n 2 n=1
Now suppose that (X, ) is a complete metric space. Let S := G (X ) and be the metric on S dened by (G (x) , G (y )) = (x, y ) for all x, y X. Then (S, ) is a complete metric (being the isometric image of a complete metric space) and by what we have just prove, = dS . Consequently, if u S and > 0 is given, we may nd () such that B (u, ()) Bd (u, ) . Taking () = min ( () , ) , we have diamd (Bd (u, ())) < and diam (Bd (u, ())) < where diam (A) := {sup (u, v ) : u, v A} and diamd (A) := {sup d (u, v ) : u, v A} . denote the closure of S inside of (Q, d) and for each n N let Let S Nn := {N d : diamd (N ) diam (N S ) < 1/n} and let Un := Nn d . From the previous paragraph, it follows that S Un ( Un ) . and therefore S S n=1 ( Un ) and n N, there exists Nn Nn such Conversely if u S n=1 that u Nn . Moreover, since N1 Nn is an open neighborhood of u S, there exists un N1 Nn S for each n N. From the denition of Nn , we have limn d (u, un ) = 0 and (un , um ) max n1 , m1 0 as m, n . Since (S, ) is complete, it follows that {un }n=1 is convergent in (S, ) to some element u0 S. Since (S, dS ) has the same topology as (S, ) it follows that d (un , u0 ) 0 as well and thus that u = u0 S. We have ( Un ) . This completes the proof because we may now shown, S = S n=1 write S = n=1 S1/n where S1/n := u Q : d u, S < 1/n and therefore, S = ( n=1 Un ) n=1 S1/n is a G set. Corollary 18.35. Every Polish space, X, with its Borel algebra is a standard Borel space. Consequently and Borel subset of X is also a standard Borel space. Proof. Theorem 18.34 shows that X is homeomorphic to a measurable (in fact a G ) subset Q0 of (Q, d) and hence X = Q0 . Since Q is a standard Borel space so is Q0 and hence so is X.
for x, y X. To prove the rst assertion, we must show G is injective and is a metric on X which is compatible with the topology determined by . If G (x) = G (y ) , then (x, a) = (y, a) for all a D. Since D is a dense subset of X, we may choose k D such that 0 = lim (x, k ) = lim (y, k ) = (y, x)
k k
and therefore x = y. A simple argument using the dominated convergence theorem shows y (x, y ) is continuous, i.e. (x, y ) is small if (x, y ) is small. Conversely, (x, y ) (x, an ) + (y, an ) = 2 (x, an ) + (y, an ) (x, an ) 2 (x, an ) + | (x, an ) (y, an )| 2 (x, an ) + 2n (x, y ) . Hence if > 0 is given, we may choose n so that 2 (x, an ) < /2 and so if (x, y ) < 2(n+1) , it will follow that (x, y ) < . This shows = . Since G : (X, ) (Q, d) is isometric, G is a homeomorphism.
Page: 212 job: prob
macro: svmonob.cls
date/time: 23-Feb-2007/15:20
References
1. Patrick Billingsley, Probability and measure, third ed., Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons Inc., New York, 1995, A WileyInterscience Publication. MR MR1324786 (95k:60001) 2. Richard Durrett, Probability: theory and examples, second ed., Duxbury Press, Belmont, CA, 1996. MR MR1609153 (98m:60001) 3. Olav Kallenberg, Foundations of modern probability, second ed., Probability and its Applications (New York), Springer-Verlag, New York, 2002. MR MR1876169 (2002m:60002) 4. L. C. G. Rogers and David Williams, Diusions, Markov processes, and martingales. Vol. 1, Cambridge Mathematical Library, Cambridge University Press, Cambridge, 2000, Foundations, Reprint of the second (1994) edition. MR 2001g:60188 5. S. R. S. Varadhan, Probability theory, Courant Lecture Notes in Mathematics, vol. 7, New York University Courant Institute of Mathematical Sciences, New York, 2001. MR MR1852999 (2003a:60001)