4-SamplingDistribution
4-SamplingDistribution
4-SamplingDistribution
Chiu Yu KO
2024
Chiu Yu KO 1 / 63
Population and sample
Population:
▶ objects we would like to know
▶ e.g., age and incomes of individuals in a city,
satisfaction level of consumers
Sample
▶ subset of population
Goal of inference: use representative sample (small
picture) to make an educated guess on the population
(big picture)
Chiu Yu KO 2 / 63
Population and sample
Population, sample and random sample
Population
• represented by bar chart/histogram
• summarized by (relative) frequency table f (x )
• mean: µ; variance: σ 2
Sample
• an observation from population
Random sample
• a random draw from population
• a random variable with probability function is the same as
frequency table f (x )
• for a sample with a size n, we write X1 , X2 , . . . , Xn
Chiu Yu KO 3 / 63
Simple Random Sample
Chiu Yu KO 4 / 63
Technical. Simple Random Sample
X1 , . . . , Xn is a simple random sample if
• X1 , . . . , Xn are independent random variables, and
• X1 , . . . , Xn follow the same probability function P (x ) or f (x )
Chiu Yu KO 5 / 63
Property of Simple Random Sample
Chiu Yu KO 6 / 63
Technical. Property of Simple Random Sample
• Simple random sample in fact has an even strong property
(beyond mean, variance and covariance):
• Each observation follows the same distribution f (x ) as the
population.
• This includes all summary statistics: same median, mode,
range, IQR, skewness, kurtosis...
Chiu Yu KO 7 / 63
Technical. Other sampling methods
Simple random sample is simple but difficult to achieve in practice:
• Online surveys likely exclude seniors who do not use internet
often
• Samples from offline surveys are likely to be independent due
to geographical correlation (e.g., economic condition, location
preference)
• Advanced sampling method to reduce sampling error:
• Stratified sampling: divide population into subsamples, and do
simple random sample within each subsample, and produce
weighted average across subsamples
Chiu Yu KO 8 / 63
Statistics
Statistics
A function of a sample X1 , ..., Xn
▶ Data summary
▶ Data reduction (simplification)
▶ Examples: sample mean, sample variance
Chiu Yu KO 9 / 63
Sample mean
Sample mean
Sample mean is the mean of a sample.
Pn
i = 1 Xi X1 + . . . + Xn
X̄ = =
n n
Chiu Yu KO 10 / 63
Properties of sample mean
Chiu Yu KO 11 / 63
Expectation of sample mean
µX̄ = E (X̄ ) = µ
Chiu Yu KO 12 / 63
Average sample mean from m samples
Population
X̄1 + · · · + X̄m
E (X̄ ) = =µ
m
Chiu Yu KO 13 / 63
Example
Consider population has three numbers: 1, 2, and 3.
Population mean: µ = 1+23+3 = 2.
First consider sample of size 1.
Simple random sample implies that the sample mean will be one of
the three possibilities with equal probability:
X̄ 1 2 3
P (X̄ ) 13 31 13
Expectation of sample mean is
1 1 1
µX̄ = E (X̄ ) = (1) + (2) + (3) = 2
3 3 3
Chiu Yu KO 14 / 63
Example
Now consider sample of size 2.
Simple random sample implies that the sample mean will be one of
the nine possibilities with equal probability:
X1 \X2 1 2 3
1 1 1.5 2
2 1.5 2 2.5
3 2 2.5 3
Sample mean is X̄ = 2 :
X1 +X2
X̄ 1 1.5 2 2.5 3
P (X̄ ) 19 2
9
3
9
2
9
1
9
Expectation of the sample mean is
1 2 3 2 1
µX̄ = E (X̄ ) = (1) + (1.5) + (2) + (2.5) + (3) = 2
9 9 9 9 9
Chiu Yu KO 15 / 63
Example
Now consider sample of size 3.
Sample mean is X̄ = X1 +X32 +X3 :
X̄ 1 1 13 1 23 2 2 13 2 32 3
P (X̄ ) 27
1 3
27
6
27
7
27
6
27
3
27
1
27
Expectation of the sample mean is
1 1 3 2 6 7
µX̄ =(1) + 1 + 1 + (2)
27 3 27 3 27 27
1 6 2 3 1
+ 2 + 2 + (3) = 2
3 27 3 27 27
Chiu Yu KO 16 / 63
Technical: Mean of sample mean is population mean
Note that E (aX ) = aE (X ) and E (X + Y ) = E (X ) + E (Y )
µX̄ = E (X̄ )
i =1 Xi
Pn !
=E
n
1 n
!
= E Xi
X
n i =1
1X n
= E (Xi )
n i =1
1
= nµ = µ.
n
Chiu Yu KO 17 / 63
Dispersion of sample mean
σ2
σX̄2 = Var (X̄ ) =
n
Chiu Yu KO 18 / 63
Standard error of sample mean
Chiu Yu KO 19 / 63
Example
Consider population has three numbers : 1, 2, and 3.
Population mean: µ = 1+23+3 = 2.
Population variance:
P (X̄ ) 1
3
1
3
1
3
Variance of sample mean is
1 1 1 2
σX̄2 = Var (X̄ ) = (1) + (0) + (1) =
3 3 3 3
Chiu Yu KO 20 / 63
Example
Now consider sample of size 2. We know that E (X̄ ) = 2. Hence,
X̄ 1 1.5 2 2.5 3
(X̄ − E (X̄ ))2 1 0.25 0 0.25 1
P (X̄ ) 1
9
2
9
1
3
2
9
1
3
Variance of sample mean is
1 2 3 2 1
σX̄2 = Var (X̄ ) = (1) + (0.25) + (0) + (0.25) + (1)
9 9 9 9 9
2/3
=
2
Chiu Yu KO 21 / 63
Example
Now consider sample of size 3.
Sample mean is X̄ = X1 +X32 +X3 :
X̄ 1 1 13 1 23 2 2 13 2 23 3
(X̄ − E (X̄ )) 2 1 4
9
1
9 0 1
9
4
9 1
P (X̄ ) 1
27
3
27
6
27
7
27
6
27
3
27
1
27
Variance of sample mean is
1 4 3 1 6 7
σX̄2 =(1) + + + (0)
27 9 27 9 27 27
1 6 4 3 1 2/3
+ + + (1) =
9 27 9 27 27 3
Chiu Yu KO 22 / 63
Technical: Variance of sample mean
Note that Var (aX ) = a2 Var (X ).
Note also that Var (X + Y ) = Var (X ) + Var (Y ) when X and Y
are independent.
Chiu Yu KO 23 / 63
When sample size gets larger
Chiu Yu KO 24 / 63
Law of large numbers
Let X1 , . . . , Xn be a random sample from a distribution
with mean µ and variance σ 2 . Denote X̄ = X1 +...n +Xn .
Chiu Yu KO 25 / 63
Law of large numbers
Chiu Yu KO 26 / 63
Technical: Markov inequality
Consider Pr(X ≥ 0) = 1, then all t > 0
E (X ) = x Pr(x ) + x Pr(x )
X X
x <t x >t
x Pr(x )
X
≥
x >t
≥ t Pr(x )
E (X )
Pr(X ≥ t ) ≤
t
Chiu Yu KO 27 / 63
Technical: Chebyshev inequality
Consider Y = (X − E (X ))2 . Then
Pr(|X − E (X )| ≥ t ) = Pr(Y 2 ≥ t 2 )
E (Y )
≥
t2
Var (X )
=
t2
where the inequality follows from Markov inequality.
Hence, we have Chebyshev inequality: for all t > 0
Var (X )
Pr(|X − E (X )| ≥ t ) ≤
t2
Chiu Yu KO 28 / 63
Technical: Law of Large Numbers
Recall from Chebyshev inequality: for all t > 0
Var (X )
Pr(|X − E (X )| ≥ t ) ≤ .
t2
σ2
Pr(|X̄n − µ| ≥ t ) ≤ .
nt 2
The LLN follows from taking the limit, and complement rule.
Chiu Yu KO 29 / 63
When sample size gets larger
Chiu Yu KO 30 / 63
Example: distribution of sample mean
Consider population has three numbers: 1, 2, and 3.
Sample of size 1:
X̄ 1 2 3
P (X̄ ) 13 31 1
3
Sample size of 2:
X̄ 1 1.5 2 2.5 3
P (X̄ ) 19 2
9
3
9
2
9
1
9
Sample size of 3:
X̄ 1 1 13 1 23 2 2 13 2 32 3
P (X̄ ) 271 3
27
6
27
7
27
6
27
3
27
1
27
Sample size of 4:
X̄ 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
P (X̄ ) 81
1 5
81
15
81
30
81
45
81
51
81
45
81
30
81
15
81
5
81
1
81
Chiu Yu KO 31 / 63
Central Limit Theorem
▶ sample mean approximately follows a normal
distribution with a large enough sample.
Central Limit Theorem
When n gets large, we have
σ2
X̄ ∼ N µ,
n
or
X̄ − µ X̄ − µ
Z = = √ ∼ N (0, 1)
σX̄ σ/ n
X̄ − 5 4−5
Pr(X̄ ≤ 4) = Pr( √ ≤ √ )
8/ 100 8/ 100
4−5
= Pr(Z ≤ √ )
8/ 100
= Pr(Z ≤ −1.25)
≈ 0.1057
Chiu Yu KO 33 / 63
Graphics for CLT left-tail
−zα = −1.25
α = 0.1057
z
−4 −2 0 2 4
Chiu Yu KO 34 / 63
Special case: binary data
Chiu Yu KO 35 / 63
Technical. Mean and variance of binary data
Suppose Xi is either 0 or 1. Consider a fraction π of them are 1 and
1 − π of 0.
That is we have X1 = . . . = Xπn = 1 and Xπn+1 = . . . = Xn = 0
Population mean:
X1 + . . . + Xπn + Xπn+1 + . . . + Xn πn
µ= = =π
n n
Population variance:
(X1 − π )2 + . . . + (Xπn − π )2
+(Xπn+1 − π )2 + . . . + (Xn − π )2
σ2 =
n
(1 − π ) πn + (0 − π )2 (1 − π )n
2
= = π (1 − π )
n
Chiu Yu KO 36 / 63
Sample proportion
Chiu Yu KO 37 / 63
Expectation of sample proportion
Expectation of sample proportion is population
proportion:
Expectation of sample proportion
µp = E (p ) = π
Chiu Yu KO 38 / 63
Dispersion of sample proportion
Chiu Yu KO 39 / 63
Central limit theorem
Central limit theorem for binary data
When n gets large, we have
Pn
i = 1 Xi π (1 − π )
p = X̄ = ∼ N π,
n n
or
p−π p−π
Z = =r ∼ N (0, 1)
σp π (1−π )
n
X ∼ N (np, np (1 − p ))
Chiu Yu KO 41 / 63
Proof. Binomial approximation
Let Xi are either 0 and 1 with probabilities p and 1 − p. If np and
n (1 − p ) are at least 5, central limit theorem applies
Xi p (1 − p )
P
p = X̄ = i
∼ N (p, )
n n
which implies that
n
Xi ∼ N (np, np (1 − p )) .
X
i =1
i =1
55 − 100(0.6)
Pr(X < 55) = Pr(Z < q )
100(0.6)(0.4)
55 − 60
= Pr(Z < √ )
24
= Pr(Z < −1.02) ≈ 0.1539
Chiu Yu KO 43 / 63
Graphics for normal approximation
−zα = −1.02
α = 0.1539
z
−4 −2 0 2 4
Chiu Yu KO 44 / 63
Unknown population variance
X̄ − µ X̄ − µ
= √ ∼ N (0, 1)
σX̄ σ/ n
Chiu Yu KO 45 / 63
Use sample variance
▶ Use sample variance:
1 X n
s2 = (Xi − X̄ )2
n − 1 i =1
Chiu Yu KO 47 / 63
Normality assumption
X̄ −µ
In general, we do not know the distribution of s/ √ .
n
We need an extra assumption on the population:
Normality assumption
Population follows normal distribution with mean µ and
variance σ 2 :
X ∼ N (µ, σ 2 )
Chiu Yu KO 48 / 63
Sample mean under normality assumption
X̄ = ∼ N µ,
n n
Chiu Yu KO 49 / 63
Technical. Normality assumption
• Assume the population follows the normal distribution with
mean µ and variance σ 2
• Random sampling implies each observation i follows the same
distribution: Xi ∼ N (µ, σ 2 )
Pn
• Hence, we have X̄ = ∼ N (µ, σn ).
2
i = 1 Xi
n
Chiu Yu KO 50 / 63
t-distribution
X̄ − µ X̄ − µ
= √ ∼ tn−1
sX̄ s/ n
that is, t distribution with n − 1 degrees of freedom
Chiu Yu KO 51 / 63
Example
Consider a population follows normal distribution with mean 5.
We have a simple random sample with size 100.
We know that sample variance 8.
What is the probability that the sample mean is less than 4.5?
Note that √X̄8/100
−5
∼ t99 . Then
Chiu Yu KO 52 / 63
Example: t table
Recall that we need to find
−t99,α = −1.768
α = 0.0401
t99
−4 −2 0 2 4
Chiu Yu KO 54 / 63
Graph for t-table
1−α
Tn
−tα/2 0 tα/2
Chiu Yu KO 55 / 63
Two tail t-table for small n
This is a typical two-tail t-table showing Pr(−t < tn < t ) = 1 − α:
n/α 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
1 1.0000 1.3764 1.9626 3.0777 6.3138 12.7062 31.8205 63.6567 636.6192
2 0.8165 1.0607 1.3862 1.8856 2.9200 4.3027 6.9646 9.9248 31.5991
3 0.7649 0.9785 1.2498 1.6377 2.3534 3.1824 4.5407 5.8409 12.9240
4 0.7407 0.9410 1.1896 1.5332 2.1318 2.7764 3.7469 4.6041 8.6103
5 0.7267 0.9195 1.1558 1.4759 2.0150 2.5706 3.3649 4.0321 6.8688
6 0.7176 0.9057 1.1342 1.4398 1.9432 2.4469 3.1427 3.7074 5.9588
7 0.7111 0.8960 1.1192 1.4149 1.8946 2.3646 2.9980 3.4995 5.4079
8 0.7064 0.8889 1.1081 1.3968 1.8595 2.3060 2.8965 3.3554 5.0413
9 0.7027 0.8834 1.0997 1.3830 1.8331 2.2622 2.8214 3.2498 4.7809
10 0.6998 0.8791 1.0931 1.3722 1.8125 2.2281 2.7638 3.1693 4.5869
11 0.6974 0.8755 1.0877 1.3634 1.7959 2.2010 2.7181 3.1058 4.4370
12 0.6955 0.8726 1.0832 1.3562 1.7823 2.1788 2.6810 3.0545 4.3178
13 0.6938 0.8702 1.0795 1.3502 1.7709 2.1604 2.6503 3.0123 4.2208
14 0.6924 0.8681 1.0763 1.3450 1.7613 2.1448 2.6245 2.9768 4.1405
15 0.6912 0.8662 1.0735 1.3406 1.7531 2.1314 2.6025 2.9467 4.0728
16 0.6901 0.8647 1.0711 1.3368 1.7459 2.1199 2.5835 2.9208 4.0150
17 0.6892 0.8633 1.0690 1.3334 1.7396 2.1098 2.5669 2.8982 3.9651
18 0.6884 0.8620 1.0672 1.3304 1.7341 2.1009 2.5524 2.8784 3.9216
19 0.6876 0.8610 1.0655 1.3277 1.7291 2.0930 2.5395 2.8609 3.8834
20 0.6870 0.8600 1.0640 1.3253 1.7247 2.0860 2.5280 2.8453 3.8495
Chiu Yu KO 56 / 63
Two tail t-table for middle n
This is a typical two-tail t-table showing Pr(−t < tn < t ) = 1 − α:
n/α 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
21 0.6864 0.8591 1.0627 1.3232 1.7207 2.0796 2.5176 2.8314 3.8193
22 0.6858 0.8583 1.0614 1.3212 1.7171 2.0739 2.5083 2.8188 3.7921
23 0.6853 0.8575 1.0603 1.3195 1.7139 2.0687 2.4999 2.8073 3.7676
24 0.6848 0.8569 1.0593 1.3178 1.7109 2.0639 2.4922 2.7969 3.7454
25 0.6844 0.8562 1.0584 1.3163 1.7081 2.0595 2.4851 2.7874 3.7251
26 0.6840 0.8557 1.0575 1.3150 1.7056 2.0555 2.4786 2.7787 3.7066
27 0.6837 0.8551 1.0567 1.3137 1.7033 2.0518 2.4727 2.7707 3.6896
28 0.6834 0.8546 1.0560 1.3125 1.7011 2.0484 2.4671 2.7633 3.6739
29 0.6830 0.8542 1.0553 1.3114 1.6991 2.0452 2.4620 2.7564 3.6594
30 0.6828 0.8538 1.0547 1.3104 1.6973 2.0423 2.4573 2.7500 3.6460
40 0.6807 0.8507 1.0500 1.3031 1.6839 2.0211 2.4233 2.7045 3.5510
50 0.6794 0.8489 1.0473 1.2987 1.6759 2.0086 2.4033 2.6778 3.4960
60 0.6786 0.8477 1.0455 1.2958 1.6706 2.0003 2.3901 2.6603 3.4602
70 0.6780 0.8468 1.0442 1.2938 1.6669 1.9944 2.3808 2.6479 3.4350
80 0.6776 0.8461 1.0432 1.2922 1.6641 1.9901 2.3739 2.6387 3.4163
90 0.6772 0.8456 1.0424 1.2910 1.6620 1.9867 2.3685 2.6316 3.4019
100 0.6770 0.8452 1.0418 1.2901 1.6602 1.9840 2.3642 2.6259 3.3905
120 0.6765 0.8446 1.0409 1.2886 1.6577 1.9799 2.3578 2.6174 3.3735
140 0.6762 0.8442 1.0403 1.2876 1.6558 1.9771 2.3533 2.6114 3.3614
160 0.6760 0.8439 1.0398 1.2869 1.6544 1.9749 2.3499 2.6069 3.3524
Chiu Yu KO 57 / 63
Two tail t-table for large n
This is a typical two-tail t-table showing Pr(−t < tn < t ) = 1 − α’:
n/α 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
180 0.6759 0.8436 1.0394 1.2863 1.6534 1.9732 2.3472 2.6034 3.3454
190 0.6758 0.8435 1.0393 1.2860 1.6529 1.9725 2.3461 2.6020 3.3425
200 0.6757 0.8434 1.0391 1.2858 1.6525 1.9719 2.3451 2.6006 3.3398
250 0.6755 0.8431 1.0386 1.2849 1.6510 1.9695 2.3414 2.5956 3.3299
300 0.6753 0.8428 1.0382 1.2844 1.6499 1.9679 2.3388 2.5923 3.3233
400 0.6751 0.8425 1.0378 1.2837 1.6487 1.9659 2.3357 2.5882 3.3150
500 0.6750 0.8423 1.0375 1.2832 1.6479 1.9647 2.3338 2.5857 3.3101
600 0.6749 0.8422 1.0373 1.2830 1.6474 1.9639 2.3326 2.5840 3.3068
700 0.6748 0.8421 1.0372 1.2828 1.6470 1.9634 2.3317 2.5829 3.3045
800 0.6748 0.8421 1.0371 1.2826 1.6468 1.9629 2.3310 2.5820 3.3027
900 0.6748 0.8420 1.0370 1.2825 1.6465 1.9626 2.3305 2.5813 3.3014
1000 0.6747 0.8420 1.0370 1.2824 1.6464 1.9623 2.3301 2.5808 3.3003
1200 0.6747 0.8419 1.0369 1.2823 1.6461 1.9619 2.3295 2.5799 3.2987
1400 0.6747 0.8419 1.0368 1.2822 1.6459 1.9617 2.3290 2.5793 3.2975
1600 0.6746 0.8418 1.0368 1.2821 1.6458 1.9614 2.3287 2.5789 3.2966
1800 0.6746 0.8418 1.0367 1.2820 1.6457 1.9613 2.3284 2.5786 3.2959
2000 0.6746 0.8418 1.0367 1.2820 1.6456 1.9612 2.3282 2.5783 3.2954
3000 0.6746 0.8417 1.0366 1.2818 1.6454 1.9608 2.3276 2.5775 3.2938
4000 0.6746 0.8417 1.0366 1.2818 1.6452 1.9606 2.3273 2.5771 3.2930
5000 0.6745 0.8417 1.0365 1.2817 1.6452 1.9604 2.3271 2.5768 3.2925
Chiu Yu KO 58 / 63
Technical: Two important properties
Here we provide two important properties without proof.
• Consider X1 , . . . , Xn are random samples from N (µ, σ 2 )
• Denote X̄ = n1 ni=1 Xi and s 2 = n−1 1 Pn
i =1 (Xi − X̄ )
2
P
Then: P
(Xi −X̄ )2
•
σ2
∼ χ2n−1
• X̄ and s 2 are independent random variables.
Chiu Yu KO 59 / 63
Technical: Sample mean and t-distribution
• From the first property, we have
s2 (Xi − X̄ )2 /σ 2 χ2
P
= = n−1
σ2 (n − 1) n−1
Chiu Yu KO 60 / 63
Technical: Normal distribution for large sample
When the sample size is very large, we have standard normal
distribution as t-distribution is closed to standard normal.
X̄ − µ
√ ∼ N (0, 1)
s/ n
Chiu Yu KO 61 / 63
Technical: Normality assumption for t-distribution
Student-t distribution requires normality assumption
• graphical: boxplot, histogram, Q-Q plot (scatter plot of
quantiles of your sample v.s. standard normal)
• numerical: check if skewness and kurtosis around 0 and 3
• format test: Jarque-Bera test (using skewness and kurtosis)
Chiu Yu KO 62 / 63
Video Links
▶ Sampling
▶ Sample mean
▶ Expectation of sample mean
▶ Variance of sample mean
▶ Central limit theorem
▶ Proportion
▶ Unknown variance - t-distribution
Chiu Yu KO 63 / 63