4-SamplingDistribution

Sampling Distribution
Chiu Yu KO
2024
Chiu Yu KO 1 / 63
Population and sample
Population:
▶ objects we would like to know
▶ e.g., age and incomes of individuals in a city,
satisfaction level of consumers
Sample
▶ subset of population
Goal of inference: use representative sample (small
picture) to make an educated guess on the population
(big picture)
Chiu Yu KO 2 / 63
Population and sample
Population, sample and random sample
Population
• represented by bar chart/histogram
• summarized by (relative) frequency table f (x )
• mean: µ; variance: σ 2
Sample
• an observation from population
Random sample
• a random draw from population
• a random variable with probability function is the same as
frequency table f (x )
• for a sample with a size n, we write X1 , X2 , . . . , Xn
Chiu Yu KO 3 / 63
Simple Random Sample
Simple Random Sample: most basic random sample

▶ Each element has equal probability being selected.
▶ Each element is selected independently.
Chiu Yu KO 4 / 63
Technical. Simple Random Sample
X1 , . . . , Xn is a simple random sample if
• X1 , . . . , Xn are independent random variables, and
• X1 , . . . , Xn follow the same probability function P (x ) or f (x )
Chiu Yu KO 5 / 63
Property of Simple Random Sample
Consider a population with mean µ and variance σ 2 .

Property of Simple Random Sample
If X1 , . . . , Xn is a simple random sample, then
▶ E (Xi ) = µ,
▶ Var (Xi ) = σ 2 , and
▶ Cov (Xi , Xj ) = 0 for i ̸= j.
Chiu Yu KO 6 / 63
Technical. Property of Simple Random Sample
• Simple random sample in fact has an even strong property
(beyond mean, variance and covariance):
• Each observation follows the same distribution f (x ) as the
population.
• This includes all summary statistics: same median, mode,
range, IQR, skewness, kurtosis...
Chiu Yu KO 7 / 63
Technical. Other sampling methods
Simple random sample is simple but difficult to achieve in practice:
• Online surveys likely exclude seniors who do not use internet
often
• Samples from offline surveys are likely to be independent due
to geographical correlation (e.g., economic condition, location
preference)
• Advanced sampling method to reduce sampling error:
• Stratified sampling: divide population into subsamples, and do
simple random sample within each subsample, and produce
weighted average across subsamples
Chiu Yu KO 8 / 63
Statistics
Statistics
A function of a sample X1 , ..., Xn
▶ Data summary
▶ Data reduction (simplification)
▶ Examples: sample mean, sample variance
Chiu Yu KO 9 / 63
Sample mean
Sample mean
Sample mean is the mean of a sample.
Pn
i = 1 Xi X1 + . . . + Xn
X̄ = =
n n
▶ This varies sample by sample.

▶ Hence, it is also a random variable.
Chiu Yu KO 10 / 63
Properties of sample mean
Sample mean X̄ is useful to guess population mean µ:

▶ Expectation: E (X̄ )
▶ Variance: Var (X̄ )
▶ (Limiting) Distribution: P (x ) or f (x )
▶ Law of large number
▶ Central limit theorem
▶ Unknown variance: normality assumption and
t-distribution
▶ Special case: Binary data
Chiu Yu KO 11 / 63
Expectation of sample mean
Expectation of sample mean

Expectation of sample mean is population mean
µX̄ = E (X̄ ) = µ
▶ If we sample many times, average of all sample

means is the population mean
▶ This nice property is known as unbiasedness (see
next chapter)
Chiu Yu KO 12 / 63
Average sample mean from m samples
Population
Sample 1 Sample 2 ..... Sample m
X̄1 X̄2 ...... X̄m
X̄1 + · · · + X̄m
E (X̄ ) = =µ
m
Chiu Yu KO 13 / 63
Example
Consider population has three numbers: 1, 2, and 3.
Population mean: µ = 1+23+3 = 2.
First consider sample of size 1.
Simple random sample implies that the sample mean will be one of
the three possibilities with equal probability:
X̄ 1 2 3
P (X̄ ) 13 31 13
Expectation of sample mean is
1 1 1
µX̄ = E (X̄ ) = (1) + (2) + (3) = 2
3 3 3
Chiu Yu KO 14 / 63
Example
Now consider sample of size 2.
Simple random sample implies that the sample mean will be one of
the nine possibilities with equal probability:
X1 \X2 1 2 3
1 1 1.5 2
2 1.5 2 2.5
3 2 2.5 3
Sample mean is X̄ = 2 :
X1 +X2
X̄ 1 1.5 2 2.5 3
P (X̄ ) 19 2
9
3
9
2
9
1
9
Expectation of the sample mean is
1 2 3 2 1
µX̄ = E (X̄ ) = (1) + (1.5) + (2) + (2.5) + (3) = 2
9 9 9 9 9
Chiu Yu KO 15 / 63
Example
Sample mean is X̄ = X1 +X32 +X3 :
X̄ 1 1 13 1 23 2 2 13 2 32 3
P (X̄ ) 27
1 3
27
6
27
7
27
6
27
3
27
1
27
Expectation of the sample mean is
1 1 3 2 6 7

µX̄ =(1) + 1 + 1 + (2)
27 3 27 3 27 27
1 6 2 3 1

+ 2 + 2 + (3) = 2
3 27 3 27 27
Chiu Yu KO 16 / 63
Technical: Mean of sample mean is population mean
Note that E (aX ) = aE (X ) and E (X + Y ) = E (X ) + E (Y )
µX̄ = E (X̄ )
i =1 Xi
Pn !
=E
n
1 n
!
= E Xi
X
n i =1
1X n
= E (Xi )
n i =1
1
= nµ = µ.
n
Chiu Yu KO 17 / 63
Dispersion of sample mean
Variance of sample mean

Population variance divided by sample size:
σ2
σX̄2 = Var (X̄ ) =
n
Note that spread of sample mean is proportional to the

population variance but inversely proportional to the
sample size.
Chiu Yu KO 18 / 63
Standard error of sample mean
Standard deviation of a statistics is often called standard

error:
Standard error of the mean
Standard deviation of the sample mean
σ
σX̄ = √
n
Chiu Yu KO 19 / 63
Example
Consider population has three numbers : 1, 2, and 3.
Population mean: µ = 1+23+3 = 2.
Population variance:
(1 − 2)2 + (2 − 2)2 + (3 − 2)2 2

σ2 = =
3 3
First consider sample of size 1. We know that E (X̄ ) = 2. Hence,
X̄ 1 2 3
(X̄ − E (X̄ )) 1 0 1
2
P (X̄ ) 1
3
1
3
1
3
Variance of sample mean is
1 1 1 2
σX̄2 = Var (X̄ ) = (1) + (0) + (1) =
3 3 3 3
Chiu Yu KO 20 / 63
Example
Now consider sample of size 2. We know that E (X̄ ) = 2. Hence,
X̄ 1 1.5 2 2.5 3
(X̄ − E (X̄ ))2 1 0.25 0 0.25 1
P (X̄ ) 1
9
2
9
1
3
2
9
1
3
1 2 3 2 1
σX̄2 = Var (X̄ ) = (1) + (0.25) + (0) + (0.25) + (1)
9 9 9 9 9
2/3
=
2
Chiu Yu KO 21 / 63
Example
Sample mean is X̄ = X1 +X32 +X3 :
X̄ 1 1 13 1 23 2 2 13 2 23 3
(X̄ − E (X̄ )) 2 1 4
9
1
9 0 1
9
4
9 1
P (X̄ ) 1
27
3
27
6
27
7
27
6
27
3
27
1
27
1 4 3 1 6 7

σX̄2 =(1) + + + (0)
27 9 27 9 27 27
1 6 4 3 1 2/3

+ + + (1) =
9 27 9 27 27 3
Chiu Yu KO 22 / 63
Technical: Variance of sample mean
Note that Var (aX ) = a2 Var (X ).
Note also that Var (X + Y ) = Var (X ) + Var (Y ) when X and Y
are independent.
σX̄2 = Var (X̄ )

i =1 Xi
Pn !
= Var
n
1
= Var ( ni=1 Xi )
P
n 2
1P
= 2 ni=1 Var (Xi )
n
1 σ2
= 2 nσ 2 =
n n
Chiu Yu KO 23 / 63
When sample size gets larger
Law of large numbers

▶ As sample size n enlarges, Var (X̄ ) = σn shrinks.
2
▶ Moreover, variance vanishes as n goes to infinity

(Var (X̄ ) → 0 as n → ∞).
▶ As E (X̄ ) = µ, we have X̄ eventually very close to µ
as n gets large (X̄ → µ).
Chiu Yu KO 24 / 63
Let X1 , . . . , Xn be a random sample from a distribution
with mean µ and variance σ 2 . Denote X̄ = X1 +...n +Xn .
Law of Large Numbers

For any ϵ > 0, when n is sufficientlly large, we have
Pr(µ − ϵ < X̄ < µ + ϵ) ≈ 1
▶ Loosely speaking, when sample size is large,

variation disappears.
▶ the sample mean becomes population mean
Chiu Yu KO 25 / 63
Law of Large Numbers

For any ϵ > 0, we have
lim Pr(|X̄ − µ| < ϵ) = 1

n→∞
• Probability that the sample mean X̄ is within small (ϵ)

distance from population meanµ approaches 1.
• Roughly speaking, with a larger sample, sample mean is closer
to population mean, and it can be as close as we want.
Chiu Yu KO 26 / 63
Technical: Markov inequality
Consider Pr(X ≥ 0) = 1, then all t > 0
E (X ) = x Pr(x ) + x Pr(x )
X X
x <t x >t
x Pr(x )
X
≥
x >t
≥ t Pr(x )
Hence, we have Markov inequality
E (X )
Pr(X ≥ t ) ≤
t
Chiu Yu KO 27 / 63
Technical: Chebyshev inequality
Consider Y = (X − E (X ))2 . Then
Pr(|X − E (X )| ≥ t ) = Pr(Y 2 ≥ t 2 )
E (Y )
≥
t2
Var (X )
=
t2
where the inequality follows from Markov inequality.
Hence, we have Chebyshev inequality: for all t > 0
Var (X )
Pr(|X − E (X )| ≥ t ) ≤
t2
Chiu Yu KO 28 / 63
Technical: Law of Large Numbers
Recall from Chebyshev inequality: for all t > 0
Var (X )
Pr(|X − E (X )| ≥ t ) ≤ .
t2
Then, since E (X̄n ) = µ and Var (X̄n ) = σ2

n
σ2
Pr(|X̄n − µ| ≥ t ) ≤ .
nt 2
The LLN follows from taking the limit, and complement rule.
Chiu Yu KO 29 / 63
When sample size gets larger
Central limit theorem:

▶ Law of large numbers says that sample mean X̄ is
eventually close to µ.
▶ What is the distribution function of sample mean
P (X̄ ) as sample size n gets larger?
▶ Always normal distribution regardless of how
population looks like.
Chiu Yu KO 30 / 63
Example: distribution of sample mean
Consider population has three numbers: 1, 2, and 3.
Sample of size 1:
X̄ 1 2 3
P (X̄ ) 13 31 1
3
Sample size of 2:
X̄ 1 1.5 2 2.5 3
P (X̄ ) 19 2
9
3
9
2
9
1
9
Sample size of 3:
X̄ 1 1 13 1 23 2 2 13 2 32 3
P (X̄ ) 271 3
27
6
27
7
27
6
27
3
27
1
27
Sample size of 4:
X̄ 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
P (X̄ ) 81
1 5
81
15
81
30
81
45
81
51
81
45
81
30
81
15
81
5
81
1
81
Chiu Yu KO 31 / 63
Central Limit Theorem
▶ sample mean approximately follows a normal
distribution with a large enough sample.
Central Limit Theorem
When n gets large, we have
σ2 
 
X̄ ∼ N µ,
n

or
X̄ − µ X̄ − µ
Z = = √ ∼ N (0, 1)
σX̄ σ/ n
Rule of thumb: sample size n is at least 35.

Chiu Yu KO 32 / 63
Example
Consider a population with mean 5 and variance 64.
Consider a sample with size 100.
What is the probability that the sample mean is no more than 4?
By central limit theorem, we have X̄ ∼ N (5, 100
64
):
X̄ − 5 4−5
Pr(X̄ ≤ 4) = Pr( √ ≤ √ )
8/ 100 8/ 100
4−5
= Pr(Z ≤ √ )
8/ 100
= Pr(Z ≤ −1.25)
≈ 0.1057
Chiu Yu KO 33 / 63
Graphics for CLT left-tail
−zα = −1.25
α = 0.1057
z
−4 −2 0 2 4
Chiu Yu KO 34 / 63
Special case: binary data
Each element in the population is ether 0 or 1

▶ The proportion with one is population proportion π.
▶ Mean of population: µ = π
▶ Variance of population is σ 2 = π (1 − π )
Consider a random sample of size one:
▶ Bernoulli distribution with probability of success π
▶ Mean is µ = π
▶ Variance is σ 2 = π (1 − π )
Chiu Yu KO 35 / 63
Technical. Mean and variance of binary data
Suppose Xi is either 0 or 1. Consider a fraction π of them are 1 and
1 − π of 0.
That is we have X1 = . . . = Xπn = 1 and Xπn+1 = . . . = Xn = 0
Population mean:
X1 + . . . + Xπn + Xπn+1 + . . . + Xn πn
µ= = =π
n n
Population variance:
(X1 − π )2 + . . . + (Xπn − π )2
+(Xπn+1 − π )2 + . . . + (Xn − π )2
σ2 =
n
(1 − π ) πn + (0 − π )2 (1 − π )n
2
= = π (1 − π )
n
Chiu Yu KO 36 / 63
Sample proportion
For binary data, we call the sample mean as sample

proportion:
Sample proportion
Special case for sample average
Pn
i = 1 Xi
p = X̄ =
n
Chiu Yu KO 37 / 63
Expectation of sample proportion
Expectation of sample proportion is population
proportion:
Expectation of sample proportion
µp = E (p ) = π
Binary data is a special case:

▶ sample mean is sample proportion (X̄ = p)
▶ population mean is population proportion (µ = π)
▶ expectation of sample mean is population mean
(E (X̄ ) = µ)
Chiu Yu KO 38 / 63
Dispersion of sample proportion
Variance of sample proportion

σ2 π (1 − π )
σp2 = =
n n
Standard error of the proportion

u π (1 − π )
v
σ
u
σp = √ = t
n n
Chiu Yu KO 39 / 63
Central limit theorem
Central limit theorem for binary data
When n gets large, we have
Pn
i = 1 Xi π (1 − π ) 
 
p = X̄ = ∼ N π,
n n
or
p−π p−π
Z = =r ∼ N (0, 1)
σp π (1−π )
n
Rule of thumb: good approximation when nπ and

n (1 − π ) are at least 5.
Chiu Yu KO 40 / 63
Technical. Normal approximation
Let X follow binomial distribution such that np and n (1 − p ) are at
least 5.
Then X approximately follows normal distribution with mean np
and variance np (1 − p ):
X ∼ N (np, np (1 − p ))
Chiu Yu KO 41 / 63
Proof. Binomial approximation
Let Xi are either 0 and 1 with probabilities p and 1 − p. If np and
n (1 − p ) are at least 5, central limit theorem applies
Xi p (1 − p )
P
p = X̄ = i
∼ N (p, )
n n
which implies that
n
Xi ∼ N (np, np (1 − p )) .
X
i =1
However, the sum follows binomial distribution:

n
Xi ∼ Binom (n, p )
X
i =1
Hence, Binom (n, p ) ∼ N (p, p (1−p

n
)
).
Chiu Yu KO 42 / 63
Example
Let X be binomial distribution with n = 100 and p = 0.6. What is
the probability that X is less than 55?
Check first np = 100(0.6) = 60 and n (1 − p ) = 100(0.4) = 40 are
at least 5. We can use normal approximation:
55 − 100(0.6)
Pr(X < 55) = Pr(Z < q )
100(0.6)(0.4)
55 − 60
= Pr(Z < √ )
24
= Pr(Z < −1.02) ≈ 0.1539
Chiu Yu KO 43 / 63
Graphics for normal approximation
−zα = −1.02
α = 0.1539
z
−4 −2 0 2 4
Chiu Yu KO 44 / 63
Unknown population variance
▶ By central limit theorem, for simple random sample

X1 , . . . , Xn , we have
X̄ − µ X̄ − µ
= √ ∼ N (0, 1)
σX̄ σ/ n
▶ Very often no idea about population variance σ 2 .
Chiu Yu KO 45 / 63
Use sample variance
▶ Use sample variance:
1 X n
s2 = (Xi − X̄ )2
n − 1 i =1
▶ Replacing σ 2 by s 2 , we replace σX̄2 by

s
sX̄ = √
n
▶ Then what is the distribution for
X̄ − µ X̄ − µ
= √
sX̄ s/ n
Chiu Yu KO 46 / 63
Technical. Estimating standard error
• Recall that the standard error of the mean is
σ
σX̄ = √
n
• Not knowing population standard deviation σ, we use sample

standard deviation s: s
sX̄ = √
n
Chiu Yu KO 47 / 63
Normality assumption
X̄ −µ
In general, we do not know the distribution of s/ √ .
n
We need an extra assumption on the population:
Normality assumption
Population follows normal distribution with mean µ and
variance σ 2 :
X ∼ N (µ, σ 2 )
Chiu Yu KO 48 / 63
Sample mean under normality assumption
Sample mean under normality assumption

Under normality, sample mean is normally distributed:
Pn
i = 1 Xi σ2
 
X̄ = ∼ N µ, 
n n
Note that same as CLT but it is not approximation and

there is no need for a large sample.
Chiu Yu KO 49 / 63
Technical. Normality assumption
• Assume the population follows the normal distribution with
mean µ and variance σ 2
• Random sampling implies each observation i follows the same
distribution: Xi ∼ N (µ, σ 2 )
Pn
• Hence, we have X̄ = ∼ N (µ, σn ).
2
i = 1 Xi
n
Chiu Yu KO 50 / 63
t-distribution
Distribution of sample mean (unknown variance)

Under normality assumption, it is t distribution:
X̄ − µ X̄ − µ
= √ ∼ tn−1
sX̄ s/ n
that is, t distribution with n − 1 degrees of freedom
Chiu Yu KO 51 / 63
Example
Consider a population follows normal distribution with mean 5.
We have a simple random sample with size 100.
We know that sample variance 8.
What is the probability that the sample mean is less than 4.5?
Note that √X̄8/100
−5
∼ t99 . Then
Pr(X̄ < 4.5) = Pr(X̄ − 5 < 4.5 − 5)

X̄ − 5 4.5 − 5
= Pr( √ <√ )
8/100 8/100
4.5 − 5
= Pr(t99 < √ )
8/100
≈ Pr(t99 < −1.768)
≈ 0.0401.
Chiu Yu KO 52 / 63
Example: t table
Recall that we need to find
Pr (t99 < −1.768)
From t-table, there is no n = 99 but we have n = 90 and n = 100.

Usually we use the smaller one.
▶ Pr(−1.662 < t90 < 1.662) = 0.9 and
Pr(−1.987 < t90 < 1.987) = 0.95.
▶ By symmetry, we have Pr(t90 < −1.662) = 0.05 and
Pr(t90 < −1.987) = 0.025.
▶ Hence, 0.025 ≤ Pr(t90 < −1.768) ≤ 0.05.
Note that Pr(Z < −1.768) ≈ 0.0386 (if we assume

population variance is 8).
Chiu Yu KO 53 / 63
Graphics for t-distribution
−t99,α = −1.768
α = 0.0401
t99
−4 −2 0 2 4
Chiu Yu KO 54 / 63
Graph for t-table
1−α
Tn
−tα/2 0 tα/2
Chiu Yu KO 55 / 63
Two tail t-table for small n
This is a typical two-tail t-table showing Pr(−t < tn < t ) = 1 − α:
n/α 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
1 1.0000 1.3764 1.9626 3.0777 6.3138 12.7062 31.8205 63.6567 636.6192
2 0.8165 1.0607 1.3862 1.8856 2.9200 4.3027 6.9646 9.9248 31.5991
3 0.7649 0.9785 1.2498 1.6377 2.3534 3.1824 4.5407 5.8409 12.9240
4 0.7407 0.9410 1.1896 1.5332 2.1318 2.7764 3.7469 4.6041 8.6103
5 0.7267 0.9195 1.1558 1.4759 2.0150 2.5706 3.3649 4.0321 6.8688
6 0.7176 0.9057 1.1342 1.4398 1.9432 2.4469 3.1427 3.7074 5.9588
7 0.7111 0.8960 1.1192 1.4149 1.8946 2.3646 2.9980 3.4995 5.4079
8 0.7064 0.8889 1.1081 1.3968 1.8595 2.3060 2.8965 3.3554 5.0413
9 0.7027 0.8834 1.0997 1.3830 1.8331 2.2622 2.8214 3.2498 4.7809
10 0.6998 0.8791 1.0931 1.3722 1.8125 2.2281 2.7638 3.1693 4.5869
11 0.6974 0.8755 1.0877 1.3634 1.7959 2.2010 2.7181 3.1058 4.4370
12 0.6955 0.8726 1.0832 1.3562 1.7823 2.1788 2.6810 3.0545 4.3178
13 0.6938 0.8702 1.0795 1.3502 1.7709 2.1604 2.6503 3.0123 4.2208
14 0.6924 0.8681 1.0763 1.3450 1.7613 2.1448 2.6245 2.9768 4.1405
15 0.6912 0.8662 1.0735 1.3406 1.7531 2.1314 2.6025 2.9467 4.0728
16 0.6901 0.8647 1.0711 1.3368 1.7459 2.1199 2.5835 2.9208 4.0150
17 0.6892 0.8633 1.0690 1.3334 1.7396 2.1098 2.5669 2.8982 3.9651
18 0.6884 0.8620 1.0672 1.3304 1.7341 2.1009 2.5524 2.8784 3.9216
19 0.6876 0.8610 1.0655 1.3277 1.7291 2.0930 2.5395 2.8609 3.8834
20 0.6870 0.8600 1.0640 1.3253 1.7247 2.0860 2.5280 2.8453 3.8495
Chiu Yu KO 56 / 63
Two tail t-table for middle n
This is a typical two-tail t-table showing Pr(−t < tn < t ) = 1 − α:
n/α 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
21 0.6864 0.8591 1.0627 1.3232 1.7207 2.0796 2.5176 2.8314 3.8193
22 0.6858 0.8583 1.0614 1.3212 1.7171 2.0739 2.5083 2.8188 3.7921
23 0.6853 0.8575 1.0603 1.3195 1.7139 2.0687 2.4999 2.8073 3.7676
24 0.6848 0.8569 1.0593 1.3178 1.7109 2.0639 2.4922 2.7969 3.7454
25 0.6844 0.8562 1.0584 1.3163 1.7081 2.0595 2.4851 2.7874 3.7251
26 0.6840 0.8557 1.0575 1.3150 1.7056 2.0555 2.4786 2.7787 3.7066
27 0.6837 0.8551 1.0567 1.3137 1.7033 2.0518 2.4727 2.7707 3.6896
28 0.6834 0.8546 1.0560 1.3125 1.7011 2.0484 2.4671 2.7633 3.6739
29 0.6830 0.8542 1.0553 1.3114 1.6991 2.0452 2.4620 2.7564 3.6594
30 0.6828 0.8538 1.0547 1.3104 1.6973 2.0423 2.4573 2.7500 3.6460
40 0.6807 0.8507 1.0500 1.3031 1.6839 2.0211 2.4233 2.7045 3.5510
50 0.6794 0.8489 1.0473 1.2987 1.6759 2.0086 2.4033 2.6778 3.4960
60 0.6786 0.8477 1.0455 1.2958 1.6706 2.0003 2.3901 2.6603 3.4602
70 0.6780 0.8468 1.0442 1.2938 1.6669 1.9944 2.3808 2.6479 3.4350
80 0.6776 0.8461 1.0432 1.2922 1.6641 1.9901 2.3739 2.6387 3.4163
90 0.6772 0.8456 1.0424 1.2910 1.6620 1.9867 2.3685 2.6316 3.4019
100 0.6770 0.8452 1.0418 1.2901 1.6602 1.9840 2.3642 2.6259 3.3905
120 0.6765 0.8446 1.0409 1.2886 1.6577 1.9799 2.3578 2.6174 3.3735
140 0.6762 0.8442 1.0403 1.2876 1.6558 1.9771 2.3533 2.6114 3.3614
160 0.6760 0.8439 1.0398 1.2869 1.6544 1.9749 2.3499 2.6069 3.3524
Chiu Yu KO 57 / 63
Two tail t-table for large n
This is a typical two-tail t-table showing Pr(−t < tn < t ) = 1 − α’:
n/α 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
180 0.6759 0.8436 1.0394 1.2863 1.6534 1.9732 2.3472 2.6034 3.3454
190 0.6758 0.8435 1.0393 1.2860 1.6529 1.9725 2.3461 2.6020 3.3425
200 0.6757 0.8434 1.0391 1.2858 1.6525 1.9719 2.3451 2.6006 3.3398
250 0.6755 0.8431 1.0386 1.2849 1.6510 1.9695 2.3414 2.5956 3.3299
300 0.6753 0.8428 1.0382 1.2844 1.6499 1.9679 2.3388 2.5923 3.3233
400 0.6751 0.8425 1.0378 1.2837 1.6487 1.9659 2.3357 2.5882 3.3150
500 0.6750 0.8423 1.0375 1.2832 1.6479 1.9647 2.3338 2.5857 3.3101
600 0.6749 0.8422 1.0373 1.2830 1.6474 1.9639 2.3326 2.5840 3.3068
700 0.6748 0.8421 1.0372 1.2828 1.6470 1.9634 2.3317 2.5829 3.3045
800 0.6748 0.8421 1.0371 1.2826 1.6468 1.9629 2.3310 2.5820 3.3027
900 0.6748 0.8420 1.0370 1.2825 1.6465 1.9626 2.3305 2.5813 3.3014
1000 0.6747 0.8420 1.0370 1.2824 1.6464 1.9623 2.3301 2.5808 3.3003
1200 0.6747 0.8419 1.0369 1.2823 1.6461 1.9619 2.3295 2.5799 3.2987
1400 0.6747 0.8419 1.0368 1.2822 1.6459 1.9617 2.3290 2.5793 3.2975
1600 0.6746 0.8418 1.0368 1.2821 1.6458 1.9614 2.3287 2.5789 3.2966
1800 0.6746 0.8418 1.0367 1.2820 1.6457 1.9613 2.3284 2.5786 3.2959
2000 0.6746 0.8418 1.0367 1.2820 1.6456 1.9612 2.3282 2.5783 3.2954
3000 0.6746 0.8417 1.0366 1.2818 1.6454 1.9608 2.3276 2.5775 3.2938
4000 0.6746 0.8417 1.0366 1.2818 1.6452 1.9606 2.3273 2.5771 3.2930
5000 0.6745 0.8417 1.0365 1.2817 1.6452 1.9604 2.3271 2.5768 3.2925
Chiu Yu KO 58 / 63
Technical: Two important properties
Here we provide two important properties without proof.
• Consider X1 , . . . , Xn are random samples from N (µ, σ 2 )
• Denote X̄ = n1 ni=1 Xi and s 2 = n−1 1 Pn
i =1 (Xi − X̄ )
2
P
Then: P
(Xi −X̄ )2
•
σ2
∼ χ2n−1
• X̄ and s 2 are independent random variables.
Chiu Yu KO 59 / 63
Technical: Sample mean and t-distribution
• From the first property, we have
s2 (Xi − X̄ )2 /σ 2 χ2
P
= = n−1
σ2 (n − 1) n−1
• From the second property, X̄ and S 2 are independent.

• Finally, we know that X̄ −µ
√ ∼ N (0, 1) = Z , so we have
σ/ n
√
X̄ − µ (X̄ − µ)/(σ/ n ) Z
√ = √ = ∼ tn−1
s/ n s 2 /σ 2 χ2n−1 /(n − 1)
q
Chiu Yu KO 60 / 63
Technical: Normal distribution for large sample
When the sample size is very large, we have standard normal
distribution as t-distribution is closed to standard normal.
X̄ − µ
√ ∼ N (0, 1)
s/ n
Chiu Yu KO 61 / 63
Technical: Normality assumption for t-distribution
Student-t distribution requires normality assumption
• graphical: boxplot, histogram, Q-Q plot (scatter plot of
quantiles of your sample v.s. standard normal)
• numerical: check if skewness and kurtosis around 0 and 3
• format test: Jarque-Bera test (using skewness and kurtosis)
Chiu Yu KO 62 / 63
Video Links
▶ Sampling
▶ Sample mean
▶ Expectation of sample mean
▶ Variance of sample mean
▶ Central limit theorem
▶ Proportion
▶ Unknown variance - t-distribution
Chiu Yu KO 63 / 63

4-SamplingDistribution

Uploaded by

Copyright:

Available Formats

4-SamplingDistribution

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4-SamplingDistribution

Uploaded by

Copyright:

Available Formats

Sampling Distribution

Simple Random Sample: most basic random sample

Consider a population with mean µ and variance σ 2 .

▶ This varies sample by sample.

Sample mean X̄ is useful to guess population mean µ:

Expectation of sample mean

▶ If we sample many times, average of all sample

Sample 1 Sample 2 ..... Sample m

X̄1 X̄2 ...... X̄m

Variance of sample mean

Note that spread of sample mean is proportional to the

Standard deviation of a statistics is often called standard

(1 − 2)2 + (2 − 2)2 + (3 − 2)2 2

σX̄2 = Var (X̄ )

Law of large numbers

▶ Moreover, variance vanishes as n goes to infinity

Law of Large Numbers

Pr(µ − ϵ < X̄ < µ + ϵ) ≈ 1

▶ Loosely speaking, when sample size is large,

Law of Large Numbers

lim Pr(|X̄ − µ| < ϵ) = 1

• Probability that the sample mean X̄ is within small (ϵ)

Hence, we have Markov inequality

Then, since E (X̄n ) = µ and Var (X̄n ) = σ2

Central limit theorem:

Rule of thumb: sample size n is at least 35.

Each element in the population is ether 0 or 1

For binary data, we call the sample mean as sample

Binary data is a special case:

Variance of sample proportion

Standard error of the proportion

Rule of thumb: good approximation when nπ and

However, the sum follows binomial distribution:

Hence, Binom (n, p ) ∼ N (p, p (1−p

▶ By central limit theorem, for simple random sample

▶ Very often no idea about population variance σ 2 .

▶ Replacing σ 2 by s 2 , we replace σX̄2 by

• Not knowing population standard deviation σ, we use sample

Sample mean under normality assumption

Note that same as CLT but it is not approximation and

Distribution of sample mean (unknown variance)

Pr(X̄ < 4.5) = Pr(X̄ − 5 < 4.5 − 5)

Pr (t99 < −1.768)

From t-table, there is no n = 99 but we have n = 90 and n = 100.

Note that Pr(Z < −1.768) ≈ 0.0386 (if we assume

• From the second property, X̄ and S 2 are independent.

You might also like