Lesson 2 Statistical Inference
Lesson 2 Statistical Inference
Lesson 2 Statistical Inference
Models (4433LGLM6Y)
Statistical Inference
Meeting 5
Vahe Avagyan
Biometris, Wageningen University and Research
Statistical inference
𝒚 = 𝐗𝛃 + 𝜖,
• Fitting the model to data gives the vectors of fitted values and residuals:
𝐲 = 𝐗𝐛 + 𝐞,
𝐗 ′ 𝐗 𝐛 = 𝐗 ′ 𝐲.
Distribution of least-squares estimator
• LS estimator:
𝐛 = 𝐗′𝐗 −1 𝐗 ′ 𝐲.
2. 𝐛 is an unbiased estimator: 𝐸 𝐛 = 𝛽 = 𝐸 𝐌𝐲 = 𝐌𝐸 𝐲 = 𝐗 ′ 𝐗 −1 𝐗 ′ 𝑿𝛽 =𝛽
Probability
1
Parameter, e.g., 𝜇 = 𝐸(𝑋) Statistic, e.g., 𝑥ҧ = 𝑛 σ𝑛𝑖=1 𝑥𝑖
Statistical Inference
Statistical inference for individual coefficients
• Vector of coefficients 𝐛 = 𝐵0 , 𝐵1 , … , 𝐵𝑘 ′
𝐛 ~ 𝑁𝑘+1 𝜷, 𝜎𝜖2 𝐗 ′ 𝐗 −1
.
• Individual coefficient:
𝐵𝒋 −𝛽𝑗
B𝑗 ~ 𝑁 𝛽𝑗 , 𝜎𝜖2 v𝑗𝑗 or ~ 𝑁(0,1)
𝜎𝜖 v𝑗𝑗
(0)
• For testing 𝐻0 : 𝛽𝑗 = 𝛽𝑗 (e.g., 𝐻0 : 𝛽𝑗 = 1 or any other value), we could use the test statistic:
(0)
𝐵𝑗 −𝛽𝑗
Z= .
𝜎𝜖 v𝑗𝑗
𝐞′ 𝐞
• 𝜎𝜖2 is estimated by 𝑆𝐸2 = , where 𝐞 = 𝐲 − 𝐗𝒃 is the vector of residuals.
𝑛−(𝑘+1)
• The estimator of variance-covariance matrix is 𝑉(𝐛) = 𝑆𝐸2 𝐗 ′ 𝐗 −1 .
• The estimator of standard error is 𝑆𝐸 𝐵𝑗 = 𝑆𝐸 𝐯𝑗𝑗 , where 𝐯𝑗𝑗 is the 𝑗-th diagonal entry of 𝐗 ′ 𝐗 −1
.
(0)
• To test 𝐻0 : 𝛽𝑗 = 𝛽𝑗 , we can use the test statistic
0 0
𝐵𝑗 −𝛽𝑗 𝐵𝑗 −𝛽𝑗
𝑡= = .
𝑆𝐸 𝐵𝑗 𝑆𝐸 v𝑗𝑗
• Linear model:
prestige𝑖 = 𝛽0 + 𝛽1 education𝑖 + 𝛽2 income𝑖 + 𝜖𝑖 , for 𝑖 = 1, … , 45 .
Example: Duncan data
Remember:
• Recall the following steps in hypothesis testing for the slope of education (i.e., 𝛽1 ):
1. Define the hypothesis test: education is not related to prestige (keeping income constant) vs
education is related to prestige (keeping income constant)
𝐻0 : 𝛽1 = 0 vs 𝐻1 : 𝛽1 ≠ 0
𝐵1 −0
𝑡=
𝑆𝐸(𝐵1 )
4. If 𝐻𝑎 is true, then 𝑡 tends to smaller (if 𝛽1 < 0) or larger (if 𝛽1 > 0) values than prescribed by
𝑡42 distribution.
Example: t-test for individual slope, two-sided 𝐻𝑎
𝑃 = 2 × 𝑃 𝑡42 ≥ 𝑡𝑜𝑢𝑡
7. p-value:
𝑃 = 2 × 𝑃 𝑡42 ≥ 5.555 = 2 ⋅ 8.65 ⋅ 10−7 = 1.73 ⋅ 10−6 .
Conclusion: 𝑃 < 𝛼 = 0.05 , therefore, reject 𝐻0 . It is shown that education level required for jobs is
related to prestige (keeping income constant).
Example: t-test for individual slope, 𝐻0 : 𝛽 = 𝛽(0)
0
• Suppose a test for 𝛽𝑗 ≠ 0.
• Imagine that the value 0.5 has some special meaning in the education example, and we ask if 𝛽1
might be equal to 0.5 (given the data).
𝐻0 : 𝛽1 = 0.5 vs 𝐻𝑎 : 𝛽1 ≠ 0.5.
Conclusion: 𝑃 > 0.05, do not reject 𝐻0 . No evidence is found that the slope deviates from 0.5 (keeping
income constant)
Example: t-test for individual slope, one-sided 𝐻𝑎
• Suppose we would like to test if the relationship is positive. In this case, it makes sense to test with a
right-sided 𝐻𝑎 .The steps are almost the same, with a small difference.
Education is not related to prestige (keeping income constant) vs education is positively related to
prestige (keeping income constant).
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 > 0
𝐵1 −0
2. Test statistic: 𝑡 = (the same as for two-sided test).
𝑆𝐸(𝐵1 )
Conclusion: 𝑃 < 𝛼 = 0.05 , therefore, reject 𝐻0 . Thus, education is positively related to prestige
(keeping income constant).
Note:
• here we could take the half of the P-value as reported by R (i.e., the two-sided p-value).
• Can two-tailed P-value, as reported by R, always be halved for one-sided 𝐻𝑎 ? (No, why?)
Example: t-test for individual slope, one-sided 𝐻𝑎 with 𝛽0 ≠ 0
𝐶𝐼 𝛽𝑗 = B𝑗 ± 𝑡𝛼 Τ2 ; 𝑛−(𝑘+1) 𝑆𝐸 B𝑗
• The CI’s do not contain the value 0, which means we can reject the two-sided test against 0.
Example: confidence interval for slope
• Recall, 𝑅𝑒𝑔𝑆𝑆 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆, i.e., difference between the residual sum of squares of the null
model (i.e., intercept-only model) and current model.
Statistical inference for several coefficients: All-slopes
𝑅𝑒𝑔𝑀𝑆
• 𝐹 = is a ratio of two Mean Squares:
𝑅𝑀𝑆
• Denominator: Residual Mean Square 𝑅𝑀𝑆 is an estimator of the error variance 𝜎𝜖2 .
• Numerator: Regression Mean Square 𝑅𝑒𝑔𝑀𝑆 is also an estimator of 𝜎𝜖2 , but only if 𝑯𝟎 is true!
• Under 𝐻𝑎 , 𝑅𝑒𝑔𝑀𝑆 tends to be larger than 𝜎𝜖2 , so the ratio tends to be larger than 1.
𝑋 2 = σ𝑚 2 2
𝑖=1 𝑍𝑖 ~𝜒𝑚
2 2
• Suppose 𝑋12 ~𝜒df 1
and 𝑋2
2
~𝜒df2 are two independent chi-square
distributed variables, with degrees of freedom df1 and df2 ,
respectively.
𝑋12 Τdf1
𝐹 ≡ ~𝐹df1 ; df2 .
𝑋22 Τdf2
Ronald Fisher
F-distribution: Examples
𝑑𝑓2
• The mean 𝐸 𝐹 = , for 𝑑𝑓2 > 2.
𝑑𝑓2 −2
• As for the t-test, we have the following steps for the F-test.
1. Hypothesis test:
𝐻0 : 𝛽1 = 𝛽2 = 0
𝐻1 : at least one is not zero.
2. Test statistic:
𝑅𝑒𝑔𝑆𝑆/𝑘
𝐹 = .
𝑅𝑆𝑆/(𝑛−(𝑘+1))
3. If 𝐻0 is true 𝐹 ~ 𝐹2;42 .
𝑅𝑒𝑔𝑆𝑆/2 36181/2
𝐹= = = 101.2.
𝑅𝑆𝑆/42 7507/42
Conclusion: 𝑃 < 0.05, so reject 𝐻0 . Therefore, education and/or income are related to prestige.
• Suppose we would like to test if a subset of slopes are 0, instead of all slopes
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑞 = 0
𝐻1 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑧𝑒𝑟𝑜
• For notational convenience, let’s focus on the first 𝑞 regressors, but any subset of 𝛽𝑖 ’s may be tested.
Hypothesis Test: Subset of Slopes
• We have 𝑅𝑆𝑆 = 𝐞′𝐞 residual sum of squares of FM and 𝑅𝑆𝑆0 = 𝐞′0 𝐞0 residuals sum of squares of RM.
• The F-ratio is defined as
𝑅𝑆𝑆0 − 𝑅𝑆𝑆 Τ𝑞
𝐹0 =
𝑅𝑆𝑆Τ 𝑛 − 𝑘 + 1
• Under 𝐻0 , 𝐹0 ~𝐹𝑞;𝑛−(𝑘+1) .
• The following holds: 𝑅𝑆𝑆0 − 𝑅𝑆𝑆 = 𝑅𝑒𝑔𝑆𝑆 − 𝑅𝑒𝑔𝑆𝑆0 , i.e., "Any increase in residual sum of squares, is
decrease in regression sum of squares“.
(𝑅𝑒𝑔𝑆𝑆−𝑅𝑒𝑔𝑆𝑆0 )/𝑞
• Therefore, we can write 𝐹 = .
𝑅𝑆𝑆1 /(𝑛−(𝑘+1))
Example: Duncan data
• Suppose we would like to test 𝐻0 : 𝛽1 = 0 (i.e., education has no association with prestige).
• Another approach in R
Statistical inference for several coefficients: Subset of Slopes
′
• Let 𝐛1 = 𝐁1 , … , 𝐁𝑞 be LS coefficients of interest from 𝐛 and 𝐕11 be the corresponding submatrix of
𝐗′𝐗 −1
.
(0) ′ (0)
• Test a general hypothesis 𝐻0 : 𝜷1 = 𝜷1 , where 𝜷1 = 𝛽1 , 𝛽2 , … , 𝛽𝑞 and 𝜷1 not necessarily 𝟎.
′
0 −1 0
𝐛1 − 𝜷1 𝐕11 𝐛1 − 𝜷1
𝐹0 = ~𝐹𝑞;𝑛−(𝑘+1)
𝑞𝑆𝐸2
Statistical inference
′ ′ −𝟏 ′ −1
𝐋𝜷−𝐜 𝐋 𝐗 𝐗 𝐋 𝐋𝜷−𝐜
• The F-statistic is defined as: 𝐹0 = ~ 𝐹𝑞,𝑛−(𝑘+1) (under 𝐻0 ), because
𝑞𝑆𝐸2
• 𝐛 ~ 𝑁𝑘+1 𝜷, 𝜎𝜖2 𝐗 ′ 𝐗 −1
• 𝐋𝐛 ~ 𝑁𝑞 𝐋𝜷, 𝜎𝜖2 𝐋 𝐗 ′ 𝐗 −1
𝐋′
• Consider
𝐻0 : 𝛽1 − 𝛽2 = 0
• Define 𝐋 =? and 𝐜 =?
Predicting new 𝑦-values
• The estimate of the mean (average) prestige 𝜇𝑦 = 𝐸 𝑦 at specific values of education and
income:
∗ ∗
𝜇ො𝑦 = 𝐵0 + 𝐵1 𝑥𝑛+1 1 + ⋯ + 𝐵 𝑘 𝑥𝑛+1 𝑘
• Suppose, we want to predict the prestige value for a new profession with
• education = 92
• income = 68
• Extrapolation in regression:
• Be concerned not only about individual predictor but also about the set of values of several
predictors together.
Inference for predictions
se 𝜇ො𝑦 = 𝑆𝐸 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ = 𝑆𝐸2 (𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ )
se 𝑌 = 𝑆𝐸 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ + 1 = 𝑆𝐸2 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ + 𝑆𝐸2
2
= se 𝜇ො𝑦 + 𝑆𝐸2
Confidence vs prediction interval
• predict() function
provides the standard error
of the predicted means.