Lesson 2 Statistical Inference

Linear and Generalized Linear
Models (4433LGLM6Y)
Statistical Inference
Meeting 5
Vahe Avagyan
Biometris, Wageningen University and Research
Statistical inference
• inference for individual coefficients: t-tests and confidence intervals

• inference for several coefficients: F-tests
• general linear hypotheses
Linear Model Theory
• Linear model: Reminder
𝒚 = 𝐗𝛃 + 𝜖,
where 𝜖 ~ 𝑁𝑛 (𝟎, 𝜎𝜖2 𝐈𝑛 ) and 𝐗 𝑛×(𝑘+1) is the model matrix.
• Fitting the model to data gives the vectors of fitted values and residuals:
𝐲 = 𝐗𝐛 + 𝐞,
• Normal equations to obtain the LS estimators 𝐛 of 𝜷:
𝐗 ′ 𝐗 𝐛 = 𝐗 ′ 𝐲.
Distribution of least-squares estimator
• LS estimator:
𝐛 = 𝐗′𝐗 −1 𝐗 ′ 𝐲.
• Recall the properties .
1. 𝐛 is a linear estimator: 𝐛 = 𝐌𝐲, for some 𝐌 = 𝐗 ′ 𝐗 −1 ′

𝐗
2. 𝐛 is an unbiased estimator: 𝐸 𝐛 = 𝛽 = 𝐸 𝐌𝐲 = 𝐌𝐸 𝐲 = 𝐗 ′ 𝐗 −1 𝐗 ′ 𝑿𝛽 =𝛽
3. 𝐛 has a variance-covariance matrix: 𝑉 𝐛 = 𝜎𝜖2 𝐗 ′ 𝐗 −1

.
4. 𝐛 has a normal distribution, if 𝐲 is normally distributed. Therefore,

𝐛 ~ 𝑁𝑘+1 (𝛽, 𝜎𝜖2 𝐗 ′ 𝐗 −1

What is the Statistical Inference?
Probability
Population 𝑓𝑋 𝑥 Sample Data (𝑥1 , … , 𝑥𝑛 )
1
Parameter, e.g., 𝜇 = 𝐸(𝑋) Statistic, e.g., 𝑥ҧ = 𝑛 σ𝑛𝑖=1 𝑥𝑖
Statistical Inference
Statistical inference for individual coefficients
• Vector of coefficients 𝐛 = 𝐵0 , 𝐵1 , … , 𝐵𝑘 ′
𝐛 ~ 𝑁𝑘+1 𝜷, 𝜎𝜖2 𝐗 ′ 𝐗 −1
.
• Individual coefficient:
𝐵𝒋 −𝛽𝑗
B𝑗 ~ 𝑁 𝛽𝑗 , 𝜎𝜖2 v𝑗𝑗 or ~ 𝑁(0,1)
𝜎𝜖 v𝑗𝑗
where v𝑗𝑗 is the 𝑗-th diagonal entry of 𝐗 ′ 𝐗 −1 .

(0)
• For testing 𝐻0 : 𝛽𝑗 = 𝛽𝑗 (e.g., 𝐻0 : 𝛽𝑗 = 1 or any other value), we could use the test statistic:
(0)
𝐵𝑗 −𝛽𝑗
Z= .
𝜎𝜖 v𝑗𝑗
• If 𝐻0 is true (i.e., under 𝐻0 ) , Z~N(0,1).

We assume 𝜎𝜖2 would be
known here.
What is the problem here?

𝐞′ 𝐞
• 𝜎𝜖2 is estimated by 𝑆𝐸2 = , where 𝐞 = 𝐲 − 𝐗𝒃 is the vector of residuals.
𝑛−(𝑘+1)
• In the variance 𝑉 𝐛 = 𝜎𝜖2 𝐗 ′ 𝐗 −1

, we simply replace 𝜎𝜖2 with 𝑆𝐸2 .
෠
• The estimator of variance-covariance matrix is 𝑉(𝐛) = 𝑆𝐸2 𝐗 ′ 𝐗 −1 .
• The estimator of standard error is 𝑆𝐸 𝐵𝑗 = 𝑆𝐸 𝐯𝑗𝑗 , where 𝐯𝑗𝑗 is the 𝑗-th diagonal entry of 𝐗 ′ 𝐗 −1
.
(0)
• To test 𝐻0 : 𝛽𝑗 = 𝛽𝑗 , we can use the test statistic
0 0
𝐵𝑗 −𝛽𝑗 𝐵𝑗 −𝛽𝑗
𝑡= = .
𝑆𝐸 𝐵𝑗 𝑆𝐸 v𝑗𝑗
• If 𝐻0 is true, then 𝑡~𝑡𝑛−(𝑘+1) .

Student's t-distribution 𝑡𝑛 (Generalization of the Standard Normal distribution)
William Gosset (“Student”)

For large df, t-distribution becomes the standard normal 𝑁(0,1).
Example: Duncan data
• Dataset on prestige of 45 occupations, to be explained by education and income.
• Linear model:
prestige𝑖 = 𝛽0 + 𝛽1 education𝑖 + 𝛽2 income𝑖 + 𝜖𝑖 , for 𝑖 = 1, … , 45 .
Remember:
R always reports two-tailed P-

0
value for the t-test, with 𝛽𝑗 = 0.
t-test for individual slope, two-sided 𝐻𝑎
• Recall the following steps in hypothesis testing for the slope of education (i.e., 𝛽1 ):
1. Define the hypothesis test: education is not related to prestige (keeping income constant) vs
education is related to prestige (keeping income constant)
𝐻0 : 𝛽1 = 0 vs 𝐻1 : 𝛽1 ≠ 0
2. Test statistic: The testing value comes here.
𝐵1 −0
𝑡=
𝑆𝐸(𝐵1 )
3. If 𝐻0 is true, then 𝑡~𝑡42 (𝑛 = 45, 𝑘 = 2, therefore df = 45 − 2 + 1 = 42).
4. If 𝐻𝑎 is true, then 𝑡 tends to smaller (if 𝛽1 < 0) or larger (if 𝛽1 > 0) values than prescribed by
𝑡42 distribution.
Example: t-test for individual slope, two-sided 𝐻𝑎
5. Two-tailed p-value is needed :
𝑃 = 2 × 𝑃 𝑡42 ≥ 𝑡𝑜𝑢𝑡
6. The outcome of test statistic (read from R output):

0.546−0
𝑡 = = 5.555
0.0983
7. p-value:
𝑃 = 2 × 𝑃 𝑡42 ≥ 5.555 = 2 ⋅ 8.65 ⋅ 10−7 = 1.73 ⋅ 10−6 .
pt(5.555, 42, lower.tail = FALSE)
Conclusion: 𝑃 < 𝛼 = 0.05 , therefore, reject 𝐻0 . It is shown that education level required for jobs is
related to prestige (keeping income constant).
Example: t-test for individual slope, 𝐻0 : 𝛽 = 𝛽(0)
0
• Suppose a test for 𝛽𝑗 ≠ 0.
• R cannot be directly used, unless we use some trick.
• Imagine that the value 0.5 has some special meaning in the education example, and we ask if 𝛽1
might be equal to 0.5 (given the data).
1. Define the hypothesis test:
𝐻0 : 𝛽1 = 0.5 vs 𝐻𝑎 : 𝛽1 ≠ 0.5.
2. Test statistic (the same as for two-sided test):

𝐵1 − 0.5
𝑡=
𝑆𝐸(𝐵1 )
3. If 𝐻0 is true, then 𝑡~𝑡42 .

Example: t-test for individual slope, 𝐻0 : 𝛽 = 𝛽(0)
4. If 𝐻𝑎 is true, 𝑡 tends to larger values than prescribed by 𝑡42 distribution
5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > |𝑡|)

0.54 − 0.5
6. The outcome of test statistic: 𝑡 = =0.468.
0.0983
7. P = 2 × P t 42 ≥ 0.468 = 2 × 0.321 = 0.64
pt(0.468, 42, lower.tail = FALSE)
Conclusion: 𝑃 > 0.05, do not reject 𝐻0 . No evidence is found that the slope deviates from 0.5 (keeping
income constant)
Example: t-test for individual slope, one-sided 𝐻𝑎
• Suppose we would like to test if the relationship is positive. In this case, it makes sense to test with a
right-sided 𝐻𝑎 .The steps are almost the same, with a small difference.
1. Define the hypothesis test:
Education is not related to prestige (keeping income constant) vs education is positively related to
prestige (keeping income constant).
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 > 0
𝐵1 −0
2. Test statistic: 𝑡 = (the same as for two-sided test).
𝑆𝐸(𝐵1 )

Example: t-test for individual slope, one-sided 𝐻𝑎
5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > 𝑡)

0.54 − 0
6. The outcome of test statistic: 𝑡 = = 5.555.
0.0983
7. p-value: 𝑃 = 𝑃 𝑡42 ≥ 5.555 = 8.65 ⋅ 10−7 ⋅
Conclusion: 𝑃 < 𝛼 = 0.05 , therefore, reject 𝐻0 . Thus, education is positively related to prestige
(keeping income constant).
Note:
• here we could take the half of the P-value as reported by R (i.e., the two-sided p-value).
• Can two-tailed P-value, as reported by R, always be halved for one-sided 𝐻𝑎 ? (No, why?)
Example: t-test for individual slope, one-sided 𝐻𝑎 with 𝛽0 ≠ 0
• Suppose we would like to test if Define the hypothesis test:

𝐻0 : 𝛽1 = 1
𝐻1 : 𝛽1 > 1
𝐵1 −1
2. Test statistic: 𝑡 = .
𝑆𝐸(𝐵1 )
5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > 𝑡)

0.54 −1
6. The outcome of test statistic: 𝑡 = = − 4.679.
0.0983
7. p-value: 𝑃 = 𝑃 𝑡42 ≥ −4.679 = 0.9999 ⋅
Conclusion: 𝑃 > 𝛼 = 0.05 , therefore, failed to reject 𝐻0 .

Confidence interval for slope
• We can also use 100(1 − 𝛼 )% confidence interval
𝐶𝐼 𝛽𝑗 = B𝑗 ± 𝑡𝛼 Τ2 ; 𝑛−(𝑘+1) 𝑆𝐸 B𝑗
• The CI’s do not contain the value 0, which means we can reject the two-sided test against 0.
Example: confidence interval for slope
• 100(1 − 𝛼)% level confidence interval for slope is defined as:

𝐶𝐼 𝛽1 = 𝐵1 ± 𝑡𝛼Τ2; 45−3 𝑆𝐸 𝐵1 =
0.546 ± 𝑡42; 0.025 × 0.0983 =
= 0.546 ± 2.018 × 0.0983 =
= (0.348; 0.744)
Statistical inference for several coefficients: All-slopes
• Multiple regression model for response 𝑌𝑖 and 𝑘 regressors 𝑥1 , … , 𝑥𝑘 :
𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖1 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜖𝑖 , for 𝑖 = 1, … , 𝑛
• Global or "omnibus" test that all regressors are unimportant.

𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0
𝐻1 : at least one slope is not zero / at least one x has predictive value
• In this case, the F-test statistic is used:

𝑅𝑒𝑔𝑀𝑆 𝑅𝑒𝑔𝑆𝑆/𝑘
𝐹= =
𝑅𝑀𝑆 𝑅𝑆𝑆/(𝑛−(𝑘+1))
• Recall, 𝑅𝑒𝑔𝑆𝑆 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆, i.e., difference between the residual sum of squares of the null
model (i.e., intercept-only model) and current model.
𝑅𝑒𝑔𝑀𝑆
• 𝐹 = is a ratio of two Mean Squares:
𝑅𝑀𝑆
• Denominator: Residual Mean Square 𝑅𝑀𝑆 is an estimator of the error variance 𝜎𝜖2 .
• Numerator: Regression Mean Square 𝑅𝑒𝑔𝑀𝑆 is also an estimator of 𝜎𝜖2 , but only if 𝑯𝟎 is true!
• Hence, under 𝐻0 , the ratio 𝑅𝑒𝑔𝑀𝑆 / 𝑅𝑀𝑆 is close to 1.
• Under 𝐻𝑎 , 𝑅𝑒𝑔𝑀𝑆 tends to be larger than 𝜎𝜖2 , so the ratio tends to be larger than 1.
• If 𝐻0 is true (i.e., under 𝐻0 ) , 𝐹~𝐹𝑘; 𝑛−(𝑘+1)
• Reject 𝐻0 for large values of 𝐹, right-sided P-value and rejection region.

Chi-squared distribution 𝜒𝑘2
• Suppose 𝑍1 , … , 𝑍𝑚 are independent, standard normal random variables, i.e., 𝑍𝑖 ~𝑁(0,1)

2 distribution, with 𝑛 degrees of freedom.
• Sum of their squares follows a 𝜒𝑚
𝑋 2 = σ𝑚 2 2
𝑖=1 𝑍𝑖 ~𝜒𝑚
• The mean 𝐸 𝑋 2 = 𝑚 (i.e., df)

F-distribution
2 2
• Suppose 𝑋12 ~𝜒df 1
and 𝑋2
2
~𝜒df2 are two independent chi-square
distributed variables, with degrees of freedom df1 and df2 ,
respectively.
• F-distribution is obtained by taking the ratio
𝑋12 Τdf1
𝐹 ≡ ~𝐹df1 ; df2 .
𝑋22 Τdf2
• F-distribution has two degrees of freedom: numerator df 𝑑𝑓1 and

denominator 𝑑𝑓2 .
Ronald Fisher
F-distribution: Examples
• Blue line: df1 = 4, df2 = 4.

• Orange line: df1 = 20, df2 = 20.
𝑑𝑓2
• The mean 𝐸 𝐹 = , for 𝑑𝑓2 > 2.
𝑑𝑓2 −2
• If 𝑡~t df then 𝑡 2 ~𝐹1;df .

• For 𝑞 = 1 F-test is equivalent to t-test.
• Analysis of variance table or ANOVA table shows construction of 𝐹 (a reminder)
• 𝑘 is the number of regressors in the model.

What is your conclusion?

R reports the sums of

squares of education and
income.
• As for the t-test, we have the following steps for the F-test.
1. Hypothesis test:
𝐻0 : 𝛽1 = 𝛽2 = 0
𝐻1 : at least one is not zero.
2. Test statistic:
𝑅𝑒𝑔𝑆𝑆/𝑘
𝐹 = .
𝑅𝑆𝑆/(𝑛−(𝑘+1))
3. If 𝐻0 is true 𝐹 ~ 𝐹2;42 .
4. If 𝐻𝑎 is true, 𝐹 tends to larger values than prescribed by 𝐹2;42 distribution.

5. Right-tailed p-value: 𝑃 = 𝑃(𝐹2;42 ≥ 𝐹)
6. The outcome of test statistic
𝑅𝑒𝑔𝑆𝑆/2 36181/2
𝐹= = = 101.2.
𝑅𝑆𝑆/42 7507/42
7. P-value: 𝑃 = 𝑃 𝐹2;42 ≥ 101.2 = 8.76 × 10−16 .
Conclusion: 𝑃 < 0.05, so reject 𝐻0 . Therefore, education and/or income are related to prestige.
• To calculate the p-value use: pf(101.2, 2, 42, lower.tail =FALSE)

Statistical inference for several coefficients: Subset of Slopes
• Inference on groups of coefficients may be needed because
• least-squares estimators are often correlated (off-diagonal elements of 𝑉 𝐛 are non-zero).
• interest in related set of coefficients, like in ANOVA.
• Suppose we would like to test if a subset of slopes are 0, instead of all slopes
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑞 = 0
𝐻1 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑧𝑒𝑟𝑜
• For notational convenience, let’s focus on the first 𝑞 regressors, but any subset of 𝛽𝑖 ’s may be tested.
Hypothesis Test: Subset of Slopes
• F-test is constructed by fitting two nested models:
Full (or initial) model FM:

𝑌 = 𝛽0 + 𝛽1 𝑥1 + ⋯ 𝛽𝑞 𝑥𝑞 + 𝛽𝑞+1 𝑥𝑞+1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜖.
Reduced model RM:

𝑌 = 𝛽0 + 0𝑥1 + ⋯ + 0𝑥𝑞 + 𝛽𝑞+1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜖 = 𝛽0 + 𝛽𝑞+1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜖
FM and RM give residual Sum of Squares 𝑅𝑆𝑆1 and 𝑅𝑆𝑆0 , respectively.

• We have 𝑅𝑆𝑆 = 𝐞′𝐞 residual sum of squares of FM and 𝑅𝑆𝑆0 = 𝐞′0 𝐞0 residuals sum of squares of RM.
• The F-ratio is defined as
𝑅𝑆𝑆0 − 𝑅𝑆𝑆 Τ𝑞
𝐹0 =
𝑅𝑆𝑆Τ 𝑛 − 𝑘 + 1
• Under 𝐻0 , 𝐹0 ~𝐹𝑞;𝑛−(𝑘+1) .
• Is F-ratio always positive? Why?
• The following holds: 𝑅𝑆𝑆0 − 𝑅𝑆𝑆 = 𝑅𝑒𝑔𝑆𝑆 − 𝑅𝑒𝑔𝑆𝑆0 , i.e., "Any increase in residual sum of squares, is
decrease in regression sum of squares“.
(𝑅𝑒𝑔𝑆𝑆−𝑅𝑒𝑔𝑆𝑆0 )/𝑞
• Therefore, we can write 𝐹 = .
𝑅𝑆𝑆1 /(𝑛−(𝑘+1))
• Suppose we would like to test 𝐻0 : 𝛽1 = 0 (i.e., education has no association with prestige).
• Remember that, if 𝑡~t df then 𝑡 2 ~𝐹1;df .

• For 𝑞 = 1 F-test is equivalent to t-test.
• Another approach in R
′
• Let 𝐛1 = 𝐁1 , … , 𝐁𝑞 be LS coefficients of interest from 𝐛 and 𝐕11 be the corresponding submatrix of
𝐗′𝐗 −1
.
• We can show that 𝑅𝑆𝑆0 − 𝑅𝑆𝑆 = 𝐛1′ 𝐕11

−1
𝐛1
𝑅𝑆𝑆0 −𝑅𝑆𝑆 Τ𝑞 𝐛′1 𝐕11

−1 𝐛
1
𝐹0 = 𝑅𝑆𝑆Τ 𝑛− 𝑘+1
= 2
𝑞𝑆𝐸
(0) ′ (0)
• Test a general hypothesis 𝐻0 : 𝜷1 = 𝜷1 , where 𝜷1 = 𝛽1 , 𝛽2 , … , 𝛽𝑞 and 𝜷1 not necessarily 𝟎.
′
0 −1 0
𝐛1 − 𝜷1 𝐕11 𝐛1 − 𝜷1
𝐹0 = ~𝐹𝑞;𝑛−(𝑘+1)
𝑞𝑆𝐸2

General linear hypotheses
• Consider the following linear hypothesis: 𝐻0 : 𝐋𝑞×(𝑘+1) 𝜷 𝑘+1 ×1 = 𝐜𝑞×1
• The hypothesis matrix 𝐋 is full row rank 𝑞 ≤ 𝑘 + 1.
′ ′ −𝟏 ′ −1
𝐋𝜷−𝐜 𝐋 𝐗 𝐗 𝐋 𝐋𝜷−𝐜
• The F-statistic is defined as: 𝐹0 = ~ 𝐹𝑞,𝑛−(𝑘+1) (under 𝐻0 ), because
𝑞𝑆𝐸2
• 𝐛 ~ 𝑁𝑘+1 𝜷, 𝜎𝜖2 𝐗 ′ 𝐗 −1
• 𝐋𝐛 ~ 𝑁𝑞 𝐋𝜷, 𝜎𝜖2 𝐋 𝐗 ′ 𝐗 −1
𝐋′
• 𝐋𝜷 − 𝐜 ′ 𝐋 𝐗′𝐗 −𝟏 𝐋′ −1 𝐋𝜷 − 𝐜 ൗ𝜎𝜖2 ~𝜒𝑞2 , under 𝐻0

General linear hypotheses
• Example (Practical exercise)

• Consider the hypothesis:
𝐻0 : 𝛽1 = 𝛽2 = 0
𝛽0
0 1 0 0 0 1 0
• We take 𝐋 = and 𝐜 = . 𝐋𝑞×(𝑘+1) 𝜷 = 𝛽
𝑘+1 ×1
0 0 1 1
0 0 1 0 𝛽2
𝛽1 0
= =
𝛽2 0
• Consider
𝐻0 : 𝛽1 − 𝛽2 = 0
• Define 𝐋 =? and 𝐜 =?
Predicting new 𝑦-values
• Forecasting the future response values:

• E.g., predicting the prestige values based on education and income.
• Two possible interpretation of prediction based on a given 𝑥.
• The estimate of the mean (average) prestige 𝜇𝑦 = 𝐸 𝑦 at specific values of education and
income:
∗ ∗
𝜇ො𝑦 = 𝐵0 + 𝐵1 𝑥𝑛+1 1 + ⋯ + 𝐵 𝑘 𝑥𝑛+1 𝑘
• Estimated prestige at the specific value of education and income:

𝑌෠𝑛+1 = 𝐵0 + 𝐵1 𝑥𝑛+1
∗ ∗
1 + ⋯ + 𝐵𝑘 𝑥𝑛+1 𝑘
• Suppose, we want to predict the prestige value for a new profession with
• education = 92
• income = 68
• Extrapolation in regression:
• Be concerned not only about individual predictor but also about the set of values of several
predictors together.
Inference for predictions
• Confidence interval for 𝜇𝑦 : 𝑥 ∗ = [1, 𝑥1∗ , … , 𝑥𝑘∗ ]

CI 𝜇𝑦 = 𝜇ො𝑦 ± 𝑡df𝐸;𝛼/2 se(𝜇ො𝑦 )
where df𝐸 is the df of the error term and
se 𝜇ො𝑦 = 𝑆𝐸 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ = 𝑆𝐸2 (𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ )
• Prediction interval for individual 𝑌:

Which one is
CI 𝑌෠ = 𝑌෠ ± 𝑡df𝐸;𝛼/2 se(𝑌)
෠
larger and why?
where df𝐸 is the df of the error term and
se 𝑌෠ = 𝑆𝐸 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ + 1 = 𝑆𝐸2 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ + 𝑆𝐸2
2
= se 𝜇ො𝑦 + 𝑆𝐸2
Confidence vs prediction interval
• A CI gives a range for 𝐸 𝑦 and a PI gives a

range for 𝑦.
• A PI is wider than a CI because it includes a

wider range of values.
• A PI predicts an individual value, whereas a CI

predicts the mean value.
• A PI focuses on the future values, whereas a CI

focuses on past values.
• Specify the ‘interval’

argument for CI or PI.
• predict() function
provides the standard error
of the predicted means.

Lesson 2 Statistical Inference

Uploaded by

Copyright:

Available Formats

Lesson 2 Statistical Inference

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 2 Statistical Inference

Uploaded by

Copyright:

Available Formats

Linear and Generalized Linear

• inference for individual coefficients: t-tests and confidence intervals

• Linear model: Reminder

where 𝜖 ~ 𝑁𝑛 (𝟎, 𝜎𝜖2 𝐈𝑛 ) and 𝐗 𝑛×(𝑘+1) is the model matrix.

• Normal equations to obtain the LS estimators 𝐛 of 𝜷:

• Recall the properties .

1. 𝐛 is a linear estimator: 𝐛 = 𝐌𝐲, for some 𝐌 = 𝐗 ′ 𝐗 −1 ′

3. 𝐛 has a variance-covariance matrix: 𝑉 𝐛 = 𝜎𝜖2 𝐗 ′ 𝐗 −1

4. 𝐛 has a normal distribution, if 𝐲 is normally distributed. Therefore,

• inference for individual coefficients: t-tests and confidence intervals

Population 𝑓𝑋 𝑥 Sample Data (𝑥1 , … , 𝑥𝑛 )

where v𝑗𝑗 is the 𝑗-th diagonal entry of 𝐗 ′ 𝐗 −1 .

• If 𝐻0 is true (i.e., under 𝐻0 ) , Z~N(0,1).

What is the problem here?

• In the variance 𝑉 𝐛 = 𝜎𝜖2 𝐗 ′ 𝐗 −1

• If 𝐻0 is true, then 𝑡~𝑡𝑛−(𝑘+1) .

William Gosset (“Student”)

• Dataset on prestige of 45 occupations, to be explained by education and income.

R always reports two-tailed P-

2. Test statistic: The testing value comes here.

3. If 𝐻0 is true, then 𝑡~𝑡42 (𝑛 = 45, 𝑘 = 2, therefore df = 45 − 2 + 1 = 42).

5. Two-tailed p-value is needed :

6. The outcome of test statistic (read from R output):

pt(5.555, 42, lower.tail = FALSE)

• R cannot be directly used, unless we use some trick.

1. Define the hypothesis test:

2. Test statistic (the same as for two-sided test):

3. If 𝐻0 is true, then 𝑡~𝑡42 .

4. If 𝐻𝑎 is true, 𝑡 tends to larger values than prescribed by 𝑡42 distribution

5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > |𝑡|)

7. P = 2 × P t 42 ≥ 0.468 = 2 × 0.321 = 0.64

pt(0.468, 42, lower.tail = FALSE)

1. Define the hypothesis test:

3. If 𝐻0 is true, then 𝑡~𝑡42 .

4. If 𝐻𝑎 is true, 𝑡 tends to larger values than prescribed by 𝑡42 distribution

5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > 𝑡)

7. p-value: 𝑃 = 𝑃 𝑡42 ≥ 5.555 = 8.65 ⋅ 10−7 ⋅

• Suppose we would like to test if Define the hypothesis test:

3. If 𝐻0 is true, then 𝑡~𝑡42 .

4. If 𝐻𝑎 is true, 𝑡 tends to larger values than prescribed by 𝑡42 distribution

5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > 𝑡)

7. p-value: 𝑃 = 𝑃 𝑡42 ≥ −4.679 = 0.9999 ⋅

Conclusion: 𝑃 > 𝛼 = 0.05 , therefore, failed to reject 𝐻0 .

• We can also use 100(1 − 𝛼 )% confidence interval

• 100(1 − 𝛼)% level confidence interval for slope is defined as:

• Multiple regression model for response 𝑌𝑖 and 𝑘 regressors 𝑥1 , … , 𝑥𝑘 :

𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖1 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜖𝑖 , for 𝑖 = 1, … , 𝑛

• Global or "omnibus" test that all regressors are unimportant.

• In this case, the F-test statistic is used:

• Hence, under 𝐻0 , the ratio 𝑅𝑒𝑔𝑀𝑆 / 𝑅𝑀𝑆 is close to 1.

• If 𝐻0 is true (i.e., under 𝐻0 ) , 𝐹~𝐹𝑘; 𝑛−(𝑘+1)

• Reject 𝐻0 for large values of 𝐹, right-sided P-value and rejection region.

• Suppose 𝑍1 , … , 𝑍𝑚 are independent, standard normal random variables, i.e., 𝑍𝑖 ~𝑁(0,1)

• The mean 𝐸 𝑋 2 = 𝑚 (i.e., df)

• F-distribution is obtained by taking the ratio

• F-distribution has two degrees of freedom: numerator df 𝑑𝑓1 and

• Blue line: df1 = 4, df2 = 4.

• If 𝑡~t df then 𝑡 2 ~𝐹1;df .

• Analysis of variance table or ANOVA table shows construction of 𝐹 (a reminder)

• 𝑘 is the number of regressors in the model.