Lesson 2 Statistical Inference

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Linear and Generalized Linear

Models (4433LGLM6Y)

Statistical Inference
Meeting 5

Vahe Avagyan
Biometris, Wageningen University and Research
Statistical inference

• inference for individual coefficients: t-tests and confidence intervals


• inference for several coefficients: F-tests
• general linear hypotheses
Linear Model Theory

• Linear model: Reminder

𝒚 = 𝐗𝛃 + 𝜖,

where 𝜖 ~ 𝑁𝑛 (𝟎, 𝜎𝜖2 𝐈𝑛 ) and 𝐗 𝑛×(𝑘+1) is the model matrix.

• Fitting the model to data gives the vectors of fitted values and residuals:

𝐲 = 𝐗𝐛 + 𝐞,

• Normal equations to obtain the LS estimators 𝐛 of 𝜷:

𝐗 ′ 𝐗 𝐛 = 𝐗 ′ 𝐲.
Distribution of least-squares estimator

• LS estimator:

𝐛 = 𝐗′𝐗 −1 𝐗 ′ 𝐲.

• Recall the properties .

1. 𝐛 is a linear estimator: 𝐛 = 𝐌𝐲, for some 𝐌 = 𝐗 ′ 𝐗 −1 ′


𝐗

2. 𝐛 is an unbiased estimator: 𝐸 𝐛 = 𝛽 = 𝐸 𝐌𝐲 = 𝐌𝐸 𝐲 = 𝐗 ′ 𝐗 −1 𝐗 ′ 𝑿𝛽 =𝛽

3. 𝐛 has a variance-covariance matrix: 𝑉 𝐛 = 𝜎𝜖2 𝐗 ′ 𝐗 −1


.

4. 𝐛 has a normal distribution, if 𝐲 is normally distributed. Therefore,


𝐛 ~ 𝑁𝑘+1 (𝛽, 𝜎𝜖2 𝐗 ′ 𝐗 −1
Statistical inference

• inference for individual coefficients: t-tests and confidence intervals


• inference for several coefficients: F-tests
• general linear hypotheses
What is the Statistical Inference?

Probability

Population 𝑓𝑋 𝑥 Sample Data (𝑥1 , … , 𝑥𝑛 )

1
Parameter, e.g., 𝜇 = 𝐸(𝑋) Statistic, e.g., 𝑥ҧ = 𝑛 σ𝑛𝑖=1 𝑥𝑖

Statistical Inference
Statistical inference for individual coefficients

• Vector of coefficients 𝐛 = 𝐵0 , 𝐵1 , … , 𝐵𝑘 ′

𝐛 ~ 𝑁𝑘+1 𝜷, 𝜎𝜖2 𝐗 ′ 𝐗 −1
.

• Individual coefficient:

𝐵𝒋 −𝛽𝑗
B𝑗 ~ 𝑁 𝛽𝑗 , 𝜎𝜖2 v𝑗𝑗 or ~ 𝑁(0,1)
𝜎𝜖 v𝑗𝑗

where v𝑗𝑗 is the 𝑗-th diagonal entry of 𝐗 ′ 𝐗 −1 .


Statistical inference for individual coefficients

(0)
• For testing 𝐻0 : 𝛽𝑗 = 𝛽𝑗 (e.g., 𝐻0 : 𝛽𝑗 = 1 or any other value), we could use the test statistic:

(0)
𝐵𝑗 −𝛽𝑗
Z= .
𝜎𝜖 v𝑗𝑗

• If 𝐻0 is true (i.e., under 𝐻0 ) , Z~N(0,1).


We assume 𝜎𝜖2 would be
known here.

What is the problem here?


Statistical inference for individual coefficients

𝐞′ 𝐞
• 𝜎𝜖2 is estimated by 𝑆𝐸2 = , where 𝐞 = 𝐲 − 𝐗𝒃 is the vector of residuals.
𝑛−(𝑘+1)

• In the variance 𝑉 𝐛 = 𝜎𝜖2 𝐗 ′ 𝐗 −1


, we simply replace 𝜎𝜖2 with 𝑆𝐸2 .


• The estimator of variance-covariance matrix is 𝑉(𝐛) = 𝑆𝐸2 𝐗 ′ 𝐗 −1 .

• The estimator of standard error is 𝑆𝐸 𝐵𝑗 = 𝑆𝐸 𝐯𝑗𝑗 , where 𝐯𝑗𝑗 is the 𝑗-th diagonal entry of 𝐗 ′ 𝐗 −1
.

(0)
• To test 𝐻0 : 𝛽𝑗 = 𝛽𝑗 , we can use the test statistic

0 0
𝐵𝑗 −𝛽𝑗 𝐵𝑗 −𝛽𝑗
𝑡= = .
𝑆𝐸 𝐵𝑗 𝑆𝐸 v𝑗𝑗

• If 𝐻0 is true, then 𝑡~𝑡𝑛−(𝑘+1) .


Student's t-distribution 𝑡𝑛 (Generalization of the Standard Normal distribution)

William Gosset (“Student”)


For large df, t-distribution becomes the standard normal 𝑁(0,1).
Example: Duncan data

• Dataset on prestige of 45 occupations, to be explained by education and income.

• Linear model:
prestige𝑖 = 𝛽0 + 𝛽1 education𝑖 + 𝛽2 income𝑖 + 𝜖𝑖 , for 𝑖 = 1, … , 45 .
Example: Duncan data

Remember:

R always reports two-tailed P-


0
value for the t-test, with 𝛽𝑗 = 0.
t-test for individual slope, two-sided 𝐻𝑎

• Recall the following steps in hypothesis testing for the slope of education (i.e., 𝛽1 ):

1. Define the hypothesis test: education is not related to prestige (keeping income constant) vs
education is related to prestige (keeping income constant)

𝐻0 : 𝛽1 = 0 vs 𝐻1 : 𝛽1 ≠ 0

2. Test statistic: The testing value comes here.

𝐵1 −0
𝑡=
𝑆𝐸(𝐵1 )

3. If 𝐻0 is true, then 𝑡~𝑡42 (𝑛 = 45, 𝑘 = 2, therefore df = 45 − 2 + 1 = 42).

4. If 𝐻𝑎 is true, then 𝑡 tends to smaller (if 𝛽1 < 0) or larger (if 𝛽1 > 0) values than prescribed by
𝑡42 distribution.
Example: t-test for individual slope, two-sided 𝐻𝑎

5. Two-tailed p-value is needed :

𝑃 = 2 × 𝑃 𝑡42 ≥ 𝑡𝑜𝑢𝑡

6. The outcome of test statistic (read from R output):


0.546−0
𝑡 = = 5.555
0.0983

7. p-value:
𝑃 = 2 × 𝑃 𝑡42 ≥ 5.555 = 2 ⋅ 8.65 ⋅ 10−7 = 1.73 ⋅ 10−6 .

pt(5.555, 42, lower.tail = FALSE)

Conclusion: 𝑃 < 𝛼 = 0.05 , therefore, reject 𝐻0 . It is shown that education level required for jobs is
related to prestige (keeping income constant).
Example: t-test for individual slope, 𝐻0 : 𝛽 = 𝛽(0)

0
• Suppose a test for 𝛽𝑗 ≠ 0.

• R cannot be directly used, unless we use some trick.

• Imagine that the value 0.5 has some special meaning in the education example, and we ask if 𝛽1
might be equal to 0.5 (given the data).

1. Define the hypothesis test:

𝐻0 : 𝛽1 = 0.5 vs 𝐻𝑎 : 𝛽1 ≠ 0.5.

2. Test statistic (the same as for two-sided test):


𝐵1 − 0.5
𝑡=
𝑆𝐸(𝐵1 )

3. If 𝐻0 is true, then 𝑡~𝑡42 .


Example: t-test for individual slope, 𝐻0 : 𝛽 = 𝛽(0)

4. If 𝐻𝑎 is true, 𝑡 tends to larger values than prescribed by 𝑡42 distribution

5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > |𝑡|)


0.54 − 0.5
6. The outcome of test statistic: 𝑡 = =0.468.
0.0983

7. P = 2 × P t 42 ≥ 0.468 = 2 × 0.321 = 0.64

pt(0.468, 42, lower.tail = FALSE)

Conclusion: 𝑃 > 0.05, do not reject 𝐻0 . No evidence is found that the slope deviates from 0.5 (keeping
income constant)
Example: t-test for individual slope, one-sided 𝐻𝑎

• Suppose we would like to test if the relationship is positive. In this case, it makes sense to test with a
right-sided 𝐻𝑎 .The steps are almost the same, with a small difference.

1. Define the hypothesis test:

Education is not related to prestige (keeping income constant) vs education is positively related to
prestige (keeping income constant).
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 > 0
𝐵1 −0
2. Test statistic: 𝑡 = (the same as for two-sided test).
𝑆𝐸(𝐵1 )

3. If 𝐻0 is true, then 𝑡~𝑡42 .

4. If 𝐻𝑎 is true, 𝑡 tends to larger values than prescribed by 𝑡42 distribution


Example: t-test for individual slope, one-sided 𝐻𝑎

5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > 𝑡)


0.54 − 0
6. The outcome of test statistic: 𝑡 = = 5.555.
0.0983

7. p-value: 𝑃 = 𝑃 𝑡42 ≥ 5.555 = 8.65 ⋅ 10−7 ⋅

Conclusion: 𝑃 < 𝛼 = 0.05 , therefore, reject 𝐻0 . Thus, education is positively related to prestige
(keeping income constant).

Note:

• here we could take the half of the P-value as reported by R (i.e., the two-sided p-value).

• Can two-tailed P-value, as reported by R, always be halved for one-sided 𝐻𝑎 ? (No, why?)
Example: t-test for individual slope, one-sided 𝐻𝑎 with 𝛽0 ≠ 0

• Suppose we would like to test if Define the hypothesis test:


𝐻0 : 𝛽1 = 1
𝐻1 : 𝛽1 > 1
𝐵1 −1
2. Test statistic: 𝑡 = .
𝑆𝐸(𝐵1 )

3. If 𝐻0 is true, then 𝑡~𝑡42 .

4. If 𝐻𝑎 is true, 𝑡 tends to larger values than prescribed by 𝑡42 distribution

5. Right-tailed p-value is needed: 𝑃 = 𝑃(𝑡42 > 𝑡)


0.54 −1
6. The outcome of test statistic: 𝑡 = = − 4.679.
0.0983

7. p-value: 𝑃 = 𝑃 𝑡42 ≥ −4.679 = 0.9999 ⋅

Conclusion: 𝑃 > 𝛼 = 0.05 , therefore, failed to reject 𝐻0 .


Confidence interval for slope

• We can also use 100(1 − 𝛼 )% confidence interval

𝐶𝐼 𝛽𝑗 = B𝑗 ± 𝑡𝛼 Τ2 ; 𝑛−(𝑘+1) 𝑆𝐸 B𝑗

• The CI’s do not contain the value 0, which means we can reject the two-sided test against 0.
Example: confidence interval for slope

• 100(1 − 𝛼)% level confidence interval for slope is defined as:


𝐶𝐼 𝛽1 = 𝐵1 ± 𝑡𝛼Τ2; 45−3 𝑆𝐸 𝐵1 =
0.546 ± 𝑡42; 0.025 × 0.0983 =
= 0.546 ± 2.018 × 0.0983 =
= (0.348; 0.744)
Statistical inference for several coefficients: All-slopes

• Multiple regression model for response 𝑌𝑖 and 𝑘 regressors 𝑥1 , … , 𝑥𝑘 :

𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖1 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜖𝑖 , for 𝑖 = 1, … , 𝑛

• Global or "omnibus" test that all regressors are unimportant.


𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0
𝐻1 : at least one slope is not zero / at least one x has predictive value

• In this case, the F-test statistic is used:


𝑅𝑒𝑔𝑀𝑆 𝑅𝑒𝑔𝑆𝑆/𝑘
𝐹= =
𝑅𝑀𝑆 𝑅𝑆𝑆/(𝑛−(𝑘+1))

• Recall, 𝑅𝑒𝑔𝑆𝑆 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆, i.e., difference between the residual sum of squares of the null
model (i.e., intercept-only model) and current model.
Statistical inference for several coefficients: All-slopes

𝑅𝑒𝑔𝑀𝑆
• 𝐹 = is a ratio of two Mean Squares:
𝑅𝑀𝑆

• Denominator: Residual Mean Square 𝑅𝑀𝑆 is an estimator of the error variance 𝜎𝜖2 .

• Numerator: Regression Mean Square 𝑅𝑒𝑔𝑀𝑆 is also an estimator of 𝜎𝜖2 , but only if 𝑯𝟎 is true!

• Hence, under 𝐻0 , the ratio 𝑅𝑒𝑔𝑀𝑆 / 𝑅𝑀𝑆 is close to 1.

• Under 𝐻𝑎 , 𝑅𝑒𝑔𝑀𝑆 tends to be larger than 𝜎𝜖2 , so the ratio tends to be larger than 1.

• If 𝐻0 is true (i.e., under 𝐻0 ) , 𝐹~𝐹𝑘; 𝑛−(𝑘+1)

• Reject 𝐻0 for large values of 𝐹, right-sided P-value and rejection region.


Chi-squared distribution 𝜒𝑘2

• Suppose 𝑍1 , … , 𝑍𝑚 are independent, standard normal random variables, i.e., 𝑍𝑖 ~𝑁(0,1)


2 distribution, with 𝑛 degrees of freedom.
• Sum of their squares follows a 𝜒𝑚

𝑋 2 = σ𝑚 2 2
𝑖=1 𝑍𝑖 ~𝜒𝑚

• The mean 𝐸 𝑋 2 = 𝑚 (i.e., df)


F-distribution

2 2
• Suppose 𝑋12 ~𝜒df 1
and 𝑋2
2
~𝜒df2 are two independent chi-square
distributed variables, with degrees of freedom df1 and df2 ,
respectively.

• F-distribution is obtained by taking the ratio

𝑋12 Τdf1
𝐹 ≡ ~𝐹df1 ; df2 .
𝑋22 Τdf2

• F-distribution has two degrees of freedom: numerator df 𝑑𝑓1 and


denominator 𝑑𝑓2 .

Ronald Fisher
F-distribution: Examples

• Blue line: df1 = 4, df2 = 4.


• Orange line: df1 = 20, df2 = 20.

𝑑𝑓2
• The mean 𝐸 𝐹 = , for 𝑑𝑓2 > 2.
𝑑𝑓2 −2

• If 𝑡~t df then 𝑡 2 ~𝐹1;df .


• For 𝑞 = 1 F-test is equivalent to t-test.
Statistical inference for several coefficients: All-slopes

• Analysis of variance table or ANOVA table shows construction of 𝐹 (a reminder)

• 𝑘 is the number of regressors in the model.


Example: Duncan data

What is your conclusion?


Example: Duncan data

R reports the sums of


squares of education and
income.
Example: Duncan data

• As for the t-test, we have the following steps for the F-test.

1. Hypothesis test:

𝐻0 : 𝛽1 = 𝛽2 = 0
𝐻1 : at least one is not zero.

2. Test statistic:
𝑅𝑒𝑔𝑆𝑆/𝑘
𝐹 = .
𝑅𝑆𝑆/(𝑛−(𝑘+1))
3. If 𝐻0 is true 𝐹 ~ 𝐹2;42 .

4. If 𝐻𝑎 is true, 𝐹 tends to larger values than prescribed by 𝐹2;42 distribution.


Example: Duncan data

5. Right-tailed p-value: 𝑃 = 𝑃(𝐹2;42 ≥ 𝐹)

6. The outcome of test statistic

𝑅𝑒𝑔𝑆𝑆/2 36181/2
𝐹= = = 101.2.
𝑅𝑆𝑆/42 7507/42

7. P-value: 𝑃 = 𝑃 𝐹2;42 ≥ 101.2 = 8.76 × 10−16 .

Conclusion: 𝑃 < 0.05, so reject 𝐻0 . Therefore, education and/or income are related to prestige.

• To calculate the p-value use: pf(101.2, 2, 42, lower.tail =FALSE)


Statistical inference for several coefficients: Subset of Slopes

• Inference on groups of coefficients may be needed because

• least-squares estimators are often correlated (off-diagonal elements of 𝑉 𝐛 are non-zero).

• interest in related set of coefficients, like in ANOVA.

• Suppose we would like to test if a subset of slopes are 0, instead of all slopes
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑞 = 0
𝐻1 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑧𝑒𝑟𝑜

• For notational convenience, let’s focus on the first 𝑞 regressors, but any subset of 𝛽𝑖 ’s may be tested.
Hypothesis Test: Subset of Slopes

• F-test is constructed by fitting two nested models:

Full (or initial) model FM:


𝑌 = 𝛽0 + 𝛽1 𝑥1 + ⋯ 𝛽𝑞 𝑥𝑞 + 𝛽𝑞+1 𝑥𝑞+1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜖.

Reduced model RM:


𝑌 = 𝛽0 + 0𝑥1 + ⋯ + 0𝑥𝑞 + 𝛽𝑞+1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜖 = 𝛽0 + 𝛽𝑞+1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜖

FM and RM give residual Sum of Squares 𝑅𝑆𝑆1 and 𝑅𝑆𝑆0 , respectively.


Statistical inference for several coefficients: Subset of Slopes

• We have 𝑅𝑆𝑆 = 𝐞′𝐞 residual sum of squares of FM and 𝑅𝑆𝑆0 = 𝐞′0 𝐞0 residuals sum of squares of RM.
• The F-ratio is defined as
𝑅𝑆𝑆0 − 𝑅𝑆𝑆 Τ𝑞
𝐹0 =
𝑅𝑆𝑆Τ 𝑛 − 𝑘 + 1
• Under 𝐻0 , 𝐹0 ~𝐹𝑞;𝑛−(𝑘+1) .

• Is F-ratio always positive? Why?

• The following holds: 𝑅𝑆𝑆0 − 𝑅𝑆𝑆 = 𝑅𝑒𝑔𝑆𝑆 − 𝑅𝑒𝑔𝑆𝑆0 , i.e., "Any increase in residual sum of squares, is
decrease in regression sum of squares“.
(𝑅𝑒𝑔𝑆𝑆−𝑅𝑒𝑔𝑆𝑆0 )/𝑞
• Therefore, we can write 𝐹 = .
𝑅𝑆𝑆1 /(𝑛−(𝑘+1))
Example: Duncan data

• Suppose we would like to test 𝐻0 : 𝛽1 = 0 (i.e., education has no association with prestige).

• Remember that, if 𝑡~t df then 𝑡 2 ~𝐹1;df .


• For 𝑞 = 1 F-test is equivalent to t-test.
Example: Duncan data

• Another approach in R
Statistical inference for several coefficients: Subset of Slopes


• Let 𝐛1 = 𝐁1 , … , 𝐁𝑞 be LS coefficients of interest from 𝐛 and 𝐕11 be the corresponding submatrix of
𝐗′𝐗 −1
.

• We can show that 𝑅𝑆𝑆0 − 𝑅𝑆𝑆 = 𝐛1′ 𝐕11


−1
𝐛1

𝑅𝑆𝑆0 −𝑅𝑆𝑆 Τ𝑞 𝐛′1 𝐕11


−1 𝐛
1
𝐹0 = 𝑅𝑆𝑆Τ 𝑛− 𝑘+1
= 2
𝑞𝑆𝐸

(0) ′ (0)
• Test a general hypothesis 𝐻0 : 𝜷1 = 𝜷1 , where 𝜷1 = 𝛽1 , 𝛽2 , … , 𝛽𝑞 and 𝜷1 not necessarily 𝟎.


0 −1 0
𝐛1 − 𝜷1 𝐕11 𝐛1 − 𝜷1
𝐹0 = ~𝐹𝑞;𝑛−(𝑘+1)
𝑞𝑆𝐸2
Statistical inference

• inference for individual coefficients: t-tests and confidence intervals


• inference for several coefficients: F-tests
• general linear hypotheses
General linear hypotheses

• Consider the following linear hypothesis: 𝐻0 : 𝐋𝑞×(𝑘+1) 𝜷 𝑘+1 ×1 = 𝐜𝑞×1

• The hypothesis matrix 𝐋 is full row rank 𝑞 ≤ 𝑘 + 1.

′ ′ −𝟏 ′ −1
𝐋𝜷−𝐜 𝐋 𝐗 𝐗 𝐋 𝐋𝜷−𝐜
• The F-statistic is defined as: 𝐹0 = ~ 𝐹𝑞,𝑛−(𝑘+1) (under 𝐻0 ), because
𝑞𝑆𝐸2

• 𝐛 ~ 𝑁𝑘+1 𝜷, 𝜎𝜖2 𝐗 ′ 𝐗 −1

• 𝐋𝐛 ~ 𝑁𝑞 𝐋𝜷, 𝜎𝜖2 𝐋 𝐗 ′ 𝐗 −1
𝐋′

• 𝐋𝜷 − 𝐜 ′ 𝐋 𝐗′𝐗 −𝟏 𝐋′ −1 𝐋𝜷 − 𝐜 ൗ𝜎𝜖2 ~𝜒𝑞2 , under 𝐻0


General linear hypotheses

• Example (Practical exercise)


• Consider the hypothesis:
𝐻0 : 𝛽1 = 𝛽2 = 0
𝛽0
0 1 0 0 0 1 0
• We take 𝐋 = and 𝐜 = . 𝐋𝑞×(𝑘+1) 𝜷 = 𝛽
𝑘+1 ×1
0 0 1 1
0 0 1 0 𝛽2
𝛽1 0
= =
𝛽2 0

• Consider
𝐻0 : 𝛽1 − 𝛽2 = 0
• Define 𝐋 =? and 𝐜 =?
Predicting new 𝑦-values

• Forecasting the future response values:


• E.g., predicting the prestige values based on education and income.

• Two possible interpretation of prediction based on a given 𝑥.

• The estimate of the mean (average) prestige 𝜇𝑦 = 𝐸 𝑦 at specific values of education and
income:
∗ ∗
𝜇ො𝑦 = 𝐵0 + 𝐵1 𝑥𝑛+1 1 + ⋯ + 𝐵 𝑘 𝑥𝑛+1 𝑘

• Estimated prestige at the specific value of education and income:


𝑌෠𝑛+1 = 𝐵0 + 𝐵1 𝑥𝑛+1
∗ ∗
1 + ⋯ + 𝐵𝑘 𝑥𝑛+1 𝑘
Example: Duncan data

• Suppose, we want to predict the prestige value for a new profession with
• education = 92
• income = 68

• Extrapolation in regression:
• Be concerned not only about individual predictor but also about the set of values of several
predictors together.
Inference for predictions

• Confidence interval for 𝜇𝑦 : 𝑥 ∗ = [1, 𝑥1∗ , … , 𝑥𝑘∗ ]


CI 𝜇𝑦 = 𝜇ො𝑦 ± 𝑡df𝐸;𝛼/2 se(𝜇ො𝑦 )

where df𝐸 is the df of the error term and

se 𝜇ො𝑦 = 𝑆𝐸 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ = 𝑆𝐸2 (𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ )

• Prediction interval for individual 𝑌:


Which one is
CI 𝑌෠ = 𝑌෠ ± 𝑡df𝐸;𝛼/2 se(𝑌)

larger and why?
where df𝐸 is the df of the error term and

se 𝑌෠ = 𝑆𝐸 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ + 1 = 𝑆𝐸2 𝑥 ∗ ′ 𝐗 ′ 𝐗 −1 𝑥 ∗ + 𝑆𝐸2
2
= se 𝜇ො𝑦 + 𝑆𝐸2
Confidence vs prediction interval

• A CI gives a range for 𝐸 𝑦 and a PI gives a


range for 𝑦.

• A PI is wider than a CI because it includes a


wider range of values.

• A PI predicts an individual value, whereas a CI


predicts the mean value.

• A PI focuses on the future values, whereas a CI


focuses on past values.
Example: Duncan data

• Specify the ‘interval’


argument for CI or PI.

• predict() function
provides the standard error
of the predicted means.

You might also like