CUHK STAT5102 Ch2
CUHK STAT5102 Ch2
CUHK STAT5102 Ch2
1
Properties of Fitted Regression Line
2
Some Concerns
3
Some Concerns
4
Example 1
Parameter Estimated Value 95% Confidence Intervals
Intercept 7.43119 (-1.18518, 16.0476)
Slope 0.755048 (0.452886, 1.05721)
The student concluded from these results that there is a linear association between Y
and X. Is the conclusion warranted? What is the implied level of significance?
Yes, the conclusion is warranted and the level of significance is 0.05 (α = 0.05).
Someone questioned the negative lower confidence limit for the intercept, pointing out
that dollar sales cannot be negative even if the population in a district is zero. Discuss.
5
Example 2
6
Example 2
Obtain a 99 percent confidence interval for β1. Interpret your confidence interval.
Does it include zero? Why might the director of admissions be interest in whether
the confidence interval includes zero?
Parameter Estimates
Variable Label DF Parameter Standard t Value Pr > |t| Standardized 95% Confidence Limits
Estimate Error Estimate
From the SAS output, the 99% confidence interval for ß1 is (0.00539,
0.07227).
Hypotheses:
Null Hypothesis H 0 : β1 = 0
Alternative Hypothesis H 1 : β1 ≠ 0
Decision Rule:
Reject the null hypothesis if the test statistics
|T| > t(.005, 118) = 2.61814
Test Statistic:
|T| = (0.03883 – 0)/0.01277 = 3.04072 > 2.61814
Conclusion:
Reject the null hypothesis. With 0.01 level of significance, we claim
that there is a linear association between student’s ACT score and GPA
at the end of the freshman year.
8
Example 2
What is the P-value of your test in part (b)? How does it support the
conclusion reached in part (b)?
The P-value is less than 0.01 and this is an expected finding since
the null hypothesis is tested at a level of significance 0.01.
Further, the P-value is a lot less than 0.01, providing us a strong
evidence against the null hypothesis.
9
Example 3
10
Example 3
Estimate the change in the mean service time when the number of copiers serviced
increases by one. Use a 90 percent confidence interval. Interpret your confidence
interval.
Parameter Estimates
Variable Label DF Parameter Standard t Value Pr > |t| 90% Confidence Limits
Estimate Error
From the SAS output, the 90% confidence interval for β1 is (14.22314,
15.84735).
Alternative method: t(.05, .43) = 1.6811, therefore, the confidence interval for
β1 is (15.0352 – 1.6811(0.4831), 15.0352 + 1.6811(0.4831)) = (14.22314,
15.84735).
Conclusion: When the number of copiers serviced increases by one, the
estimated change in the mean service time is (14.22314, 15.84735) with 90
percent confidence level. 11
Example 3
Conduct a t-test to determine whether or not there is a linear association between X
and Y here; control the α risk at .10. State the alternatives, decision rule, and
conclusion. What is the P-value of your test?
Hypotheses
Null Hypothesis H0: β1 = 0
Alternative Hypothesis H1: β1 ≠ 0
Decision Rule
Reject the null hypothesis if the test statistics |T| > t(.05, 43) = 1.6811
Test Statistics
|T| = (15.0352 – 0)/0.4831 = 31.122 > 1.6811
Conclusion
Reject the null hypothesis. With 0.1 level of significance, we claim that there is
a linear association between X and Y.
Yes. Since both the upper and lower limits are positive (zero is
not contained in the interval), with the same α level, the null
hypothesis should also be rejected.
13
Example 3
The manufacturer has suggested that the mean required time should not increase by more than 14
minutes for each additional copier that is serviced on a service call. Conduct a test to decide
whether this standard is being satisfied by Tri-City. Control the risk of a Type I error at .05. State
the alternatives, decision rule, and conclusion. What is the P-value of the test?
Hypotheses
Null Hypothesis H0: β1 ≤ 14
Alternative Hypothesis H1: β1 > 14
Decision Rule
Reject the null hypothesis if the test statistics T > t(.05, 43) = 1.6811
Test Statistics:
T = (15.0352 – 14)/0.4831 = 2.1428 > 1.6811
Conclusion
Reject the null hypothesis. With 0.05 level of significance, we believe that the
standard is not being satisfied by Tri-City. P-value: P(T > 2.1428) = 0.0189.
14
Example 3
Does b0 give any relevant information here about the “start-up” time on calls –
i.e., about the time required before service work is begun on the copiers at a
customer location?
No, since the estimate is zero which does not correspond to any
sensible measurement of time.
15
Example 4
(refer to Copier maintenance problem in Example 3)
Obtain a 90 percent confidence interval for the mean service time on calls in which
six copiers are serviced. Interpret your confidence interval.
We need to estimate the mean response E(Y6) by Ŷ6 where
Ŷ6 = b0 + b1(6) = 89.6313
t / 2 , n 2 t .05 , 43 1 . 6811
2
s{Yˆ6 } MSE
1 XhX
1 6 5 . 1111 2
79 . 45063
n n
2
(XiX ) 45 340 . 4412
i 1
16
Example 4
So, two-sided 90% C.I. for E(Y6) is
Yˆ
6
t / 2 , n 2 s{Yˆ6 },
Yˆ6 t / 2 , n 2 s{Yˆ6 } (87 . 2838 , 91 . 9788 )
17
Example 4
Obtain a 90 percent prediction interval for the service time on the next call in which
six copiers are serviced. Is your prediction interval wider than the corresponding
confidence interval in part (a)? Should it be?
Let the service time on the next call in which six copiers are serviced be
Y6(new) which is estimated by Ŷ6.
Yˆ6
t / 2 , n 2 s{ pred }, Yˆ6 t / 2 , n 2 s{ pred }
Now,
t / 2 , n 2 t .05 , 43 1 . 6811
s{ pred } MSE
1 1 X h X
2
79 . 45063 1
1 6 5 . 1111
2
9 . 0222
n n
(XiX )
2 45 340 . 4412
18
i 1
Example 4
So, a two-sided 90% prediction interval for Y6(new) is
Yˆ
6
t / 2 , n 2 s{ pred },
Yˆ6 t / 2 , n 2 s{ pred } ( 74 . 4641 , 104 . 7985 )
With a .90 confidence level, we predict that the service time on the next
call in which six copiers will be serviced to be between 74.4641 minutes to
104.7985 minutes.
The prediction interval is much wider than the confidence interval in part
(a). This is reasonable since the variance is larger which accounts for the
variation from item to item.
19
Example 4
Management wishes to estimate the expected service time per copier on
calls in which six copiers are serviced. Obtain an appropriate 90
percent confidence intervals by converting the interval obtained in part
(a). Interpret the converted confidence interval.
20
Example 5
21
Example 5
Set up the ANOVA table. Which elements are additive?
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 1 160.00000 160.00000 72.73 <.0001
Error 8 17.60000 2.20000
Corrected Total 9 177.60000
The ANOVA table is given above with Sum of Squares and Degrees of
Freedom being additive.
22
Example 5
Conduct an F test to decide whether or not there is a linear association between the
number of times a carton is transferred and the number of broken ampules; control
the α risk at .05.
Hypotheses
Null Hypothesis H0: β1 = 0
Alternative Hypothesis H1: β1 ≠ 0
Decision Rule
Reject the null hypothesis if the test statistics F > F(.05, 1,8) = 5.32
Test Statistics
F = 72.73 (From ANOVA table)
Conclusion
Reject the null hypothesis. With 0.05 level of significance, we claim that there
is a linear association between the number of times a carton is transferred and the
number of broken ampules.
23
Example 5
Obtain the T statistic for the test in part (b) and demonstrate numerically its
equivalence to the F statistic obtained in part (b).
Hypotheses
Null Hypothesis H0 : β 1 = 0
Alternative Hypothesis H1 : β 1 ≠ 0
Decision Rule
Reject the null hypothesis if the test statistics
|T| > t(.025, 8) = 2.3065 (Note: 2.30652 = 5.32)
Parameter Estimates
Variable Label DF Parameter Standard t Value Pr > |t| 95% Confidence Limits
Estimate Error
25