Stat 305 Final Practice - Solutions
Stat 305 Final Practice - Solutions
Stat 305 Final Practice - Solutions
1. Enterprise Industries produces Fresh, a brand of liquid laundry detergent. In order to more
effectively manage its inventory, the company would like to better predict demand for Fresh. To
develop a prediction model, the company has gathered data concerning demand for Fresh over
the last 30 sales periods (each sales period is defined to be a four-week period). For this data
set, let
x1 = the price (in dollars) of Fresh as offered by Enterprise Industries in the sales period minus
the average industry price (in dollars) of competitors similar detergents in the sales period.
y = the demand for Fresh (in hundreds of thousands of bottles) in the sales period
Refer to Output A for parts (a) (b).
a) [4] Based on your interpretation of the scatterplots provided, state the model equations that
might adequately describe the relationship of i) y with x1 and ii) y with x 2 . Your answers
here should be similar in form to the following incorrect answer:
y = 0 + 1 x1 + 2 x2 + 3 x1 x2 + .
i) y =
0 + 1 x1 +
ii) y = 0 + 1 x 2 + 2 x 22 +
b) [2] If one would fit the model, y 0 + 1 x1 (which may or may not correctly reflect the
relationship between y and x1 ) to the data set, what would be the value for R2, the coefficient of
determination?
R 2 = 0.88972 = 0.7916
From among several models for y as a function of x1 and x 2 , the following model (Model 1) was
selected: y = 0 + 1 x 2 + 2 x 22 + 3 x1 + 4 x1 x 2 + .
c) [5] A normal quantile plot of residuals and a plot of the residuals versus the predicted values
are shown in Output B. Describe how you may use these plots to examine whether certain
model assumptions are appropriate here. State the assumptions under consideration and
identify clearly the plot you would use for assessing each assumption.
~ iid N (0, 2 )
The constant variance assumption can be assessed by looking at the plot of the
residuals. If one sees the residuals forming a fan shape, then the constant variance
assumption may not be appropriate for the data. If the model is appropriate for the data,
then one hopes to see the residuals forming a cloud shape.
The normality assumption could be assessed by looking at the normal quantile plot. If
the points create a fairly linear pattern, especially in the middle of the plot, then the
normality assumption could be appropriate for the data.
Refer to Output C for parts (d) (g).
d) [4] Predict the demand for the next sales period (in hundreds of thousands of bottles) if the
price difference will be -.20 (dollars) and the advertising expenditure for Fresh will be 5.0
(hundreds of thousands of dollars).
0.6712 1.708(0.2027)
(0.32,1.02 )
Since 0 is not in the interval, then we can conclude that 2 0 . Hence the quadratic
term is needed in the model. Note that the value used for t is based on 25 df.
f)
[4] State the null and the alternative hypotheses concerning whether the interaction term is
needed in the model or not. Continue to follow the five-step format to perform a hypothesis test.
H0 : 4 = 0
H a : 4 0
1.4777 0
= 2.21
0.6672
p value = 0.0361
t=
Since the p-value is less than 0.05, we can reject the null hypothesis and conclude that
the interaction term is needed in the model.
g) [2] Give the estimate for 2 .
MSE = 0.04258
h) [6] Since Enterprise Industries has to pay someone to visit several stores and gather
information on the prices for similar detergents produced by competitors during every sales
period, Enterprise Industries is wondering if using only advertising expenditure to predict
demand is equivalent to using both advertising expenditure and price difference to predict
demand. Output D contains the output for a model that uses only advertising expenditure to
predict demand, i.e., y = 0 + 1 x 2 + 2 x 22 + (Model 2). Follow the five-step format to justify
using Model 1 or Model 2 to predict demand. Note that you will need to provide the value of a
test statistic that is distributed according to an F -distribution.
H 0 : 3 = 4 = 0
H a : at least one j 0 for i = 3,4.
(2.18 1.06) /(27 25)
= 13.2 ~ F2,25
1.06 / 25
Q(.95) = 3.39
p value < .001
f =
The p-value is less than 0.05 so we reject H0. Therefore, use the full model (Model 1) to predict
demand.
Output A:
Output B:
Output C:
Output D:
DF
Model
400
Error
__a__
___b___
24
800
C. Total
Sum of Squares
Mean Square
_____c____
F Ratio
___e___
_____d_____
0.176 2.831(0.027)
(0.252, 0.100)
d) Give the estimate for
0.547 = 0.7396
f)
Interpret the confidence interval calculated by JMP for the observation described by the last line
in the data table.
For a speed of 55 mph and 87 octane, we are 95% confident that the average mileage will be
between 28.97 and 30.19 mpg.
hypotheses, the formula for the test statistic, the formula with the appropriate values as provided
by the JMP output, the p-value and the conclusion.
H 0 : 1 = 2 = 0
H a : at least1 or 2 0
(246.63 11.489) /(23 21)
= 214.90
11.489 /(21)
P[ F2, 21 > f ] < .0001
f =
Since the p-value is less than 0.05, we can conclude that at least one of the parameters is not
equal to zero. Therefore, the model that includes both speed and octane along with the
appropriate parameter estimates should be used to predict average mileage.
Output A:
Octane
87
87
87
87
87
y 0 + 1 xtime + 2 xtemp + 3 xtime xtemp (refer to Output C). Which model (along with the
appropriate parameter estimates) use to predict strength and why? Follow the five-step format
and provide a test statistic that is distributed according to the F-distribution.
H 0 : 3 = 0
H a : 3 0
(364346.2 321450.5) /(33 32)
= 4.27 ~ F1,32
321450.5 / 32
Using F1,30 : Q(.95) = 4.17 < 4.27 < Q(.99) = 7.56
f =
When one says (0,10) is a 95% CI for , one means that the probability
that lies within the interval is .95.
A 99% confidence interval is wider than a 95% confidence interval for a given
data set.
Answers: F, F, T.
____________
____________ Theorem.
7. A new experimental drug to reduce cholesterol was developed. Five people were chosen to receive
the new drug. Each person had his/her cholesterol measured before taking the drug. Then each
person took the drug for a six-week period and had his/her cholesterol measured again.
a) Give and interpret a 95% confidence interval for the mean difference between the before and
after cholesterol measurements.
Person
Before
After
1
200
180
2
220
190
330
330
29 2.776
,29 + 2.776
5
5
df = n 1 = 5 1 = 4
(51.55,6.45)
3
180
165
4
195
175
5
240
180
We are 95% confident that the mean decrease in cholesterol after taking the new drug for six weeks
will be between 6.45 and 51.55.
8. An engineer is concerned about spring lifetimes (103 cycles) under two different levels of stress:
900 N/mm2 and 950 N/mm2. Below are the data.
950 N/mm2: 225, 171, 198, 189, 189, 135, 162, 135, 117, 162
900 N/mm2: 216, 162, 153, 216, 225, 216, 306, 225, 243, 189
Follow the five-step format to assess the strength of evidence that the difference in mean lifetimes
between 900 N/mm2 stress level and 950 N/mm2 stress level is not equal to zero.
H 0 : 900 950 = 0
H a : 900 950 0
x950 = 168.3
2
s950
= 1098.9
x900 = 215.1
2
s900
= 1844.1
x950 = 154.1
2
s 950
= 1315.2
n950 = 60
x900 = 168.8
2
s 900
= 1902.8
n900 = 40
Follow the five-step format to assess the strength of evidence that the difference in mean lifetimes
between 900 N/mm2 stress level and 950 N/mm2 stress level is not equal to zero.
H 0 : 900 950 = 0
H a : 900 950 0
z=
168.8 154.1 0
= 1.73
1902.8 1315.2
+
40
60
P[| Z |> 1.73] = 2 P[ Z < 1.73] = .0836
The p-value is greater than .05, so we will not reject the null hypothesis. Hence, there is not enough
evidence to conclude that there is a difference in mean lifetimes between 900 N/mm2 stress level and
950 N/mm2 stress level.