Business Solutions
Business Solutions
Business Solutions
Descriptive Statistics
Variable
Orders
N
28
Variable
Orders
Min
5.00
a.
2.
Q1
Q3
11.25 28.75
X = 21.32
b.
S = 13.37
c.
S2 = 178.76
d.
If the policy is successful, smaller orders will be eliminated and the mean will
increase.
e.
If the change causes all customers to consolidate a number of small orders into
large orders, the standard deviation will probably decrease. Otherwise, it is very
difficult to tell how the standard deviation will be affected.
f.
Descriptive Statistics
Variable
Prices
N
Mean Median StDev SE Mean
12 176654 180000 39440 11385
Variable
Min
Max
Q1
Q3
Prices 121450 253000 138325 205625
X = 176,654 and
3.
S = 39,440
a.
b.
X 1.96 S /
(5.85%, 15.67%)
c.
X =10.76,
X 2.045 S /
30 =10.76 4.91
(5.64%, 15.88%)
S =13.71
30 = 10.76 5.12
4.
d.
We see that the 95% confidence intervals in b and c are not much different because
the multipliers 1.96 and 2.045 are nearly the same magnitude.
This explains why a sample of size n = 30 is often taken as the cutoff between
large and small samples.
a.
Point estimate: X =
23.41 + 102.59
= 63
2
1 = .90 Z = 1.645,
X 1.645 S /
H0: = 12.1
H1: > 12.1
n = 63 1.645(20.2) = 63 33.23
(29.77, 96.23)
5.
X =63, S /
= .05
X = 13.5
n = 100
S = 1.7
13.5 12.1
Z = 1.7
= 8.235
100
Reject H0 since the computed Z (8.235) is greater than the critical Z (1.645). The mean has
increased.
6.
5.7
49
Forecast 8.1 empty seats per flight; very likely the mean number of empty seats will lie
between 6.5 and 9.7.
7.
X 5.9
S/
5.60 5.9
.87 /
= 2.67
60
Since |2.67| = 2.67 > 1.96, reject H 0 at the 5% level. The mean satisfaction rating is
different from 5.9.
p-value: P(Z < 2.67 or Z > 2.67) = 2 P(Z > 2.67) = 2(.0038) = .0076, very strong
evidence against H 0 .
8.
df = n 1 = 14 1 = 13,
H0 : = 4
H1 : > 4
X =4.31, S =.52
Test statistic: t =
X 4
S/ n
4.31 4
.52 / 14
= 2.23
Since 2.23 > 1.771, reject H 0 at the 5% level. The medium-size serving contains an
average of more than 4 ounces of yogurt.
p-value: P(t > 2.23) = .022, strong evidence against H 0
9.
H0: = 700
H1: 700
n = 50
S = 50
= .05
X = 715
715 700
Z = 50
= 2.12
50
Since the calculated Z is greater than the critical Z (2.12 > 1.96), reject the null hypothesis.
The forecast does not appear to be reasonable.
p-value: P(Z < 2.12 or Z > 2.12) = 2 P(Z > 2.12) = 2(.017) = .034, strong evidence
against H 0
10.
This problem can be used to illustrate how a random sample is selected with Minitab. In
order to generate 30 random numbers from a population of 200 click the following menus:
Calc>Random Data>Integer
The Integer Distribution dialog box shown in the figure below appears. The number of
random digits desired, 30, is entered in the Number of rows of data to generate space. C1
is entered for Store in column(s) and 1 and 200 are entered as the Minimum and Maximum
values. OK is clicked and the 30 random numbers appear in Column 1 of the worksheet.
The null hypothesis that the mean is still 2.9 is true since the actual mean of the
population of data is 2.91 with a standard deviation of 1.608; however, a few students may
reject the null hypothesis, committing a Type I error.
11.
a.
b.
c.
Y = 6058
X2 = 513
Y2 = 4,799,724
XY = 48,665
X = 59
r = .938
4
12.
a.
b.
c.
Y = 2312
Y2 = 515,878
X2 = 282.55 XY = 12,029.3
Y
Y
X = 53.7
r = .95
= 32.5 + 36.4X
= 32.5 + 36.4(5.2) = 222
13.
This is a good population for showing how random samples are taken. If three-digit
random numbers are generated from Minitab as demonstrated in Problem 10, the selected
items for the sample can be easily found. In this population, = 0.06 so most
students will get a sample correlation coefficient r close to 0. The least squares line will, in
most cases, have a slope coefficient close to 0, and students will not be able to reject the
null hypothesis H0: 1 = 0 (or, equivalently, = 0) if they carry out the hypothesis test.
14.
a.
15.
b.
c.
d.
X 2.33 S /
(43.4, 47.0)
Hypothesis test:
H 0 : = 44
H 1 : 44
Test statistic: Z =
X 44
S/
45.2 44
10.3 / 175
= 1.54
b.
c.
17.
H 0 : = 4 .3
H1 : 4.3
H 0 : = 1300
H1 : < 1300
5.99
= 1.10 1.88
39
( 2.98, .78)
= .94 (%) is not a realistic value for mean monthly return of clients
account since it falls outside the 95% confidence interval. Client may have a
case.
18.
a.
b.
c.
This case can be used to generate a discussion on this point as David chooses = .01 and ends up
"accepting" the null hypothesis that the mean lifetime is 5000 hours.
Alice's point is valid: the company may be put in a bad position if it insists on very dramatic
evidence before abandoning the notion that its components last 5000 hours. In fact, the indifference
(p-value) is about .0375; at any higher level the null hypothesis of 5000 hours is rejected.
CASE 2-2: MR. TUX
In this case, John Mosby tries some primitive ways of forecasting his monthly sales. The
things he tries make some sort of sense, at least for a first cut, given that he has had no formal
training in forecasting methods. Students should have no trouble finding flaws in his efforts, such
as:
1.
The mean value for each year, if projected into the future, is of little value since
month-to-month variability is missing.
2.
His free-hand method of fitting a regression line through his data can be improved
upon using the least squares method, a technique now found on inexpensive hand
calculators. The large standard deviation for his monthly data suggests considerable
month-to-month variability and, perhaps, a strong
seasonal effect, a factor not accounted for when the values for a year are averaged.
Both the hand-fit regression line and John's interest in dealing with the monthly seasonal
factor suggest techniques to be studied in later chapters. His efforts also point out the value of
learning about well-established formal forecasting methods rather than relying on intuition and very
simple methods in the absence of knowledge about forecasting. We hope students will begin to
appreciate the value of formal forecasting methods after learning about John's initial efforts.
CASE 2-3: ALOMEGA FOOD STORES
Julies initial look at her data using regression analysis is a good start. She found that the
r-squared value of 36% is not very high. Using more predictor variables, along with examining
their significance in the equation, seems like a good next step. The case suggests that other
techniques may prove even more valuable, techniques to be discussed in the chapters that follow.
Examining the residuals of her equation might prove useful. About how large are these
errors? Are forecast errors in this range acceptable to her? Do the residuals seem to remain in
the same range over time, or do they increase over time? Are a string of negative residuals
followed by a string of positive residuals or vice versa? These questions involve a deeper
understanding of forecasting using historical values and these matters will be discussed more fully
in later chapters.
CHAPTER 3
EXPLORING DATA PATTERNS AND
CHOOSING A FORECASTING TECHNIQUE
ANSWERS TO PROBLEMS AND CASES
8
1.
2.
A time series consists of data that are collected, recorded, or observed over successive
increments of time.
3.
The secular trend of a time series is the long-term component that represents the growth or
decline in the series over an extended period of time. The cyclical component is the wavelike fluctuation around the trend. The seasonal component is a pattern of change that
repeats itself year after year. The irregular component is that part of the time
series remaining after the other components have been removed.
4.
Autocorrelation is the correlation between a variable, lagged one or more period, and itself.
5.
The autocorrelation coefficient measures the correlation between a variable, lagged one or
more periods, and itself.
6.
The correlogram is a useful graphical tool for displaying the autocorrelations for various
lags of a time series. Typically, the time lags are shown on a horizontal scale and the
autocorrelation coefficients, the correlations between Yt and Yt-k, are displayed as vertical
bars at the appropriate time lags. The lengths and directions (from 0) of the bars indicate
the magnitude and sign of the of the autocorrelation coefficients. The lags at which
significant autocorrelations occur provide information about the nature of the time series.
7.
a.
b.
c.
d.
nonstationary series
stationary series
nonstationary series
stationary series
8.
a.
b.
c.
d.
e.
f.
stationary series
random series
trending or nonstationary series
seasonal series
stationary series
trending or nonstationary series
9.
Naive methods, simple averaging methods, moving averages, and Box-Jenkins methods.
Examples are: the number of breakdowns per week on an assembly line having a uniform
production rate; the unit sales of a product or service in the maturation stage of its life
cycle; and the number of sales resulting from a constant level of effort.
10.
11.
Classical decomposition, census II, Winters exponential smoothing, time series multiple
regression, and Box-Jenkins methods. Examples are: electrical consumption,
summer/winter activities (sports like skiing), clothing, and agricultural growing seasons,
retail sales influenced by holidays, three-day weekends, and school calendars.
12.
13.
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
2,413
2,407
2,403
2,396
2,403
2,443
2,371
2,362
2,334
2,362
2,336
2,344
2,384
2,244
-6
-4
-7
7
40
-72
-9
-28
28
-26
8
40
-140
1999
2000
2001
2002
2003
2004
2358 114
2329 -29
2345 16
2254 -91
2245
-9
2279 34
0 1.96 ( 1
15.
a.
b.
c.
MPE
MAPE
MSE or RMSE
16.
17.
a.
r1 = .895
H0: 1 = 0
H1: 1 0
Reject if t < -2.069 or t > 2.069
k 1
SE( r k ) =
1 + 2 ri 2
i =1
11
1 + 2 ( r1 )
i =1
24
10
1
= .204
24
t=
r1 1
.895 0
=
= 4.39
SE(rk)
.204
Since the computed t (4.39) is greater than the critical t (2.069), reject the null.
r2 = .788
H0: 2 = 0H1: 2 0
Reject if t < -2.069 or t > 2.069
k 1
SE( r k ) =
1 + 2 ri 2
i =1
2 1
1 + 2 ( .895)
i =1
t=
24
2.6
= .33
24
r1 1
.7880
= 2.39
=
SE(r1)
.33
Since the computed t (4.39) is greater than the critical t (2.069), reject the null.
b.
11
18.
a.
r1 = .376
b.
12
19.
13
20.
The data have a quarterly seasonal pattern as shown by the significant autocorrelation
at time lag 4. First quarter earnings tend to be high, third quarter earnings tend to be low.
14
Yt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
.40
.29
.24
.32
.47
.34
.30
.39
.63
.43
.38
.49
.76
.51
.42
.61
.86
.51
.47
.63
.94
.56
.50
.65
.95
.42
.57
.60
.93
.38
.37
.57
.40
.29
.24
.32
.47
.34
.30
.39
.63
.43
.38
.49
.76
.51
.42
.61
.86
.51
.47
.63
.94
.56
.50
.65
.95
.42
.57
.60
.93
.38
.37
et
-.11
-.05
.08
.15
-.13
-.04
.09
.24
-.20
-.05
.11
.27
-.25
-.09
.19
.25
-.35
-.04
.16
.31
-.38
-.06
.15
.30
-.53
.15
.03
.33
-.55
-.01
.20
et
.11
.05
.08
.15
.13
.04
.09
.24
.20
.05
.11
.27
.25
.09
.19
.25
.35
.04
.16
.31
.38
.06
.15
.30
.53
.15
.03
.33
.55
.01
.20
et2
.0121
.0025
.0064
.0225
.0169
.0016
.0081
.0576
.0400
.0025
.0121
.0729
.0625
.0081
.0361
.0625
.1225
.0016
.0256
.0961
.1444
.0036
.0225
.0900
.2809
.0225
.0009
.1089
.3025
.0001
.0400
et
Yt
et
Yt
.3793 -.3793
.2083 -.2083
.2500 .2500
.3191 .3191
.3824 -.3824
.1333 -.1333
.2308 .2308
.3810 .3810
.4651 -.4651
.1316 -.1316
.2245 .2245
.3553 .3553
.4902 -.4902
.2143 -.2143
.3115 .3115
.2907 .2907
.6863 -.6863
.0851 -.0851
.2540 .2540
.3298 .3298
.6786 -.6786
.1200 -.1200
.2308 .2308
.3158 .3158
1.2619 -1.2619
.2632 .2632
.0500 .0500
.3548 .3548
1.4474 -1.4474
.0270 -.0270
.3509 .3509
b.
MAD =
c.
MSE =
d.
5.85
= .189
31
1.6865
= .0544 , RMSE = .0544 = .2332
31
11.2227
MAPE =
= .3620 or 36.2%
31
15
21.
2.1988
= -.0709
31
e.
MPE =
a.
b.
The sales time series appears to vary about a fixed level so it is stationary.
c.
The sample autocorrelations die out rapidly. This behavior is consistent with a
stationary series. Note that the sales data are not random. Sales in adjacent
weeks tend to be positively correlated.
22.
a.
16
b.
Since, in this case, the residuals differ from the original observations by the
constant Y = 2460.05 , the residual autocorrelations will be the same as the
autocorrelations for the sales numbers. There is significant residual
autocorrelation at lag 1 and the autocorrelations die out in an exponential fashion.
The random model is not adequate for these data.
23.
17
The autocorrelations are consistent with choice in part b. The autocorrelations fail
to die out rapidly consistent with nonstationary behavior. In addition, there are
relatively large autocorrelations at lags 4 and 8, indicating a quarterly seasonal
pattern.
24.
a. 98/99Inc
70.01
133.39
129.64
100.38
95.85
157.76
126.98
93.80
98/99For
50.87
93.83
92.51
80.55
70.01
133.39
129.64
100.38
98/99Err
19.14
39.56
37.13
19.83
25.84
24.37
-2.66
-6.58
98/99AbsErr
19.14
39.56
37.13
19.83
25.84
24.37
2.66
6.58
Sum
175.11
98/99Err^2
366.34
1564.99
1378.64
393.23
667.71
593.90
7.08
43.30
5015.17
98/99AbE/Inc
0.273390
0.296574
0.286409
0.197549
0.269588
0.154475
0.020948
0.070149
1.5691
b.
c.
1.
The retail sales series has a trend and a monthly seasonal pattern.
2.
Yes! Julie has determined that her data have a trend and should be first differenced. She has
also found out that the first differenced data are seasonal.
3.
4.
She will know which technique works best by comparing error measurements such as MAD,
MSE or RMSE, MAPE, and MPE.
The retail sales series has a trend and a monthly seasonal pattern.
2.
The patterns appear to be somewhat similar. More actual data is needed in order to reach a
definitive conclusion.
3.
This question should create a lively discussion. There are good reasons to use either set of
data. The retail sales series should probably be used until more actual sales data is available.
This case affords students an opportunity to learn about the use of autocorrelation functions,
and to continue following John Mosby's quest to find a good forecasting method for his data.
With the use of Minitab, the concept of first differencing data is also illustrated. The
summary should conclude that the sales data have both a trend and a seasonal component.
2.
The trend is upward. Since there are significant autocorrelation coefficients at time lags 12
and 24, the data have a monthly seasonal pattern.
3.
There is a 49% random component. That is, about half the variability in Johns monthly
sales is not accounted for by trend and seasonal factors. John, and the students analyzing
these results, should realize that finding an accurate method of forecasting these data could
be very difficult.
4.
Yes, the first differences have a seasonal component. Given the autocorrelations at lags 12
and 24, the monthly changes are related 12, 24, months apart. This information should be
used in developing a forecasting model for changes in monthly sales.
First, Dorothy used Minitab to compute the autocorrelation function for the number of new
20
12
Lag
Corr
LBQ
Lag
0.49
4.83
24.08
0.43
3.50
42.86
0.35
2.56
55.51
10
0.18
0.33
2.30
67.18
11
0.23
0.28
1.85
75.60
12
0.36
0.24
1.50
81.61
13
0.24
1.49
87.87
14
22
Corr
LBQ
Lag
Corr
8 0.23
1.40
93.71
15
9 0.17
1.01
96.90
1.09 100.72
LBQ
Lag
Corr
LBQ
0.12
0.64 136.27
22
0.09
0.46 153.39
16
0.14
0.75 138.70
23
0.16
0.83 156.84
17
0.22
1.14 144.37
24
0.25
1.26 165.14
1.35 106.87
18
0.06
0.33 144.86
2.05 121.68
19
0.11
0.58 146.40
0.23
1.25 127.70
20
0.13
0.69 148.66
0.24
1.30 134.55
21
0.17
0.87 152.33
Since the autocorrelations failed to die out rapidly, Dorothy concluded her series was
trending or nonstationary. She then decided to difference her time series.
21
Lag
Corr
12
LBQ
1 -0.42 -4.11
17.43
0.05
Lag
Corr
LBQ
8 0.03
0.26
9 -0.06 -0.52
Lag
22
Corr
LBQ
18.49
15 -0.12 -0.93
Lag
Corr
LBQ
29.32
22 -0.12 -0.92
41.93
0.41
17.66
18.91
16 -0.04 -0.32
29.52
23 -0.03 -0.26
42.09
3 -0.04 -0.33
17.82
10 0.00
0.02
18.91
17
1.67
34.85
24
47.00
0.01
0.10
17.83
11 -0.08 -0.69
19.67
18 -0.18 -1.41
38.93
0.02
0.17
17.87
12 0.20
1.65
24.07
19
0.09
38.95
6 -0.07 -0.57
18.34
13 -0.14 -1.11
26.20
20 -0.02 -0.11
38.98
18.39
14 0.11
27.72
21
40.02
0.02
0.17
0.92
0.21
0.01
0.09
0.69
0.19 1.44
2.
The differences appear to be stationary and are correlated in consecutive time periods. Given
the somewhat large autocorrelations at lags 12 and 24, a monthly seasonal pattern should be
considered.
3.
Dorothy would recommend that various seasonal techniques such as Winters method of
exponential smoothing (Chapter 4), classical decomposition (Chapter 5), time series
multiple regression (Chapter 8) and Box-Jenkins methods (ARIMA models in Chapter 9) be
considered.
22
Autocorrelations suggest an up and down pattern that is very regular. If one month is
relatively high, next month tends to be relatively low and so forth. Very regular
pattern is suggested by persistence of autocorrelations at relatively large lags.
The changing of the sign of the autocorrelations from one lag to the next is consistent with
an up and down pattern in the time series. If high sales tend to be followed by low sales or
low sales by high sales, autocorrelations at odd lags will be negative and autocorrelations at
even lags positive.
The relatively large autocorrelation at lag 12, 0.53, suggests there may also be a seasonal
pattern. This issue is explored in Case 5-6.
CASE 3-5: SURTIDO COOKIES
1.
A time series plot and the autocorrelation function for Surtido Cookies sales follow.
23
The graphical evidence above suggests Surtido Cookies sales vary about a fixed level with
a strong monthly seasonal component. Sales are typically high near the end of the year and
low during the beginning of the year.
2.
03Sales
1072617
NaiveFor
681117
510005
579541
771350
590556
549689
497059
652449
636358
Err
391500
AbsErr
391500
AbsE/03Sales
0.364995
16.3%
-39684
82482
118901
-45802
Sum
39684
82482
118901
45802
678369
0.077811
0.142323
0.154147
0.077557
0.816833
MAD appears large because of the big numbers for sales. MAPE is fairly large but
perhaps tolerable. In any event, Jame is convinced he can do better.
24
CHAPTER 4
MOVING AVERAGES AND SMOOTHING METHODS
ANSWERS TO PROBLEMS AND CASES
1.
Exponential smoothing
2.
Naive
3.
Moving average
4.
5.
6.
a.
t
Yt
Y
t
1
2
3
4
5
6
7
8
9
10
11
12
19.39
18.96
18.20
17.89
18.43
19.98
19.51
20.63
19.78
21.25
21.18
22.14
19.00
19.39
18.96
18.20
17.89
18.43
19.98
19.51
20.63
19.78
21.25
21.18
et
et
8.990
8.92
= .74
12
b. MAD =
8.99
= .75
12
d. MAPE =
et
Yt
.39
.39 .1521 .020
- .43
.43 .1849 .023
- .76
.76 .5776 .042
- .31
.31 .0961 .017
.54
.54 .2916 .029
1.55 1.55 2.4025 .078
- .47
.47 .2209 .024
1.12 1.12 1.2544 .054
- .85
.85 .7225 .043
1.47 1.47 2.1609 .069
- .07
.07 .0049 .003
.96
.96 .9216 .043
8.92
c. MSE =
et2
.445
= .0371
12
25
.445
et
Yt
.020
-.023
-.042
-.017
.029
.078
-.024
.054
-.043
.069
-.003
.043
.141
e. MPE =
.141
= .0118
12
f. 22.14
7.
Price
19.39
18.96
18.20
17.89
18.43
19.98
19.51
20.63
19.78
21.25
21.18
22.14
AVER1
*
*
18.8500
18.3500
18.1733
18.7667
19.3067
20.0400
19.9733
20.5533
20.7367
21.5233
FITS1
*
*
*
18.8500
18.3500
18.1733
18.7667
19.3067
20.0400
19.9733
20.5533
20.7367
RESI1
*
*
*
-0.96000
0.08000
1.80667
0.74333
1.32333
-0.26000
1.27667
0.62667
1.40333
Accuracy Measures
MAPE: 4.6319
MAD: 0.9422
MSE: 1.1728
Avg
*
*
*
*
212
216
219
221.2
Fits
*
*
*
*
*
212
216
219
221.2
Accuracy Measures
MAPE: 3.5779
Res
*
*
*
*
*
8
9
7
MAD: 8.0000
MSE: 64.6667
26
b. & c.
Smoothed
200.000
204.000
208.400
211.440
214.464
216.678
220.007
222.404
Accuracy Measures
MAPE: 3.2144
Forecast
200.000
200.000
204.000
208.400
211.440
214.646
216.678
220.007
222.404
MAD: 7.0013
MSE: 58.9657
27
9.
a. & c, d, e, f
Month
1
2
3
4
5
6
7
8
9
10
11
12
Yield
MA Forecast
9.29
*
*
9.99
*
*
10.16
9.813
*
10.25 10.133
9.813
10.61 10.340 10.133
11.07 10.643 10.340
11.52 11.067 10.643
11.09 11.227 11.067
10.80 11.137 11.227
10.50 10.797 11.137
10.86 10.720 10.797
9.97 10.443 10.720
Accuracy Measures
MAPE: 4.5875
Error
*
*
*
0.437
0.477
0.730
0.877
0.023
-0.427
-0.637
0.063
-0.750
MAD: 0.4911
MSE: 0.3193
28
MPE: .6904
b. & c, d, e, f
Month Yield
1
9.29
2
9.99
3 10.16
4 10.25
5 10.61
6 11.07
7 11.52
8 11.09
9 10.80
10 10.50
11 10.86
12
9.97
MA
*
*
*
*
10.060
10.416
10.722
10.908
11.018
10.996
10.954
10.644
Forecast
*
*
*
*
*
10.060
10.416
10.722
10.908
11.018
10.996
10.954
Error
*
*
*
*
*
1.010
1.104
0.368
-0.108
-0.518
-0.136
-0.984
Accuracy Measures
MAPE: 5.5830
MAD: 0.6040
Forecast for month 13 (Jan.) is 10.644
29
MSE: 0.5202
MPE: .7100
g.
10.
MAD: 0.6300
MSE: 0.5568
MPE: 5.0588
11.
No! The accuracy measures favor the three-month moving average procedure, but the
values of the forecasts are not much different.
See plot below.
30
205
251
304
284
352
300
241
284
312
289
385
256
205.000
228.000
266.000
275.000
313.500
306.750
273.875
278.938
295.469
292.234
338.617
297.309
Accuracy Measures
MAPE: 14.67
205.000
205.000
228.000
266.000
275.000
313.500
306.750
273.875
278.938
295.469
292.234
338.617
MAD:
Error
0.0000
46.0000
76.0000
18.0000
77.0000
-13.5000
-65.7500
10.1250
33.0625
-6.4688
92.7656
-82.6172
43.44
MSE: 2943.24
12.
MAPE: 8.425
MAD: 1.894
MSE: 5.462
(Actual: 26.47)
Based on the error measures and the forecast for Q2 of 1996, the nave method
and simple exponential smoothing are comparable. Either method could be used.
13.
a.
= .4
Accuracy Measures
MAPE: 14.05
MAD: 24.02
MSE: 1174.50
= .6
Accuracy Measures
MAPE: 14.68
MAD: 24.56
MSE: 1080.21
d.
Looking at the error measures, there is not much difference between the two
choices of smoothing constant. The error measures for = .4 are slightly better.
The forecasts for the two choices of smoothing constant are also not much
different.
The residual autocorrelations for = .4 are shown below. The residual
autocorrelations for = .6 are similar. There are significant residual
32
14.
None of the techniques do much better than the nave method. Simple exponential
Smoothing with close to 1, say .95, is essentially the nave method.
Accuracy Measures for Nave Method
MAPE: 42.57
MAD: 1.685
MSD: 4.935
Using the nave method, the forecast for 2000 would be 6.85.
15.
A time series plot of quarterly Revenues and the autocorrelation function show
that the data are seasonal with a trend. After some experimentation, Winters
multiplicative smoothing with smoothing constants (level) = 0.8, (trend) = 0.1
and (seasonal) = 0.1 is used to forecast future Revenues. See plot below.
Accuracy Measures
MAPE
3.8
MAD
69.1
MSE 11146.4
Forecasts
Quarter Forecast
71
2444.63
Lower Upper
2275.34 2613.92
33
72
73
74
75
76
1987.98
2237.98
1887.74
2456.18
1997.36
1773.84 2202.12
1969.23 2506.72
1559.46 2216.01
2065.70 2846.65
1543.10 2451.62
16.
a.
34
The data appear to be seasonal with relatively large sales in August, September,
October and November, and relatively small sales in July and December.
b. & c. The Excel spreadsheet for calculating MAPE for the nave forecasts and
the simple exponential smoothing forecasts is shown below.
is calculated with a divisor of 23 (since the first smoothed value is set equal
to the first observation). Using a divisor of 24 gives MAPE2 = 7.69%, the
value reported by Minitab.
d.
e.
g.
36
17.
a.
The four-week moving average seems to represent the data a little better.
Compare the error measures for the four-week moving average in the figure below
with the five-week moving average results in Figure 4-4.
b.
37
18.
a.
As the order of the moving average increases, the smoothed data become more
wavelike. Looking at the results for orders k =10 and k = 15, and counting the
number of years from one peak to the next, it appears as if the number of severe
earthquakes is on about a 30 year cycle.
b.
38
19.
c.
a.
39
b.
40
The forecasts seem reasonable but the residual autocorrelation function below has
a significant spike at lag 1. So although Winters procedure captures the trend and
seasonality, there is still some association in consecutive observations not
accounted for by Winters method.
20.
This time series is trending upward and has a seasonal pattern with third and fourth
quarter Gap sales relatively large. Moreover the variability in this series is increasing
with the level suggesting a multiplicative Winters smoothing procedure or a
transformation of the data (say logarithms of sales) to stabilize the variability.
The results of Winters multiplicative smoothing with smoothing constants
= = =.2 are shown in the plot below.
42
Quarter
101
102
103
104
Forecast
3644.18
3775.78
4269.27
5267.82
Lower
3423.79
3551.94
4041.58
5035.90
Upper
3864.57
3999.62
4496.96
5499.74
148
= 17 + 12 = 29
12
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
29
26
32
35
42
50
56
53
45
35
38
29
44
The nave forecasts are not unreasonable but the Winters forecasts seem to have captured
the seasonal pattern a little better, particularly for the first 3 months of the year. Notice that if the
trend and seasonal pattern are strong, Winters smoothing procedure can work well even with only
two years of monthly data.
CASE 4-2: MR TUX
This case shows how several exponential smoothing methods can be applied to the Mr. Tux
data. John Mosby tries simple exponential smoothing and exponential smoothing with adjustments
for trend and seasonal factors, along with a three-month moving average.
Students can begin to see that several forecasting methods are typically tried when an
important variable must be forecast. Some method of comparing them must be used, such as the
three accuracy methods discussed in this case. Students should be asked their opinions of John's
progress in his forecasting efforts given these accuracy values. It should be apparent to most that
the degree of accuracy achieved is not sufficient and that further study is needed. Students should
be reminded that they are looking at actual data, and that the problems faced by John Mosby really
occurred.
1.
Of the methods attempted, Winters multiplicative smoothing was the best method John
found. Each forecast was typically off by about 25,825. The error in each forecast was
about 22% of the value of the variable being forecast.
2.
There are other choices for the smoothing constants that lead to smaller error measures.
For example, with = = = .1, MAD = 22,634 and MAPE = 20.
3.
John should examine plots of the residuals and the residual autocorrelations. If Winters
procedure is adequate, the residuals should appear to be random. In addition, John can
examine the forecasts for the next 12 months to see if they appear to be reasonable.
45
4.
The ideal value for MPE is 0. If MPE is negative, then, on average, the predicted values
are too high (larger than the actual values).
Students should realize immediately that simply using the basic naive approach of
using last period to predict this period will not allow for forecasts for the rest of
1993. Since the autocorrelation coefficients presented in Case 3-3 indicate
some seasonality, a naive model using April 1992 to predict April 1993, May 1992 to
predict May 1993 and so forth might be tried. This approach produces the error
measures
MAD = 23.39
MSE = 861.34
MAPE = 18.95
over the data region, and are not particularly attractive given the magnitudes of the new
client numbers.
2.
A moving average model of any order cannot be defended since any moving average
will produce flat line forecasts for the rest of 1993. That is, the forecasts will lie along a
horizontal line whose level is the last value for the moving average. The seasonal pattern
will be ignored.
3.
4.
5.
Using Winters procedure in 4, the forecasts for the remainder of 1993 are:
Month
Apr/1993
May/1993
Jun/1993
Jul/1993
Aug/1993
Sep/1993
Oct/1993
Nov/1993
Dec/1993
Forecast
148
141
148
141
143
136
159
146
126
46
6.
2.
adequate. Also, a nave model that combined seasonal and trend estimates (similar to
Equation 4.5) was found to be adequate. The trend and seasonal pattern in actual
Murphy Brothers sales are consistent and pronounced so a nave model is likely to
work well.
3.
Based on the forecasting methods tested, actual Murphy Brothers sales data should be
used. A plot of the results for the best Winters procedure follows.
An examination of the autocorrelation coefficients for the residuals from this Winters
model shown below indicates that none of them are significantly different from zero.
However, Julie decided to use the nave model because it was very simple and she could
explain it to her father.
48
The time series plot for Orders shows a slight upward trend and a seasonal pattern
with peaks in December. Because of the relatively small data set, the autocorrelations
are only computed for a limited number of lags, 6 in this case. Consequently with
monthly data, the seasonality does not show up in the autocorrelation function. There
is significant positive autocorrelation at lag 1, so Orders in consecutive months are
correlated.
The time series plot for CPO shows a downward trend but a seasonal component is
not readily apparent. There is significant positive autocorrelation at lag 1 and the
autocorrelations die out relatively slowly. The CPO series is nonstationary and
observations in consecutive time periods are correlated.
2.
Forecast
Lower
Upper
3524720 3072265 3977174
3885780 3431589 4339972
3656581 3200544 4112618
4141277 3683287 4599266
49
3.
Simple exponential smoothing with = .77 (the optimal in Minitab) represents the
the CPO data well but, like any averaging procedure, produces flat-line forecasts.
Forecasts of CPO for the next 4 months are:
Month Forecast Lower Upper
Jul/2003 0.1045 0.0787 0.1303
Aug/2003 0.1045 0.0787 0.1303
Sep/2003 0.1045 0.0787 0.1303
Oct/2003 0.1045 0.0787 0.1303
The results for simple exponential smoothing are pictured below. There are no
significant residual autocorrelations (see plot below).
50
4.
Month
Jul/2003
Forecast
368333
51
Aug/2003
Sep/2003
Oct/2003
406064
382113
432763
5.
6.
It may or may not be better to focus on the number of units and contacts per unit
to get a forecast of contacts. It depends on the nature of the data (ease of modeling)
and the amount of relevant data available.
2.
Lower
1249.9
1252.6
1189.3
1171.2
1242.3
1192.4
Upper
1681.7
1728.4
1718.1
1759.6
1895.1
1913.0
52
3.
4.
The forecasts from Winters smoothing show an upward trend. If they are
to be believed, perhaps additional medical staff are required to handle the
expected increased demand. At this point however, further study is required.
1.
Jame learned that Surtido Cookie sales have a strong seasonal pattern
(sales are relatively high during the last two months of the year, low during
the spring) with very little, if any, trend (see Case 3-5).
2.
The autocorrelation function for sales (see Case 3-5) is consistent with
the time series plot. The autocorrelations die out (consistent with no
trend) and have a spike at the seasonal lag 12 (consistent with a seasonal
component).
3.
Forecast
Lower
Upper
653254
91351 1215157
712159 141453 1282865
655889
75368 1236411
1532946 941647 2124245
1710520 1107533 2313507
2133888 1518354 2749421
1903589 1274702 2532476
54
4.
Forecast
618914
685615
622795
1447864
1630271
2038257
1817989
These forecasts have the same pattern as the forecasts generated by Winters
method but are uniformly lower. Winters forecasts seem more consistent
with recent history.
CHAPTER 5
55
The purpose of decomposing a time series variable is to observe its various elements
in isolation. By doing so, insights into the causes of the variability of the series are
frequently gained. A second important reason for isolating time series components
is to facilitate the forecasting process.
2.
The multiplicative components model works best when the variability of the time
series increases with the level. That is, the values of the series spread out as the
trend increases, and the set of observations have the appearance of a megaphone
or funnel.
3.
The basic forces that affect and help explain the trend-cycle of a series are
population growth, price inflation, technological change, and productivity increases.
4.
a.
Exponential
b.
c.
Linear
5.
Weather and the calendar year such as holidays affect the seasonal component.
6.
a. & b.
c.
23.89 billion
d.
648.5 billion
56
7.
e.
f.
Inflation, population growth, and new technology affect the trend of capital
spending.
a. & b.
c.
d.
8.
9.
10.
11.
= TS = 850(1.12) = $952
= TS = 900(.827) = $744.3
12.
Month
Jan
Feb
Mar
Sales
($ Thousands)
125
113
189
Seasonal
Index (%)
Deseasonalized
Data
51
50
87
245
226
217
57
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
201
206
241
230
245
271
291
320
419
93
95
99
96
89
103
120
131
189
216
217
243
240
275
263
243
244
222
The statement is not true. When the data are deseasonalized, it shows that business
is about the same.
13.
a. & b. Would use both the trend and seasonal indices to forecast although seasonal
component is not strong in this example (see plot and seasonal indices below).
Seasonal Indices
Period
1
2
3
4
Index
0.969
1.026
1.000
1.005
58
Forecasts
Period
Q3/1996
Q4/1996
c.
Forecast
3305.39
3343.02
59
14.
a.
Multiplicative Model
Data
Cavanaugh Sales
Length 77
NMissing 0
Fitted Trend Equation
Yt = 72.6 + 6.01*t
Seasonal Indices
Period Index
1
1.278
2
0.907
3
0.616
4
0.482
5
0.426
6
0.467
7
0.653
8
0.863
9
1.365
10
1.790
11
1.865
12
1.288
60
b.
c.
15.
a.
Additive Model
Data
LnSales
Length 77
NMissing 0
Fitted Trend Equation
Yt = 4.6462 + 0.0215*t
Seasonal Indices
61
Period
1
2
3
4
5
6
7
8
9
10
11
12
b.
c. & d.
e.
Index
0.335
-0.018
-0.402
-0.637
-0.714
-0.571
-0.273
-0.001
0.470
0.723
0.747
0.342
a.
Multiplicative Model
Data
Disney Sales
Length 63
NMissing 0
Fitted Trend Equation
Yt = -302.9 + 44.9*t
Seasonal Indices
Period
1
2
3
4
b.
Index
0.957
1.022
1.046
0.975
There is a significant trend but it is not a linear trend. First quarter sales
tend to be relatively low and third quarter sales tend to be relatively high.
However, the plot in part a indicates a multiplicative decomposition with a
linear trend is not an adequate representation of Disney sales. Perhaps
better to do a multiplicative decomposition with a quadratic trend. Even
better, in this case, is to do an additive decomposition with the logarithms
of Disney sales.
63
c.
With the right decomposition, would use both the trend and seasonal
components to generate forecasts.
d.
Forecasts
Quarter Forecast
Q4/1995
2506
Q1/1996
2502
Q2/1996
2719
Q3/1996
2830
Q4/1996
2681
However, the plot in part a indicates that forecasts generated from a
multiplicative decomposition with a linear trend are likely to be too low.
17.
a.
c.
Index
0.947
0.950
0.961
Period
5
6
7
Index
1.004
1.007
1.022
64
Period
9
10
11
Index
1.045
0.982
0.995
0.998
1.070
12
1.019
18.
Forecast
171.2
174.9
180.5
Multiplicative Model
Data
U.S. Retail Sales
Length 84
NMissing 0
Fitted Trend Equation
Yt = 128.814 + 0.677*t
Seasonal Indices
Period
1
2
3
Index
0.880
0.859
0.991
65
4
5
6
7
8
9
10
11
12
0.986
1.031
1.021
1.007
1.035
0.973
0.991
1.015
1.210
Forecast
164.0
160.7
186.1
185.8
194.9
193.6
191.8
197.7
186.6
190.6
196.1
234.5
Actual
167.0
164.0
192.1
187.5
201.4
202.6
194.9
204.2
192.8
194.0
202.4
238.0
Forecasts maintain the seasonal pattern but are uniformly below the actual
retail sales for 1995. However, MPE = MAPE = 2.49% is relatively small.
66
19.
a.
Jan =
600
= 500
1.2
T
T
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
c.
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
= 500(1.20) = 600
= (140 + 5(73))(1.37) = 692
= (140 + 5(74))(1.00) = 510
= (140 + 5(75))(0.33) = 170
= (140 + 5(76))(0.47) = 244
= (140 + 5(77))(1.25) = 656
= (140 + 5(78))(1.53) = 811
= (140 + 5(79))(1.51) = 808
= (140 + 5(80))(0.95) = 513
= (140 + 5(81))(0.60) = 327
= (140 + 5(82))(0.82) = 451
= (140 + 5(83))(0.97) = 538
22.
Deflating a time series removes the effects of dollar inflation and permits the analyst
to examine the series in constant dollars.
23.
1289.73(2.847) = 3,671.86
24.
Jan
303,589
67
Feb
Mar
Apr
May
Jun
Jul
25.
251,254
303,556
317,872
329,551
261,362
336,417
Multiplicative Model
Data
Employed Men
Length 130
NMissing 0
Fitted Trend Equation
Yt = 65355 + 72.7*t
Seasonal Indices
Month Index
1 0.981
2 0.985
3 0.990
4 0.995
5 1.002
6 1.014
Month
7
8
9
10
11
12
Index
1.019
1.014
1.002
1.004
0.999
0.995
68
Forecasts
Month
Nov/2003
Dec/2003
Jan/2004
Feb/2004
Mar/2004
Forecast
74791.4
74581.7
73607.8
73954.0
74393.4
69
Apr/2004
May/2004
Jun/2004
Jul/2004
Aug/2004
Sep/2004
Oct/2004
74887.2
75454.0
76419.5
76894.1
76564.4
75757.2
76005.6
A multiplicative decomposition with a default linear trend is not quite right for these
data. There is some curvature in the time series as the plot of the seasonally adjusted
data indicates. Not surprisingly, there is a strong seasonal component with
employment relatively high in the summer and relatively low in the winter. In spite
of the not quite linear trend, the forecasts seem reasonable.
26.
A linear trend is not appropriate for the employed men data. The plot below shows
a quadratic trend fit to the data of Table P-25.
Although better than a linear trend, the quadratic trend is not quite right. Employment
for the years 20002003 seems to have leveled off. No simple trend curve is
likely to provide an excellent fit to these data. The residual autocorrelation
function below indicates a prominent seasonal component since there are large
autocorrelations at the seasonal lag S = 12 and its multiples.
70
27.
Multiplicative Model
Data
Wal-Mart Sales
Length 56
NMissing 0
Fitted Trend Equation
Yt = 1157 + 1088*t
Seasonal Indices
Quarter
Q1
Q2
Q3
Q4
Index
0.923
0.986
0.958
1.133
71
72
Actuals
65443
70466
69261
82819
Slight upward curvature in the Wal-Mart sales data so a linear trend is not quite
right. Not surprisingly, there is a strong seasonal component with 4th quarter
sales relatively high and 1st quarter sales relatively low. The forecasts for 2004
are uniformly below the actuals (primarily the result of the linear trend assumption)
although the seasonal pattern is maintained. Here MPE = MAPE = 9.92%.
Multiplicative decomposition better than additive decomposition but any
decomposition that assumes a linear trend will not forecast sales for 2004 well.
28.
A linear trend fit to the Wal-Mart sales data of Table P-27 is shown below. A
linear trend misses the upward curvature in the data.
73
A quadratic trend provides a better fit to the Wal-Mart sales data (see plot
below). The autocorrelation function for the residuals from a quadratic
trend fit suggests a prominent seasonal component since there are large
autocorrelations at the seasonal lag S = 4 and its multiples.
74
75
2.
3.
SEASONAL
ADJUSTMENT
MONTH
FACTORS
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
0.693
0.707
0.935
1.142
1.526
1.940
1.479
0.998
0.757
0.373
0.291
1.290
17.32 25.97
18.41 27.23
25.34 30.01
32.13 46.38
44.52 63.57
58.61 82.82
46.23 64.69
32.23 44.68
25.22 34.67
12.83 17.49
10.32 13.95
47.06 63.17
76
4.
77
5.
Trend*Seasonality (T*S):
Linear Trend Model:
MAD = 1.52
MAD = 9.87
6.
If you had to limit your choices to the models in 2 and 4, the linear trend model is
78
better (judged by MAD and MSE) than any of the Holt smoothing procedures.
However, the Trend*Seasonality (T*S) model is best. This procedure is the only
one that takes account of the trend and seasonality in Small Engine Doctor sales.
CASE 5-2: MR. TUX
At last, John is able to deal directly with the strong seasonal effect in his monthly data.
Students find it interesting that in addition to using these to forecast, John's banker wants them to
justify variable loan payments.
To forecast using decomposition, students see that both the C and I components must be
estimated. We like to emphasize that studying the C column in the computer printout is helpful,
but that other study is needed to estimate the course of the economy over the next several months.
The computer is not able to make such forecasts with accuracy, as anyone who follows economic
news well knows.
Thinking about Johns efforts to balance his seasonal business to achieve a more uniform
sales picture can generate a good class discussion. This is usually the goal of any business;
examples such as boats/skis or bikes/skis illustrate this effort in many seasonal businesses. In fact,
John Mosby put a great deal of effort into expanding his Seattle business in order to balance his
seasonal effect. Along with his shirt making business, he has achieved a rather uniform monthly
sales volume.
1.
The two sentences might look something like this: A computer analysis of John
Mosby's monthly sales data clearly shows the strong variation by month. I think we
are justified in letting him make variable monthly loan payments based on the seasonal
indices shown in the computer printout.
2.
Since John expects to do twice as much business in Seattle as Spokane, the Seattle
indices he should try to achieve will be only half as far from 100 as the Spokane
indices, and on the opposite side of 100:
Spokane Seattle
Jan
31.4
134.3
Feb
47.2
126.4
Mar
88.8
105.6
Apr
177.9
61.1
May
191.8
54.1
Jun
118.6
90.7
Jul
102.9
98.6
Aug
128.7
85.7
Sep
93.8
103.1
Oct
81.5
109.3
Nov
60.4
119.8
Dec
77.1
111.5
3.
Using the sales figures for January and February of 2005, to get average (100%) sales
dollars, divide the actual sales by the corresponding seasonal index:
79
80
The number of new clients tends to be relatively large during the first three months
of the year.
Forecasts
Month
Apr/2003
May/2003
Jun/2003
Jul/2003
Aug/2003
Sep/2003
Oct/2003
Nov/2003
Dec/2003
Forecast
153.207
145.121
158.062
142.440
148.560
137.749
166.161
137.261
124.277
81
There is one, possibly two, large positive residuals (irregularities) at the beginning of the
series but there are no significant residual autocorrelations.
Jun/2002
Jul/2002
Aug/2002
Sep/2002
2.
8314.2
8351.5
8388.8
8426.1
Forecast
7453.2
7462.5
8058.7
7873.1
8223.5
8140.9
8308.8
8611.1
8368.2
Actual
7120
7124
7817
7538
7921
7757
7816
8208
7828
Holts linear smoothing was adequate for the seasonally adjusted data, but the
forecasts above are uniformly above the actual values for the first nine months of
2002.
3.
Using the same procedure as in 2, the forecast for October, 2002 is 8609.2.
4.
The pattern for the three sets of data shows a trend and monthly seasonality.
Seasonal Indices
Month
1
2
3
4
5
6
7
8
9
11
12
Index
0.937
0.922
0.972
0.963
0.925
1.016
1.063
1.094
1.094
1.025
0.936
MAD
84
814
MSD
1276220
2.
Decomposition analysis works pretty well for AAA Washington data. There is a
slight downward trend in emergency road service call volume with a pronounced
seasonal component. Volume tends to be relatively high in the summer and
early fall. There is significant residual autocorrelation at lag 1 (see plot below) so
not all the association in the data has been accounted for by the decomposition.
85
Finally, a 12-month forecast is generated using both the trend line and the seasonal
indices. The forecasts seem reasonable.
Month
Jan/2007
Feb/2007
Mar/2007
Apr/2007
Forecast
785348
326276
585307
391827
86
May/2007
Jun/2007
Jul/2007
Aug/2007
Sep/2007
Oct/2007
Nov/2007
Dec/2007
558299
453257
520615
319029
614997
394599
377580
235312
Multiplicative Model
Data
SurtidoSales
Length 41
NMissing 0
Fitted Trend Equation
Yt = 907625 + 4736*t
Seasonal Indices
Month
1
2
3
4
5
6
7
8
9
10
11
12
Index
0.696
0.546
0.517
0.678
0.658
0.615
0.716
0.567
1.527
1.664
1.988
1.829
87
88
2.
3.
Month
Forecast
Jun/2003
680763
Jul/2003
795362
Aug/2003
633209
Sep/2003
1710846
Oct/2003
1872289
Nov/2003 2246745
Dec/2003
2076183
The linear trend in sales has a slight upward slope. The seasonal indices show that
cookie sales are relatively high the last four months of the year with a peak in
November and relatively low the rest of the year.
The residual autocorrelation function is shown below. There are no significant
residual autocorrelations.
89
The multiplicative decomposition adequately accounts for the trend and seasonality
in the data. The forecasts are very reasonable. Jame should change his thinking
about the value of decomposition analysis.
CASE 5-8: SOUTHWEST MEDICAL CENTER
1.
2.
Multiplicative Model
Data
Total Visits
Length 114
NMissing 0
Fitted Trend Equation
Yt = 955.6 + 4.02*t
90
Seasonal Indices
Month
1
2
3
4
5
6
7
8
9
10
11
12
Index
0.972
1.039
0.943
0.884
1.039
0.935
1.043
1.033
0.995
1.007
1.091
1.019
91
Forecasts
3.
4.
Month Forecast
Month Forecast
Mar/2004
1479
Sep/2004
1401
Apr/2004
1469
Oct/2004
1502
May/2004
1419
Nov/2004
1367
Jun/2004
1440
Dec/2004
1284
Jul/2004
1564
Jan/2005
1514
Aug/2004
1464
Feb/2005
1367
There is a distinct upward trend in total visits. The seasonal indices show that
visits in December (4th month of fiscal year) tend to be relatively low and visits
in July (11th month of fiscal year) tend to be relatively high.
The residual autocorrelation function is shown below.
92
There are significant residual autocorrelations. The residuals are far from random
The forecasts may be reasonable given the last three fiscal years of data. However,
looking at the time series decomposition plot in 2, it is clear a decomposition analysis
is not able to describe the middle two or three fiscal years of data. For some
reason, visits for these fiscal years, in general, appear to be unusually high. A
decomposition analysis does not adequately describe Marys data and leaves her
perplexed.
CHAPTER 6
REGRESSION ANALYSIS
ANSWERS TO PROBLEMS AND CASES
1.
Option b is inconsistent because the regression coefficient and the correlation coefficient
93
3.
SE Coef
0.2501
T
P
2.48 0.038
94
Value
0.10919
sf = sy.x
sf = .471
2
1
(X X )
+
n ( X X )2
1
(3 19.78)
1+
+
10
2148.9
1 +.1 +.131
= .471
1.231
= .471
sf = .471(1.110) = .523
5.
95
.
The regression equation is
Cost = 208.2 + 70.92 Age
SS
MS
F
P
634820 634820 50.96 0.000
87197
12457
722017
6.
a, b and d.
96
DF
SS
MS
1 27032.3 27032.3
9 2905.4
322.8
10 29937.6
F
P
83.74 0.000
f. Based on the residuals versus the fitted values plot, there is no reason to
doubt the adequacy of the simple linear regression model.
97
a, b, c & d.
The regression equation is
Orders = 15.8 + 1.11 Catalogs
Predictor
Coef SE Coef
Constant 15.846
3.092
Catalogs 1.1132 0.3596
P
0.000
0.011
DF
SS
MS
F
P
1 317.53 317.53 9.58 0.011
10 331.38
33.14
11 648.92
g. See Fit and 90% PI at end of computer printout above. A 90% prediction interval
for mail orders when 10(000) catalogs are distributed is (16, 38)---16,000 to 38,000.
8.
Coef SE Coef
3538.1
744.4
-418.3
150.8
T
4.75
-2.77
P
0.001
0.024
9.
a. The firms seem to be using very similar rationale since r = .959. Also, from the fitted
line plot below, notice the fitted line is not far from the 45 o line through the origin (with
intercept 0 and slope 1).
99
b. If ABC bids 1.01, the predicted competitors bid is 101.212. A 95% prediction
interval (PI) is given below.
New
Obs
101
Fit
101.212
SE Fit
0.164
95% CI
(100.872, 101.552)
95% PI
(99.637, 102.786)
c. Assume normality distributed errors about the population regression line and
treat the least square line as if it were the population regression line (n is reasonably
large in this case). Then at ABC bid 101, possible competitor bids are normally
distributed about the fitted value 101.212 with a standard deviation estimated by
sy.x = .743. Consequently, the probability that ABC will have the bid is
P(Z (101-101.212)/ .743) = P(Z -.285) = .51.
10.
a. Only if the sample size is large enough. The t statistic associated with the
slope coefficient or the F ratio should be consulted to determine if the population
regression line slope is significantly different from a horizontal line with zero
slope.
b. It will typically produce significant results, not necessarily useful results.
The coefficient of determination, r2, might be small, so forecasting using the fitted
line is unlikely to produce a useful result.
11.
100
Coef
2217.4
-144.95
SE Coef
316.2
27.96
T
P
7.01 0.000
-5.18 0.001
MS
559607
20822
F
26.88
P
0.001
c. Reject H 0 : 1 = 0 at the 5% level since t = -5.18 and its p value = .001 < .05.
d. If interest rate increases by 1%, on average the number of building permits will
decrease by 145.
e. From the computer output above, r2 = .793.
f. Interest rate explains about 79% of the variation in number of building permits issued.
12.
Coef SE Coef
T
P
-17.731
4.626 -3.83 0.003
0.35495 0.02332 15.22 0.000
DF
SS
MS
F
P
1 14331 14331 231.77 0.000
11
680
62
12 15011
c. Reject H 0 : 1 = 0 at the 5% level since t = 15.22 and its p value = .000 < .05
d.
102
Residual Versus Fits plot shows curvature in scatter not captured by straight line fit.
e. Model with quadratic term in Batch Size fits well. Results with Size**2 as
predictor variable follow.
The regression equation is
Defectives = 4.70 + 0.00101 Size**2
Predictor
Constant
Size**2
Coef
4.6973
0.00100793
SE Coef
T
0.9997
4.70
0.00001930 52.22
P
0.001
0.000
SS
MS
F
P
14951 14951 2727.00 0.000
60
5
15011
f. Reject H 0 : 1 = 0 at the 5% level since t = 52.22 and its p value = .000 < .05
103
95% CI
(92.829, 97.993)
95% PI
(89.647, 101.175)
a.
104
Unusual Observations
Obs Assessed Market
3
64.6 87.200
26
72.0 97.200
Fit SE Fit
87.423 1.199
90.483 0.578
Residual St Resid
-0.223
-0.10 X
6.717
2.83R
t=
1.30 2.0
= 4.58 (p value = .000) suggests 1 = 2 is not supported by
.153
the data. Appears that operating expenses have a fixed cost component
represented by the intercept b0 = 18.88 , and are then about 1.3 times player costs.
e.
Y =58.6 , Y 2.064 s f
f. Unusual Observations
Obs PlayCosts OpExpens
7
18.0
60.00
SE Coef
553.6
0.05622
T
P
-1.47 0.158
4.02 0.001
DF
SS
MS
F
P
1 10855642 10855642 16.15 0.001
21 14113925
672092
106
Total
22
24969567
Although the regression is significant, the residual versus fit plot indicates the
magnitudes of the residuals increase with the level. This behavior and the
scatter diagram in a suggest that consumption is not evenly distributed about
the regression line. That is, the data have a megaphone-like appearance. A
straight line regression model for these data is not adequate.
c & d. The response variable is converted to the natural log of newsprint consumption
(LnConsum).
The regression equation is
LnConsum = 5.70 + 0.000134 Families
Predictor
Coef
SE Coef
T
P
Constant
5.6987
0.3302 17.26 0.000
Families 0.00013413 0.00003353
4.00 0.001
S = 0.488968 R-Sq = 43.2% R-Sq(adj) = 40.5%
Analysis of Variance
Source
DF
SS
Regression
1 3.8252
Residual Error 21 5.0209
Total
22
8.8461
MS
3.8252
0.2391
107
F
P
16.00 0.001
The regression is significant (F = 16, p value = .001) although only 43% of the
variation in ln(consumption) is explained by families. The residual plots
above suggest the straight line regression of ln(consumption) on families is
adequate. This simple linear regression model with ln(consumption) is better
than the same model with consumption as the response.
e. Using the results in c, a forecast of ln(consumption) with 10,000 families is
7.040 so a forecast of consumption is 1,141.
f. Other variables that will influence newsprint consumption include number of
papers published and retail sales (influencing newspaper advertising).
17.
a. Can see from fitted line plot below that growth in number of steakhouses is
exponential, not linear.
108
Coef SE Coef
0.3476
0.3507
0.81990 0.09004
T
0.99
9.11
P
0.378
0.001
SS
MS
11.764 11.764
0.568
0.142
12.332
F
82.91
P
0.001
a, Can see from fitted line plot below that growth in number of copy centers is
exponential, not linear.
109
Coef SE Coef
-0.3049
0.1070
0.48302 0.01257
T
-2.85
38.42
P
0.015
0.000
SS
MS
F
P
53.078 53.078 1476.38 0.000
0.431 0.036
53.509
19.
number of employees.
c. r2 = .15. Only 15% of the variation in profits per employee is explained by the
number of employees.
d. The regression is not significant. There is no point in using the fitted function to
generate forecasts for profits per employee for a given number of employees.
20.
Coef
25.013
-0.7125
SE Coef
T
P
5.679 4.40 0.001
0.2912 -2.45 0.029
P
0.029
The regression is now significant at the 5% level (t value = -2.45, p value = .029 < .05).
r2 has increased from 15% to 31.5%. These results suggest there is a linear
relationship between profits per employee and number of employees. A single
observation can have a large influence on the regression analysis, particularly when
the number of observations is relatively small. However, the relatively small r2 of 31.5%
indicates there will be a fair amount of uncertainly associated with any forecast of
profits per employee. Dun and Bradstreet should not be thrown out unless there is some
good (non-numerical) reason not to include this firm with the others.
21.
Coef
0.683
0.92230
SE Coef
T
P
1.691 0.40 0.690
0.08487 10.87 0.000
Analysis of Variance
Source
DF
Regression
1
Residual Error 24
Total
25
SS
3833.4
779.1
4612.5
MS
F
P
3833.4 118.09 0.000
32.5
22.
e. The plot of the residuals versus the fitted values has a megaphone-like appearance.
The residuals are numerically smaller for smaller projects than for larger projects.
Estimated costs are more accurate predictors of actual costs for inexpensive (smaller)
projects than for expensive (larger) projects.
a. The regression is significant (t value = 14.71, p value = .000).
b. r2 = .90 or 90% of the variation in ln(actual costs) is explained by
ln(estimated costs).
c. If ln(estimated costs) are perfect predictor of ln(actual costs), then 0 = 0 , 1 = 1 .
The estimated intercept coefficient, .003, is consistent with 0 = 0 . With the
t value = .02 and its p value = .987, cannot reject the null hypothesis H 0 : 0 = 0 .
To check the hypothesis H1 : 1 = 1 compute t =(.968-1)/.0658 = .49, which is not
112
in the rejection region for a two-sided test at any reasonable significance level.
The estimated slope coefficient, .968, is consistent with 1 = 1 .
d. ln(24) = 3.178, so forecast of ln(actual cost) = .0026 + .968(3.178) = 3.079. Forecast
of actual cost is e3.079 = 21.737.
CASE 6-1: TIGER TRANSPORT
This case asks students to summarize the analysis in a report to management. We find this a useful
exercise since it requires students to put the application and results of a statistical procedure into their
own words. If they are able to do this, they understand the technique.
This case illustrates the use of regression analysis in a situation where determining a good
regression equation is only the first step. The results must then be priced out in order to
arrive at a rational decision regarding a pricing policy. This situation can generate a discussion regarding
the general nature of quantitative techniques: they aid in the decision-making
process rather than replace it. Possible policies regarding the small-load charge can be
discussed after the cost of such loads is determined. One approach would be to take small loads
at company cost, which is low. The resultant goodwill might pay off in increased regular
business. Another would be to charge a low cost for small loads but only if the customer agrees to
book a certain number of large loads.
The low out-of-pocket cost involved in adding small loads can focus management attention
in other directions. Since no significant costs need to be recovered by the small load charge,
a policy based on other considerations is appropriate.
CASE 6-2: BUTCHER PRODUCTS, INC.
1.
The 89 degree temperature is 24 degrees off ideal (89 - 65 = 24). This value is placed into
the regression equation yielding a forecast number of units per day of 338.
2.
Once again, the temperature is 24 degrees from ideal (65 - 41 = 24). For X = 24, a forecast
of 338 units is calculated from the regression equation.
3.
Since there is a fairly strong relationship between output and deviation from ideal
temperature (r = -.80), higher output may well result from efforts to control the
temperature in the work area so that it is close to 65 degrees. Gene should consider ways
to do this.
4.
Gene has made a decent start towards finding an effective forecasting tool. However,
since about 36% of the variation in output is unexplained, he should look for additional
important predictor variables.
The correlation coefficient is: r = .927. The corresponding t = 8.9 for testing
H 0 : = 0 has a p value of .000. We reject H0 and conclude the correlation between
days absent and employee age holds for the population.
2.
Y = 4.28 + .254X
113
3.
r2 = .859. About 86% of Y's (absent days) variability can be explained through
knowledge of X (employee age).
4.
The null hypothesis H 0 : 1 = 0 is rejected using either t = 8.9, p value = .000 or the
F = 79.3 with p value = .000. There is a significant relation between absent days and
employee age.
5.
Placing X = 24 into the prediction equation yields a Y forecast of 1.8 absent days per year.
6.
If time and cost are not factors, it might be helpful to take a larger sample to see if these
small sample results hold. If results hold, a larger sample will very likely produce
more precise interval forecasts.
7.
The fitted function is likely to produce useful forecasts, although 95% prediction
intervals can be fairly wide because of the small sample size.
After John uses simple regression analysis to forecast his monthly sales volume, he is
not satisfied with the results. The low r-squared value (56.3%) disappoints him.
The high seasonal variation should be discussed as a cause of his poor fit
when using only the month number to forecast sales. The possibility of using
dummy variables to account for the monthly effect is a possibility. After this topic
is covered in Chapter 7, you can have the students return to this case.
2.
Not adequate.
3.
The idea of serial correlation can be mentioned at this point. The possibility of
autocorrelated residuals can be introduced based on John's Durbin-Watson statistic.
In fact, the DW is low, indicating definite autocorrelation. A class discussion about
this problem and what might be done about it is useful. After this topic is covered
in Chapter 8, you can have the students return to this case. We hope that by this
time students appreciate the difficulties involved in real-life forecasting. Forecasting
Compromises and multiple attempts are the norm, not exceptions.
Coef
32.68
0.003487
SE Coef
31.94
0.001076
T
1.02
3.24
P
0.312
0.002
F
10.51
P
0.002
The correlation of Clients and Index = 0.752. The relation is significant (see below).
The regression equation is
Clients = - 199 + 2.94 Index
Predictor
Constant
Index
Coef
-198.65
2.9400
SE Coef
T
P
28.64 -6.94 0.000
0.2619 11.23 0.000
2.
Source
DF
SS
MS
F
P
Regression
1 49993 49993 126.04 0.000
Residual Error 97
38475
397
Total
98
88468
The regression equation is Clients = - 199 + 2.94 BI
Jan 1993: Clients = - 199 + 2.94 (125) = 168.5
Feb 1993: Clients = - 199 + 2.94 (125) = 168.5
Mar 1993: Clients = - 199 + 2.94 (130) = 183.2
Note: Students might develop a new equation that leaves out the first three months of
data for 1993. This is a better way to determine whether the model works and the
results are:
The regression equation is
Clients = - 204 + 2.99 Index
Predictor
Coef SE Coef
T
P
Constant -203.85
31.37 -6.50 0.000
Index
2.9898 0.2883 10.37 0.000
S = 20.0046 R-Sq = 53.4% R-Sq(adj) = 52.9%
115
Analysis of Variance
Source
DF
SS
Regression
1 43028
Residual Error 94 37617
Total
95 80645
MS
F
43028 107.52
400
P
0.000
Coef SE Coef
469.58
32.07
-37719
3461
T
14.64
-10.90
P
0.000
0.000
3.
Source
DF
Regression
1
Residual Error 94
Total
95
Actual
Jan 1993
Feb 1993
Mar 1993
152
151
199
SS
MS
F
P
45015 45015 118.76 0.000
35630
379
80645
Forecast Forecast Forecast(RecipIndex predictor)
169
169
183
170
170
185
168
168
180
4.
Only if the business activity index could itself be forecasted accurately. Otherwise, it is
not a viable predictor because the values for the business activity index are not
available in a timely fashion.
5.
6.
If a good regression equation can be developed in which the changes in the predictor
variable lead the response, it might be possible to accurately forecast the rest of 1993.
However, if the regression equation is based on coincident changes in the predictor
variable and response, forecasts for the rest of 1993 could not be developed since values
for the predictor variable are not known in advance.
1.
The four linear regression models are shown below. Both temperature and rainfall are
potential predictor variables.
The regression equation is
Calls = 18366 + 467 Rate
Predictor
Coef SE Coef
T
P
Constant 18366
1129 16.27 0.000
Rate
467.4
174.2
2.68 0.010
S = 1740.10 R-Sq = 11.0% R-Sq(adj) = 9.5%
The regression equation is
Calls = 28582 - 137 Temp
Predictor
Constant
Temp
Coef SE Coef
28582.2
956.0
-137.44
18.06
T
P
29.90 0.000
-7.61 0.000
Coef SE Coef
T
P
20068.9
351.7 57.07 0.000
400.30
84.20
4.75 0.000
Coef
SE Coef
T
27980
3769
7.42
-0.015670 0.008703 -1.80
P
0.000
0.078
labeled NewTemp.
The correlation coefficient between Calls and NewTemp is .724, indicating a fairly
strong positive linear relationship. However, examination of the fitted line plot below
suggests there is a curvilinear relation between Calls and NewTemp
4.
Coef
20044.4
5.3817
SE Coef
203.1
0.5462
T
98.68
9.85
P
0.000
0.000
MS
119870408
1234744
118
F
P
97.08 0.000
CHAPTER 7
MULTIPLE REGRESSION
ANSWERS TO PROBLEMS AND CASES
1.
A good predictor variable is highly related to the dependent variable but not too
highly related to other predictor variables.
2.
The population of Y values is normally distributed about E(Y), the plane formed by the
regression equation. The variance of the Y values around the regression plane is
constant. The residuals are independent of each other, implying a random sample. A linear
relationship exists between Y and each predictor variable.
3.
The net regression coefficient measures the average change in the dependent variable per
unit change in the relevant independent variable, holding the other independent variables
constant.
4.
5.
6.
7.
8.
a. Correlations:
Amount
Items
Time Amount
0.959
0.876 0.923
Predictor
Constant
Amount
Items
Coef SE Coef
0.4217 0.5864
0.08715 0.01611
-0.0386
0.1131
T
P
VIF
0.72 0.483
5.41 0.000 6.756
-0.34 0.737 6.756
SS
128.988
11.030
140.018
MS
F
64.494 87.71
0.735
P
0.000
Amount and Time are highly collinear (correlation = .923, VIF = 6.756). Both
variables are not needed in the regression function. Deleting Items with the
non-significant t value gives the best regression below.
95% CI
(5.504, 6.512)
95% PI
(4.171, 7.845)
9.
Coef SE Coef
3.519
3.161
2.2776
0.8126
-0.411
1.236
T
P
1.11 0.302
2.80 0.026
-0.33 0.749
VIF
4.016
4.016
dropped from the regression function and the analysis redone with only Income
as the predictor variable.
10.
a. Both high temperature and traffic count are positively related to number of sixpacks sold and have potential as good predictor variables. There is some collinearity
(r = .68) between the predictor variables but perhaps not enough to limit their
value.
b. Reject H 0 : 1 = 0 if |t| > 2.898
b1
t= s
b1
.78207
= 3.45
.22694
Reject H0 because 3.45 > 2.898 and conclude that the regression coefficient for
the high temp-variable is unequal to zero in the population.
Reject H 0 : 2 = 0 if |t| > 2.898
b2
t= s
b2
.06795
= 3.35
.02026
Reject H0 because 3.35 > 2.898 and conclude that the regression coefficient for
the traffic count variable is unequal to zero in the population.
c. Y = -26.706 + .78207(60) + .06795(500) = 54 (six-packs)
2
(Y Y )
2727.9
= .81
2 = 1 14316.9
(Y Y )
We are able to explain 81% of the number of six-packs sold variation using
knowledge of daily high temperature and daily traffic count.
d. R2 = 1 -
e. sy.xs =
2
(Y Y ) =
n k 1
2727.9
=
( 203)
160.46 = 12.67
f. If there is an increase of one degree in high temperature while the traffic count
is held constant, beer sales increase on an average of .78 six-packs.
g. The predictor variables explain 81% of the variation in six-packs sold. Both
predictor variables are significant. It would be prudent to examine the residuals (not
available in the problem) before deciding to use the fitted regression function for
forecasting however.
123
11.
a. Scatter diagram follows. Female drivers indicated by solid circles, male divers by
diamonds.
124
12.
Sales
0.739
0.548
Outlets
0.670
Number of retail outlets is positively related to annual sales, r12 = .74, and is
potentially a good predictor variable. Number of automobiles registered is
moderately related to annual sales, r13 = .55, and is positively correlated with
number of retail outlets, r23 = .67. Given number of retail outlets in the
regression function, number of automobiles registered may not be required.
b. The regression equation is
Sales = 10.1 + 0.0110 Outlets + 0.195 Auto
Predictor
Constant
Outlets
Auto
Coef
10.109
0.010989
0.1947
SE Coef
T
P
VIF
7.220 1.40 0.199
0.005200 2.11 0.068 1.813
0.6398 0.30 0.769 1.813
MS
F
521.8 4.91
106.2
125
P
0.041
2
(Y Y )
=
n k 1
849.6
=
(11 3)
106.2
= 10.3
f. If one retail outlet is added while the number of automobiles registered remains
constant, sales will increase by an average of .011 million or $11,000 dollars. If
one million more automobiles are registered while the number of retail outlets
remains constant, sales will increase by an average of .195 million or $195,000
dollars. However, these regression coefficients are suspect due to collinearity
between the predictor variables.
g. New predictor variables should be tried.
13.
Sales Outlets
0.739
0.548 0.670
0.936
0.556
Auto
0.281
126
Analysis of Variance
Source
Regression
Residual Error
Total
DF
SS
MS
F
P
3 1843.40 614.47 86.32 0.000
7
49.83
7.12
10 1893.23
95% CI
(22.865, 31.746)
95% PI
(19.591, 35.020)
Coef SE Coef
T
P
VIF
-4.027
2.468 -1.63 0.141
0.6209
0.1382 4.49 0.002 1.086
0.43017 0.03489 12.33 0.000 1.086
14.
Measures of fit are nearly the same as those for the full model and there is no longer
a multicollinearity problem.
a. Reject H0 : 1 = 0 if |t |> 3.1.
t=
.65
= 13
.05
Reject H0 and conclude that the regression coefficient for the aptitude test variable
is significantly different from zero in the population.
Similarly, Reject H0 : 2 = 0 if |t |> 3.1.
t=
20.6
= 12.2
1.69
Reject H0 and conclude that the regression coefficient for the effort index variable
is significantly different from zero in the population.
b. If the effort index increases one point while aptitude test score remains constant,
sales performance increases by an average of $20.600.
c. Y = 16.57 + .65(75) + 20.6(.5) = 75.62
2
d. s y x ' s ( n 3) = (3.56)2 (14 - 3) = 139.4
2
e. s y ( n 1) = (16.57)2 (14 - 1) = 3569.3
f. R2 = 1 -
(Y Y ) 2
139.4
= 1 - .039 = .961
2 = 1 3569.3
(Y Y )
R 2 =1
SSE /( n k 1)
134.90 / 11
=1
= .955
SST /(n 1)
3569.3 / 13
128
15.
a. Scatter plot for cash purchases versus number of items (rectangles) and credit card
purchases versus number of items (solid circles) follows.
129
Notice that for a given number of items, sales from cash purchases are estimated to
be about $18.60 less than gross sales from credit card purchases.
c. The regression in part b is significant. The number of items sold and whether
the purchases were cash or credit card explains approximately 83% of the
variation in gross sales. The predictor variable Items is clearly significant. The
coefficient of the dummy variable X2 is significantly different from 0 at the
10% level but not at the 5% level. From the residual plots below we see that
there are a few large residuals (see, in particular, cash sales for day 25 and credit
card sales for day 1); but overall, plots do not indicate any serious departures
from the usual regression assumptions.
130
df = 47
ERA
SO
BA
131
RUNS
HR
SO
BA
RUNS
HR
SB
0.049
0.446
0.627
0.209
0.190
-0.393
0.015 -0.007
0.279 -0.209 0.645
0.490 -0.215 0.154 0.664
-0.404 -0.062 -0.207 -0.162 -0.305
1
2
20.40 71.23
RUNS
T-Value
P-Value
0.087 0.115
3.94 10.89
0.001 0.000
ERA
T-Value
P-Value
-18.0
-9.52
0.000
S
7.72
3.55
R-Sq
39.28 87.72
The fitted function from the stepwise program is:
WINS = 71.23 + .115 RUNS - 18 ERA with R2 = 88%
17.
a. View will enter the stepwise regression function first since it has the largest
correlation with Price. After that the order of entry is difficult to determine from
the correlation matrix alone. Several of the predictor variable pairs are fairly highly
correlated so multicollinearity could be a problem. For example, once View is in the
model, Elevation may not enter (be significant). Slope and Area are correlated so
it may be only one of these predictors is required.
b. As pointed out in part a, it is difficult to determine the results of a stepwise program.
However, a two predictor model will probably work as well as any in this case.
Potential two predictor models include View and Area or View and Slope.
132
18.
SE Coef
T
P
VIF
31.67 -1.36 0.192
0.3397
1.09 0.290 1.473
0.2917
1.21 0.246 1.445
11.04
1.73 0.103 1.481
SS
3071.1
3096.7
6167.8
MS
F
1023.7 5.29
193.5
P
0.010
Unusual Observations
Obs X1
Y
Fit SE Fit Residual St Resid
20 95 57.00 84.43
4.73
-27.43
-2.10R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New
Obs
Fit SE Fit
95% CI
95% PI
1 80.88
3.36 (73.77, 88.00) (50.55, 111.22)
F = 5.29 with a p value = .010, so the regression is significant at the 1% level.
The predicted final exam score for within term exam scores of 86 and 77 and a
GPA of 3.4 is Y = 81
The variance inflation factors (VIFs) are all small (near 1); however, the t ratios and
corresponding p values suggest that each of the predictor variables could be dropped
from the regression equation. Since the F ratio was significant, we conclude that
multicollinearity is a problem.
d. Mean leverage = (3+1)/20= .20. None of the observations are high leverage points.
e. From the regression output above, observation 20 has a large standardized residual.
The fitted model over-predicts the response (final exam score) for this student.
19.
Stepwise regression results, with significance level .05 to enter and leave the
regression function, follow.
133
1
-26.24
X3
T-Value
P-Value
31.4
3.30
0.004
S
R-Sq
R-Sq(adj)
14.6
37.71
34.25
The best regression model relates final exam score to the single predictor
variable grade point average.
All possible regression results are summarized in the following table.
Predictor
Variables
R2
X1
.295
X2
.301
X3
.377
X1, X2
.404
X1, X3
.452
X2, X3
.460
X1, X2, X3
.498
2
The R criterion would suggest using all three predictor variables. However, the
results in problem 7.18 suggest there is a multicollinearity problem with three
predictors. The best two independent variable model uses predictors X2 and X3.
When this model is fit, X2 is not required. We end up with a model involving the
single predictor X3, the model selected by the stepwise procedure.
20.
Coef
5.6865
-0.5046
0.2553
-0.0246
SE Coef
0.6103
0.1170
0.0725
0.0130
R-Sq = 42.8%
T
P
9.32 0.000
-4.31 0.000
3.52 0.001
-1.90 0.064
R-Sq(adj) = 39.1%
134
VIF
1.0
1.0
1.0
Fit
5.9055
7.0645
21.
Predictor
Constant
Accounts
Accounts**2
Coef
7.608
-0.00457
0.00003361
E Coef
T
P
VIF
8.503 0.89 0.401
0.02378 -0.19 0.853 25.965
0.00000893 3.76 0.007 25.965
DF
SS
2 51130
7
1078
9 52208
MS
F
P
25565 165.95 0.000
154
SE Coef
9.146
1.049
8.412
T
P
VIF
-3.36 0.006
4.01 0.002 2.019
2.08 0.059 2.019
Analysis of Variance
Source
Regression
Residual Error
Total
DF
SS
MS
F
P
2 2777.0 1388.5 32.57 0.000
12 511.6
42.6
14 3288.7
23.
Using the final model from problem 22 with H2S = 7.3 and Lactic = 1.85
Predicted Values for New Observations
New
Obs
Fit SE Fit
1 32.36
3.02
95% CI
(25.78, 38.95)
95% PI
(16.69, 48.04)
Since s y x ' s = 6.53 and t.025 = 2.179 a large sample 95% prediction interval is:
32.36 2.179(6.53) (18.13, 46.59)
Notice the large sample 95% prediction interval is not too much different than the
actual 95% prediction interval (PI) above.
Although the fit in this case is relatively good, the standard error of the estimate is
somewhat large, so there is a fair amount of uncertainty associated with any forecast.
137
It may be a good idea to collect more data and, perhaps, investigate additional
predictor variables.
24.
1.96
11.94
0.000
S
13.7
R-Sq
85.59
R-Sq(adj) 84.99
Results from stepwise program are not surprising given the definitions of the
variables and the strong (and in some cases perfect) multicollinearity.
c. The coefficient of TotRev from the stepwise program is 1.96 and the constant
is relatively small and, in fact, insignificant. Consequently, Franchise Value is,
on average, about twice Total Revenue.
138
SE Coef
4.138
0.1528
T
4.56
8.52
P
0.000
0.000
MS
2101.7
29.0
F
72.56
P
0.000
Unusual Observations
Obs PlayerCt OpExpens
7
18.0
60.00
What questions do you think Judy will have for Ron? The students always seem
to come up with questions that Ms. Johnson will ask. The key is that Ron should be able
to answer them. Possible issues include:
Are all the predictor variables in the final model required? Is a simpler model
with fewer predictor variables feasible?
139
Do the estimated regression coefficients in the final model make sense and are
they reliable?
Four observations have large standardized residuals. Is this a cause for concern?
Is the final model a good one and can it be confidently used to forecast the
utilitys bond interest rate at the time of issuance?
Is multiple regression the appropriate statistical method to use for this situation?
CASE 7-2: AAA WASHINGTON
1.
The multiple regression model that includes both unemployment rate and average
monthly temperature is shown below. Temperature is the only good predictor variable.
2.
Yes.
3.
Coef SE Coef
T
P
21405
1830 11.70 0.000
-88.36
19.21 -4.60 0.000
140
Lag11Rate
756.3
172.0
4.40
0.000
SE Coef
T
P
847.0 20.14 0.000
146.5
4.34 0.000
47.70 -2.35 0.023
1.657
4.58 0.000
Calls
24010
17424
24861
19205
The regression is significant. The R 2 of 78.1% looks good. The t statistic for each
of the predictor variables is large with a very small p-value. The VIFs are relatively
small for the three predictors indicating that multicollinearity is not a problem. The
residual plots shown in Figure 7-4 indicate that this model is valid. Dr. Hanke has
developed a good model to forecast ERA.
2.
The matrix plot below of ERA versus each of five potential predictor variables does
not show any obvious nonlinear relationships. There does not appear to be any
reason to develop a new model.
142
3.
The regression results with WHIP replacing OBA as a predictor variable follow.
The residual plots are very similar to those in Figure 7-4.
The regression equation is
ERA = - 2.81 + 4.43 WHIP + 0.101 CMD + 0.862 HR/9
Predictor
Coef SE Coef
Constant -2.8105
0.4873
WHIP
4.4333
0.3135
CMD
0.10076 0.04254
HR/9
0.8623 0.1195
T
P
VIF
-5.77 0.000
14.14 0.000 1.959
2.37 0.019 1.793
7.22 0.000 1.135
P
0.000
The fit and the adequacy of this model are virtually indistinguishable from the
corresponding model with OBA instead of WHIP as a predictor. The estimated
coefficients of CMD and HR/9 are nearly the same in both models. Both models are
good. The original model with OBA as a predictor has a slightly higher R2 and a
slightly smaller standard error of the estimate. Using these criteria, it is the preferred
model.
The project may not be doomed to failure. A lot can be learned from investigating the
influence of the various independent variables on WINS. However, the best regression model
does not explain a large percentage of the variation in WINS, R2 = 34%, so the experts have
a point. There will be a lot of uncertainty associated with any forecast of WINS. The stepwise
selection of the best predictor variables and the subsequent full regression output follow.
Stepwise Regression: WINS versus THROWS, ERA, ...
Alpha-to-Enter: 0.05 Alpha-to-Remove: 0.05
Response is WINS on 10 predictors, with N = 138
Step
Constant
1
2
20.531 5.543
ERA
T-Value
P-Value
-2.16 -2.01
-7.00 -6.80
0.000 0.000
RUNS
T-Value
P-Value
0.0182
3.86
0.000
S
R-Sq
R-Sq(adj)
3.33 3.17
26.51 33.83
25.97 32.85
CHAPTER 8
REGRESSION WITH TIME SERIES DATA
144
If not properly accounted for, serial correlation can lead to false inferences under the
usual regression assumptions. Regressions can be judged significant when, in fact,
they are not, coefficient standard errors can be under (or over) estimated so individual
terms in the regression function may be judged significant (or insignificant) when they
are not (or are) and so forth.
2.
Serial correlation often arises naturally in time series data. Series, like employment,
whose magnitudes are naturally related to the seasons of the year will be autocorrelated.
Series, like sales, that arise because of a consistently applied mechanism, like advertising
or effort, will be related from one period to the next (serially correlated). In the analysis
of time series data, autocorrelated residuals arise because of a model specification error
or incorrect functional formthe autocorrelation in the series is not properly accounted
for.
3.
4.
Durbin-Watson statistic
5.
Reject H0 if DW < 1.10. Since 1.0 < 1.10, reject and conclude that the errors are
positively autocorrelated.
6.
Reject H0 if DW < 1.55, Do not reject H0 if DW > 1.62. Since 1.6 falls between 1.55
and 1.62, the test is inconclusive.
7.
8.
A predictor variable is generated by using the Y variable lagged one or more periods.
9.
DF
SS
MS
F
P
2 223.39 111.69 21.29 0.000
13
68.19
5.25
15 291.58
H1: > 0
Using the .05 significance level for a sample size of 16 with 2 predictor variables,
dL = .98. Since DW = .61 < .98, reject H0 and conclude the observations are positively
serially correlated.
10.
SE Coef
T
P
59496 5.21 0.000
7240 3.37 0.007
97706 -1.98 0.076
47412
4.58 0.001
11.
With n = 14, k =3 and = .05, DW = 1.14 gives an indeterminate test for serial
correlation.
Serial correlation is not a problem. However, it is interesting to see whether the students
realize that collinearity is a likely problem since Customer and Charge are highly correlated.
Correlation matrix:
Use
Charge
Customer
Revenue
0.187
0.989
0.918
Use
Charge
0.109
0.426
0.891
146
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
2 76938 38469 774.66 0.000
Residual Error 25
1241
50
Total
27 78180
Durbin-Watson statistic = 1.82064 (Cannot reject H 0 : = 0 at any reasonable
significance level)
12.
Earnings
Dividend
Payout
The best model, after taking account of the initial multicollinearity, uses the predictor
variables Earnings and Payout (ratio).
The regression equation is
Share = 4749 + 6651 Earnings + 171 Payout
Predictor Coef SE Coef
T
P
VIF
Constant 4749
5844 0.81 0.424
Earnings
6651
1546 4.30 0.000 1.002
Payout 171.40
50.49 3.39 0.002 1.002
S = 3922.16 R-Sq = 53.4% R-Sq(adj) = 49.7%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
SS
MS
F
P
2 440912859 220456429 14.33 0.000
25 384584454
15383378
27 825497313
13.
a.
148
b. No. The residual autocorrelation function for the residuals from the straight line fit
indicates significant positive autocorrelation. The independent errors assumption
is not viable.
c. The fitted line plot with the natural logarithms of Passengers as the dependent variable
and the residual autocorrelation function follow.
149
The residual autocorrelation function looks a little better than that in part b,
but there is still significant positive autocorrelation at lag 1.
d. Exponential trend plot for Passengers follows along with residual autocorrelation
function.
150
=195.
Coef SE Coef
T
P
16.61
27.99 0.59 0.563
8.801
1.020 8.63 0.000
30.02
41.67 0.72 0.484
Forecasts for the 3rd and 4th quarters can be done using several different
approaches. This is best left to the student with a discussion of why they
used a particular method. One method that is to average the past values
of Permits for the 1st and 2nd quarters and use these averages in the model.
This will result in forecasts: 3rd quarter 514; 4th quarter 235.
15.
Quarter
1
2
3
4
Sales S2 S3 S4
16.3
17.7
28.1
34.3
0
1
0
0
0
0
1
0
0
0
0
1
Coef
SE Coef
T
152
Constant 19.292
S2
-1.425
S3
11.163
S4
33.254
2.074
2.933
2.999
2.999
9.30
-0.49
3.72
11.09
0.000
0.630
0.001
0.000
153
17.
SE Coef
T
P
97.70 1.52 0.145
2.034 4.50 0.000
Source
DF
SS
MS
F
P
Regression
1 1164598 1164598 20.27 0.000
Residual Error 18 1034389
57466
Total
19 2198987
Durbin-Watson statistic = 1.1237
Here DiffSales = Yt ' = Yt Yt 1 and DiffIncome = X t' = X t X t 1 . The results
involving simple differences are close to the results obtained by the method of
generalized differences in Example 8.5. The estimated slope coefficient is 9.16
versus an estimated slope coefficient of 9.26 obtained with generalized. The intercept
coefficient 149 is also somewhat consistent with the intercept coefficient 54483(1.997) =
163 for the generalized differences procedure. We would expect the two methods to
= .997 is nearly 1.
produce similar results since
18.
DF
1
SS
MS
F
P
430.0 430.0 4.23 0.054 (1) Regression is not
significant
at .01 level
Residual Error 18 1829.0 101.6
Total
19 2259.0
Durbin-Watson statistic = 0.4135 With = .05, dL = 1.20 so positive
autocorrelation is indicated.
Can improve model by allowing for autocorrelated observations (errors).
b. The regression equation is
Savings = - 3.14 + 0.0763 Income + 20.2 War Year
Predictor
Coef SE Coef
T
P
Constant
-3.141
2.504 -1.25 0.227
Income
0.07632 0.01279 5.97 0.000
War Year 20.165
2.375 8.49 0.000 (1) Given Income, War Year makes
a significant contribution at
the
155
.01 level.
S = 4.53134 R-Sq = 84.5% R-Sq(adj) = 82.7%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
2 1909.94 954.97 46.51 0.000
Residual Error 17
349.06 20.53
Total
19 2259.00
Durbin-Watson statistic = 2.010 (2) No significant autocorrelation of any kind is
indicated.
Using all the usual criteria for judging the adequacy of a regression model, this model
is much better than the simple linear regression model in part a.
19.
a.
156
The data are clearly seasonal with fourth quarter sales large and sales for the
remaining quarters relatively small. Seasonality is confirmed by the
autocorrelation function with significant autocorrelation at the seasonal
lag 4.
b. From the autocorrelation function observations 4 periods apart are highly
positively correlated. Therefore an autoregressive model with sales lagged 4
time periods as the predictor variable might be appropriate.
c. The regression equation is
Sales = 421 + 0.853 Lg4Sales
24 cases used, 4 cases contain missing values
Predictor
Coef
Constant
421.4
Lg4Sales 0.85273
SE Coef
T
P
230.0 1.83 0.081
0.09286 9.18 0.000
compared to 2150
compared to 2350
compared to 2600
compared to 3400
Forecasts are not bad but they are below the Value Line estimates for the
last 3 quarters and the difference becomes increasingly larger.
e. Value line estimates for the last 3 quarters of 2003-04 seem increasingly optimistic.
f. Model in part c can be improved by allowing for significant lag 1 residual
autocorrelation. One approach is to include sales lagged 1 quarter as an additional
predictor variable.
20.
Income ChickPrice
0.932
0.957
0.986
PorkPrice
0.970
0.928
0.941
1
28.86
0.00970 0.01454
10.90
6.54
0.000
0.000
ChickPrice
T-Value
P-Value
S
R-Sq
2
37.72
-0.29
-2.34
0.030
2.58
84.98
2.34
88.21
158
R-Sq(adj)
84.27
87.03
c. There is high multicollinearity among the predictor variables so the final model
depends on which non-significant predictor variable is deleted first. If BeefPrice is
deleted, the final model is the one selected by stepwise regression (using a .05 level
for determining significance of individual terms) with significant lag 1 residual
autocorrelation. If Income is deleted first, then the final model involves the three
Price predictor variables as shown below. There is no significant residual
autocorrelation but large VIFs, although the coefficients of the predictor variables
have the right signs. In this data set, Income is essentially a proxy for the three
price variables.
The regression equation is
ChickConsum = 37.9 - 0.665 ChickPrice + 0.195 PorkPrice + 0.123 BeefPrice
Predictor
Constant
ChickPrice
PorkPrice
BeefPrice
Coef SE Coef
T
P
VIF
37.859
3.672 10.31 0.000
-0.6646
0.1702 -3.90 0.001 17.649
0.19516 0.05874 3.32 0.004 21.109
0.12291 0.02625 4.68
0.000 9.011
1
2
1.729 2.375
LnIncome
T-Value
P-Value
0.283 0.440
14.32 15.40
0.000 0.000
LnChickP
T-Value
P-Value
-0.445
-6.06
0.000
S
R-Sq
0.0528 0.0321
90.71 96.72
159
R-Sq(adj)
90.27
96.40
SE Coef
T
P
VIF
0.1344 17.67 0.000
0.02857 15.40 0.000 5.649
0.07342 -6.06 0.000 5.649
Analysis of Variance
Source
DF
SS
MS
F
Regression
2 0.61001 0.30500 295.30
Residual Error 20 0.02066 0.00103
Total
22
0.63067
P
0.000
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
2 8.039 4.020 2.72 0.091
Residual Error 19 28.033 1.475
Total
21 36.073
Durbin-Watson statistic = 1.642
Very little explanatory power in the predictor variables. If the non-significant DiffIncome
is dropped from the model, the resulting regression is significant at the .05 level, R 2 is
virtually unchanged and the standard error of the estimate decreases slightly. The residual
plots look good and there is no evidence of autocorrelation. With the very low R 2, the fitted
function is not useful for forecasting the change (difference) in chicken consumption.
23. The regression equation is
ChickConsum = 1.94 + 0.975 LagChickC
22 cases used, 1 cases contain missing values
Predictor
Coef SE Coef
T
P
Constant
1.945
1.823 1.07 0.299
LagChickC 0.97493 0.04687 20.80 0.000
S = 1.33349 R-Sq = 95.6% R-Sq(adj) = 95.4%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
1 769.45 769.45 432.71 0.000
Residual Error 20 35.56
1.78
Total
21 805.01
161
Yt Yt 1 = X t X t 1 + t t 1 = t + t t 1 = t say
X t X t 1 = t
Here the independent error t has mean 0 and variance 32. So the first differences for
both Yt and X t are stationary and X and Y are cointegrated of order 1. The cointegrating
linear combination is: Yt X t = t .
CASE 8-2: BUSINESS ACTIVITY INDEX FOR SPOKANE COUNTY
1.
2.
Would it have been better to eliminate multicollinearity first and then tackle
autocorrelation?
Answer: No. In order to solve the autocorrelation problem, the nature of the data was
changed (first differenced). If multicollinearity were solved first, one or more important
variables may have been eliminated. Autocorrelation must be accounted for first so the
usual regression assumptions apply; then multicollinearity can be tackled.
162
3.
4.
Should the regression done on the first differences have been through the origin?
Answer: Perhaps. An intercept can be included in the regression model and then
checked for significance. Ordinarily, regressions with first differenced data does
not require an intercept term.
5.
6.
What conclusions can be drawn from a comparison of the Spokane County business
activity index and the GNP?
Answer: The Spokane business activity seems to be extremely stable. It was not
affected by the national recessions of 1970 and 1974. The large peak in 1974 was
caused by Expo 74 (a world fair). It would be inappropriate in this case to expect
the Spokane economy to follow national patterns.
2.
3.
4.
Would another type of forecasting model be more effective for forecasting weekly sales?
Answer: Possibly! Jim will investigate Box-Jenkins ARIMA models in Chapter 9.
for by sales lagged 12 months as the predictor variable, R2 is large (91%) and there is no residual
autocorrelation. However, this model does not include predictor variables directly under Johns
control, like price, so he would not be able to determine how a change in price (or changes in other
operational variables) might affect future sales.
SE Coef
T
P
41.23 -7.09 0.000
0.3404 9.93 0.000
0.09740 3.80 0.000
0.02882 -2.28 0.026
SE Coef
T
P
26.96 -5.01 0.000
0.2421 10.37 0.000
8.443 -0.45 0.655
164
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
5.686
-15.869
-21.146
-13.580
-20.641
-19.650
-25.857
-6.869
-19.014
-33.143
8.469
8.445
8.441
8.443
8.441
8.443
8.441
8.445
8.448
8.441
0.67
-1.88
-2.51
-1.61
-2.45
-2.33
-3.06
-0.81
-2.25
-3.93
0.504
0.064
0.015
0.112
0.017
0.023
0.003
0.419
0.027
0.000
DF
SS
MS
F
P
12 39111.7 3259.3 13.07 0.000
71 17704.7
249.4
83 56816.3
Forecast
179
175
197
Actual
151
152
199
Forecasts for Jan and Feb 1993 are high compared to actual numbers of clients but
forecast for Mar 1993 is very close to the actual number of new clients
Autoregressive model:
Autoregressive models with number of new clients lagged 1, 4 and 12 months were
tried. None of these models proved to be useful for forecasting. The best model had number of
new clients lagged 1 month. The results are displayed below.
The regression equation is
Client = 61.4 + 0.487 LagClients
95 cases used, 1 cases contain missing values
Predictor
Coef
Constant
61.41
LagClients 0.48678
SE Coef
T
P
10.91 5.63 0.000
0.08796 5.53 0.000
165
The results for the best model are shown below (see also solution to Case 7-2). Each of
the independent variables is significantly different from 0 at the .05 level. The signs of
the coefficients are what we would expect them to be.
The regression equation is
Calls = 17060 + 635 Lg11Rate - 112 NewTemp + 7.59 NewTemp**2
Predictor
Coef SE Coef
T
P
Constant
17060.2
847.0 20.14 0.000
Lg11Rate
635.4
146.5
4.34 0.000
NewTemp
-112.00
47.70 -2.35 0.023
NewTemp**2 7.592
1.657
4.58 0.000
S = 941.792 R-Sq = 75.0% R-Sq(adj) = 73.5%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
SS
MS
F
P
3 140771801 46923934 52.90 0.000
53 47009523
886972
56 187781324
Serial correlation is not a problem. The value of the Durbin-Watson statistic (1.62)
would not reject the null hypothesis of no serial correlation. There are no
significant residual autocorrelations. Restricting attention to integer powers, 2 is the
best choice for the exponential transformation. Allowing other choices for powers,
e.g. 2.4, may improve the fit a bit but is not as nice as an integer power.
3.
The memo to Mr. DeCoria should use all the usual inferential and descriptive summaries
to defend the model in part 1. A residual analysis should also be included.
166
2.
Selling the final regression model to management, including the irascible Jackson
Tilson, ties the statistical exercise in the Alomega case to the real world of business
management. The idea of selling the statistical results to management can be
the focus of team presentations to the class with the instructor playing the role of
Tilson. Working through the presentation of results to the class adds an important
real world element to the statistical analysis.
3.
As noted in the case, the advertising predictor variables are under the control of
Alomega management. Students can demonstrate the usefulness of this result by
choosing reasonable future values for these advertising variables and generating forecasts.
However, students must recognize the regression equation does not necessarily
imply a cause and effect relationship between advertising expenditures and sales. In
addition, conditions under which the model was developed may change in the future.
4.
All forecasts, including the ones using Julies regression equation, assume a future
that is identical to the past except for the identified predictor variables. If her
model is used to generate forecasts for Alomega, she should check the model
accuracy on a regular basis. The errors encountered as the future unfolds should
be compared to those in the data used to generate the model. If significant
changes or trends are observed, the model should be updated to include the most
recent data, along with possibly discarding some of the oldest data. Alternatively,
a different approach to the forecasting problem can be sought if the forecasting errors
suggest that the current regression model is inadequate.
The positive coefficient on November makes sense because cookie sales are seasonal
sales relatively high each year in November, the month before the Christmas holidays.
2.
James model looks good. Almost 94% of the variation in cookie sales is explained
by the model. The residual analysis indicates the usual regression assumptions are
tenable, including the independence assumption.
3.
Forecasts:
167
June 2003
July 2003
August 2003
September 2003
October 2003
November 2003
December 2003
4.
733,122
799,823
737,002
1,562,070
1,744,477
2,152,463
1,932,194
SE Coef
T
P
91884 1.26 0.219
0.08732 10.88 0.000
MS
F
P
7.03141E+12 118.35 0.000
59412957997
5.
717,956
632,126
681,996
1,642,130
1,801,762
2,113,392
1,844,434
Both models fit the data well. Apart from July 2003, the forecasts generated by the
models are very close to one another. Dummy variable regression explains more of
the variation in cookie sales but the autoregression is simpler. Could make a case for
either model.
168
The regression results along with residual plots and the residual autocorrelation
function follow.
The regression equation is
Total Visits = 997 + 3.98 Time - 81.4 Sep + 5.3 Oct - 118 Nov - 149 Dec
- 24.2 Jan - 116 Feb + 23.8 Mar + 18.2 Apr - 30.5 May - 39.4 Jun
+ 35.2 Jul
Predictor
Constant
Time
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Coef SE Coef
T
P
996.97
58.42 17.06 0.000
3.9820
0.4444
8.96 0.000
-81.38
71.69 -1.14
0.259
5.34
71.67
0.07
0.941
-118.34
71.66 -1.65
0.102
-148.62
71.66 -2.07
0.041
-24.21
71.65 -0.34
0.736
-116.39
71.65 -1.62
0.107
23.80
73.55
0.32 0.747
18.15
73.53
0.25
0.806
-30.50
73.53 -0.41 0.679
-39.37
73.52 -0.54
0.593
35.20
73.51 0.48
0.633
DF
SS
MS
F
P
12 2353707 196142 8.07 0.000
101 2456198
24319
113 4809905
169
Mary has a right to be disappointed. This regression model does not fit well. Even
allowing for seasonality, only the Dec seasonal dummy variable is significant at the
.05 level. The residual plots clearly show a poor fit in the middle of the series and
there is a considerable amount of significant residual autocorrelation.
2.
Mary might try an autoregression with different choices of lags of total visits
as predictor variable(s). She might try to fit a Box-Jenkins ARIMA model to
be discussed in Chapter 9. Regardless, finding an adequate model for this
time series will be challenging.
CHAPTER 9
170
a. 0 .196
b. Series is random
c. Series could be a stationary autoregressive process or series could be non-stationary.
Interpretation depends on how fast the autocorrelations decay to 0.
d. Seasonal series with period of 4
2.
t
1
2
3
4
Yt
32.5
36.6
33.3
31.9
Y
t
35.000
34.375
36.306
33.581
et
-2.500
2.225
-3.006
-1.681
3.
a.
Y
61
b.
Y
62
= 75.65
= 76.55
Y
62
= 84.04
Y
63
Y
63
= 87.82
= 84.45
c. 75.65 23.2
4.
a. Model
Autocorrelations
AR
die out
MA
cut off
ARIMA
die out
5.
a. MA(2)
Partial Autocorrelations
cut off
die out
die out
b. AR(1)
c. ARIMA(1,0,1)
6.
Since Q = 44.3 > 19.675, reject H0 and conclude model is not adequate. Also,
there is a significant residual autocorrelation at lag 2. Add a MA term to the
model at lag 2 and fit an ARIMA(1,1,2) model.
7.
172
The least squares estimate of the constant term, .7127, is virtually the same as
The least squares slope coefficient in the straight line fit shown in part a. Also,
The first order moving average coefficient is essentially 1. These two results
are consistent with a straight line time trend regression model for the original data.
Suppose Yt is demand in time period t. The straight line time trend regression
model is: Yt = 0 + 1t + t . Thus Yt 1 = 0 + 1 (t 1) + t 1 and
Yt Yt 1 = 1 + t t 1 . The latter is an ARIMA(0,1,1) model with a constant
term (the slope coefficient in the straight line model) and a first order moving
average coefficient of 1.
There is some residual autocorrelation (particularly at lag 2) for both the straight
line fit and the ARIMA(0,1,1) fit, but the usual residual plots indicate no other
problems.
c. Prediction equations for period 53.
Straight line model: Y53 =19.97 +.71(53)
52
ARIMA model: Y53 =Y52 +.71 1.00
d. The forecasts for the next four periods from forecast origin t = 52 for the
ARIMA model follow.
8.
Since the autocorrelation coefficients drop off after one time lag and the partial
autocorrelation coefficients trail off, an MA(1) model should be adequate. The best
173
model is
Y
= 56.1853 - (-0.7064)t-1
127
= 56.1853 + 0.7064)125
127
The critical 5% chi-square value for 10 df is 18.31. Since the calculated chi-square
Q for the residual autocorrelations equals 7.4, the model is deemed adequate.
The autocorrelation and partial autocorrelation plots for the original series follow.
Autocorrelation Function for Yt
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
10
Lag
Corr
LBQ
0.39
Lag
Corr
20
LBQ
Lag
Corr
LBQ
30
Lag
Corr
LBQ
4.33
19.20
10 -0.12 -1.13
25.16
19 -0.06 -0.51
36.49
28 0.19 1.60
62.90
2 -0.08 -0.80
20.06
11 -0.08 -0.75
26.02
20 -0.13 -1.22
39.26
29 0.03 0.21
63.01
0.06
0.62
20.59
12 0.10
0.95
27.43
21 -0.04 -0.36
39.51
30 -0.05 -0.45
63.51
0.02
0.22
20.65
13 0.14
1.36
30.41
22 -0.11 -1.02
41.53
31 0.08 0.63
64.52
5 -0.07 -0.65
21.24
14 0.02
0.20
30.47
23 -0.25 -2.22
51.35
0.18
21.29
15 0.13
1.21
32.95
24 -0.03 -0.26
51.49
7 -0.01 -0.12
0.02
21.31
16 0.13
1.21
35.49
25
0.50
52.05
8 -0.08 -0.82
22.27
17 0.05
0.47
35.89
26 -0.16 -1.36
56.19
9 -0.08 -0.81
23.23
18 0.03
0.27
36.02
27 -0.06 -0.54
56.87
0.06
174
10
Lag
PAC
20
30
Lag PAC
Lag PAC
0.39
4.33
10 -0.08
-0.90
19 -0.04
-0.40
28
0.05
0.53
2 -0.27
-3.03
11 0.02
0.24
20 -0.10
-1.10
29 -0.11
-1.26
Lag PAC
0.26
2.95
12 0.13
1.41
21 0.05
0.61
30 -0.00
-0.05
4 -0.20
-2.25
13 0.04
0.50
22 -0.20
-2.30
31
0.45
0.08
0.92
14 -0.02
-0.27
23 -0.05
-0.59
6 -0.01
-0.10
15 0.21
2.36
24 0.14
1.53
7 -0.06
-0.65
16 -0.11
-1.18
25 -0.10
-1.14
8 -0.02
-0.22
17 0.17
1.94
26 -0.08
-0.92
9 -0.08
-0.95
18 -0.15
-1.64
27 0.10
1.17
0.04
Forecast
52.3696
56.1853
56.1853
95 Percent Limits
Lower
Upper
44.6754
60.0637
46.7651
65.6054
46.7651
65.6054
Since the autocorrelation coefficients trail off and the partial autocorrelation
coefficients cut off after one time lag, an AR(1) model should be adequate.
The best model is
Y
= 109.628 - 0.9377Yt-1
175
81
= 109.628 - 0.9377Y80
81
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Lag
LBQ
Lag
Corr
1 -0.88 -7.86
64.18
0.17
Corr
10
0.80
4.50 118.31
10
0.05
20
LBQ
Lag
Corr
LBQ
0.60 229.81
15
0.21
0.74
238.24
15
0.19 230.78
0.26
0.90
251.34
2.45 186.36
19
13
0.40
1.47 220.35
0.08
0.28 231.46
176
0.36
1.22
277.15
Lag
10
15
PAC
Lag PAC
1 -0.88
-7.86
8 -0.16
0.13
1.16
9 -0.16
0.28
2.54
10
0.16
1.43
11
Lag
20
PAC
-1.44
15 -0.00
-0.04
-1.44
16
0.03
0.28
0.13
1.19
17
0.01
0.08
0.11
1.01
18 -0.26
-2.32
0.15
1.30
12 -0.02
-0.19
19 -0.10
-0.87
6 -0.04
-0.40
13
0.11
0.98
20
1.42
1.10
14 -0.22
-1.95
0.12
0.16
T
-19.17
179.57
Number of observations: 80
Residuals: SS = 2325.19 (backforecasts excluded)
MS = 29.81 DF = 78
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag
12
24
36
48
Chi-Square 24.8(DF=10) 39.4(DF=22) 74.0(DF=34) 83.9(DF=46)
Period
81
82
83
Forecast
29.9234
81.5688
33.1408
95 Percent Limits
Lower
Upper
19.2199
40.6269
66.8957
96.2419
15.7088
50.5728
The critical 5% chi-square value for 10 df's is 18.31. Since the calculated chi-square
Q for the residual autocorrelations equals 24.8, the model is deemed inadequate. An
examination of the individual residual autocorrelations suggests it might be possible to
improve the model by adding a MA term at lag 2.
177
10.
As can be seen below, the autocorrelations for the original series are slow to die out. This
behavior indicates the series may be non-stationary. The autocorrelations for the
differenced data cut off after lag 1 and the partial autocorrelations die out. This suggests
an ARIMA(0,1,1) model. When this model is fit (see the computer output below), there
are no significant residual autocorrelations and the residual plots look good. The
forecasting equation from the fitted model is
Y
= Yt-1 - (-0.3714)t-1
= Y80 - (-0.3714)80
81 = 266.9 - (-0.3714)(3.4647) = 268.19
81
10
15
LBQ
Lag
Corr
LBQ
Lag
Corr
0.92
8.24
70.42
0.28
0.92 271.99
15 0.01
0.83
4.50 127.89
0.23
0.75 276.74
0.74
3.28 174.43
10
0.18
0.58 279.67
0.65
2.54 210.36
11
0.14
0.45 281.49
0.55
2.00 236.52
12
0.12
0.38 282.83
0.45
1.55 254.16
13
0.09
0.28 283.55
0.35
1.18 265.00
14
0.05
0.16 283.78
Lag
Corr
20
LBQ
0.03 283.80
Type
MA 1
Coef
-0.3714
StDev
0.1052
T
-3.53
95 Percent Limits
Lower
Upper
245.848
291.635
229.885
307.597
218.787
318.695
Forecast
268.741
268.741
268.741
The critical 5% chi-square value for 11 df's is 19.68. Since the calculated
chi-square Q for the residual autocorrelations equals 9.2, the model is deemed adequate.
11.
The slow decline in the early, non-seasonal lags indicates the need for regular
differencing.
Autocorrelation Function for Yt
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
12
Lag
Corr
LBQ
0.71
6.92
49.43
0.63
4.34
88.66
0.63
Lag
Corr
22
LBQ
Lag
Corr
8 0.54
2.07 309.89
15
9 0.50
1.85 337.18
16
3.69 128.66
10 0.45
1.61 359.38
0.62
3.23 168.41
11 0.50
0.63
2.98 210.04
0.56
2.41 242.61
0.59
2.39 279.06
LBQ
Lag
0.40
1.22 506.38
22
0.40
1.20 525.09
23
17
0.42
1.26 546.38
24
1.74 387.14
18
0.33
0.97 559.60
12 0.70
2.34 441.38
19
0.35
1.03 574.97
13 0.49
1.56 468.47
20
0.30
0.87 586.38
14 0.41
1.28 487.93
21
0.27
0.78 595.80
179
Corr
LBQ
Lag
Corr
LBQ
1 -0.35 -3.44
2 -0.17 -1.49
3
0.07
15
LBQ
LBQ
12.19
8 -0.01 -0.11
15.08
9 0.05
0.40
25.78
15 -0.03 -0.19
90.84
26.06
16 -0.05 -0.28
91.10
15.09
10 -0.17 -1.35
29.20
17
98.33
4 -0.03 -0.23
24
15.16
11 -0.29 -2.22
38.14
1.57
18.65
12 0.65
4.80
85.04
19
6 -0.21 -1.78
23.41
13 -0.19 -1.14
89.05
25.76
14 -0.12 -0.73
90.72
21
0.01
0.18
0.15
1.21
Lag
Corr
Lag
Corr
25
0.25
0.14
0.05
1.47
Lag
Corr
LBQ
0.79 107.44
0.26 107.79
180
Lag
Corr
LBQ
1 -0.49 -4.44
20.42
2 -0.03 -0.19
3 0.04 0.30
15
Lag
Corr
LBQ
8 0.07 0.48
20.47
20.61
4 0.03 0.23
20.70
5 -0.10 -0.76
21.64
6 0.09 0.67
7 -0.08 -0.62
Lag
Corr
25
LBQ
23.42
15 -0.03 -0.15
9 0.01 0.08
23.43
10 -0.07 -0.50
23.89
11 0.27 2.00
12 -0.50 -3.48
22.38
23.02
Lag
Corr
LBQ
61.85
22 0.02 0.12
70.43
16 -0.11 -0.66
63.13
23 -0.02 -0.12
70.48
17 0.21 1.22
67.63
24 0.02 0.10
70.51
31.19
18 -0.13 -0.78
69.58
25 0.03 0.20
70.65
55.77
19 0.07 0.40
70.11
13 0.24 1.45
61.38
20 -0.05 -0.28
70.36
14 0.06 0.37
61.78
21 -0.01 -0.07
70.38
Lag
15
25
PAC
Lag PAC
Lag PAC
1 -0.49
-4.44
8 -0.07
-0.66
15 -0.04
2 -0.34
-3.14
9 -0.01
-0.07
16 -0.09
3 -0.21
-1.95
10 -0.09
-0.79
4 -0.09
-0.81
11
0.34
5 -0.17
-1.55
12 -0.32
6 -0.07
-0.64
13 -0.20
7 -0.15
-1.36
14 -0.12
PAC
-0.37
22 -0.04
-0.38
-0.81
23
0.15
1.36
17 0.01
0.09
24 -0.20
-1.79
3.10
18 0.04
0.36
25 -0.05
-0.47
-2.92
19 0.01
0.09
-1.86
20 0.02
0.17
-1.06
21 0.02
0.19
181
Lag
T
10.09
9.85
Forecast
163500
158300
177084
178792
188706
184846
191921
188746
185194
187669
188084
221521
95 Percent Limits
Lower
Upper
146991
180009
141277
175322
159562
194606
160785
196798
170227
207185
165907
203785
172532
211310
168918
208574
164936
205451
166991
208348
166993
209175
200025
243016
The critical 5% chi-square value for 10 df's is 18.31. Since the calculated
chi-square Q for the residual autocorrelations equals 3, the model is deemed adequate.
12.
a. See part b.
b. The autocorrelation coefficient plot below indicates that the data are
non-stationary. Therefore, the data should be first differenced. The
autocorrelation coefficient and partial autocorrelation coefficient plots for
the first differenced data are also shown.
182
12
Lag
Corr
LBQ
Lag
Corr
LBQ
0.87
6.30
42.05
0.22
0.66
143.50
0.76
3.44
74.38
0.16
0.46
145.12
0.66
2.48
99.23
10
0.13
0.37
146.21
0.54
1.84
116.50
11
0.12
0.35
147.18
0.44
1.40
128.06
12
0.08
0.23
147.60
0.34
1.06
135.27
13
0.07
0.19
147.91
0.28
0.85
140.30
12
Lag
Corr
LBQ
Lag
Corr
0.30
2.12
4.77
-0.08
-0.52
6.86
0.08
0.56
5.17
-0.31
-2.01
13.24
0.06
0.37
5.34
10
-0.22
-1.31
16.44
0.03
0.20
5.40
11
-0.01
-0.05
16.45
-0.02
-0.14
5.42
12
-0.06
-0.37
16.74
-0.11
-0.74
6.19
-0.06
-0.41
6.44
183
LBQ
utocorrelation
-0.6
-0.8
-1.0
Lag
PAC
Lag
12
PAC
0.30
2.12
8 -0.06
-0.43
2 -0.00
-0.03
9 -0.30
-2.11
0.04
0.25
10 -0.05
-0.39
0.01
0.04
11
0.10
0.72
5 -0.04
-0.27
12 -0.09
-0.67
6 -0.11
-0.77
7 -0.00
-0.00
T
2.53
P
.015
Forecast
311.560
314.418
95 Percent Limits
Lower
Upper
300.094
323.026
294.895
333.941
d. The residual plots look good and there are no significant residual autocorrelations.
184
53
13.
53
One question that might arise is should the student use the first 145 observations or
all 150 observations. With this many observations, it will not make much difference.
The autocorrelation function using all the data below is slow to die out and suggests
the DEF time series is non-stationary. Therefore, the differenced data should be investigated.
The autocorrelation coefficient and partial autocorrelation coefficient plots for the first
differenced data follow.
185
It appears that the autocorrelations for the differenced data cut off after lag one
and that the partial autocorrelations die out. This suggests a regular MA term in a model
for the differenced data so an ARIMA(0,1,1) model is identified. If 145 observations
are used, the forecasting equation from the fitted model is
Y
= Yt-1 - 0.7179t-1
186
Forecast
133.815
133.814
133.814
133.813
133.813
95% Limits
Lower Upper
128.832 138.797
128.637 138.991
128.450 139.178
128.268 139.358
128.092 139.533
Actual
135.2
139.2
136.8
136.0
134.4
This model fits well. The usual residual analysis indicates no model inadequacies.
Comparing the forecasts with the actuals for the five days from forecast origin t = 145
using MAPE gives MAPE = 1.82%.
14.
187
The sample autocorrelation and partial autocorrelation functions below suggest and
AR(2) or, equivalently, an ARIMA(2,0,0) model. The computer output follows along
with the residual autocorrelation function.
Final Estimates of Parameters
Type
AR 1
AR 2
Constant
Mean
Coef SE Coef
T
P
1.4837
0.0732 20.26 0.000
-0.7619
0.0729 -10.45 0.000
17.181
1.381 12.44 0.000
61.757
4.965
Number of observations: 90
Residuals: SS = 14914.5 (backforecasts excluded)
MS = 171.4 DF = 87
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag
12
24
36
48
Chi-Square 19.9 25.9 41.7 55.9
DF
9
21
33
45
P-Value
0.018 0.209 0.142 0.128
Forecasts from period 90
Period Forecast
91 110.333
95% Limits
Lower
Upper Actual
84.665 136.001
188
The forecast of 110 accidents for the 91st week seems reasonable given the history
of the series near that point.
There is no evidence of annual seasonality in these data but since there is less
than two years of weekly observations, seasonality, if it exists, would be virtually
impossible to detect.
15.
The time series plot that follows suggests the Price series is non-stationary. This
is corroborated by the autocorrelations which are slow to die out. The differenced
series should be investigated.
189
The autocorrelation function for the differenced data below suggests the
differenced series is random. The partial autocorrelation function for the
differenced data has a similar appearance.
An ARIMA(0,1,0) model is identified for the price of corn. For this model
a forecast of the next observation at forecast origin t is given by Yt +1 =Yt . Forecasts
two steps ahead are the same, similarly for three steps ahead and so forth. In other
words, this model produces flat line forecasts whose intercept is given by Yt .
So, forecasts of the price of corn for the next 12 months are all given by the last
observation or 251 cents per bushel.
16.
The variation in the Cavanaugh sales series increases with the level, so a log
transformation seems appropriate. Let Yt be the natural log of sales and
190
5.76675
Jul. 2000
Aug. 2000
Sep. 2000
Oct. 2000
Nov. 2000
Dec. 2000
ForecastLnSales
320
6.11484
6.40039
6.80928
7.09153
7.14969
6.85211
191
ForecastSales
453
602
906
1202
1274
946
The residual autocorrelation a lag 2 can be ignored or, alternatively, can fit the
ARIMA(0,0,2)(0,1,1)12 model.
17.
The variation in Disney sales increases with the level, so a log transformation
seems appropriate. Let Yt be the natural log of sales and Wt = Yt Yt 4 be the
seasonally differenced series. Two ARIMA models that represent the data
reasonably well are given by the representations ARIMA(1,0,0)(0,1,1) 4 and
ARIMA(0,1,1)(0,1,1)4. The former model contains a constant. The results for
the ARIMA(1,0,0)(0,1,1)4 process are displayed below.
Fitted model: Wt = .50Wt 1 + .089 + t .49t 4
Final Estimates of Parameters
Type
Coef SE Coef
T
P
AR 1
0.4991
0.1164
4.29 0.000
SMA 4
0.4863
0.1196
4.07 0.000
Constant 0.0886
0.0063 14.07 0.000
Differencing: 0 regular, 1 seasonal of order 4
Number of observations: Original series 63, after differencing 59
Forecasts: Date
ForecastLnSales
Q4 1995
8.25008
Q1 1996
8.12423
Q2 1996
8.11642
Q3 1996
8.24372
Q4 1996
8.43698
192
ForecastSales
3828
3375
3349
3804
4615
18.
The data were transformed by taking natural logs; however, an ARIMA model
may be fit to the original observations. Let Yt be the natural log of demand
and let Wt = 12Yt = Yt Yt 1 Yt 12 +Yt 13 be the series after taking one seasonal
difference followed by a regular difference. An ARIMA(0,1,1)(0,1,1) 12 model
represents the log demand series well. The results follow.
Fitted model: Wt = t .63t 1 .57t 12 + (.63)(.57)t 13
Final Estimates of Parameters
Type
Coef SE Coef
T
P
MA 1
0.6309
0.0724
8.71 0.000
SMA 12
0.5735
0.0849
6.75 0.000
Differencing: 1 regular, 1 seasonal of order 12
Number of observations: Original series 129, after differencing 116
Forecasts: Date
ForecastLnDemand
Oct. 1996
5.23761
Nov. 1996
5.29666
Dec. 1996
5.33704
193
ForecastDemand
188
200
208
19.
Lag
12
24
36
Chi-Square 13.2 19.3 26.2
DF
11
23
35
P-Value
0.280 0.681 0.858
48
52.6
47
0.266
20.
Forecast
73653.4
73448.7
72571.8
72904.3
73200.8
73711.5
74218.7
75021.6
75459.7
75114.5
74519.0
74681.4
95% Limits
Lower
Upper
73166.1 74140.6
72759.7 74137.8
71727.9 73415.7
71929.8 73878.7
72111.4 74290.3
72518.1 74905.0
72929.6 75507.7
73643.5 76399.7
73998.0 76921.4
73573.8 76655.3
72903.0 76134.9
72993.6 76369.2
The variation in Wal-Mart sales increases with the level, so a log transformation
seems appropriate. Let Yt be the natural log of sales and Wt = Yt Yt 4 be the
seasonally differenced series. Examination of the autocorrelation function for Wt
leads to the identification of an ARIMA(0,1,0)(0,1,1) 4 model.
The results follow.
Fitted model: Wt = t .52t 4
Final Estimates of Parameters
195
Type
Coef SE Coef
T
P
SMA 4 0.5249
0.1185 4.43 0.000
Differencing: 1 regular, 1 seasonal of order 4
Number of observations: Original series 60, after differencing 55
Residuals:
21.
Forecasts
LnSales
Sales
11.1671 70,764
11.2514 76,988
11.2408 76,176
11.4233 91,427
11.2660 78,120
11.3503 84,991
11.3397 84,095
11.5223 100,942
196
Summary of model fit and forecasts for the next 5 years follow.
Final Estimates of Parameters
Type
Coef SE Coef
T
P
AR 1
0.5486
0.0845 6.49 0.000
Constant 9.0295
0.6082 14.85 0.000
Mean
20.001
1.347
Number of observations: 100
Residuals:
24
23.2
36
30.7
48
45.9
197
DF
P-Value
10
22
34
46
0.358 0.388 0.631 0.477
22.
Forecast
21.6463
20.9037
20.4964
20.2729
20.1504
20.0831
95% Limits
Lower Upper
9.7236 33.5691
7.3049 34.5026
6.4323 34.5605
6.0718 34.4741
5.9082 34.3925
5.8287 34.3376
Since the variation in the series increases with the level, a log transformation is indicated.
An examination of the autocorrelations and partial autocorrelations for LnGapSales leads
to the identification of an ARIMA(0,1,0)(0,1,1) 4 model. Summary of model fit and
forecasts for the next 8 quarters follow.
Final Estimates of Parameters
Type
Coef SE Coef
T
P
SMA 4 0.2780 0.1004 2.77 0.007
Differencing: 1 regular, 1 seasonal of order 4
Number of observations: Original series 100, after differencing 95
Residuals:
12
24
36
48
198
23.
The long strings of 0s (no Influenza A positive cases) of uneven lengths might create
identification and fitting problems for ARIMA modeling. On the other hand, a simple
AR(1) model with an AR coefficient of about .8 and no constant term might provide
reasonable one week ahead forecasts for the number of positive cases. These forecasts
can be generated with the understanding that any non-integer forecast less than 1 is set
to 0 and any non-integer forecast greater than 1 is rounded to the closest integer.
Coef
SE Coef
P
199
AR 1
0.5997
Constant 1921.7
Mean
4800.8
Forecast
3249.49
3870.48
4242.89
4466.23
95% Limits
Lower Upper Actual
1251.36 5247.62
2431
1540.58 6200.38
2796
1804.68 6681.10
4432
1990.23 6942.23
5714
Forecasts are too high for first two weeks of January 1983 and too low for next
Two weeks. Note however, that actual sales fall within the 95% prediction
interal limits for each of the four weeks.
4.
5.
The best model in Chapter 8 for the original Restaurant Sales data is an autoregressive
model with an added dummy variable to represent the period during the year when
Marquette University is in session. So, because of the additional dummy variable, this
model fits the data better than the AR(1) model in part 1. If the dummy variable were not
present, the two models would be the same. Consequently, we would expect better
forecasts with the AR + dummy variable model than with the simple AR model.
Regardless, however, if forecasts are compared to actuals from forecast origin 104 (last
week in 1982), the usual measures of forecast accuracy (RMSE, MAPE, etc.) are likely
to be relatively large since a large portion of the variation in sales is not accounted for
by the AR + dummy variable model.
At the very least the parameters in the AR(1) model should be re-estimated if the
new data are combined with the old data. A better approach is to combine the data
and the go through the usual ARIMA model building process again. It may be the
combined data suggest the form of the ARIMA has changed. In this case, an AR(1)
is still appropriate when the new data are combined with the old data.
Box-Jenkins ARIMA models account for the autocorrelation in the observed series using
200
possibly differenced data, lagged dependent variables and current and previous errors.
There are no potential causal (exogenous) independent variables in these models so they
are often difficult to explain to management. Best to demonstrate the results.
2.
Autocorrelation and partial autocorrelation plots for the regular and seasonally
differenced data suggest a non-seasonal AR(2) term (the partial autocorrelations cut
off after lag 2 and the autocorrelations die out). No seasonal MA or AR terms should
be included. However, here is a case where, say, the ARIMA(2, 1, 0)(0, 1, 0) 12
model is more complex than necessary and a much simpler model works well. A time
series plot of the seasonally differenced Mr. Tux data is shown below along with the
sample autocorrelation function for these differences.
3.
To fit the model Wt = Yt Yt 12 = 0 + t to the Mr. Tux data, simply set 0 = W , the
0 = 32,174 . Since the residuals from this model
mean of the seasonal differences. Here
0 = 32,174 , the residual
differ from the seasonal differences by the constant
autocorrelation function will be identical to the autocorrelation function for the seasonal
differences shown in part 2. The forecasting equation is simply Yt =32,174 +Yt 12 .
Setting t = 97 through t = 108, we have the forecasts for the 12 months of 2006:
Y97 =32,174 +71,043 =103,217
Y98 =185,104
Y = 282,733
99
Y100
Y101
Y102
Y
=441,741
=426,921
=305,048
Y104 =407,576
Y105 = 227,583
Y
=205,692
106
Y107 =213,876
Y108 = 290,887
The sales forecasts for 2006 are obtained by adding 32,174 to the sales for
each of the 12 months of 2005.
CASE 9-3: CONSUMER CREDIT COUNSELING
2.
The autocorrelation function plot below indicates that the data are non-stationary.
The autocorrelations are slow to die out. In addition, there is a spike at lag 12 and a
smaller spike at lag 24 indicating some seasonality.
202
The autocorrelation functions for the differenced series (DiffClients), the seasonally
differenced series (Diff12Clients) and the series with one regular and one seasonal
difference (DiffDiff12Clients) follow.
203
SE Coef
T
P
0.1055 4.37 0.000
Lag
12
24
36
48
Chi-Square 10.9 20.3 30.8 37.4
DF
11
23
35
47
P-Value
0.452 0.623 0.669 0.842
Forecasts from period March 1993
Period Forecast
Apr 1993 123.181
May 1993 122.960
Jun 1993
140.803
Jul 1993
150.944
Aug 1993 140.056
Sep 1993
134.285
Oct 1993
146.517
Nov 1993 146.953
Dec 1993
126.243
95% Limits
Lower
Upper
70.931 175.431
70.710 175.210
88.553 193.053
98.694 203.194
87.806 192.306
82.035 186.535
94.267 198.767
94.703 199.203
73.993 178.493
The forecast for 1961 using the AR(2) model is 1290. The revised error
measures are:
MAD = 114
2.
MAPE = 7.1%
The results from fitting an ARIMA(1,1,0) model, one step ahead forecasts and
actuals follow.
Final Estimates of Parameters
Type
Coef SE Coef
AR 1 0.4551
0.1408
T
P
3.23 0.002
36
25.3
35
0.885
48
*
*
*
Actual
Forecast
Error
205
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1984
1787
1689
1866
1896
1684
1633
1657
1569
1390
1397
1289
1905
2018
1697
1644
1947
1910
1588
1610
1668
1529
1309
1400
79
-231
-8
222
-51
-226
45
47
-99
-139
88
-111
MAPE = 6.9%
the same as those for the AR(2) model. The choice of one model over the other depends
upon whether one believes the sales series in non-stationary or nearly non-stationary.
3.
206
Fitted model:
Yt = Yt 12 +50.479 +t .792t 12
207
208
3.
209
Fitted model:
210
SE Coef
T
P
0.3022 -2.11 0.046
24
*
*
*
36 48
* *
* *
* *
This model was suggested by an examination of the plots of the autocorrelation and
partial autocorrelation functions for the original series and the first differenced series.
Another potential model is an ARIMA(1,0,0)(0,0,1) 12 model. But if this model is fit to
the data, the estimate of the autoregressive parameter turns out to be very nearly 1,
confirming the choice of the initial ARIMA(0,1,0)(0,0,1) 12.
2.
The model in part 1 is adequate. The is no residual autocorrelation and the residual plots
that follow look good.
211
3.
Forecast
426280
492809
527275
535656
545614
692161
554640
494570
484265
471355
462995
95% Limits
Lower
Upper
242397 610163
232759 752859
208780 845770
167890 903422
134439 956789
241741 1142580
68131 1041149
-25530 1014669
-67384 1035914
-110135 1052844
-146876 1072867
212
37
491232
-145757 1128222
The pattern of the forecasts is reasonable but the forecast of the seasonal peak in
December (recall this series starts in June) is very likely to be much too low. The
actual December peak may be captured by the 95% prediction limits but, because of
the small sample size, these limits are wide. The lower prediction limit is even
negative for some lead times.
4.
The sample size in this case is small. With only two years of monthly data, it is
difficult to estimate the seasonality precisely. Although an ARIMA model
does provide some insights into the nature of this series, another modeling approach
may produce more readily acceptable forecasts.
SE Coef
T
P
0.1910 3.74 0.001
Lag
12
24
Chi-Square 7.4
14.2
DF
11
23
P-Value 0.770 0.921
36
*
*
*
48
*
*
*
Cookie sales have a strong and quite consistent seasonal component but with
little or no growth. Following the usual pattern of looking at autocorrelations
and partial autocorrelations for the original series and its various differences, the
best patterns for model identification appear to be those for the original series and
the seasonally differenced series. In either case, a seasonal moving average term of
order 12 is included in the model to accommodate seasonality and can be deleted if
non-significant. Fitting an ARIMA(1,0,0)(0,0,1)12 model gives an estimated
autoregressive coefficient of about .9, suggesting perhaps a model with a regular
difference, residual autocorrelations and unattractive forecasts. This line of
inquiry is not useful. The ARIMA model above involving the seasonally
differenced data fits well and, as we shall see, produces reasonable forecasts.
2.
214
3.
The forecasts for the next 12 months follow. Judging from the time series plot,
they seem very reasonable.
Forecasts from period 41
Period
42
43
44
45
46
47
48
49
50
51
52
53
Forecast
627865
721336
658579
1533503
1628889
2070440
1805503
778148
534265
525169
697168
624876
95% Limits
Lower
Upper
328983 926748
422453 1020219
359696 957461
1234620 1832386
1330007 1927772
1771557 2369323
1506620 2104385
479265 1077031
235382 833148
226286 824052
398285 996051
325994 923759
215
Various plots follow. Given these plots, Marys initial model seems reasonable.
216
217
2.
Lag
12
24
Chi-Square 21.2
53.0
DF
10
22
P-Value
0.020 0.000
36
72.9
34
0.000
48
88.1
46
0.000
95% Limits
Lower
Upper
1223.70 1615.49
1205.16 1670.99
1121.28 1650.90
1083.27 1669.79
1140.30 1778.66
1088.12 1774.41
999.88 1730.98
1069.83 1843.14
917.79 1731.12
877.70 1729.17
998.71 1886.68
888.71 1811.74
219
Collectively, the residual autocorrelations are larger than they would be for random
errors; however, they suggest no obvious additional terms to add to the ARIMA model.
Apart from the large residual at month 68, the residual plots look good. The forecasts
seem reasonable but the 95% prediction limits are fairly wide.
3.
Total visits for fiscal years 4, 5 and 6 seem somewhat removed from the rest of the data.
Total visit for these fiscal years are, as a group, somewhat larger than the remaining
observations. Did something unusual happen during these years? Was total visits
defined differently? This particular feature makes modeling difficult.
220
CHAPTER 10
JUDGMENTAL FORECASTING AND FORECAST ADJUSTMENTS
ANSWERS TO PROBLEMS AND CASES
1.
The Delphi method can be used in any forecasting situation where there is little or no
historical data and there is expert opinion (experience) available. Two examples might
be:
a. Month
1
2
3
4
5
6
7
8
9
10
Averaged Forecast
4924.5
5976.0
6769.0
4708.0
4964.0
6102.0
8212.5
6178.5
4806.5
4228.5
b. Month
1
2
3
4
5
6
7
8
9
10
WtAvg Forecast
4721.4
5956.8
6731.2
4601.2
4991.6
6385.2
8362.2
6320.4
4596.8
4474.2
The nave forecasting model is not very accurate. The MSE equals
8,648,047,253.
2.
The MSE for the multiple regression model (from the regression output)
equals 2,097,765,646 which is quite a bit less than the nave model.
3.
If the nave approach had been more accurate, combining methods would have
been worth a try.
4.
If Julie did combine forecasts, she should use a weighted average that definitely
favored the multiple regression model.
These articles are more abundant than many realize. More "popular journals,
particularly financial markets titles such as Technical Analysis of Stocks &
222
The interested student with access to a neural network simulator should enjoy
this assignment. In addition to the "backpropagation" approach, students might
try radial basis functions and least mean squares if they are available.
3.
CHAPTER 11
MANAGING THE FORECASTING PROCESS
ANSWERS TO PROBLEMS AND CASES
223
1.
a. One response: Forecasts may not be right, but they improve the odds of being
close to right. More importantly, if there are no agreed upon set of forecasts to
drive planning, then different groups may develop own procedures to guide
planning with potential chaos as the result.
b. One response: AnalogyIf you think education is expensive, try ignorance.
Having a good set of forecasts is like walking while looking ahead instead of at
your shoes. Planning without forecasts will lead to inefficient operations, sub
optimal returns on investment, poor customer service, and so forth.
c. One response: Good forecasts require not only good quantitative skills, they also
require an in-depth understanding of the business or, more generally, the
forecasting environment and, ultimately, good communication skills to sell
forecasts to management.
This case invites students to think about how to use some of the forecasting techniques
discussed in Chapter 11. Guy Preston is trying to get his managers to think about the
long-range position of the company, as opposed to the short range thinking that most
managers are involved in on a daily basis. The case might generate a class discussion
about the tendency of managers to shorten their planning horizons too much in the
daily press of business.
Guy has asked his managers to write scenarios for the future: a worst case, a status quo,
and a most likely scenario. His next task might be to discuss each of these three
possibilities, and to discuss any differences of opinion that might emerge. A second
round of written scenarios by each participant could then follow this.
2.
The instructor should point out that the purpose of Guy's retreat is to expand the
planning horizon of his managers. He should be prepared to continue this effort after
the first round of written scenarios: it is quite possible that his team is still caught
up in the affairs of the day and is not really engaged in long range thinking. He should
encourage expanded thinking after the discussion phase and try during the day to continue
such thinking.
3.
There are two possible benefits from Guy's retreat. First, he may gain valuable insights
into the company's future to use in his own long range thinking. Second, and
probably more important, his managers may come away with an increased
awareness of the importance of expanding their planning horizons. If this is true,
the company will probably be in a better position to face the future.
case ( = 0), would expect Holts procedure to fit and forecast better here.
Therefore, there is no reason to consider a combination of forecasts. Combining
forecasts is best considered when the sets of forecasts are produced by different
procedures.
2.
Jill should definitely update her historical data as new data points arrive. Since she
is using a computer program to do the forecasting, there would be very little effort
involved in this process. Why not update and re-run every quarter for a while?
3.
After the results for a few additional quarters (say 4) become available, the analysis
can be re-done to see if the current model is still viable. Model parameters can be
re-estimated after each new observation if appropriate computer software is available.
4.
Box-Jenkins ARIMA methodology is not well suited for small sample sizes and
can be difficult to explain to a non-statistician.
This case illustrates the practical problems that are typically encountered when
attempting to forecast a time series in a business setting. Among the problems Jill
encounters are:
She chooses to forecast a national variable for which data values are available
in the Survey of Current Business. Will this variable correlate well with the
actual Y value of interest (her firm's export sales)?
Her initial sample size is only 13.
When she attempts to gather more data, she finds that the series underwent a
definition change during the recent past, resulting in inconsistent data. She must
shift her focus to another surrogate variable.
Her data plot indicates a bump in the data and she decides a more
consistent series would result if she dropped the first few data points.
A real life-forecasting project could very likely involve difficulties such as those
Jill encountered in this case, or perhaps even more. For this reason this case is a "good
read" for forecasting students as they finish their studies since it shows that judgment
and skill must be involved in the forecasting effort: forecasting problems are not usually
as clean and straightforward as textbook problems.
end of each chapter instead of contrived data. We didn't know what would happen when we tried
to forecast this variable, but we think it turned out well because no one method was superior.
The case in Chapter 11 summarizes the different ways John used to forecast his monthly
sales, and asks students to comment on his efforts. We think a key point is that a lot of real data
sets do not lend themselves to accurate forecasting, and that continually trying different methods is
required. For the Mr. Tux data, there are fairly simple seasonal models (see the cases in Chapters
8 and 9) that represent the data well and provide reasonable forecasts.
What advice should we give to John Mosby for the future? Some suggestions to offer
might include:
1. Update the data set as future monthly values become available and re-run the
most promising analyses to see if the current forecasting model is still viable.
2. Consider combining forecasts from two different methods.
3. Try to develop a useful relationship between monthly sales and regional
economic variables. Perhaps the area unemployment rate or an economic activity
index would correlate well with John's sales. Perhaps some demographic
variables would correlate well. If several variables were collected over the
months of John's sales data, a good regression equation might result.
This would allow John to understand how is sales are tied to the local
environment.
CASE 11-5: ALOMEGA FOOD STORES
1.
Julie has to choose between two different methods of forecasting her companys
monthly sales. Students should review the results of these two efforts and decide
which offers the better choice. We find that class presentations by student teams
are valuable as they move the analysis beyond the computer results to simulate
implementing these results in a real situation.
2.
3.
4.
227