Statistical Modeling Dependent Variable Independent Variables
Statistical Modeling Dependent Variable Independent Variables
Statistical Modeling Dependent Variable Independent Variables
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent
variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how
the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied,
while the other independent variables are held fixed.
The linear regression technique works with any two variables. But in forecasting, one of your variables is time and the other is the
variable for which you need the forecast. For example, for a sales forecast, assume that at the end of month one your sales were at
12,000 units. At the end of months two, three and four, sales were at 14,000, 15,000 and 17,000. The following example uses linear
regression to forecast sales for months five and six.
Define the number of months as x and your monthly sales in thousands as y. In the example, your data points (x,y) are (1,12), (2,14),
(3,15) and (4,17). The first step is to total all the x values and all the y values and find the average of each. For the example, define
your total sales in thousands as Yt, which equals 58. Define the total number of months as Xt, equal to 10. The average sales, called
Ya, were 58/4 = 14.5. The average number of months, called Xa, were 10/4 = 2.5.
Calculate the squares of each x value, the total of the squares of x, the products of each x and y value pair and the total of the
products. For the example, the squares of the x values are 1, 4, 9, and 16, and their sum is 30. Call this total X2t. The products of
each x and y value pair are 1 x 12, 2 x 14, 3 x 15 and 4 x 17. The results are 12, 28, 45, 68 and the sum is 153. Call this value XYt.
To find b and c in the equation y = bx + c, calculate Sxx, which is the sum of the squares of x, X2t = 30, minus the square of the sum
of the x values, Xt squared = 100, divided by the number of data points, which is four. Sxx = 30 - 100/4 = 5. Calculate Sxy, which is the
sum of the products of x and y, XYt = 153, minus the sum of the x values, Xt = 10, times the sum of the y values, Yt = 58, divided by
the number of data points, which is four. Sxy = 153 - 580/4 = 8.
The constant b = Sxy/Sxx = 1.6. The constant c = the average of y, Ya = 14.5, minus b times the average of x, Xa = 2.5; c = 14.5 - 2.5 x
1.6 = 10.5. The equation for your sales forecast in thousands is y = 1.6x + 10.5. The sales forecast for month 5 is 1.6 times 5 plus 10.5
= 18.5 and the sales forecast for month 6 is 1.6 times 6 plus 10.5 = 20.1. Your sales under present trends will be 18,500 and 20,100 in
months five and six.
First, the group facilitator selects a group of experts based on the topic being examined. Once all participants are confirmed, each
member of the group is sent a questionnaire with the instructions to comment on each topic based on their personal opinion,
experience or previous research. The questionnaires are returned to the facilitator who groups the comments and prepares copies
of the information. A copy of the compiled comments is sent to each participant, along with the opportunity to comment further.
At the end of each comment session, all questionnaires are returned to the facilitator who decides if another round is necessary or if
the results are ready for publishing. The questionnaire rounds can be repeated as many times as necessary to achieve a general
sense of consensus.
A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time
intervals. Examples of time series are the daily closing value of the Dow Jones Industrial Average, the annual flow
volume of the Nile River at Aswan etc. Time series are used in statistics, signal processing, pattern
recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography,
control engineering, astronomy, and communications engineering etc. Clearly the application of time series forecasting
and analysis spans across multiple domains and businesses.
Time series forecasting methods produce forecasts based solely on historical values and they are widely used in business
situations where forecasts of a year or less are required. These methods used are particularly suited to Sales, Marketing,
Finance, Production planning etc and they have the advantage of relative simplicity, but certain factors need to be
considered:
Time series methods are better suited for short-term forecasts (i.e., less than a year).
Time series forecasting relies on sufficient past data being available and that the data is of a high quality and truly representative.
Time series methods are best suited to relatively stable situations. Where substantial fluctuations are common and underlying
conditions are subject to extreme change, then time series methods may give relatively poor results.
Averaging methods
If a time series is generated by a constant process subject to random error, then mean is a useful
statistic and can be used as a forecast for the next period.
Averaging methods are suitable for stationary time series data where the series is in equilibrium around
a constant value ( the underlying mean) with a constant variance over time.
The simplest exponential smoothing method is the single smoothing (SES) method where only one
parameter needs to be estimated
Holt’s method makes use of two different parameters and allows forecasting for series with trend.
Holt-Winters’ method involves three smoothing parameters to smooth the data, the trend, and the
seasonal index.
moving average
A large k is desirable when there are wide, infrequent fluctuations in the series.
A small k is most desirable when there are sudden shifts in the level of series.
For quarterly data, a four-quarter moving average, MA(4), eliminates or averages out seasonal effects.
A large k is desirable when there are wide, infrequent fluctuations in the series.
A small k is most desirable when there are sudden shifts in the level of series.
For quarterly data, a four-quarter moving average, MA(4), eliminates or averages out seasonal effects.
( yt yt 1 yt 2 yt k 1 )
Ft 1 yˆ t 1
K
1 t
Ft 1 yi
k i t k 1
K is the number of terms in the moving average.
The moving average model does not handle trend or seasonality very well although it can do better than the total mean
The weekly sales figures (in millions of dollars) presented in the following table are used by a major department
store to determine the need for temporary sales personnel.
Exponential Smoothing Methods
This method provides an exponentially weighted moving average of all previously observed values.
The aim is to estimate the current level and use it as a forecast of future value.
= smoothing constant.
The forecast Ft+1 is based on weighting the most recent observation yt with a weight and weighting the
most recent forecast Ft with a weight of 1-
The implication of exponential smoothing can be better seen if the previous equation is expanded by replacing Ft
with its components as follows: F y (1 ) F
t 1 t t
yt (1 )[ yt 1 (1 ) Ft 1 ]
yt (1 ) y t 1 (1 ) 2 Ft 1
If this substitution process is repeated by replacing Ft-1 by its components, Ft-2 by its components, and so
on the result is:
Ft 1 yt (1 ) y t 1 (1 ) 2 y t 2 (1 )3 y t 3 (1 )t 1 y1
Therefore, Ft+1 is the weighted moving average of all past observations.
The following table shows the weights assigned to past observations for = 0.2, 0.4, 0.6, 0.8, 0.9
The exponential smoothing equation rewritten in the following form elucidate the role of weighting factor .
Ft 1 Ft ( yt Ft )
Exponential smoothing forecast is the old forecast plus an adjustment for the error that occurred in the last
forecast.
The value of smoothing constant must be between 0 and 1
If stable predictions with smoothed random variation is desired then a small value of is desire.
If a rapid response to a real change in the pattern of observations is desired, a large value of is appropriate.
To estimate , Forecasts are computed for equal to .1, .2, .3, …, .9 and the sum of squared forecast error is
computed for each.
The value of with the smallest RMSE is chosen for use in producing the future forecasts.
Use the average of the first five or six observations for the initial smoothed value.
Period (t) Sales (y)
1 5.3
Weekly Sales
2 4.4
3 5.4 8
4 5.8
5 5.6 7
6 4.8
7 5.6 6
8 5.6
9 5.4
5
10 6.5
11 5.1
Sales
4 Sal…
12 5.8
13 5
3
14 6.2
15 5.6
2
16 6.7
17 5.2
1
18 5.5
19 5.8
20 5.1 0
0 5 10 15 20 25 30
21 5.8 Weeks
22 6.7
23 5.2
24 6
25 5.8
Use a three-week moving average (k=3) for the department store sales to forecast for the week 24 and 26.