Bus Analytics Chapter 3
Bus Analytics Chapter 3
Bus Analytics Chapter 3
Linear Regression
OBJECTIVE:
• Explain and demonstrate how linear regression is used to understand the relationship
between variable and independent variable.
“The goal is to turn data into information, and information into insight."
Managerial decisions are often based on the relationship between two or more variables.
Examples:
✓ after considering the relationship between advertising expenditures and sales, a
marketing manager might attempt to predict sales for a given level of advertising
expenditures
✓ a public utility might use the relationship between the daily high temperature and the
demand for electricity to predict electricity usage on the basis of next month’s
anticipated daily high temperatures
Regression Analysis
- is the study of relationships between variables
- a statistical procedure which can be used to develop an equation showing how the
variables are related
Example:
In analyzing the effect of advertising expenditures on sales, a marketing manager’s desire to
predict sales would suggest making sales the dependent variable. Advertising expenditure
would be the independent variable used to help predict sales
1
Business Analytics 2nd Semester 2021-2022
Simple Regression - A regression analysis involving one independent variable and one
dependent variable
In statistical notation y denotes the dependent variable and x denotes the independent variable.
Linear Regression - A regression analysis for which any one unit change in the independent
variable, x, is assumed to result in the same change in the dependent variable, y.
In the Butler Trucking Company example, the population consists of all the driving
assignments that can be made by the company. For every driving assignment in the
population, there is a value of x (miles traveled) and a corresponding value of y (travel time in
hours).
β0 and β1 are characteristics of the population and so are referred to as the parameters of
the model
ε, (the Greek letter epsilon) is a random variable referred to as the error term.
2
Business Analytics 2nd Semester 2021-2022
The error term accounts for the variability in y that cannot be explained by the linear
relationship between x and y.
A random variable is the outcome of a random experiment (such as the drawing of a random
sample) and so represents an uncertain outcome.
Regression Equation – is the equation that describes how the expected value of y, denoted
E(y), is related to x. The regression equation for simple linear regression follows:
Where:
E(y|x) is the expected value of y for a given value of x; is the mean or expected value of y for
a given value of x
β0 is the y-intercept of the regression line The graph of the simple linear
The regression line in Panel A shows that the mean value of y is related positively to x, with
larger values of E(y|x) associated with larger values of x.
In Panel B, the mean value of y is related negatively to x, with smaller values of E(y|x)
associated with larger values of x.
3
Business Analytics 2nd Semester 2021-2022
In Panel C, the mean value of y is not related to x; that is, E(y|x) is the same for every value
of x.
Substituting the values of the sample statistics b0 and b1 for β0 and β1 in the regression
equation, we obtain the estimated regression equation.
The estimated regression equation for simple linear regression follows:
Estimated Regression Line - The graph of the estimated simple linear regression equation
The Estimation Process in Simple Linear Regression
Example:
To estimate the mean or expected value of travel time for a driving assignment of 75 miles,
Butler trucking would substitute the value of 75 for x in the equation above. In some cases,
however, Butler Trucking may be more interested in predicting travel time for an upcoming
driving assignment of a particular length.
4
Business Analytics 2nd Semester 2021-2022
For example, suppose Butler Trucking would like to predict travel time for a new 75-mile
driving assignment the company is considering.sTo predict travel time for a new 75-mile
driving assignment, Butler Trucking would also substitute the value of 75 for x in the equation.
The value provides both a point estimate of E(y|x) for a given value of x and a prediction of
an individual value of y for a given value of x.
We see that driving assignment 1, with x1 = 100 and y1 = 9.3, is a driving assignment of 100
miles and a travel time of 9.3 hours.
Driving assignment 2, with x2 = 50 and y2 = 4.8, is a driving assignment of 50 miles and a travel
time of 4.8 hours.
The shortest travel time is for driving assignment 5, which requires 50 miles with a travel time
of 4.2 hours.
5
Business Analytics 2nd Semester 2021-2022
Miles traveled is
shown on the
horizontal axis, and
travel time (in hours) is
shown on the vertical
axis.
Scatter charts for regression analysis are constructed with the independent variable x on the horizontal
axis and the dependent variable y on the vertical axis.
The scatter chart enables us to observe the data graphically and to draw preliminary
conclusions about the possible relationship between the variables.
We therefore choose the simple linear regression model to represent this relationship. Given
that choice, our next task is to use the sample data in the table above to determine the values
of b0 and
b1 in the estimated simple linear regression equation.
For the ith driving assignment, the estimated regression equation provides
6
Business Analytics 2nd Semester 2021-2022
ŷi = b0 + b1x1
where:
ŷi = predicted travel time (in hours) for the ith driving assignment
b0 = the y-intercept of the estimated regression line
b1 = the slope of the estimated regression line
x1 = miles traveled for the ith driving assignment
With yi denoting the observed (actual) travel time for driving assignment i and ŷi
representing the predicted travel time for driving assignment i, every driving assignment in the
sample will have an observed travel time yi and a predicted travel time ŷi.
For the estimated regression line to provide a good fit to the data, the differences
between the observed travel times yi and the predicted travel times ŷi should be small.
The least squares method uses the sample data to provide the values of b0 and b1 that
minimize the sum of the squares of the deviations between the observed values of the
dependent variable yi and the predicted values of the dependent variable ŷi.
The criterion for the least squares method is given by the following equation:
This is known as the least squares method for estimating the regression equation.