Bus Analytics Chapter 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Business Analytics 2nd Semester 2021-2022

Linear Regression
OBJECTIVE:
• Explain and demonstrate how linear regression is used to understand the relationship
between variable and independent variable.

TIME FRAME: 7.5 hours

“The goal is to turn data into information, and information into insight."

- Carly Fiorina, ex CEO of Hewlett-Packard

Managerial decisions are often based on the relationship between two or more variables.
Examples:
✓ after considering the relationship between advertising expenditures and sales, a
marketing manager might attempt to predict sales for a given level of advertising
expenditures
✓ a public utility might use the relationship between the daily high temperature and the
demand for electricity to predict electricity usage on the basis of next month’s
anticipated daily high temperatures

Regression Analysis
- is the study of relationships between variables
- a statistical procedure which can be used to develop an equation showing how the
variables are related

Dependent Variable (response or target) – is the variable being predicted


- the single variable being explained by the regression
Independent Variables (predictor or explanatory variables) - the variables being used to
predict the value of the dependent variable
- used to explain the dependent variable

Example:
In analyzing the effect of advertising expenditures on sales, a marketing manager’s desire to
predict sales would suggest making sales the dependent variable. Advertising expenditure
would be the independent variable used to help predict sales

1
Business Analytics 2nd Semester 2021-2022

Simple Regression - A regression analysis involving one independent variable and one
dependent variable

In statistical notation y denotes the dependent variable and x denotes the independent variable.

Linear Regression - A regression analysis for which any one unit change in the independent
variable, x, is assumed to result in the same change in the dependent variable, y.

Multiple Regression - Regression analysis involving two or more independent variables

THE SIMPLE LINEAR REGRESSION MODEL


Butler Trucking Company is an independent trucking company in southern California.
A major portion of Butler’s business involves deliveries throughout its local area.
To develop better work schedules, the managers want to estimate the total daily travel
times for their drivers. The managers believe that the total daily travel times (denoted by y)
are closely related to the number of miles traveled in making the daily deliveries (denoted by
x).
Using regression analysis, we can develop an equation showing how the dependent
variable y is related to the independent variable x.

Regression Model and Regression Equation


Regression Model - The equation that describes how y is related to x and an error term

In the Butler Trucking Company example, the population consists of all the driving
assignments that can be made by the company. For every driving assignment in the
population, there is a value of x (miles traveled) and a corresponding value of y (travel time in
hours).

The regression model used in simple linear regression follows:

β0 and β1 are characteristics of the population and so are referred to as the parameters of
the model

ε, (the Greek letter epsilon) is a random variable referred to as the error term.

2
Business Analytics 2nd Semester 2021-2022
The error term accounts for the variability in y that cannot be explained by the linear
relationship between x and y.

A random variable is the outcome of a random experiment (such as the drawing of a random
sample) and so represents an uncertain outcome.

Regression Equation – is the equation that describes how the expected value of y, denoted
E(y), is related to x. The regression equation for simple linear regression follows:

Where:
E(y|x) is the expected value of y for a given value of x; is the mean or expected value of y for
a given value of x

β0 is the y-intercept of the regression line The graph of the simple linear

β1 is the slope regression equation is a straight


line.

Examples of possible regression lines are shown below:


Possible Regression Lines in Simple Linear Regression

The regression line in Panel A shows that the mean value of y is related positively to x, with
larger values of E(y|x) associated with larger values of x.
In Panel B, the mean value of y is related negatively to x, with smaller values of E(y|x)
associated with larger values of x.

3
Business Analytics 2nd Semester 2021-2022
In Panel C, the mean value of y is not related to x; that is, E(y|x) is the same for every value
of x.

Estimated Regression Equation


Sample Statistics (denoted b0 and b1) - are computed as estimates of the population
parameters β0 and β1

Substituting the values of the sample statistics b0 and b1 for β0 and β1 in the regression
equation, we obtain the estimated regression equation.
The estimated regression equation for simple linear regression follows:

Simple Linear Regression Estimated Regression Equation


b0 is the estimated y-intercept
b1 is the estimated slope

Estimated Regression Line - The graph of the estimated simple linear regression equation
The Estimation Process in Simple Linear Regression

Example:
To estimate the mean or expected value of travel time for a driving assignment of 75 miles,
Butler trucking would substitute the value of 75 for x in the equation above. In some cases,
however, Butler Trucking may be more interested in predicting travel time for an upcoming
driving assignment of a particular length.
4
Business Analytics 2nd Semester 2021-2022
For example, suppose Butler Trucking would like to predict travel time for a new 75-mile
driving assignment the company is considering.sTo predict travel time for a new 75-mile
driving assignment, Butler Trucking would also substitute the value of 75 for x in the equation.
The value provides both a point estimate of E(y|x) for a given value of x and a prediction of
an individual value of y for a given value of x.

LEAST SQUARES METHOD


Least Squares Method - is a procedure for using sample data to find the estimated
regression equation
Example:
Suppose data were collected from a sample of ten Butler Trucking Company driving
assignments. For the ith observation or driving assignment in the sample, xi is the miles
traveled and yi is the travel time (in hours).
The values of xi and yi for the ten driving assignments in the sample are summarized in the
table below:
Miles Traveled and Travel Time (In Hours) For Ten Butler Trucking Company Driving
Assignments

We see that driving assignment 1, with x1 = 100 and y1 = 9.3, is a driving assignment of 100
miles and a travel time of 9.3 hours.

Driving assignment 2, with x2 = 50 and y2 = 4.8, is a driving assignment of 50 miles and a travel
time of 4.8 hours.
The shortest travel time is for driving assignment 5, which requires 50 miles with a travel time
of 4.2 hours.

5
Business Analytics 2nd Semester 2021-2022

The figure below is a scatter chart of the data in the table.

Miles traveled is
shown on the
horizontal axis, and
travel time (in hours) is
shown on the vertical
axis.

Scatter charts for regression analysis are constructed with the independent variable x on the horizontal
axis and the dependent variable y on the vertical axis.

The scatter chart enables us to observe the data graphically and to draw preliminary
conclusions about the possible relationship between the variables.

From the figure above, the following conclusions can be drawn:


• Longer travel times appear to coincide with more miles traveled
• The relationship between the travel time and miles traveled appears to be
approximated by a straight line
• A positive linear relationship is indicated between x and y

We therefore choose the simple linear regression model to represent this relationship. Given
that choice, our next task is to use the sample data in the table above to determine the values
of b0 and
b1 in the estimated simple linear regression equation.

For the ith driving assignment, the estimated regression equation provides

6
Business Analytics 2nd Semester 2021-2022
ŷi = b0 + b1x1

where:
ŷi = predicted travel time (in hours) for the ith driving assignment
b0 = the y-intercept of the estimated regression line
b1 = the slope of the estimated regression line
x1 = miles traveled for the ith driving assignment

With yi denoting the observed (actual) travel time for driving assignment i and ŷi
representing the predicted travel time for driving assignment i, every driving assignment in the
sample will have an observed travel time yi and a predicted travel time ŷi.
For the estimated regression line to provide a good fit to the data, the differences
between the observed travel times yi and the predicted travel times ŷi should be small.

The least squares method uses the sample data to provide the values of b0 and b1 that
minimize the sum of the squares of the deviations between the observed values of the
dependent variable yi and the predicted values of the dependent variable ŷi.
The criterion for the least squares method is given by the following equation:

This is known as the least squares method for estimating the regression equation.

You might also like