AP Stats 3.2
AP Stats 3.2
AP Stats 3.2
2: Least Squares
Regressions
Section 3.2
Least-Squares Regression
After this section, you should be able to…
Format 2:
Predicted back pack weight= 16.3 +
0.0908(student’s weight)
Interpreting Linear Regression
• Y-intercept: A student weighing zero pounds is predicted
to have a backpack weight of 16.3 pounds (no practical
interpretation).
• Slope: For each additional pound that the student
weighs, it is predicted that their backpack will weigh an
additional 0.0908 pounds more, on average.
Interpreting Linear Regression
Interpret the y-intercept and slope values in
context. Is there any practical interpretation?
ෝ = 37x + 270
𝒚
x= Hours Studied for the SAT
𝑦ො =Predicted SAT Math Score
Interpreting Linear Regression
ෝ = 37x + 270
𝒚
Y-intercept: If a student studies for zero hours,
then the student’s predicted SAT score is 270
points. This makes sense.
Slope: For each additional hour the student
studies, his/her score is predicted to increase
37 points, on average. This makes sense.
Predicted Value
What is the predicted SAT Math score for a student who
studies 12 hours?
ෝ = 37x + 270
𝒚
Hours Studied for the SAT (x)
Predicted SAT Math Score (y)
Predicted Value
What is the predicted SAT Math score for a student who
studies 12 hours?
ෝ = 37x + 270
𝒚
Hours Studied for the SAT (x)
Predicted SAT Math Score (y)
ෝ = 37(12) + 270
𝒚
Predicted Score: 714 points
Self Check Quiz!
Self Check Quiz: Calculate the Regression
Equation
Answer:
𝑦ො = 50 + x
𝑦ො = predicted reading score
x = number of IQ points above 100
Self Check Quiz: Interpreting Regression Lines &
Predicted Value
Data on the IQ test scores and reading test scores for a
group of fifth-grade children resulted in the following
regression line:
predicted reading score = −33.4 + 0.882(IQ score)
residual
Negative residuals
(below line)
How to Calculate the
Residual
1. Calculate the predicted value, by
plugging in x to the LSRE.
2. Determine the observed/actual value.
3. Subtract.
Calculate the Residual
1. If a student weighs 170 pounds and their backpack weighs
35 pounds, what is the value of the residual?
Pattern in residuals
Linear model not
appropriate
Should You Use LSRL?
1.
2.
Interpreting Computer Regression
Output
Be sure you can locate: the slope, the y intercept and
determine the equation of the LSRL.
ෝ = -0.0034415x + 3.5051
𝒚
ෝ = predicted....
𝒚
x = explanatory variable
2:
r Coefficient of Determination
r 2 tells us how much better the LSRL does at predicting values of y
than simply guessing the mean y for each value in the dataset.
Answers:
1. 87.5% of the variation in SAT score is
explained by the linear relationship with the
number of hours studied.
Answer:
S= 0.740
Interpretation: On average, the model under predicts fat
gain by 0.740 kilograms using the least-squares regression
line.
S: Standard Deviation of the
Residuals
If we use a least-squares regression line to predict the
values of a response variable y from an explanatory variable
x, the standard deviation of the residuals (s) is given by
s=
residuals2
=
(y i − ˆ
y ) 2
n −2 n −2
S represents the typical or average error (residual).
The left graph is perfectly linear. In the right graph, the last value was
changed from (5, 5) to (8, 5)…clearly influential, because it changed
the graph significantly. However, the residual is very small.
Correlation and
Regression Limitations
The distinction between explanatory and
response variables is important in regression.
Correlation and
Regression Limitations
Correlation and regression lines describe
only linear relationships.
NO!!!
Correlation and Regression
Limitations
Correlation and least-squares regression
lines are not resistant.
Correlation and Regression
Wisdom
Association Does Not Imply Causation
An association between an explanatory variable x and a
response variable y, even if it is very strong, is not by itself
good evidence that changes in x actually cause changes in y.
SST
SSE = residual 2 SST = ( yi − y ) 2
In practicality,
just square the correlation r.
Accounted for Error
1 – SSE/SST = 1 –
30.97/83.87
r2 = 0.632
63.2 % of the
variation in
backpack weight is
accounted for by
the linear model
relating pack
weight to body
If we use the LSRL to make our predictions,
weight.
the sum of the squared residuals is 30.90.
SSE = 30.90
Unaccounted for Error
SSE/SST =
30.97/83.87
SSE/SST = 0.368
Therefore, 36.8% of
the variation in pack
weight is
unaccounted for by
the least-squares
regression line.
If we use the mean backpack weight as
our prediction, the sum of the squared
residuals is 83.87.
SST = 83.87
Interpreting a Regression Line
Consider the regression line from the example (pg. 164)
“Does Fidgeting Keep You Slim?” Identify the slope and
y-intercept and interpret each value in context.
fatgain = 3.505 - 0.00344(NEA change)