Fu Ch11 Linear Regression

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 70

Chapter 11

Regression and Correlation


methods

EPI 809/Spring 2008 1


Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Understand and check model assumptions
6. Predict Response Variable
7. Comments of SAS Output
EPI 809/Spring 2008 2
Learning Objectives…
8. Correlation Models
9. Link between a correlation model and a
regression model
10. Test of coefficient of Correlation

EPI 809/Spring 2008 3


Models

EPI 809/Spring 2008 4


What is a Model?

1. Representation of
Some Phenomenon

Non-Math/Stats Model

EPI 809/Spring 2008 5


What is a Math/Stats Model?
1. Often Describe Relationship between
Variables

2. Types
- Deterministic Models (no randomness)

- Probabilistic Models (with randomness)

EPI 809/Spring 2008 6


Deterministic Models
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of
body fat based

 Metric Formula: BMI = Weight in Kilograms


(Height in Meters) 2

 Non-metric Formula: BMI = Weight (pounds)x703


(Height in inches)2

EPI 809/Spring 2008 7


Probabilistic Models
1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns
Is 6 Times the Age in days + Random Error
• SBP = 6xage(d) + 
• Random Error May Be Due to Factors
Other Than age in days (e.g. Birthweight)

EPI 809/Spring 2008 8


Types of
Probabilistic Models

Probabilistic
Probabilistic
Models
Models

Regression
Regression Correlation
Correlation Other
Other
Models
Models Models
Models Models
Models

EPI 809/Spring 2008 9


Regression Models

EPI 809/Spring 2008 10


Types of
Probabilistic Models

Probabilistic
Probabilistic
Models
Models

Regression
Regression Correlation
Correlation Other
Other
Models
Models Models
Models Models
Models

EPI 809/Spring 2008 11


Regression Models

 Relationship between one dependent


variable and explanatory variable(s)
 Use equation to set up relationship
• Numerical Dependent (Response) Variable
• 1 or More Numerical or Categorical Independent
(Explanatory) Variables
 Used Mainly for Prediction & Estimation

EPI 809/Spring 2008 12


Regression Modeling Steps
 1. Hypothesize Deterministic Component
• Estimate Unknown Parameters
 2.
Specify Probability Distribution of
Random Error Term
• Estimate Standard Deviation of Error
 3. Evaluate the fitted Model
 4. Use Model for Prediction & Estimation

EPI 809/Spring 2008 13


Model Specification

EPI 809/Spring 2008 14


Specifying the deterministic
component

 1.Define the dependent variable and


independent variable

 2. Hypothesize Nature of Relationship


 Expected Effects (i.e., Coefficients’ Signs)
 Functional Form (Linear or Non-Linear)
 Interactions

EPI 809/Spring 2008 15


Model Specification
Is Based on Theory

 1. Theory of Field (e.g., Epidemiology)


 2. Mathematical Theory
 3. Previous Research
 4. ‘Common Sense’

EPI 809/Spring 2008 16


Thinking Challenge:
Which Is More Logical?
CD+ counts CD+ counts

Years since seroconversion Years since seroconversion


CD+ counts CD+ counts

Years since seroconversion Years since seroconversion

EPI 809/Spring 2008 17


OB/GYN Study

EPI 809/Spring 2008 18


Types of
Regression Models

EPI 809/Spring 2008 19


Types of
Regression Models
Regression
Models

EPI 809/Spring 2008 20


Types of
Regression Models
1 Explanatory Regression
Variable Models

Simple

EPI 809/Spring 2008 21


Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

EPI 809/Spring 2008 22


Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

Linear

EPI 809/Spring 2008 23


Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

Non-
Linear
Linear

EPI 809/Spring 2008 24


Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

Non-
Linear Linear
Linear

EPI 809/Spring 2008 25


Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

EPI 809/Spring 2008 26


Linear Regression
Model

EPI 809/Spring 2008 27


Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

EPI 809/Spring 2008 28


Linear Equations
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X

© 1984-1994 T/Maker Co.

EPI 809/Spring 2008 29


Linear Regression Model

 1.Relationship Between Variables Is a


Linear Function
Population Population Random
Y-Intercept Slope Error

Yi   0  1X i   i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Population & Sample
Regression Models

EPI 809/Spring 2008 31


Population & Sample
Regression Models
Population

 


EPI 809/Spring 2008 32
Population & Sample
Regression Models
Population

Unknown
Relationship 
Yi   0  1X i   i
 


EPI 809/Spring 2008 33
Population & Sample
Regression Models
Population Random Sample

Unknown
Relationship 
Yi   0  1X i   i 

 


EPI 809/Spring 2008 34
Population & Sample
Regression Models
Population Random Sample

Unknown
 
Yi   0   1X i   i
Relationship 
Yi   0  1X i   i 

 


EPI 809/Spring 2008 35
Population Linear Regression
Model
Y Yi   0  1X i   i Observed
value

i = Random error

E  Y    0  1 X i

X
Observed value
EPI 809/Spring 2008 36
Sample Linear Regression
Model
Y Yi   0   1X i   i

^i = Random
error
Unsampled
observation
Yi   0   1X i
X
Observed value
EPI 809/Spring 2008 37
Estimating Parameters:
Least Squares Method

EPI 809/Spring 2008 38


Scatter plot
 1. Plot of All (Xi, Yi) Pairs
 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60

EPI 809/Spring 2008 39


Thinking Challenge

How would you draw a line through the


points? How do you determine which line
‘fits best’?

Y
60
40
20
0 X
0 20 40 60

EPI 809/Spring 2008 40


Thinking Challenge
How would you draw a line through the
points? How do you determine which line
‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
EPI 809/Spring 2008 41
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Slope unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept changed
EPI 809/Spring 2008 42
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
EPI 809/Spring 2008 43
Least Squares
 1.‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative ones

EPI 809/Spring 2008 44


Least Squares
 1.‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values is a
Minimum. But Positive Differences Off-Set
Negative ones. So square errors!

    ˆ
n n


2
Y  Yˆ 2
i
i i
i 1 i 1

EPI 809/Spring 2008 45


Least Squares
 1.‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative. So square errors!

    ˆ
n n


2
Y  Yˆ 2
i
i i
i 1 i 1

 2. LS Minimizes the Sum of the Squared


Differences (errors) (SSE)
EPI 809/Spring 2008 46
Least Squares Graphically
n
LS minimizes   i   1   2   3   4
 2  2  2  2  2

i 1
Y Y2   0   1X 2   2
^ 44
^ 22
^ 11 ^ 33
Yi   0   1X i
X
EPI 809/Spring 2008 47
Coefficient Equations
 Prediction equation
yˆi  ˆ0  ˆ1 xi

 Sample slope
SS xy   xi  x  yi  y 
ˆ1  
2
SS xx  i x  x 
 Sample Y - intercept

ˆ0  y  ˆ1x
EPI 809/Spring 2008 48
Derivation of Parameters (1)
 Least Squares (L-S):
Minimize squared error
n n

 i   yi  0  1 xi 
2
 2

i 1 i 1

     yi   0  1 xi 
2 2
i
0 
 0  0
 2  ny  n 0  n1 x 

ˆ0  y  ˆ1x
EPI 809/Spring 2008 49
Derivation of Parameters (1)
 Least Squares (L-S):
Minimize squared error
   i2    yi   0  1 xi 
2

0 
1 1
 2 xi  yi   0  1 xi 
 2 xi  yi  y  1 x  1 xi 

1  xi  xi  x    xi  yi  y 
1   xi  x   xi  x     xi  x   yi  y 

ˆ SS xy
1 
SS xx

EPI 809/Spring 2008 50


Computation Table
2 2
Xi Yi Xi Yi Xi Yi
2 2
X1 Y1 X1 Y1 X1 Y1
2 2
X2 Y2 X2 Y2 X2 Y2
: : : : :
2 2
Xn Yn Xn Yn Xn Yn
2 2
 Xi  Yi  Xi  Yi XiYi

EPI 809/Spring 2008 51


Interpretation of Coefficients

EPI 809/Spring 2008 52


Interpretation of Coefficients
^
 1. Slope (1)
^
 Estimated Y Changes by 1 for Each 1 Unit
Increase in X
^
• If 1 = 2, then Y Is Expected to Increase by 2 for
Each 1 Unit Increase in X

EPI 809/Spring 2008 53


Interpretation of Coefficients
^
 1. Slope (1)

^
Estimated Y Changes by 1 for Each 1 Unit
Increase in X
^
• If 1 = 2, then Y Is Expected to Increase by 2 for
Each 1 Unit Increase in X
^
 2. Y-Intercept (0)
 Average Value of Y When X = 0
^
• If 0 = 4, then Average Y Is Expected to Be
4 When X Is 0
EPI 809/Spring 2008 54
Parameter Estimation Example
 Obstetrics: What is the relationship between
Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

EPI 809/Spring 2008 55


Scatterplot
Birthweight vs. Estriol level

Birthweight
4
3
2
1
0
0 1 2 3 4 5 6
Estriol level

EPI 809/Spring 2008 56


Parameter Estimation Solution
Table
Xii Yii Xii22 Yii22 XiiYii
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
EPI 809/Spring 2008 57
Parameter Estimation Solution
n
 n  nn 
  X ii  Yii 
nn
    1510

ii11 ii11
X Y
ii ii  37 
n 5
ˆ11  ii11
  0.70

nn 22
  15
22

nn
  X ii 55 
5
 

ii11
X 22
ii 
ii11 n

ˆ00  Y  ˆ11X  2   0.70  3  0.10


EPI 809/Spring 2008 58
Coefficient Interpretation
Solution

EPI 809/Spring 2008 59


Coefficient Interpretation
Solution
^
 1. Slope (1)
 Birthweight (Y) Is Expected to Increase by .7
Units for Each 1 unit Increase in Estriol (X)

EPI 809/Spring 2008 60


Coefficient Interpretation
Solution
^
 1. Slope (1)
 Birthweight (Y) Is Expected to Increase by .7
Units for Each 1 unit Increase in Estriol (X)
^
 2. Intercept (0)
 Average Birthweight (Y) Is -.10 Units When
Estriol level (X) Is 0
• Difficult to explain
• The birthweight should always be positive

EPI 809/Spring 2008 61


SAS codes for fitting a simple linear
regression
 Data BW; /*Reading data in SAS*/
 input estriol birthw@@;
 cards;
 1 1 2 1 3 2
4 2 5 4
 ;
 run;

 PROC REG data=BW; /*Fitting linear regression models*/


 model birthw=estriol;
 run;

EPI 809/Spring 2008 62


Parameter Estimation
SAS Computer Output

Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -0.10000 0.63509 -0.16 0.8849


Estriol 1 0.70000 0.19149 3.66 0.0354

^0 ^
1

EPI 809/Spring 2008 63


Parameter Estimation Thinking
Challenge
 You’re a Vet epidemiologist for the county
cooperative. You gather the following data:
 Food (lb.) Milk yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
© 1984-1994 T/Maker Co.
 What is the relationship
between cows’ food intake and milk yield?

EPI 809/Spring 2008 64


Scattergram
Milk Yield vs. Food intake*

M. Yield (lb.)
10
8
6
4
2
0
0 5 10 15
Food intake (lb.)

EPI 809/Spring 2008 65


Parameter Estimation Solution
Table*
2 2
Xii Yii Xii Yii XiiYii
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218

EPI 809/Spring 2008 66


Parameter Estimation Solution*
n
 n X  nn 
  ii  Yii 

nn
X Y   ii11  ii11 
218 
 32  24 
ii ii
n 4
ˆ11  ii11
  0.65
n
 n 
22
 32  2
2

nn
  X ii 296 
4
 

ii11
X 22
ii 
ii11 n

ˆ00  Y  ˆ11X  6   0.65 8  0.80


EPI 809/Spring 2008 67
Coefficient Interpretation
Solution*

EPI 809/Spring 2008 68


Coefficient Interpretation
Solution*
^
 1. Slope (1)
 Milk Yield (Y) Is Expected to Increase by .
65 lb. for Each 1 lb. Increase in Food intake
(X)

EPI 809/Spring 2008 69


Coefficient Interpretation
Solution*
^
 1. Slope (1)
 Milk Yield (Y) Is Expected to Increase by .
65 lb. for Each 1 lb. Increase in Food intake
(X)

^
 2. Y-Intercept (0)
 Average Milk yield (Y) Is Expected to Be 0.8
lb. When Food intake (X) Is 0
EPI 809/Spring 2008 70

You might also like