Correlation and Regression - The Simple Case
Correlation and Regression - The Simple Case
Correlation and Regression - The Simple Case
NOTE:
The point is that a correlation between two
variables is not sufficient to establish causality
between them; hence, one should not infer
causality from correlational designs.
SCATTERPLOT
a plot of the pairs of values of two (quantitative)
variables on a rectangular coordinate plane
an effective tool for presenting possible
relationships between two (quantitative) variables
(Possible) RELATIONSHIPS:
None
Linear numerically assessed by
CORRELATION
Non-Linear
EXAMPLE #1: ATTITUDE AND
ATTRACTION
Have you ever wondered whether it is true that opposites
attract? Weve all been with couples in which the two
individuals seem so different from each other. But is this
the usual experience? Does similarity or dissimilarity
foster attraction?
A social psychologist investigating this problem asked 15 college
students to fill out a questionnaire concerning their attitudes
toward a variety of topics. Some time later, they were shown the
attitudes of a stranger to the same items and were asked to
rate the stranger as to probable liking for the stranger and
probable enjoyment of working with him. The attitudes of the
stranger were really made up by the experimenter and varied
over subjects regarding the proportion of attitudes held by the
stranger that were similar to those held by the rater.
EXAMPLE #1: ATTITUDE AND
ATTRACTION
Thus, for each subject, data were collected concerning his
attitudes and the attraction of a stranger based on the strangers
attitudes to the same items. If similarities attract, then there
should be a direct relationship between the attraction of the
stranger and the proportion of his similar attitudes.
The data are presented in the following table. The higher the
attraction, the higher is the score. The maximum possible
attraction score is 14.
EXAMPLE #1: ATTITUDE AND
ATTRACTION
MAGNITUDE:
closer to 1 : strong to almost perfect (linear)
relationship
closer to 0 : weak to almost no (linear) relationship
Magnitude Strength
0.00 0.20 Very Weak
0.20 0.40 Weak
0.40 0.60 Moderate
0.60 0.80 Strong
0.80 1.00 Very Strong
CORRELATION AND SCATTERPLOTS
CORRELATION AND SCATTERPLOTS
PEARSONS R
also known as the Pearsons Product Moment
Correlation Coefficient
describes the linear relationship between interval
and/or ratio variables
= ( )
=
= =
EXAMPLE #2: ATTITUDE AND
ATTRACTION
= .
VERY STRONG and +
EXAMPLE #2: ATTITUDE AND
ATTRACTION
EXAMPLE #2: ATTITUDE AND
ATTRACTION
= .
VERY STRONG and +
Is it significant?
t-test on significance of
EXAMPLE #3: ATTITUDE AND
ATTRACTION
: =
(. . , )
v.s.
:
(. . , )
STEP 2: SPECIFY
REJECT if
> /, = ., = .
OR
pvalue < = .
STEP 5: COMPUTE THE TEST
STATISTIC
=
EXAMPLE #3: ATTITUDE AND
ATTRACTION
EXAMPLE #3: ATTITUDE AND
ATTRACTION
EXAMPLE #3: ATTITUDE AND
ATTRACTION
no provided!!
EXAMPLE #3: ATTITUDE AND
ATTRACTION
EXAMPLE #3: ATTITUDE AND
ATTRACTION
no provided!!
STEP 5: COMPUTE THE TEST
STATISTIC
= .
Aside: < .
STEP 6: MAKE A DECISION
= . > = .
OR
< . < = .
REJECT
STEP 7: FORM A CONCLUSION
PURPOSE:
for prediction
EXAMPLES
Predict the employees success in the company using
their college grades that they have earned.
RESPONSE (): success
PREDICTOR (): college grades
Regression
a statistical technique which considers using the
relationship between two or more variables for
prediction
Linear Regression
a regression technique wherein the regression function
is taken to be of the linear form (i.e., the use of a
linear equation for prediction)
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
a linear regression which uses only one predictor
() to explain/predict a response ()
= + +
where
= response value
= < <
THE BEST-FIT LINE
BEST-FIT LINE
=
ORDINARY LEAST SQUARES (OLS) REGRESSION
Estimators:
To estimate :
( )( )
1 =
2
To estimate :
1
0 = = 1
BEST-FIT LINE
= . + .
EXAMPLE #5: ATTITUDE AND
ATTRACTION
EXAMPLE #5: ATTITUDE AND
ATTRACTION
EXAMPLE #5: ATTITUDE AND
ATTRACTION
= . + .
EXAMPLE #5: ATTITUDE AND
ATTRACTION
Estimated Regression Equation:
= . + .
= .
The expected value of ones attraction level to
another persons 6.6136 whenever the
proportion of similar attitudes theyOnly
share is 0. interpretable
whenever 0 is a possible
value of !!
= .
Ones attraction level to another person is
expected to INCREASE by 5.0812 units for
every 1-unit increase in proportion of similar
attitudes shared.
PREDICTION ERROR
Residual (ith):
the difference between the observed value and
the corresponding fitted value , denoted by ,
i.e.,
=
NOTE:
> : response i is UNDERESTIMATED
< : response i is OVERESTIMATED
= : response i is EXACT
EXAMPLE #6: ATTITUDE AND
ATTRACTION
= . + . . = .
= . . = .
OVERESTIMATED
= . + . . = .
= . . = .
UNDERESTIMATED
EXAMPLE #6: ATTITUDE AND
ATTRACTION
EXAMPLE #6: ATTITUDE AND
ATTRACTION
EXAMPLE #6: ATTITUDE AND
ATTRACTION
EXAMPLE #6: ATTITUDE AND
ATTRACTION
MEAN SQUARED ERROR
= =
the average prediction error (i.e., the average
squared deviations between the actual and the
predicted values)
PURPOSES:
provides measure of precision of the prediction
*square this
value
= . = .
MEAN SQUARED ERROR
= =
the average prediction error (i.e., positive square
root of the average squared deviations
between the actual and the predicted values)
PURPOSES:
provides measure of precision of the prediction
: =
(. . , )
v.s.
:
(. . , )
STEP 2: SPECIFY
REJECT if
> /, = ., = .
OR
pvalue < = .
STEP 5: COMPUTE THE TEST
STATISTIC
=
( )
where
( ) =
EXAMPLE #8: ATTITUDE AND
ATTRACTION
EXAMPLE #8: ATTITUDE AND
ATTRACTION
STEP 5: COMPUTE THE TEST
STATISTIC
= .
Aside: < .
STEP 6: MAKE A DECISION
= . > = .
OR
< . < = .
REJECT
STEP 7: FORM A CONCLUSION
NOTE: no IV-DV
relationship
Notice the similarities between assessing the
statistical significance of the correlation between
and and determining whether is a significant
predictor ofw/.IV-DV
relationship =
( )( )
MEAN SQUARED ERROR
= =
the average prediction error (i.e., positive square
root of the average squared deviations
between the actual and the predicted values)
PURPOSES:
provides measure of precision of the prediction
= +
a component of the
Deviation
VARIANCE of !!
Deviation around
Deviation of Fitted
Mean
PARTITIONING THE TOTAL DEVIATION
PARTITIONING THE SUM OF SQUARES
= +
= %
the proportionate reduction of total variation in
associated with the use of the predictor variable
The amount of variation in explained by the
(variations in)
COEFFICIENT OF DETERMINATION
PROPERTIES:
= . %
87.70% of the total variation in
attraction level is explained by the
differences in attitude similarity.
EXAMPLE #9: ATTITUDE AND
ATTRACTION
EXAMPLE #9: ATTITUDE AND
ATTRACTION
= . %
87.70% of the total variation in
attraction level is explained by the
differences in attitude similarity.
Is it significant?
EXAMPLE #10: ATTITUDE AND
ATTRACTION
: .
(. . , )
v.s.
: .
(. . , )
STEP 2: SPECIFY
REJECT if
> , , = ., , = .
OR
pvalue < = .
STEP 5: COMPUTE THE TEST
STATISTIC
ANOVA Table:
SOURCE SS df MS Fstat
SSR
Regression SSR 1 MSR = =
1
Error SSE
SSE n* - 2 MSE =
(Residual) n 2
TOTAL SSTO n* - 1
EXAMPLE #10: ATTITUDE AND
ATTRACTION
EXAMPLE #10: ATTITUDE AND
ATTRACTION
STEP 5: COMPUTE THE TEST
STATISTIC
= .
Aside: < .
ANOVA Table:
STEP 6: MAKE A DECISION
= . > = .
OR
< . < = .
REJECT
STEP 7: FORM A CONCLUSION
NOTE:
For SLRM, the t-test on significance of and F-test on
model significance are SIMILAR.
is a significant predictor of the model fits the
data
=
CONSIDERATIONS ON APPLYING REGRESSION
ANALYSIS
The relationship between two interval- or ratio-level variables
and must be linear.
Homoscedasticity
the error terms must have a constant variance
Independence
the error terms must not be correlated with one another