Correlation and Regression - The Simple Case

Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

CORRELATION AND REGRESSION:

THE SIMPLE CASE


PSYSTA2 Week 4
CORRELATION
RATIONALE
It is often desirable to determine whether the
scores of one distribution are related to the scores
of another distribution.
PURPOSES:
to assess (linear) relationship between variables
to provide an initial step for prediction
to provide an initial assessment of possible
causal relationship
to assess test-retest reliability (of instruments)
EXAMPLES
Is there a relationship between the college grades
that were earned by their employees and their
success in the company?

Assuming an individuals IQ is stable from month


to month, can we ascertain that a particular IQ
test is consistent if it is administered twice and
one-month apart to the same people?

(Correlational) Design: two (uncontrolled)


variables at either an interval or ratio
measurement
NO scale
CAUSATION no distinction
implied!! between Ind. and Dep.
variables
CORRELATION CAUSATION
Possible Explanations of Correlation:
the correlation between and may be
spurious
causes
causes
a third variable confounds with both and

NOTE:
The point is that a correlation between two
variables is not sufficient to establish causality
between them; hence, one should not infer
causality from correlational designs.
SCATTERPLOT
a plot of the pairs of values of two (quantitative)
variables on a rectangular coordinate plane
an effective tool for presenting possible
relationships between two (quantitative) variables

(Possible) RELATIONSHIPS:
None
Linear numerically assessed by
CORRELATION
Non-Linear
EXAMPLE #1: ATTITUDE AND
ATTRACTION
Have you ever wondered whether it is true that opposites
attract? Weve all been with couples in which the two
individuals seem so different from each other. But is this
the usual experience? Does similarity or dissimilarity
foster attraction?
A social psychologist investigating this problem asked 15 college
students to fill out a questionnaire concerning their attitudes
toward a variety of topics. Some time later, they were shown the
attitudes of a stranger to the same items and were asked to
rate the stranger as to probable liking for the stranger and
probable enjoyment of working with him. The attitudes of the
stranger were really made up by the experimenter and varied
over subjects regarding the proportion of attitudes held by the
stranger that were similar to those held by the rater.
EXAMPLE #1: ATTITUDE AND
ATTRACTION
Thus, for each subject, data were collected concerning his
attitudes and the attraction of a stranger based on the strangers
attitudes to the same items. If similarities attract, then there
should be a direct relationship between the attraction of the
stranger and the proportion of his similar attitudes.
The data are presented in the following table. The higher the
attraction, the higher is the score. The maximum possible
attraction score is 14.
EXAMPLE #1: ATTITUDE AND
ATTRACTION

Construct a scatterplot for this


data.
EXAMPLE #1: ATTITUDE AND
ATTRACTION
EXAMPLE #1: ATTITUDE AND
ATTRACTION
EXAMPLE #1: ATTITUDE AND
ATTRACTION
CORRELATION COEFFICIENT
expresses quantitatively the magnitude and
direction of the relationship between two variables
using a normalized scale (i.e., ranging from -1 to
1)
a measure of the strength of the linear association
between two variables

Any correlation coefficient has two components:


(1) the sign indicates either a positive or a negative
linear relationship (i.e., direction); (2) the absolute
value indicates the strength of the relationship (i.e.,
magnitude).
TYPES OF (LINEAR) RELATIONSHIP
DIRECTION:
+ : direct relationship
: inverse relationship

MAGNITUDE:
closer to 1 : strong to almost perfect (linear)
relationship
closer to 0 : weak to almost no (linear) relationship
Magnitude Strength
0.00 0.20 Very Weak
0.20 0.40 Weak
0.40 0.60 Moderate
0.60 0.80 Strong
0.80 1.00 Very Strong
CORRELATION AND SCATTERPLOTS
CORRELATION AND SCATTERPLOTS
PEARSONS R
also known as the Pearsons Product Moment
Correlation Coefficient
describes the linear relationship between interval
and/or ratio variables

a measure of the extent to which paired scores occupy


the same or opposite positions within their own
distributions


= ( )
=
= =
EXAMPLE #2: ATTITUDE AND
ATTRACTION

For the Attitude and Attraction example:

Compute and interpret for the correlation coefficient between


similarity of attitudes and attraction.
EXAMPLE #2: ATTITUDE AND
ATTRACTION
EXAMPLE #2: ATTITUDE AND
ATTRACTION
EXAMPLE #2: ATTITUDE AND
ATTRACTION

= .
VERY STRONG and +
EXAMPLE #2: ATTITUDE AND
ATTRACTION
EXAMPLE #2: ATTITUDE AND
ATTRACTION

for variables selection


EXAMPLE #2: ATTITUDE AND
ATTRACTION

= .
VERY STRONG and +

Is it significant?

t-test on significance of
EXAMPLE #3: ATTITUDE AND
ATTRACTION

For the Attitude and Attraction example:

Is there a significant correlation in the population between


attitude similarity and attraction?
STEP 1: IDENTIFY AND

For the Attitude and Attraction example:

Let = true (population) correlation between attitude


similarity and attraction. Then we wish to evaluate the
following hypotheses:

: =
(. . , )
v.s.
:
(. . , )
STEP 2: SPECIFY

For the Attitude and Attraction example:

Suppose we set the significance level of the test at 5%.


= .
STEP 3: IDENTIFY THE TEST TO USE

t-test on significance of (with df = = 13)

NOTE: is the number of pairs of observations


STEP 4: IDENTIFY THE REJECTION
RULE

For the Attitude and Attraction example:

Rejection Rule: [*See board for Illustration]

REJECT if

> /, = ., = .
OR

pvalue < = .
STEP 5: COMPUTE THE TEST
STATISTIC

For t-test on significance of (with df = 2) :


=


EXAMPLE #3: ATTITUDE AND
ATTRACTION
EXAMPLE #3: ATTITUDE AND
ATTRACTION
EXAMPLE #3: ATTITUDE AND
ATTRACTION

no provided!!
EXAMPLE #3: ATTITUDE AND
ATTRACTION
EXAMPLE #3: ATTITUDE AND
ATTRACTION

for variables selection


EXAMPLE #3: ATTITUDE AND
ATTRACTION

no provided!!
STEP 5: COMPUTE THE TEST
STATISTIC

For the Attitude and Attraction example:

= .

Aside: < .
STEP 6: MAKE A DECISION

For the Attitude and Attraction example:

= . > = .
OR
< . < = .
REJECT
STEP 7: FORM A CONCLUSION

For the Attitude and Attraction example:

At = 5%, one REJECTS 0 . Therefore, one has sufficient


evidence to say that attitude similarity and attraction
are significantly (POSITIVE and VERY STRONG)
correlated.
SIMPLE LINEAR REGRESSION
RATIONALE
Oftentimes, it is not only desired to determine
how the scores of one distribution are related to
the scores of another distribution, but to take
advantage of this relationship to use for
predicting one score given the other.

PURPOSE:
for prediction
EXAMPLES
Predict the employees success in the company using
their college grades that they have earned.
RESPONSE (): success
PREDICTOR (): college grades

Predict a persons suicidal tendencies based on his/her


emotional stability, both of which are measured using
composite indices/scores.
RESPONSE (): suicidal tendency
PREDICTOR (): emotional stability

Design: one dependent variable (response) and one


independent variable (predictor), both at either an
interval or ratio measurement scale
NO CAUSATION
implied!!
unless done for controlled experiments
LINEAR REGRESSION

Regression
a statistical technique which considers using the
relationship between two or more variables for
prediction

Linear Regression
a regression technique wherein the regression function
is taken to be of the linear form (i.e., the use of a
linear equation for prediction)
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
a linear regression which uses only one predictor
() to explain/predict a response ()

= + +
where
= response value

(y-intercept) and (slope) parameters

= the value of the predictor variable

is the random error component


MEANING OF REGRESSION PARAMETERS
also known as regression coefficients

is the slope of the regression line.


It indicates the change in the mean of the
probability distribution of per unit increase in
.

is the y-intercept of the regression line.


When the scope of the model includes = 0, 0
gives the mean of the probability distribution of
at = 0. When the scope of the model does not
cover = 0 , 0 does not have any particular
meaning as a separate term in the regression
model.
MEANING OF REGRESSION PARAMETERS
SIMPLE LINEAR REGRESSION MODEL AND
SCATTERPLOTS

> > >

= < <
THE BEST-FIT LINE

For any particular scatterplot, an infinite number of


linear regression equations can be identified.
GOAL: Identify the BEST-FIT Line.
EXAMPLE #4: ATTITUDE AND
ATTRACTION

Consider the Attitude and Attraction example:

Suppose that it is of particular interest to determine


how ones attraction level to a particular person can be
predicted by the amount of similar attitudes shared.
EXAMPLE #4: ATTITUDE AND
ATTRACTION
EXAMPLE #4: ATTITUDE AND
ATTRACTION
EXAMPLE #4: ATTITUDE AND
ATTRACTION

BEST-FIT LINE

How does one determine this line?


ORDINARY LEAST SQUARES (OLS) REGRESSION

the estimated linear regression


which minimizes the error
(deviation) sum of squares i.e.,

=
ORDINARY LEAST SQUARES (OLS) REGRESSION
Estimators:
To estimate :
( )( )
1 =
2

To estimate :

1
0 = = 1

Estimated Regression Equation:


= +
EXAMPLE #5: ATTITUDE AND ATTRACTION
EXAMPLE #5: ATTITUDE AND
ATTRACTION
EXAMPLE #5: ATTITUDE AND
ATTRACTION

BEST-FIT LINE

= . + .
EXAMPLE #5: ATTITUDE AND
ATTRACTION
EXAMPLE #5: ATTITUDE AND
ATTRACTION
EXAMPLE #5: ATTITUDE AND
ATTRACTION

= . + .
EXAMPLE #5: ATTITUDE AND
ATTRACTION
Estimated Regression Equation:
= . + .

= .
The expected value of ones attraction level to
another persons 6.6136 whenever the
proportion of similar attitudes theyOnly
share is 0. interpretable
whenever 0 is a possible
value of !!
= .
Ones attraction level to another person is
expected to INCREASE by 5.0812 units for
every 1-unit increase in proportion of similar
attitudes shared.
PREDICTION ERROR
Residual (ith):
the difference between the observed value and
the corresponding fitted value , denoted by ,
i.e.,
=

NOTE:
> : response i is UNDERESTIMATED
< : response i is OVERESTIMATED
= : response i is EXACT
EXAMPLE #6: ATTITUDE AND
ATTRACTION

= . + . . = .
= . . = .
OVERESTIMATED

= . + . . = .
= . . = .
UNDERESTIMATED
EXAMPLE #6: ATTITUDE AND
ATTRACTION
EXAMPLE #6: ATTITUDE AND
ATTRACTION
EXAMPLE #6: ATTITUDE AND
ATTRACTION
EXAMPLE #6: ATTITUDE AND
ATTRACTION
MEAN SQUARED ERROR


= =

the average prediction error (i.e., the average
squared deviations between the actual and the
predicted values)

PURPOSES:
provides measure of precision of the prediction

allows assessment of model significance


Is a significant predictor of ?
Is the fitted model a good fit for the data?
EXAMPLE #7: ATTITUDE AND
ATTRACTION
EXAMPLE #7: ATTITUDE AND
ATTRACTION

*square this
value
= . = .
MEAN SQUARED ERROR


= =

the average prediction error (i.e., positive square
root of the average squared deviations
between the actual and the predicted values)

PURPOSES:
provides measure of precision of the prediction

allows assessment of model significance


Is a significant predictor of ?
Is the fitted model a good fit for the data?
EXAMPLE #8: ATTITUDE AND
ATTRACTION

For the Attitude and Attraction example:

Is the proportion of similar attitudes shared a significant


predictor of ones attraction level to another person?
STEP 1: IDENTIFY AND

For the Attitude and Attraction example:

Let 1 = true (population) slope coefficient between attitude


similarity and attraction. Then we wish to evaluate the
following hypotheses:

: =
(. . , )
v.s.
:
(. . , )
STEP 2: SPECIFY

For the Attitude and Attraction example:

Suppose we set the significance level of the test at 5%.


= .
STEP 3: IDENTIFY THE TEST TO USE

t-test on significance of (with df = = 13)

NOTE: is the number of pairs of observations


STEP 4: IDENTIFY THE REJECTION
RULE

For the Attitude and Attraction example:

Rejection Rule: [*See board for Illustration]

REJECT if

> /, = ., = .
OR

pvalue < = .
STEP 5: COMPUTE THE TEST
STATISTIC

For t-test on significance of (with df = 2) :


=
( )

where


( ) =

EXAMPLE #8: ATTITUDE AND
ATTRACTION
EXAMPLE #8: ATTITUDE AND
ATTRACTION
STEP 5: COMPUTE THE TEST
STATISTIC

For the Attitude and Attraction example:

= .

Aside: < .
STEP 6: MAKE A DECISION

For the Attitude and Attraction example:

= . > = .
OR
< . < = .
REJECT
STEP 7: FORM A CONCLUSION

For the Attitude and Attraction example:

At = 5%, one REJECTS 0 . Therefore, one has sufficient


evidence to say that the proportion of similar attitudes
shared is a significant predictor of ones attraction
level to another person.

NOTE: no IV-DV
relationship
Notice the similarities between assessing the
statistical significance of the correlation between
and and determining whether is a significant
predictor ofw/.IV-DV

relationship =
( )( )
MEAN SQUARED ERROR


= =

the average prediction error (i.e., positive square
root of the average squared deviations
between the actual and the predicted values)

PURPOSES:
provides measure of precision of the prediction

allows assessment of model significance


Is a significant predictor of ?
Is the fitted model a good fit for the data?
AN ANALYSIS OF VARIANCE APPROACH

Partitioning the Total Deviation:

= +

a component of the
Deviation

VARIANCE of !!

Regression Value around

Fitted Regression Line


Total

Deviation around
Deviation of Fitted

Mean
PARTITIONING THE TOTAL DEVIATION
PARTITIONING THE SUM OF SQUARES

= +

SSTO SSR SSE

SSR : explained variation


SSE : unexplained variation
COEFFICIENT OF DETERMINATION



= %

the proportionate reduction of total variation in
associated with the use of the predictor variable
The amount of variation in explained by the
(variations in)
COEFFICIENT OF DETERMINATION

PROPERTIES:

the higher the value, the better the model fit


EXAMPLE #9: ATTITUDE AND
ATTRACTION

For the Attitude and Attraction example:

Compute and interpret for the coefficient of determination


from the estimated (OLS) regression equation between similarity
of attitudes and attraction.
EXAMPLE #9: ATTITUDE AND
ATTRACTION
EXAMPLE #9: ATTITUDE AND
ATTRACTION
EXAMPLE #9: ATTITUDE AND
ATTRACTION

= . %
87.70% of the total variation in
attraction level is explained by the
differences in attitude similarity.
EXAMPLE #9: ATTITUDE AND
ATTRACTION
EXAMPLE #9: ATTITUDE AND
ATTRACTION

= . %
87.70% of the total variation in
attraction level is explained by the
differences in attitude similarity.

Is it significant?
EXAMPLE #10: ATTITUDE AND
ATTRACTION

For the Attitude and Attraction example:

Does the identified (linear regression) model (i.e., predicting ones


attraction level to another person using the proportion of similar
attitudes shared between them) provide a good fit for the data?
STEP 1: IDENTIFY AND

For the Attitude and Attraction example:

: .
(. . , )
v.s.
: .
(. . , )
STEP 2: SPECIFY

For the Attitude and Attraction example:

Suppose we set the significance level of the test at 5%.


= .
STEP 3: IDENTIFY THE TEST TO USE

F-test on model significance (with df1 = 1 and df2 = =


13)
numerator df denominator df

NOTE: is the number of pairs of observations


STEP 4: IDENTIFY THE REJECTION
RULE

For the Attitude and Attraction example:

Rejection Rule: [*See board for Illustration]

REJECT if

> , , = ., , = .
OR

pvalue < = .
STEP 5: COMPUTE THE TEST
STATISTIC

For F-test on model significance (with df1 = 1 and df2 = 2 =


13) :

ANOVA Table:
SOURCE SS df MS Fstat
SSR
Regression SSR 1 MSR = =
1
Error SSE
SSE n* - 2 MSE =
(Residual) n 2
TOTAL SSTO n* - 1
EXAMPLE #10: ATTITUDE AND
ATTRACTION
EXAMPLE #10: ATTITUDE AND
ATTRACTION
STEP 5: COMPUTE THE TEST
STATISTIC

For the Attitude and Attraction example:

= .

Aside: < .

ANOVA Table:
STEP 6: MAKE A DECISION

For the Attitude and Attraction example:

= . > = .
OR
< . < = .
REJECT
STEP 7: FORM A CONCLUSION

For the Attitude and Attraction example:

At = 5%, one REJECTS 0 . Therefore, one has sufficient


evidence to say that the identified regression model
(i.e., predicting ones attraction level to another
person using the proportion of similar attitudes
shared between them) provide a good fit for the data.

NOTE:
For SLRM, the t-test on significance of and F-test on
model significance are SIMILAR.
is a significant predictor of the model fits the
data
=
CONSIDERATIONS ON APPLYING REGRESSION
ANALYSIS
The relationship between two interval- or ratio-level variables
and must be linear.

The applicability of the estimated regression equation only ranges


within the observed scope of .

The estimation of the regression model is sensitive to outliers.


The validity of the regression model is subject to the validity of
the model assumptions on the distribution of the s.
Normality
the error terms must be normally distributed

Homoscedasticity
the error terms must have a constant variance

Independence
the error terms must not be correlated with one another

You might also like