Correlation

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

Exploring Relationships

Lesson
. . 1
Correlation
Lesson 1: Correlation
Bivariate Data

Bivariate data is data in which two variables are measured on an


individual.
The response variable is the variable whose value can be explained
or determined based upon the value of the predictor variable.
A lurking variable is one that is related to the response and/or
predictor variable, but is excluded from the analysis

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Scatter Diagrams
A scatter diagram shows the relationship between two quantitative
variables measured on the same individual.
The value of the predictor is read on the
horizontal axis and the response variable
on the vertical axis.
Each individual in the data set is
represented by a point in the scatter
diagram.
Do not connect the points when drawing
a scatter diagram.

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 1: Drawing a Scatter Diagram
Weight Miles Per
P 202, #16. An engineer wanted to determine (pounds) Gallon
how the weight of a car affected the gas mileage. 3565 19

The data represent the weight of various 3440 20


3970 17
domestic cars and their city mileage rating (in 3305 19
mpg) for the 2001 model year. 3340 20
3200 20
(a) Determine which is the likely predictor 3230 19
variable and which is the likely response 2560 28
variable. 2520 28
3065 20
Predictor variable: weight 3600 18
3300 19
Response variable: mileage 3625 19
3590 19
2605 23
2370 28

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 1: Drawing a Scatter Diagram
Weight Miles Per
P 202, #16. An engineer wanted to determine (pounds) Gallon
how the weight of a car affected the gas mileage. 3565 19

The data represent the weight of various 3440 20


3970 17
domestic cars and their city mileage rating (in 3305 19
mpg) for the 2001 model year. 3340 20
3200 20
(b) Draw a scatter diagram. 3230 19
Weight vs. Mileage 2560 28
City Mileage (MPG)

30 2520 28
3065 20
25 3600 18
3300 19
20
3625 19
3590 19
15
2000 2500 3000 3500 4000 2605 23
Weight (lbs) 2370 28

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Relationships Between Two Variables
Scatter diagrams reveal the type of relationship or trend that exists
between two variables.
Linear
Nonlinear No trend
(Decreasing)

Linear
(Increasing) Nonlinear

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 2: Identifying the Trend
P 199, #1 – 4. Determine whether the relationship between the
variables is linear or non-linear. If linear, indicate whether there is a
positive or negative trend.
1. 2.

Linear
Nonlinear Negative

3. 4.

Linear
Positive Nonlinear

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Positive Linear Relationships
Two variables that are linearly related are said to be positively
associated when above average values of one variable are
associated with above average values of the corresponding
variable.
II I That is, two variables are
positively associated when the
values of the predictor
variable increase, the values
y of the response variable also
III IV increase.

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Negative Linear Relationships
Two variables that are linearly related are said to be negatively
associated when above average values of one variable are
associated with below average values of the corresponding
variable.
II I That is, two variables are
negatively associated when the
values of the predictor variable
increase, the values of the
y response variable decrease.
III IV

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Measuring the Strength of the Linear Relationship
The linear correlation coefficient (or Pearson product moment
correlation coefficient) is a measure of the strength of linear relation
between two quantitative variables.
We use the Greek letter ρ (rho) to represent the population
correlation coefficient and r to represent the sample correlation
coefficient.
�yi - y �
�xi - x �
We shall only present the formula for �� s �� �s � �
� x � � y �
the sample correlation coefficient: r=
n -1
The correlation coefficient is a unitless measure of association. The
units of measure for x and y play no role in the interpretation of r.

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Properties of the Linear Correlation Coefficient
The linear correlation coefficient is always between –1 and 1.
If r = +1, there is a perfect positive r=1
linear relation between the two
variables.

The closer r is to +1, the stronger the evidence of positive association


between the two variables.

r ≈ .9 r ≈ .4

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Properties of the Linear Correlation Coefficient
If r = –1 , there is a perfect negative
linear relation between the two variables.

The closer r is to –1 , the stronger the


evidence of negative association between r = –1

the two variables.

r ≈ –.9 r ≈ –.4

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Properties of the Linear Correlation Coefficient
If r is close to 0, there is little or no linear relation between the two
variables.

r ≈ 0, no relationship r ≈ 0, nonlinear relationship

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 3: Estimating Correlation from a Scatter Plot
P 200, # 6. Match the correlation coefficient to the scatter diagram.
(c) r = –1 (d) r = –0.992 (b) r = –0.049 (a) r = –0.969

(a) r = –0.969
(b) r = –0.049
(c) r = –1
(d) r = –0.992

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 4: Anticipating Correlation
P 205, #27. For each of the following statements, state whether you
think the variables will have a positive correlation, negative
correlation, or no correlation.
(a) Number of children in the household under the age of 3 and
expenditures on diapers. Positive correlation
(b) Interest rates on car loans and the number of cars sold. Negative
(c) Number of hours per week on the treadmill and cholesterol level.
Negative correlation
(d) Price of a Big Mac and the number of MacDonald’s french fries
sold in a week. Negative correlation
(e) Shoe size and IQ. No correlation

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Calculating the Correlation Coefficient
A more efficient formula for computing the correlation coefficient is

S xy
r=
S xx S yy

( x)
2

where S xx = �( xi - x ) = �x
2 2
-
� i
i
n
( y)
2

S yy = �( yi - y ) = �y
2 2
-
� i
i
n

S xy = �( xi - x )( yi - y ) = �xi yi -
( �x ) ( �y )
i i

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 5: Computing a Correlation
P 200, # 8. Given the data: x y
(a) Draw a scatter diagram. 2 5.7
3 5.2
y 5 2.8
6
6 1.9
5
4 6 2.2
3
2
1
0
x
1 2 3 4 5 6

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 5: Computing a Correlation
P 200, # 8. Given the data: x y x2 y2 xy
(b) Compute the correlation 2 5.7 4 32.49 11.4

coefficient. 3 5.2 9 27.04 15.6


5 2.8 25 7.84 14.0
Compute x , y , and xy.
2 2
6 1.9 36 3.61 11.4
Sum all columns. 6 2.2 36 4.84 13.2
22 17.8 110 75.82 65.6
Calculate SSxx, SSyy, and SSxy.
222 17.82
S xx = 110 - = 13.2 S yy = 75.82 - = 12.452
5 5
(22)(17.8)
S xy = 65.6 - = -12.72
5
-12.72
Calculate the correlation: r = = -.99
(13.2)(12.452)
Unit 2: Probability Distributions t z f
Lesson 1: Correlation
Example 5: Computing a Correlation
P 200, # 8. Given the data: x y x2 y2 xy
(c) Comment on the relationship 2 5.7 4 32.49 11.4
between x and y. 3 5.2 9 27.04 15.6
5 2.8 25 7.84 14.0
The correlation coefficient
6 1.9 36 3.61 11.4
indicates there is a strong
6 2.2 36 4.84 13.2
negative linear relationship
22 17.8 110 75.82 65.6
between x and y.

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 6: Weight vs. Mileage Rating
Weight Miles Per
P 202, #16. The data represent the weight of (pounds) Gallon
various domestic cars and their city mileage 3565 19

rating (in mpg) for the 2001 model year. 3440 20


3970 17
(c) What type of relation that appears to exist 3305 19

between the weight of the car between the 3340 20


3200 20
weight of a car and its city mileage rating. 3230 19

Weight vs. Mileage There is a 2560 28


City Mileage (MPG)

2520 28
30
negative linear 3065 20
25 relationship 3600 18

20
between 3300 19
3625 19
weight and 3590 19
15
2000 2500 3000 3500 400 mileage. 2605 23
0
Weight (lbs) 2370 28

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 6: Drawing a Scatter Diagram
Weight Miles Per
P 202, #16. The data represent the weight of (pounds) Gallon
various domestic cars and their city mileage 3565 19

rating (in mpg) for the 2001 model year. 3440 20


3970 17
(d) Compute the linear correlation coefficient 3305 19

between the weight of the car between the 3340 20


3200 20
weight of a car and its city mileage rating. 3230 19

Weight vs. Mileage r = –.92 2560 28


City Mileage (MPG)

30 2520 28
3065 20
25 3600 18
3300 19
20
3625 19

15
3590 19
2000 2500 3000 3500 400 2605 23
0
Weight (lbs) 2370 28

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Correlation & Causation
A word of caution when interpreting the correlation coefficient:
A linear correlation coefficient that implies a strong positive or
negative association that is computed using observational data
does not imply causation among the variables.
The predictor and response variables may both be determined by an
unknown lurking variable.
If data are obtained through a controlled experiment, then a strong
linear correlation also implies causation.

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 7: Brain Size and Intelligence
P 203, #21. Researchers interested in whether a person’s brain size is
related to mental capacity selected a sample of 20 students who had
SAT scores higher than 1350 and administered an IQ test. Brain size
was determined by an MRI scan.
MRI MRI
(a) Use the TI-83 to Gender Count IQ Gender Count IQ
draw a scatter Female 816932 133 Male 949395 140
diagram treating Female 951545 137 Male 1001121 140
MRI count as the Female 991305 138 Male 1038437 139
Female 833868 132 Male 965353 133
predictor variable
Female 856472 140 Male 955466 133
and IQ as the Female 852244 132 Male 1079549 141
response variable. Female 790619 135 Male 924059 135
Female 866662 130 Male 955003 139
Female 857782 133 Male 935494 141
Female 948066 133 Male 949589 144

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 7: Brain Size and Intelligence
P 203, #21. Researchers interested in whether a person’s brain size is
related to mental capacity selected a sample of 20 students who had
SAT scores higher than 1350 and administered an IQ test. Brain size
was determined by an MRI scan.
MRI MRI
:(b) Use the TI-83 to Gender Count IQ Gender Count IQ
compute the Female 816932 133 Male 949395 140
correlation Female 951545 137 Male 1001121 140
coefficient between Female 991305 138 Male 1038437 139
Female 833868 132 Male 965353 133
the MRI count and
Female 856472 140 Male 955466 133
IQ. Do they appear Female 852244 132 Male 1079549 141
to be linearly Female 790619 135 Male 924059 135
related? Female 866662 130 Male 955003 139
Female 857782 133 Male 935494 141
Female 948066 133 Male 949589 144

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 7: Brain Size and Intelligence
P 203, #21. Researchers interested in whether a person’s brain size is
related to mental capacity selected a sample of 20 students who had
SAT scores higher than 1350 and administered an IQ test. Brain size
was determined by an MRI scan.
MRI MRI
(c) Gender is a lurking Gender Count IQ Gender Count IQ
variable in the Female 816932 133 Male 949395 140
analysis. Draw Female 951545 137 Male 1001121 140
separate scatter Female 991305 138 Male 1038437 139
Female 833868 132 Male 965353 133
diagrams for each
Female 856472 140 Male 955466 133
gender. What do Female 852244 132 Male 1079549 141
you notice? Female 790619 135 Male 924059 135
Female 866662 130 Male 955003 139
Female 857782 133 Male 935494 141
Female 948066 133 Male 949589 144

Unit 2: Probability Distributions t z f


Lesson 1: Correlation
Example 7: Brain Size and Intelligence
P 203, #21. Researchers interested in whether a person’s brain size is
related to mental capacity selected a sample of 20 students who had
SAT scores higher than 1350 and administered an IQ test. Brain size
was determined by an MRI scan.
MRI MRI
(d) Calculate the Gender Count IQ Gender Count IQ
correlation Female 816932 133 Male 949395 140
coefficient Female 951545 137 Male 1001121 140
separately for males Female 991305 138 Male 1038437 139
Female 833868 132 Male 965353 133
and females. Do
Female 856472 140 Male 955466 133
you still believe that Female 852244 132 Male 1079549 141
MRI count and IQ Female 790619 135 Male 924059 135
are linearly related? Female 866662 130 Male 955003 139
Female 857782 133 Male 935494 141
Female 948066 133 Male 949589 144

Unit 2: Probability Distributions t z f

You might also like