Bio-L8- Correlation and Regression Analysis

Linear Regression and Correlation
Correlation and Regression:

It frequently happens that statistic want to discribe with a single number 𝑎
relationship between two sets of scores. [ A number which measures a
relationship between two sets of scores is called a correlation coefficient ].
Scatter Gram :
Consider a list of pairs of numerical values respresenting variables 𝑥 and 𝑦 .
The scatter gram of the data is simply a picture of the pairs of values as a
point in a coordinate plan 𝑅3 . The picture some times indicates a
relationship between the point as illustrated in the following examples :
Correlation Coefficient :
Pearson defined 𝑟 so that the formula for 𝑟 has a minimum possible value of −1 and
a maximum possible value of +1 , when the sample points lie exatly in a line sloping
down to the right we say there is perfect negative correlation : 𝑟 = −1 , when the
sample points lie exactly in a line sloping up to the right , we say there is perfect
positive correlation : 𝑟 = +1 , when there is no tending of the points to the lie in a
straight line we say there is no correlation 𝑟 = 0 .
If 𝑟 is near to +1 or −1 we say we have high correlation. If 𝑟 is near zero, we say
that we have lowe correlation .
𝑛 𝑛
𝑖=1(𝑥𝑖 −𝑋) 𝑖=1(𝑦𝑖 −𝑌)
𝑟=
𝑛 (𝑥 −𝑋)2 𝑛 (𝑦 −𝑌)2
𝑖=1 𝑖 𝑖=1 𝑖
or
𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 − ( 𝑖=1 𝑥𝑖 𝑖=1 𝑦𝑖 )/𝑛
𝑟=
𝑛 2 𝑛 2
𝑛 2
( 𝑖=1 𝑥𝑖 ) 𝑛 2
( 𝑖=1 𝑦𝑖 )
𝑖=1 𝑥𝑖 − 𝑛 𝑖=1 𝑦𝑖 − 𝑛
64 81 81
Regression Line :
A regression line is a straight line that describes how a response variable
y changes as an explanatory variable x changes. We often use a
regression line to predict the value of y for a given value of x.
𝑦 = 𝑎 + 𝑏𝑥
𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 −𝑛𝑋𝑌
𝑏= 𝑛 𝑥 2 −𝑛𝑋 2
𝑖=1 𝑖
𝑎 = 𝑌 − 𝑏𝑋
𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 = 711 , 𝑖=1 𝑥𝑖 = 47 , 𝑋 = 47/6 , 𝑖=1 𝑦𝑖 = 79 , 𝑌 = 79/6
𝑛 2
𝑖=1 𝑥𝑖 = 423 ,
𝑏 = 1.68 , 𝑎 = 𝑌 − 𝑏𝑋 = 0.005 , 𝑦 = 0.005 + 1.68𝑥
b.) when 𝑥 = 4 , 𝑦 = 0.005 + 1.68 ∗ (4) = 6.725
when 𝑥 = 1 , 𝑦 = 0.005 + 1.68 ∗ (1) = 1.685
when 𝑥 = 15 , 𝑦 = 0.005 + 1.68 ∗ 15 = 25.205
Example : let we have the following data :

𝑥 6 6 7 8 9 9
𝑦 5 6 6 7 7 8
then , 1) construct a scatter gram and draw a calculated regression line .
2) find the correlation coefficient and explain it .

Simple and Multiple Linear Regression Models
Definition: A multiple linear regression model
relating a random response 𝑌 to a set of predictor
variables 𝑥1 , . . . , 𝑥𝑛 is an equation of the form
𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +· · ·+𝛽𝑘 𝑥𝑘 + ε
Where 𝛽0 , . . . , 𝛽𝑘 are unknown parameters,
𝑥1 , . . . , 𝑥𝑘 are the independent non-random
variables, and 𝜀 is a random variable
representing an error term. We assume that
𝐸(𝜀) = 0, or equivalently
, 𝐸(𝑌) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +· · ·+𝛽𝑘 𝑥𝑘 .
Definition: If 𝑌 = 𝛽0 + 𝛽1 𝑥+𝜀 , this

is called a simple
linear regression model. Here, 𝛽0 is
the y-intercept of the line and 𝛽1 is
the slope of the line. The term ε is
the error component.
11
The Method of Least Squares
Let (𝑥1 , 𝑦1 ), ( 𝑥2 , 𝑦2 ), . . . ( 𝑥𝑛 , 𝑦𝑛 ), are the 𝑛 observed data points, with
corresponding errors 𝜀𝑖 , 𝑖 = 1, . . . , 𝑛. That is, 𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 +𝜀𝑖 , 𝑖 = 1, 2, . . . , 𝑛.
We assume that the errors 𝜀𝑖 , 𝑖 = 1, 2, . . . , 𝑛 are independent and identically
distributed with 𝐸(𝜀𝑖 ) = 0 , 𝑖 = 1, 2, . . . , 𝑛 and 𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2 , 𝑖 = 1, 2, . . . , 𝑛 .
One of the ways to decide on how well a straight line fits the set of data is to
determine the extent to which the data points deviate from the line. The
straight line model for the response 𝑌 for a given 𝑥 is 𝑌 = 𝛽0 + 𝛽1 𝑥+𝜀 .
Because we assumed that 𝐸(𝜀) = 0, the expected value of 𝑌 is given by
𝐸(𝑌) = 𝛽0 + 𝛽1 𝑥.
The estimator of the 𝐸(𝑌), denoted by 𝑌, can be obtained by using the
estimators 𝛽0 and 𝛽1 of the parameters 𝛽0 and 𝛽1 , respectively. Then, the fitted
regression line we are looking for is given by 𝑌 = 𝛽0 + 𝛽1 𝑥.
For observed values (𝑥𝑖 , 𝑦𝑖 ),we obtain the estimated value of 𝑦𝑖 as
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 .
12
The deviation of observed 𝑦𝑖 from its predicted value 𝑦𝑖 , called the 𝑖𝑡ℎ residual, is
defined by 𝑒𝑖 = 𝑦𝑖 − 𝑦𝑖 = [𝑦𝑖 − (𝛽0 + 𝛽1 𝑥𝑖 )].
The residuals, or errors 𝑒𝑖 , are the vertical distances between observed and predicted
values of 𝑦𝑖 ’s.
Definition: The sum of squares for errors (SSE) or sum of squares of the residuals for
all of the 𝑛 data points is 𝑆𝑆𝐸 = 𝑛𝑖=1 𝑒𝑖2 = 𝑛𝑖=1[𝑦𝑖 −(𝛽0 + 𝛽1 𝑥𝑖 )]2 .
The least-squares approach to estimation is to find 𝛽0 and 𝛽1 that minimize the sum
of squared residuals, SSE.
Derivation of 𝜷𝟎 and 𝜷𝟏 : To simplify the formula for 𝛽1 , set
𝑛 2 ( 𝑛
𝑖=1 𝑥𝑖 )
2
𝑛
𝑛
𝑖=1 𝑥𝑖
𝑛
𝑖=1 𝑦𝑖
𝑆𝑥𝑥 = 𝑖=1 𝑥𝑖 − , 𝑆𝑥𝑦 = 𝑖=1 𝑥𝑖 𝑦𝑖 −
𝑛 𝑛
𝑆𝑥𝑦
𝛽1 =  𝛽0 = 𝑦 − 𝛽1 𝑥  𝑌 = 𝛽0 + 𝛽1 𝑥.
𝑆𝑥𝑥
13
Example 1:Use the method of least squares to fit a straight line to the accompanying data points.
Give the estimates of 𝛽0 and 𝛽1 . Plot the points and sketch the fitted least-squares line. The observed
data values are given in the following table. 𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9
Solution: Form a table to compute various terms
14
𝑛 2 ( 𝑛
𝑖=1 𝑥𝑖 )
2 (38)2
𝑆𝑥𝑥 = 𝑖=1 𝑥𝑖 − 𝑛
= 408 − 10
= 263.6
𝑛 𝑛
𝑛 𝑖=1 𝑥𝑖 𝑖=1 𝑦𝑖 38 46
𝑆𝑥𝑦 = 𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛
= 709 − 10
= 534.2
𝑥 = 3.8, 𝑦 = 4.6
𝑆𝑥𝑦 534.2
Therefore, 𝛽1 = 𝑆𝑥𝑥 = 263.6 = 2.0266
and 𝛽0 = 𝑦 − 𝛽1 𝑥 = 4.6 − (2.0266)(3.8) = −3.1011
Hence, the least-squares line for these data is
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 = −3.1011 + 2.0266 x
EXAMPLE : Fit a least square line for the following data. Also find the trend values
(𝑦 ) and show that (𝑦 − 𝑦)=0
𝑥 1 2 3 4 5
𝑦 2 5 3 8 7
H.W
15

Bio-L8- Correlation and Regression Analysis

Uploaded by

Copyright:

Available Formats

Bio-L8- Correlation and Regression Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bio-L8- Correlation and Regression Analysis

Uploaded by

Copyright:

Available Formats

Linear Regression and Correlation

Correlation and Regression:

𝑏 = 1.68 , 𝑎 = 𝑌 − 𝑏𝑋 = 0.005 , 𝑦 = 0.005 + 1.68𝑥

b.) when 𝑥 = 4 , 𝑦 = 0.005 + 1.68 ∗ (4) = 6.725

when 𝑥 = 1 , 𝑦 = 0.005 + 1.68 ∗ (1) = 1.685

when 𝑥 = 15 , 𝑦 = 0.005 + 1.68 ∗ 15 = 25.205

Example : let we have the following data :

then , 1) construct a scatter gram and draw a calculated regression line .

2) find the correlation coefficient and explain it .

Definition: If 𝑌 = 𝛽0 + 𝛽1 𝑥+𝜀 , this

Solution: Form a table to compute various terms

You might also like