EEGQ 3131-Adjustment Computations-Lesson 3
EEGQ 3131-Adjustment Computations-Lesson 3
EEGQ 3131-Adjustment Computations-Lesson 3
1
SIMPLE LINEAR REGRESSION
Simple Linear Regression Model
The equation of a simple linear regression model between two variables 𝑥 and 𝑦 is written as
𝑦 = 𝑎 + 𝑏𝑥
where 𝑥 is the independent variable and 𝑦 is the dependent variable
𝑎 gives the 𝑦-intercept and 𝑏 represents the slope of the line.
This model gives the exact relationship between x and y, but in real life, that is never the case. Hence the
complete simple linear regression model used is always given as:
𝑦 = 𝑎 + 𝑏𝑥 + 𝑒
Where 𝑒 is the random errors
After estimation of the Simple linear regression Model, the computed values of the dependent variable 𝑦ො can
be compared to the observed values of the dependent variable 𝑦 so that for each data point
𝑒 = 𝑦 − 𝑦ො = 𝑦 − 𝑎 − 𝑏𝑥
2
SIMPLE LINEAR REGRESSION
Limitations of Simple Linear Regression Model
A value of a dependent variable cannot be estimated if the value of the independent variable and vice-
versa is beyond the values on which the regression data is based. If x ranges from say 200 to 400, you
cannot predict y corresponding to say x=1500 or x=2000 i.e. any value outside the range.
The analysis is confined to normally distributed data
The aim is to estimate the slope and the y-intercept of the best fitting line..
The method is also based on the assumption that the best fitting line model must pass through the mean of
the datasets
4
METHODS OF SIMPLE LINEAR REGRESSION
Simple Linear regression by least-squares method
First compute the slope b using the following formula
𝑆𝑥𝑦
𝑏=
𝑆𝑥𝑥
where
σ𝑥 σ𝑦
𝑆𝑥𝑦 = 𝑥𝑦 −
𝑛
σ𝑥 2
2
𝑆𝑥𝑥 = 𝑥 −
𝑛
We then compute the mean of the independent (x) and dependent (y) variables through the formula
σ𝑥 σ𝑦
𝑥ҧ = 𝑎𝑛𝑑 ഥ𝑦 =
𝑛 𝑛 5
METHODS OF SIMPLE LINEAR REGRESSION
Simple Linear regression by least-squares method
Since the regression equation for a set of n data points must pass through the mean of y and
x, the equation can be estimated using the formula
𝑦ത = 𝑎 + 𝑏𝑥ҧ
Where 𝑦ത and 𝑥ҧ are the mean of y and x respectively. The 𝑦-intercept and 𝑎 is therefore given
as
𝑎 = 𝑦ത − 𝑏𝑥ҧ
There are errors since we are simply taking a straight line and forcing it to fit into the given
data in the best possible way.
We then have to estimate the standard deviation 𝜎𝑒 which measures the spread of the errors
around the regression line and is calculated using:
6
𝑆𝑦𝑦 −𝑏𝑆𝑥𝑦 σ𝑦 2
𝜎𝑒 = where 𝑆𝑦𝑦 = σ 𝑦 2 −
𝑛−2 𝑛
METHODS OF SIMPLE LINEAR REGRESSION
Example
Fit the following data using simple linear regression by least squares method
x 0 2 4 6
y 11 16 19 26
First compute the slope b using the following formula
𝑆𝑥𝑦
n x y xy x2 y2 𝑏=
1 0 11 0 0 121 𝑆𝑥𝑥
2 2 16 32 4 256
σ𝑥 σ𝑦 12 (72)
3 4 19 76 16 361 𝑆𝑥𝑦 = 𝑥𝑦 − = 264 − = 264 − 216 = 48
𝑛 4
4 6 26 156 36 676
σ 12 72 264 56 1,414
σ 𝑥 2 12 2
𝑆𝑥𝑥 = 𝑥 2 − = 56 − = 56 − 36 = 20
𝑛 4
Therefore
𝑆𝑥𝑦 48 7
𝑏= = = 2.4
𝑆𝑥𝑥 20
METHODS OF SIMPLE LINEAR REGRESSION
Example
Fit the following data using simple linear
regression by least squares method
x 0 2 4 6
We then compute the means of y and x using
y 11 16 19 26
the following formula
n x y xy x2 y2
σ𝑥 12 σ𝑦 72
1 0 11 0 0 121 𝑥ҧ = = = 3 and 𝑦ത = = = 18
𝑛 4 𝑛 4
2 2 16 32 4 256
3 4 19 76 16 361
4 6 26 156 36 676 But 𝑏 = 2.4, therefore
σ 12 72 264 56 1,414
𝑎 = 𝑦ത − 𝑏𝑥ҧ = 18 − 2.4 3 = 18 − 7.2 = 10.8
Hence 8
𝑦ො = 10.8 + 2.4𝑥
METHODS OF SIMPLE LINEAR REGRESSION
Example
Fit the following data using simple linear
regression by least squares method
We can then estimate the standard deviation 𝜎𝑒
x 0 2 4 6 which measures the spread of the errors around
y 11 16 19 26
the regression line using:
𝑆𝑦𝑦 − 𝑏𝑆𝑥𝑦
n x y xy x2 y2 𝜎𝑒 =
1 0 11 0 0 121 𝑛−2
2 2 16 32 4 256 Where
3 4 19 76 16 361 σ𝑦 2 72 2
Velocity (m/s) 10 20 30 40 50 60 70 80
Force (N) 24 68 378 552 608 1218 831 1452
i. Use the linear least squares regression method to determine the coefficients a and b
in the function 𝑦 = 𝑎 + 𝑏𝑥 that best fits the data
ii. Estimate the force when the velocity is 55m/s
iii. Determine the standard error σe for the data
10
METHODS OF SIMPLE LINEAR REGRESSION
Example
n x y xy x2 y2
1 10 24 240 100 576 First compute the slope b using the following formula
2 20 68 1,360 400 4,624 𝑆𝑥𝑦
𝑏=
3 30 378 11,340 900 142,884 𝑆𝑥𝑥
4 40 552 22,080 1,600 304,704 where
5 50 608 30,400 2,500 369,664
σ𝑥 σ𝑦 360 (5131)
6 60 1,218 73,080 3,600 1,483,524 𝑆𝑥𝑦 = 𝑥𝑦 −
𝑛
= 312830 −
8
= 312830 − 230895
7 70 831 58,170 4,900 690,561 = 81935
8 80 1,452 116,160 6,400 2,108,304
σ𝑥 2 2
2
360
𝑆𝑥𝑥 = 𝑥 − = 20400 − = 20400 − 16200 = 4200
σ 360 5,131 312,830 20,400 5,104,841 𝑛 8
Therefore
𝑆𝑥𝑦 81935
𝑏= = = 19.51
𝑆𝑥𝑥 4200
11
METHODS OF SIMPLE LINEAR REGRESSION
Example
We then compute the means of y and x using the
n x y xy x2 y2
1 10 24 240 100 576
following formula
2 20 68 1,360 400 4,624
σ𝑥 360 σ𝑦 5131
3 30 378 11,340 900 142,884 𝑥ҧ = = = 45𝑚/𝑠 and 𝑦ത = = = 641.375𝑁
𝑛 8 𝑛 8
4 40 552 22,080 1,600 304,704
5 50 608 30,400 2,500 369,664
6 60 1,218 73,080 3,600 1,483,524 But 𝑏 = 19.51, therefore
7 70 831 58,170 4,900 690,561
8 80 1,452 116,160 6,400 2,108,304 𝑎 = 𝑦ത − 𝑏𝑥ҧ = 641.375 − 19.51 45 = −236.5
𝜎𝑦
𝑦 − 𝑦ത = 𝛾 𝑥 − 𝑥ҧ
𝜎𝑥
into the format 15
𝑦 = 𝑎 + 𝑏𝑥
METHODS OF SIMPLE LINEAR REGRESSION
Previous Example
An experiment gives the relationship between force (N) and velocity (m/s) for a
suspended object in a wind tunnel as
Velocity (m/s) 10 20 30 40 50 60 70 80
Force (N) 24 68 378 552 608 1218 831 1452
Use the coefficient of correlation method to determine the coefficients a and b in the
function 𝑦 = 𝑎 + 𝑏𝑥 that best fits the data
16
METHODS OF SIMPLE LINEAR REGRESSION
σ 𝒚−𝒚ഥ 𝟐 1813946
𝜎𝑦 = = = 476.17565
𝑛 8 17
METHODS OF SIMPLE LINEAR REGRESSION
𝑦 = −236.500 + 19.508343𝑥
METHODS OF SIMPLE LINEAR REGRESSION
2.25
𝑦 − 64.75 = 0.875 𝑥 − 64.25
2.35
𝑦 − 64.75 = 0.837766𝑥 − 53.8265
19
𝑦 = 10.9235 + 0.837766𝑥
METHODS OF SIMPLE LINEAR REGRESSION
Application of Simple linear regression in geospatial science
The application of simple linear regression in geospatial science is best described using geospatial information
system (GIS).
Mostly clients are interested in the production of maps that relies on modeling to predict based on actual
observations.
GIS can be used to investigate associations between such variables
the distribution of mosquito species responsible for malaria transmission i.e. species distribution vs
prevalence
temperature and relative humidity vs malaria prevalence
NDVI vs rainfall patterns
Crime vs uneducated youth
Population vs poverty index
20
geologic rock types vs groundwater recharge