Curve Fitting - DS
Curve Fitting - DS
Curve Fitting - DS
Curve fitting is the process of finding the “BEST FIT” curve for a given set of data. It is the
representation of the relationship between two variables by means of an algebraic equation.
On the basis of this mathematical equation, predictions can be made in many statistical problems.
Suppose a set of ‘n’ points of values (x1,y1), (x2,y2), ….., (xn,yn) of the two variables x and y are
given. These values are plotted on a rectangular coordinate system i.e., xy- plane. The resulting set
of points is known as a scatter diagram.
The scatter diagram exhibits the trend and it is possible to visualize a smooth curve
approximating the data. Such a curve is known as an approximating curve.
Let P (xi, yi) be a point on the scatter diagram. Let the ordinate at P meet the curve y=f(x) at
Q and the x-axis at M.
Distance QP = MP – MQ = yi – y = yi – f(xi)
CURVE FITTING
The distance QP is known as deviation, error, or residual and is denoted by di. It may be positive,
negative, or zero depending upon whether P lies above, below, or on the curve.
Similar residuals or errors corresponding to the remaining (n-1) points may be obtained.
If E = 0 then all the ‘n’ points will lie on the curve y = f(x). If E is not equal to zero, then f(x) is chosen
such that E is minimum, i.e., the best fitting curve to the set of points is that for which E is minimum.
(For this particular point, we need to depend on CALCULUS, and this is the starting point realize how
optimization problems plays a vital role in predictive analysis)
THIS METHOD IS KNOWN AS THE LEAST – SQUARES METHOD. This method does not attempt to
determine the form of the curve y = f(x) but determines the values of the parameters of the
equation of the curve.
The above two equations (A) and (B) are known as normal equations. These equations can be solved
simultaneously to give the best values of ‘a’ and ‘b’. The best fitting straight line is obtained by
substituting the values of ‘a’ and ‘b’ in the equation y = a+bx
Example 1:
Fit a straight line to the following data. Also, estimate the value of ‘y’ at x=2.5
x 0 1 2 3 4
y 1 1.8 3.3 4.5 6.3
here n = 5
CURVE FITTING
x y X2 xy
0 1 0 0
1 1.8 1 1.8
2 3.3 4 6.6
3 4.5 9 13.5
4 6.3 16 25.2
Total 10 16.9 30 47.1
16.9 = 5a + 10b
47.1 = 10a + 30b
Solving the above system of equations
a = 0.72 and b = 1.33
Hence, the required straight line equation is y = 0.72 + 1.33 x
Y (at x = 2.5) = 0.72 + 1.33(2.5) = 4.045
Example 2:
Fit a straight line to the following data taking ‘x’ as the dependent variable
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Solution:
If ‘x’ is considered as the dependent variable and ‘y’ as the independent variable , the equation of
the straight line to be fitted to the data is x = a+by
Here n = 8
x y y2 xy
1 1 1 1
3 2 4 6
4 4 16 16
6 4 16 24
8 5 25 40
9 7 49 63
11 8 64 88
14 9 81 126
Total 56 40 256 364
56 = 8a + 40b
Problem:
T (Degree) 19 25 30 36 40 45 50
R 76 77 79 80 82 83 85
Find a relation R = a+bT, where ‘a’ and ‘b’ are constants to be determined.
The equations (A), (B) and (C) are known as ‘normal equations’. These equations can be
solved simultaneously to give the best values of ‘a’, ‘b’ and ‘c’. The best fitting parabola is obtained
by substituting the values of ‘a’, ‘b’ and ‘c’ in the equation y = a+bx+cx2.
CURVE FITTING
Example 3:
x 1 2 3 4
y 1.7 1.8 2.3 3.2
Solution:
Let the relationship between the variables ‘x’ and ‘y’ be y = a+bx+cx2.
Here n = 4
x y X2 X3 X4 XY X2Y
1 1.7 1 1 1 1.7 1.7
2 1.8 4 8 16 3.6 7.2
3 2.3 9 27 81 6.9 20.7
4 3.2 16 64 256 12.8 51.2
Total 9 30 100 354 25 80.8
10
9 = 4a + 10b + 30c
a = 2, b = - 0.5, c = 0.2
y = 2 – 0.5x +0.2 x2
Problem:
Fit a curve y = a+bx+cx2 for the given data by using the method of least squares
X 3 5 7 9 11 13
Y 2 3 4 6 5 8
Example 4:
X 1 2 3 4 5 6
Y 2.51 5.82 9.93 14.84 20.55 27.06
Solution:
HOW ??
Here n = 4
x y x2 x3 x4 xy x2y
1 2.51 1 1 1 2.51 2.51
2 5.82 4 8 16 11.64 23.28
3 9.93 9 27 81 29.79 89.37
4 14.84 16 64 256 59.36 237.44
5 20.55 25 125 625 102.75 513.75
6 27.06 36 216 1296 162.36 974.16
TOTAL 91 441 2275 368.41 1840.51