Least Squares Fitting of Data To A Curve
Least Squares Fitting of Data To A Curve
Least Squares Fitting of Data To A Curve
of Data to a Curve
Gerald Recktenwald
Portland State University
Department of Mechanical Engineering
[email protected]
The latest version of this PDF file, along with other supplemental material
for the book, can be found at www.prenhall.com/recktenwald.
(xi, yi), i = 1, . . . , m
F (x) = αx + β
Questions:
• How do we define good fit?
• How do we compute α and β after a definition of “good fit”
is obtained?
4
y
3
1
1 2 3 4 5 6
x
ri = yi − F (xi) = yi − (αxi + β)
4
y
3
1
1 2 3 4 5 6
x
m
2
ρ= [yi − (αxi + β)]
i=1
d4
d3
d2
d1
x1 x2 x3 x4
2
Minimizing di is known as the Orthogonal Distance
Regression problem. This problem readily solved with Matlab,
but it is beyond the scope of our present discussion1 .
1 See, e.g., Åke Björk, Numerical Methods for Least Squares Problems, 1996, SIAM,
Philadelphia
m
2
ri
i=1
where
m
m
Sxx = xixi Sx = xi
i=1 i=1
m m
Sxy = xiyi Sy = yi
i=1 i=1
Note: Sxx, Sx, Sxy , and Syy can be directly computed from
the given (xi, yi) data. Thus, Equation (1) and (2)
are two equations for the two unknowns, α and β .
1
α = (SxSy − mSxy ) (3)
d
1
β = (SxSxy − SxxSy ) (4)
d
with
2
d = Sx − mSxx (5)
Now, let’s rederive the equations for the fit. This will give us
insight into the process or fitting arbitrary linear combinations of
functions.
αx1 + β = y1
αx2 + β = y2
or
x1 1 α y1
=
x2 1 β y2
or
Ac = y
where
x1 1 y1
x2 1 α y2
A=
... ...
c= y=
...
β
xm 1 ym
Minimizing ρ requires
∂ρ T T
= −2A y + 2A Ac = 0
∂c
or
T T
(A A)c = A b
>> x = [1 2 4 5];
>> y = [1 2 2 3];
>> c = linefit(x,y)
4
y data and fit function
0
0 1 2 3 4 5 6
x
Define:
ŷ is the value of the fit function at the
known data points
ŷi = c1xi + c2 for a line fit
ȳ is the average of the y values
1
ȳ = yi
m
Then:
2
2
(ŷi − ȳ)
r22
R = =1−
(yi − ȳ)
2 (yi − ȳ)2
0 ≤ R 2 ≤ 1.
Graphical interpretation of
2 r22
R =1−
(yi − ȳ)2
1
1 2 3 4 5 6
x
5
Vertical distances between
4
given y data and the average y
of the y . Vertical lines show 3
contributions to (yi −ȳ)2 2
1
1 2 3 4 5 6
x
205
GPa
200
Bulk Modulus
195
190
185
180
0 500 1000 1500
°
T C
Examples:
y = c1ec2x −→ ln y = αx + β
y = c1xc2 −→ ln y = α ln x + β
y = c1xec2x −→ ln(y/x) = αx + β
Consider
c x
y = c1e 2 (6)
Taking the logarithm of both sides yields
ln y = ln c1 + c2x
v = ln y b = ln c1 a = c2
v = ax + b
y = c1ec2x ln y = c2x + ln c1
1
5 10
4.5
3.5
0
10
2.5
y
2
-1
10
1.5
0.5
-2
0 10
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x x
Consider
c
y = c1x 2 (7)
Taking the logarithm of both sides yields
ln y = ln c1 + c2 ln x
v = ln y u = ln x b = ln c1 a = c2
v = au + b
y = c1xc2 ln y = c2 ln x + ln c1
1
5 10
4.5
3.5
0
2.5 y 10
y
1.5
0.5
-1
0 10
-2 -1 0 1
0 1 2 3 4 5 6 10 10 10 10
x x
Consider
c2 x
y = c1xe (8)
The transformation
y
v = ln a = c2 b = ln c1
x
v = ax + b
0.45 0
0.4 -1
0.35 -2
0.3 -3
ln(y/x)
0.25 -4
y
0.2 -5
0.15 -6
0.1 -7
0.05 -8
0 -9
0 1 2 3 4 5 6 0 1 2 3 4 5 6
x x
function c = xexpfit(x,y)
% xexpfit Least squares fit of data to y = c(1)*x*exp(c(2)*x)
%
% Synopsis: c = xexpfit(x,y)
%
% Input: x,y = vectors of independent and dependent variable values
%
% Output: c = vector of coefficients of y = c(1)*x*exp(c(2)*x)
0.7
original
noisy
0.6 fit
0.5
c1 = 5.770 c2 = -3.233
0.4
200 points in synthetic data set
y
0.3
0.2
0.1
0
0 0.5 1 1.5 2
x
or
n
F (x) = cj fj (x)
j=1
The basis functions
The coefficients
c1, c2, . . . , cn
are determined by the least squares method
2 2/3 x 4x
1, x, x , x , sin x, e , xe , cos(ln 25x)
are not valid basis functions as long as the cj are the parameters
of the fit.
Since F (xi)
= yi, the residual for each data point is
n
ri = yi − F (xi) = yi − cj fj (xi)
j=1
The least-squares solution gives the cj that minimize r2.
c1 y1
c2 y2
c=
... , y=
... .
cn ym
x 1 2 3 4 5 6
y 10 5.49 0.89 −0.14 −1.07 0.84
12
10
6
y
−2
0 1 2 3 4 5 6 7
x
The A matrix is
x21 x1 1
x2 x2 1
A=
..
.
2
... ...
x2m xm 1
>> x = (1:6)’;
>> y = [10 5.49 0.89 -0.14 -1.07 0.84]’;
>> disp(A’*A)
2275 441 91
441 91 21
91 21 6
>> disp(A’*y)
A’*y =
41.2200
22.7800
16.0100
>> c = (A’*A)\(A’*y)
c =
0.8354
-7.7478
17.1160
F(x) = c1 x2 + c2 x + c3
12
10
6
y
−2
0 1 2 3 4 5 6 7
x
>> x = (1:6)’;
>> y = [10 5.49 0.89 -0.14 -1.07 0.84]’;
>> A = [1./x x];
>> c = (A’*A)\(A’*y)
F(x) = c1/x + c2 x
12
10
6
y
−2
0 1 2 3 4 5 6 7
x
Thus, if A is available
F (x) = Ac
function A = xinvxfun(x)
A = [ 1./x(:) x(:) ];
Advantages:
Disadvantages:
function A = poly2Basis(x)
A = [ x(:).^2 x(:) ones(size(x(:)) ];
1
, x
x
Create the m-file xinvxfun.m
function A = xinvxfun(x)
A = [ 1./x(:) x(:) ];
Syntax:
c = polyfit(x,y,n)
[c,S] = polyfit(x,y,n)
Syntax:
yf = polyval(c,xf)
[yf,dy] = polyval(c,xf,S)
x 1 2 3 4 5 6
y 10 5.49 0.89 −0.14 −1.07 0.84
In Matlab:
>> x = (1:6)’;
>> y = [10 5.49 0.89 -0.14 -1.07 0.84]’;
>> c = polyfit(x,y,3);
>> xfit = linspace(min(x),max(x));
>> yfit = polyval(c,xfit);
>> plot(x,y,’o’,xfit,yfit,’--’)
12
10
−2
1 2 3 4 5 6
35
30
25
Conductivity (W/m/C)
20
15
10
0
0 10 20 30 40 50 60
Temperature (K)
1
k(T ) = c1 2
+ c2T
T
To fit using linear least squares we need to write this as
1 c1 2
γ(T ) = = + c2T
k(T ) T