Curve Fitting: Deyan Mihaylov

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Curve fitting

Deyan Mihaylov Merton College, University of Oxford Abstract In this report I have investigated the method of least squares for fitting a polynomial of arbitrary degree into a given data set. Further I implemented the method into an application allowing the user to load their own data. The program is able to fit not only linear but also higher degree polynomials. Introduction Curve fitting is important technique in the scope of experimental physics, where a particular relation has to be extracted from a large set of collected data. Where a smooth function is required, the method of least squares proves very useful in performing the regression analysis. The method itself was first developed by Carl Friedrich Gauss circa 1794. It corresponds to finding the minimum sum of the squares of the residuals of the experimental data from the fitted values. The fitted polynomial represents the best approximation to the set. The application is able to find this function but also to present the fitted values according to it. Theoretical background Generating random data The first task related to the project was to design a generator of data set with randomly distributed error, simulating experimental error of collected data. Principle values are to be generated using a polynomial of arbitrary degree and coefficients. Furthermore, for each value is generated a random deviation to simulate experimental error:

+ +

+
is a random number

where e is the r.m.s. square deviation to be imposed and generated with Gaussian distribution:

=
where
,

(2

are random numbers in the interval (0;1).

Straight line fit 1

The task to fit a regression line in a given data set is an introduction to the more complicated problem of fitting a polynomial of an arbitrary degree. The straight line has the form:

=
The squares sum is given by:

which we can minimise by taking the partial derivatives with respect to the constants and setting them to zero. The equations we obtain are:

+ +

= =

By solving this system of linear equations with respect to a and b we can identify them to be:

By directly computing the sums and hence calculating the constants we can easily find the coefficients. Polynomial curve fitting Now we are about to accomplish the task we set ourselves fit a polynomial curve in a data set. In a similar manner to the previous section we can define the squared deviation to be:

+ +

We can again minimise this expression, this time obtaining m+1 linear equations determining the m+1 coefficients that we want to find out: 2

=
with k being the number of the equation and A and v defined to be:

=
and

=
It is important to note that A is actually a matrix, while v is a vector, and therefore the system of equations could be written in the form:

=
where c is the vector containing the coefficients which we are about to determine. Technical details The application was developed in the C environment using xCode under Mac OS. Its licensed under GNU, which allows the program to be modified for personal needs and redistributed both in its original and new version. The task is carefully explained in the script for the project please refer to it if you need further clarification. Note: The code of the application is appended in the end of this report. Section 1 generatig a data set and storing it in a file #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <math.h> Our first job is to include the libraries we are going to use in this case, the more specific ones are string.h dealing with strings, time.h containing functions allowing

random numbers to be generated and math.h to be able to use more complicated mathematics functions. int main () { int n, degree, i, j, k, l; Starting the main function, we declare some of the variables we are going to use note that other variables are declared throughout the programs where needed. degree contains the degree of the polynomial to be used for generating the data set and the other one-letter variables are counters. char suffix[4]; printf("Please enter the number of points to be generated:\n"); scanf("%d", &n); printf("Please enter the degree of polinomial to be used:\n"); scanf("%d", &degree);

double c[degree+1], e; printf("Please enter the experimental error for the data set:\n"); scanf("%lf", &e); for(i=0;i<=degree;i++) { switch(i) { case 1: strcpy(suffix, "st"); break; case 2: strcpy(suffix, "nd"); break; case 3: strcpy(suffix, "rd"); break; default: strcpy(suffix, "th"); } printf("Please enter the value of %d%s coefficient:\n", i, suffix); scanf("%lf", &(c[i])); } Since I wanted to create an easy-to-use application, I have developed a userfriendly interface for entering the variables. The user is first prompted to enter the number of points in the data set, the degree of the polynomial and the

experimental error. After that, using a loop, the program prompts for the values of the coefficients. double step, x, r1, r2, y, s, g; srand48((unsigned int) time(NULL)); FILE *fout; if((fout=fopen("fit.txt", "w"))==NULL) { printf("Cannot open the file."); exit(EXIT_FAILURE); } step=1.0/n; for(j=1;j<=n;j++) { x=j*step; r1=drand48(); r2=drand48(); g=sqrt(-2*log(r1))*cos(2*M_PI*r2); s=0; for(k=0;k<=degree;k++) { s+=c[k]*pow(x,k); } y=s+e*g; fprintf(fout, "%f %f\n", x, y); } fclose(fout); In this fragment of the program the data is actually generated and stored in a file. Firstly we define several more variables. Since we have to use values for x uniformly distributed in the interval (0;1), step contains the increase in x. After that we define the random operator that we are going to use for the random values. From that point on, the code starts to deal with file manipulation. For further explanation of these commands, one can refer to the Computing practical guide.

The script creates a loop that runs from 0 to 1 through all the multiples of the step (defined to be 1/n), thus creating n different but equally spaced values for x. To every value of x is associated a value of y. The value of y consists of 2 parts one of them computed used the polynomial and the value for x, and the other one dependent on two random numbers, which are then used to generate a random Gaussian number. The value for x and the combined value for y are then stored in a text file suitable for further use by other programs. exit(0); } The program is then finished with an exit function. Section 2 using an external data set to perform a linear fit Note: The practical task for this section included drawing a graph. Please find the printed graph appended in the end of the report. #include <stdio.h> #include <stdlib.h> #include <math.h> Once again we start by including the relevant libraries. int main () { FILE *fp; int i; double x[100], y[100]; if((fp=fopen("fit.data", "r"))==NULL) { printf("Cannot open file."); exit(EXIT_FAILURE); } i=0; while(!feof(fp)) { fscanf(fp, "%lf\t%lf\n", &x[i], &y[i]); i++; } 6

fclose(fp);

Here we use the reversed process of writing into a file we read data from a file and store it in arrays in order to manipulate it. int j; double sumx, sumy, sumx2, sumxy, a, b, d2, rms; sumx=0; sumy=0; sumx2=0; sumxy=0; printf("Measured values for x and y:\n"); for(j=0;j<i;j++) { printf("%g %g\n", x[j], y[j]); sumx+=x[j]; sumy+=y[j]; sumx2+=x[j]*x[j]; sumxy+=x[j]*y[j]; } a=(sumx2*sumy-sumxy*sumx)/(i*sumx2-sumx*sumx); b=(sumx*sumy-i*sumxy)/(sumx*sumx-i*sumx2); What follows is to determine the coefficients a and b. To do this, we define several new variables where we are going to store the aforementioned sums. After running loops to compute these, we simply use the expressions for a and b to getthe fitted values we needed. d2=0; for(j=0;j<i;j++) { d2+=(y[i]-a-b*x[i])*(y[i]-a-b*x[i]); } rms=sqrt(d2/i); printf("a = %g, b = %g, rms = %g\n", a, b, rms); 7

exit(0); } The code finishes with computing the additional parameters that are to be printed and outputting the results. Section 3 polynomial curve fit #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <math.h>

double a[100][150], v[100], coef[10000], c[200]; Again the needed libraries are included. The difference in this case is that we need to declare the matrix and the vector for the system of simultaneous equations as global variables. int solveLinearEquations(int size) { double temp; int i,j,counter; double mat[size][size+1]; /* Merge the matrix A and the vector B into one matrix */ for (i=0; i<size; i++) { for (j=0; j<size;j++) mat[i][j] = a[i][j]; mat[i][size] = v[i]; } for (i=0;i<=size-1;i++) { temp=mat[i][i]; for (j=0;j<=size;j++) { mat[i][j]=mat[i][j]/temp; } 8

for (counter=i+1;counter<=size-1;counter++) { temp=mat[counter][i]; for (j=0;j<=size;j++) { mat[counter][j]=-1*temp*mat[i][j]+mat[counter][j]; } } } for (i=size-1;i>=0;i--) { for (counter=i-1;counter>=0;counter--) { temp=mat[counter][i]; for (j=0;j<=size;j++) { mat[counter][j]=-1*temp*mat[i][j]+mat[counter][j]; } } } for (i=0; i<size; i++) c[i] = mat[i][size]; return 0; } This function solves the simultaneous equations. It is important to note that this function was provided for download and its not my work. int main () { int n, m, i, j, k; char suffix[4]; printf("Please enter the number of points to be generated:\n"); scanf("%d", &n); printf("Please enter the degree of polinomial to be used:\n"); scanf("%d", &m); double e; printf("Please enter the experimental error for the data set:\n"); scanf("%lf", &e); for(i=0;i<=m;i++) 9

{ switch(i) { case 1: strcpy(suffix, "st"); break; case 2: strcpy(suffix, "nd"); break; case 3: strcpy(suffix, "rd"); break; default: strcpy(suffix, "th"); } printf("Please enter the value of %d%s coefficient:\n", i, suffix); scanf("%lf", &(coef[i])); } double step, x[200], r1, r2, y[200], s, g, fitted[200], d2, rms; srand48((unsigned int) time(NULL)); step=1.0/n; for(j=1;j<=n;j++) { x[j]=j*step; r1=drand48(); r2=drand48(); g=sqrt(-2*log(r1))*cos(2*M_PI*r2); s=0; for(k=0;k<=m;k++) { s+=coef[k]*pow(x[j],k); } y[j]=s+e*g; } This part of the source code copies the first program with the only difference that now the values for x and y arent stored in a file but directly in arrays so they can be used at later times in the program. Our next task is to store values in the matrix A and the vector v according to the relations we stated in the theoretical part:

10

=
This is the meanng of the following code:

printf("Please enter the degree of the polinomial to be fitted:\n"); int fitdegree; scanf("%d", &fitdegree); for(k=0;k<=fitdegree;k++) { v[k]=0; for(j=1;j<=n;j++) { v[k]+=y[j]*pow(x[j],k); } for(i=1;i<=fitdegree+1;i++) { a[k][i-1]=0; for(j=1;j<=n;j++) { a[k][i-1]+=pow(x[j],(i+k-1)); } } } Finally we have to apply the function it is going to store the coefficients in the array c. The output requires both the experimental and the fitted values of y and the r.m.s. of the fit as well. solveLinearEquations(fitdegree+1); printf("simulated data\t fitted value\n"); d2=0; for(i=1;i<=n;i++) { fitted[i]=0; 11

for(j=0;j<=fitdegree;j++) { fitted[i]+=c[j]*pow(x[i],j); } d2+=pow((y[i]-fitted[i]), 2); printf("%f\t%f\n", y[i], fitted[i]); } for(i=0;i<=fitdegree;i++) { printf("c%d = %f;\n", i, c[i]); } rms=sqrt(d2/n); printf("R.m.s. deviation of the fit: %f\n", rms); exit(0); } Conclusion The program works properly for a small experimental error (what would be the real case of an experiment) the coefficients agree to better than 5%. More interesting is the case with big error such as the proposed tasks 2) and 3). In that case the points are much more randomly spaced and sometimes a polynomial of higher degree appears to have smaller r.m.s. deviation, which means that its better fitted. However, for small errors the proper fit function is the one that is used to generate the set. Further work on the program would include a tool for plotting the points and the fitted polynomial on the same graph. References Handbook of the Physics Computing Course 2009-2010, University of Oxford, Sub-faculty of Physics

12

Source code for Section 1 #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <math.h>

int main () { int n, degree, i, j, k, l; char suffix[4]; printf("Please enter the number of points to be generated:\n"); scanf("%d", &n); printf("Please enter the degree of polinomial to be used:\n"); scanf("%d", &degree);

double c[degree+1], e; printf("Please enter the experimental error for the data set:\n"); scanf("%lf", &e); for(i=0;i<=degree;i++) { switch(i) { case 1: strcpy(suffix, "st"); break; case 2: strcpy(suffix, "nd"); break; case 3: strcpy(suffix, "rd"); break; default: strcpy(suffix, "th"); } printf("Please enter the value of %d%s coefficient:\n", i, suffix); scanf("%lf", &(c[i])); } double step, x, r1, r2, y, s, g; srand48((unsigned int) time(NULL));

13

FILE *fout; if((fout=fopen("fit.txt", "w"))==NULL) { printf("Cannot open the file."); exit(EXIT_FAILURE); } step=1.0/n; for(j=1;j<=n;j++) { x=j*step; r1=drand48(); r2=drand48(); g=sqrt(-2*log(r1))*cos(2*M_PI*r2); s=0; for(k=0;k<=degree;k++) { s+=c[k]*pow(x,k); } y=s+e*g; fprintf(fout, "%f %f\n", x, y); } fclose(fout); exit(0); }

14

Source code for Section 2 #include <stdio.h> #include <stdlib.h> #include <math.h>

int main () { FILE *fp; int i; double x[100], y[100]; if((fp=fopen("fit.data", "r"))==NULL) { printf("Cannot open file."); exit(EXIT_FAILURE); } i=0; while(!feof(fp)) { fscanf(fp, "%lf\t%lf\n", &x[i], &y[i]); i++; } fclose(fp); int j; double sumx, sumy, sumx2, sumxy, a, b, d2, rms; sumx=0; sumy=0; sumx2=0; sumxy=0; printf("Measured values for x and y:\n"); for(j=0;j<i;j++) { printf("%g %g\n", x[j], y[j]); sumx+=x[j];

15

sumy+=y[j]; sumx2+=x[j]*x[j]; sumxy+=x[j]*y[j]; } a=(sumx2*sumy-sumxy*sumx)/(i*sumx2-sumx*sumx); b=(sumx*sumy-i*sumxy)/(sumx*sumx-i*sumx2); d2=0; for(j=0;j<i;j++) { d2+=(y[i]-a-b*x[i])*(y[i]-a-b*x[i]); } rms=sqrt(d2/i); printf("a = %g, b = %g, rms = %g\n", a, b, rms); exit(0); }

16

Source code for Section 3 #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <math.h>

double a[100][150], v[100], coef[10000], c[200];

int solveLinearEquations(int size) { double temp; int i,j,counter; double mat[size][size+1]; /* Merge the matrix A and the vector B into one matrix */ for (i=0; i<size; i++) { for (j=0; j<size;j++) mat[i][j] = a[i][j]; mat[i][size] = v[i]; } for (i=0;i<=size-1;i++) { temp=mat[i][i]; for (j=0;j<=size;j++) { mat[i][j]=mat[i][j]/temp; } for (counter=i+1;counter<=size-1;counter++) { temp=mat[counter][i]; for (j=0;j<=size;j++) { mat[counter][j]=-1*temp*mat[i][j]+mat[counter][j]; } }

17

} for (i=size-1;i>=0;i--) { for (counter=i-1;counter>=0;counter--) { temp=mat[counter][i]; for (j=0;j<=size;j++) { mat[counter][j]=-1*temp*mat[i][j]+mat[counter][j]; } } } for (i=0; i<size; i++) c[i] = mat[i][size]; return 0; }

int main () { int n, m, i, j, k; char suffix[4]; printf("Please enter the number of points to be generated:\n"); scanf("%d", &n); printf("Please enter the degree of polinomial to be used:\n"); scanf("%d", &m); double e; printf("Please enter the experimental error for the data set:\n"); scanf("%lf", &e); for(i=0;i<=m;i++) { switch(i) { case 1: strcpy(suffix, "st"); break; case 2: strcpy(suffix, "nd"); break; case 3: strcpy(suffix, "rd"); break; default: strcpy(suffix, "th");

18

} printf("Please enter the value of %d%s coefficient:\n", i, suffix); scanf("%lf", &(coef[i])); } double step, x[200], r1, r2, y[200], s, g, fitted[200], d2, rms; srand48((unsigned int) time(NULL)); step=1.0/n; for(j=1;j<=n;j++) { x[j]=j*step; r1=drand48(); r2=drand48(); g=sqrt(-2*log(r1))*cos(2*M_PI*r2); s=0; for(k=0;k<=m;k++) { s+=coef[k]*pow(x[j],k); } y[j]=s+e*g; } printf("Please enter the degree of the polinomial to be fitted:\n"); int fitdegree; scanf("%d", &fitdegree); for(k=0;k<=fitdegree;k++) { v[k]=0; for(j=1;j<=n;j++) { v[k]+=y[j]*pow(x[j],k); } for(i=1;i<=fitdegree+1;i++) {

19

a[k][i-1]=0; for(j=1;j<=n;j++) { a[k][i-1]+=pow(x[j],(i+k-1)); } } } solveLinearEquations(fitdegree+1); printf("simulated data\t fitted value\n"); d2=0; for(i=1;i<=n;i++) { fitted[i]=0; for(j=0;j<=fitdegree;j++) { fitted[i]+=c[j]*pow(x[i],j); } d2+=pow((y[i]-fitted[i]), 2); printf("%f\t%f\n", y[i], fitted[i]); } for(i=0;i<=fitdegree;i++) { printf("c%d = %f;\n", i, c[i]); } rms=sqrt(d2/n); printf("R.m.s. deviation of the fit: %f\n", rms); exit(0); }

20

You might also like