2

I wrote a n-section algorithm for finding roots of a function. The working principle is exactly the same as is bisection method, only the range is divided into N equal parts instead. This is my C code:

/*
    Compile with: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops
*/

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>

#define N 6

#ifndef COUNT
#error COUNT not defined!
#endif

#ifndef NSECT
#define NSECT 2
#endif

float polynomial[N];
float horner( const float poly[N], float x )
{
    float val = poly[N-1];
    for ( int i = N - 2; i >= 0; i-- )
        val = poly[i] + x * val;
    return val;
}

float f( float x )
{
    return horner( polynomial, x );
}

float nsect( float a, float b, const float eps_x )
{
    float fa = f( a );
    float fb = f( b );
    if ( fa == 0 ) return a;
    else if ( fb == 0 ) return b;
    else if ( fa * fb > 0 ) return 0;

    float x[NSECT];
    float fx[NSECT];

    while ( b - a > eps_x )
    {   
        x[0] = a;
        if ( ( fx[0] = f( x[0] ) ) == 0 ) return x[0];

        int found = 0;
        for ( int i = 0; i < NSECT - 1; i++ )
        {
            x[i + 1] = a + ( b - a ) * (float)( i + 1 ) / NSECT;
            if ( ( fx[i + 1] = f( x[i + 1] ) ) == 0 )
                return x[i + 1];
            else if ( fx[i] * fx[i + 1] < 0 )
            {
                a = x[i];
                b = x[i + 1];
                found = 1;
                break;
            }
        }

        if ( !found )
            a = x[NSECT - 1]; 

    }

    return ( a + b ) * 0.5f;
}

int main( int argc, char **argv )
{
    struct timeval t0, t1;
    float *polys = malloc( COUNT * sizeof( float ) * N );
    float *p = polys;
    for ( int i = 0; i < COUNT * N; i++ )
        scanf( "%f", p++ ); 

    float xsum = 0; // So the code isn't optimized when we don't print the roots
    p = polys;
    gettimeofday( &t0, NULL );
    for ( int i = 0; i < COUNT; i++, p += N )
    {
        memcpy( polynomial, p, N * sizeof( float ) );
        float x = nsect( -100, 100, 1e-3 );
        xsum += x;

        #ifdef PRINT_ROOTS
            fprintf( stderr, "%f\n", x );
        #endif
    }
    gettimeofday( &t1, NULL );

    fprintf( stderr, "xsum: %f\n", xsum );
    printf( "%f ms\n", ( ( t1.tv_sec - t0.tv_sec ) * 1e6 + ( t1.tv_usec - t0.tv_usec ) ) * 1e-3 );
    free( polys );
}

EDIT: This is the command I used to compile the code: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops. I ran everything on i7-8700k.

I decided to test the algorithm performance for different N values. The test consists of measuring time needed to find any root in range (-100;100) for each of 5,000,000 polynomials of degree 5. The polynomials are randomly generated and have real coefficients ranging from -5 to 5. The polynomial values are calculated using Horner's method.

These are results I got from running the code 10 times for each N (x=N, y=time [ms]):

My understanding of the worst case performance here is that the amount of work to be done in the main while loop is proportional to N. The main loop needs logN(C) (where C > 1 is a constant - ratio of initial search range to requested accuracy) iterations to complete. This yields following equation:

Equation

The plot looks very similar to the violet curve I used above to roughly match the data:

Now, I have some (hopefully correct) conclusions and questions:

  • Firstly, is my approach valid at all?
  • Does the function I came up with really describe the relation between number of operations and N?
  • I think this is the most interesting one - I'm wondering what causes such significant difference between N=2 and all the others? I consistently get this pattern for all my test data.

Moreover:

  • That function has minimum in e≈2.718..., which is closer to 3 than to 2. Also, f(3) < f(2) holds. Given that the equation I came up with is correct, I think that would imply that trisection method may actually be more efficient than bisection. It seems a bit counter intuitive, but the measurements seem to acknowledge that. Is this right?
  • If so, why does trisection method seem to be so unpopular compared to bisection?

Thank you

22
  • These root solvers are essentially, after averaging over many random functions and scaling the initial interval to have length 1, listing the digits of a number (the root to which they are converging) in base N. The numeric system that is most efficient is the one of base 3. Commented Sep 20, 2019 at 19:11
  • 1
    A bisection algorithm is structurally very simple, just like bubblesort among the sort algorithms. For a trisection implementation you need at least two branchings to select the next interval. If you are that concerned about the performance of bisection, a more striking improvement is obtained by selecting the midpoint using the function values as the secant root etc., like in regula falsi and its variants, then Dekker's method and the widely used Brent's method. Commented Sep 20, 2019 at 19:22
  • 1
    @PeterCordes I specified the build command in the comment at the top of the source code. It's not really visible there - sorry for that. I used clang compiler and -O3 optimization. I didn't, however, try using -ffast-math and enabling vectorization. My CPU is i7-8700k.
    – Jacajack
    Commented Sep 21, 2019 at 7:54
  • 2
    In the inner loop you are recomputing fa == f(x[0]) and possibly fb == f(x[NSECT]). This has a massive influcence on the runtime in the bisection and trisection cases, not so much in the higher divisions. Commented Sep 23, 2019 at 10:29
  • 1
    @LutzL I didn't realize how huge the difference caused by that can be. I optimized the code, so it reuses f(x) values from previous iterations and N=2 is now comparable to all the others. Thank you for pointing that out!
    – Jacajack
    Commented Sep 23, 2019 at 14:49

1 Answer 1

3

I am commenting on the general question, not on your particular code, which I don't fully understand.

Assuming that there is a single root known to be in an interval of length L and the desired accuracy is ε, you will need log(L/ε)/log(N) subdivision stages. Every subdivision stage requires N-1 evaluations of the function (not N), to tell which subinterval among N contains the root.

Hence, neglecting overhead, the total cost is proportional to (N-1)/log(N). The values of this ratio are, starting from N=2:

1.44, 1.82, 2.16, 2.49, 2.79...

and higher.

Hence the theoretical optimum is with N=2. This is why trisection is not used.

1
  • 2
    I optimized the code a bit, and there's no longer a spike for N=2. Now (N-1)/log(N) seems to be the right way to describe the complexity indeed. Thank you for your explanation
    – Jacajack
    Commented Sep 23, 2019 at 15:02

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.