Bigolittleo

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

LITTLE oS AND BIG OS.

Suppose you have a function f (x) with f (a) = 0 and you want to consider how quickly the function goes to zero around a. Then ideally you would want to nd a simple function g (for example g(x) = (x a)n ) which also vanishes at a such that g and f are almost equal around a. The little o and big O notation, want to express something like that, but only state that f goes to zero faster than g. For error terms this is of course sucient (you just want to know that the error term is small), so they are used mostly in that context. Denition 1. We say f (x) = O(g(x)) as x a if there exists a constant M such that |f (x)| M |g(x)| in some punctured neighborhood of a, that is for x (a , a + ) \ {a} for some value of . We say f (x) = o(g(x)) as x a if limxa f (x) = 0. This implies that there g(x) exists a punctured neighborhood of a on which g does not vanish. As notation this means that whenever we write O(x4 ) we mean an unspecied function h(x) which satises the requirement that h(x) = O(x4 ). Likewise, whenever we write o(x4 ) you should read that as denoting a further unspecied function h(x) which satises h(x) = o(x4 ). This means that we dont quite know what this error term is, but we have singled out its most important property (how small it is around the point we are looking at). This allows us to write the error term for the Taylor polynomial approximation in a simple way Theorem 1. Let f : U R be an (n + 1)-times continuously dierentiable map. Let Pn (x) = then
n f (k) (a) k=0 k! (x

a)k be the n-th order Taylor polynomial of f at a,

f (x) = Pn (x) + O((x a)n+1 ). If f : U R is only n-times continuously dierentiable, then we nd f (x) = Pn (x) + o((x a)n ). Inversely, suppose Qn (x) is a polynomial of degree at most n, such that f (x) = Qn (x) + o((x a)n ) then Qn is the n-th order Taylor polynomial of f around a. In writing this down you get to choose what the value of n is: the larger you choose n, the better your approximation of the function is (the error term becomes smaller, at least around the point a), but the more complicated your approximation is (it is a polynomial of higher degree). Let us now write down some examples; the rst two are derived from Taylor polynomials, the rest can be checked directly. 1 1 We have ex = 1 + x + 2 x2 + 6 x3 + O(x4 ) as x 0; 1 We have 1x = 1 + x + x2 + O(x3 ) = 1 + x + x2 + o(x2 ) as x 0; We have |x|3 = O(x3 ) = o(x2 ) as x 0;
1

LITTLE OS AND BIG OS.

We have cosh(x) = O(ex ) = o(e 4 x ) as x ; We have 1/ sin(x) = O(1/x) = o(1/x3/2 ) as x 0; The dierence between big O and little o is exemplied here. o(x2 ) is smaller than O(x2 ), so using little o tells you more than using big O; but in all these examples we had to put bigger functions after the o than after the O, in which case it says less after all. That is saying the error is o(x2 ) is stronger than saying the error is O(x2 ) (as x 0), but unfortunately, often you have to choose between saying the error is o(x2 ) or O(x3 ) (as x 0), in which case the latter is the strongest statement. We want to use these error behaviors to simplify our calculations. Therefore it is vital that we can simply calculate with them. The following rules hold for calculating with these errors Theorem 2. (1) f (x) = O(f (x)); (2) If f (x) = o(g(x)) then f (x) = O(g(x)); (3) If f (x) = O(g(x)) then O(f (x)) + O(g(x)) = O(g(x)); (4) If f (x) = O(g(x)) then o(f (x)) + o(g(x)) = o(g(x)); (5) Let c = 0 then cO(g(x)) = O(g(x)) and c o(g(x)) = o(g(x)); (6) O(f (x))O(g(x)) = O(f (x)g(x)); (7) o(f (x))O(g(x)) = o(f (x)g(x)); 1 1 (8) If g(x) = o(1) then 1+o(g(x)) = 1 + o(g(x)), and 1+O(g(x)) = 1 + O(g(x)). For example the fourth rule should be read as the sum of any two functions h and k, where h = o(f (x)) and k = o(g(x)) is itself h + k = o(g(x)) (as long as f (x) = O(g(x))). Do not ever divide simply by some o(g(x)) or O(g(x)), as we only know these terms are small, they could be 0. The proofs are included for completeness, they are not that interesting, and you may just skim or skip them. Proof. The proofs are all really basic, consisting mostly of painstakingly checking the denitions. (1) Indeed |f (x)| 1 |f (x)|, i.e. you can just use the bound M = 1; (2) Indeed if limxa f (x)/g(x) = 0 then there for = 1 there exists a , such that if 0 < |x a| < , then |f (x)/g(x)| < 1, so |f (x)| < |g(x)|. (3) If h(x) = O(f (x)) and j(x) = O(g(x)), then there exist constants M1 , M2 , and M3 such that |h(x)| M1 |f (x)|, |j(x)| M2 |g(x)|, and |f (x)| M3 |g(x)| in a neighborhood around a. We then conclude that |h(x) + j(x)| |h(x)| + |j(x)| M1 |f (x)| + M2 |g(x)| M1 M3 |g(x)| + M2 |g(x)| = (M1 M3 + M2 )|g(x)|. Thus h(x) + j(x) = O(g(x)); (4) Similarly if h = o(f (x)) and j = o(g(x)) then (with again M3 such that |f (x)| M3 |g(x)| in a neighborhood of a) we have |h(x)/g(x)| = |h(x)/f (x)||f (x)/g(x)| M3 |h(x)/f (x)|, so if limxa h(x)/f (x) = 0 we have limxa h(x)/g(x) = 0 by the squeezing lemma. We conclude limxa (h(x) + j(x))/g(x) = limxa h(x)/g(x) + limxa j(x)/g(x) = 0 + 0 = 0; (5) If h(x) = O(g(x)) then there exists M such that |h(x)| M |g(x)|, so |ch(x)| = c|h(x)| cM |g(x)|, so ch(x) = O(g(x)) as well. Likewise if h(x) = o(g(x)), then we have limxa ch(x)/g(x) = c limxa h(x)/g(x) = c 0 = 0, so ch(x) = o(g(x)). (6) If h(x) = O(f (x)) and j(x) = O(g(x)) there exist M1 and M2 such that in a neighborhood of a we have |h(x)| M1 |f (x)| and |j(x)| M2 |g(x)|.

LITTLE oS AND BIG OS.

We conclude that in this neighborhood we have |h(x)j(x)| = |h(x)||j(x)| M1 M2 |f (x)||g(x)| = M1 M2 |f (x)g(x)|. Thus h(x)j(x) = O(f (x)g(x)); (7) If h(x) = o(f (x)) and j(x) = O(g(x)) then we know that limxa h(x)/f (x) = 0 and that j(x)/g(x) is bounded. By a previous homework problem we have h(x) j(x) seen that this implies that limxa f (x) g(x) = 0 as well. (8) To prove 1/(1 + o(g(x))) = 1 + o(g(x)), suppose h(x) = o(g(x)), then lim
1 1+h(x)

1 = lim
xa

1(1+h(x)) 1+h(x)

xa

g(x)

g(x)

= lim

h(x) 1 g(x) 1 + h(x) h(x) 1 1 = lim =0 =0 xa g(x) 1 + h(x) g(x) 1+00


xa g(x)

In the last step we used that we know the limits of h(x)/g(x) and of g(x)/1 1 as x a and that these limits equal 0. In particular we see that 1+h(x) 1 = o(g(x)). Now if h(x) = O(g(x)) we choose M such that |h(x)| M |g(x)| in a neighborhood of a. Then we get h(x) 1 1 1= = h(x) h(x) 1 + h(x) 1 + h(x) 1 + g(x) g(x) Since g(x) = o(1) we see that |g(x)| < of a. Then
1 h(x) 1+ g(x) g(x) h(x) g(x) g(x) 1 2M

in a close enough neighborhood

< M

1 2M

1 2

in a neighborhood around a. Thus

< 2 in this neighborhood. We conclude that in this neighbor 1 < 2|h(x)| < 2M |g(x)|, so it is O(g(x)).

hood

1 1+h(x)

In the case the functions f and g are polynomials these rules simplify to the following Proposition 1. Around 0 we have (1) xa = O(xb ) for all b a and xa = o(xb ) for all b < a; (2) O(xa ) + O(xb ) = O(xmin(a,b) ), o(xa ) + o(xb ) = o(xmin(a,b) ), and O(xa ) + o(xb ) = o(xb ) O(xa ) b<a ba

(3) cO(xa ) = O(xa ) and c o(xa ) = o(xa ); (4) xb O(xa ) = O(xa+b ), and xb o(xa ) = o(xa+b ); (5) O(xa )O(xb ) = O(xa+b ), and O(xa )o(xb ) = o(xa+b ), and o(xa )o(xb ) = o(xa+b ) This allows us to nd some simple rules for calculating Taylor polynomials. Theorem 3. If f and g are n times continuously dierentiable and Pn (x), respectively Qn (x), is the n-th order Taylor polynomial of f , respectively g, around a, then the n-th order Taylor polynomial of f (x)g(x) is given by Pn (x)Qn (x) with the terms (x a)s with s > n omitted.

LITTLE OS AND BIG OS.

Proof. Write Rn (x) for the n-th degree polynomial obtained by removing the terms (x a)s with s > n from Pn (x)Qn (x). We see that Rn (x) = Pn (x)Qn (x) + O((x a)n+1 ) = Pn (x)Qn (x) + o((x a)n ). Now we see that f (x) = Pn (x) + o((x a)n ) and g(x) = Qn (x) + o((x a)n ). Hence f (x)g(x) = Pn (x)Qn (x)+Pn (x)o((xa)n )+Qn (x)o((xa)n )+o((xa)n )o((xa)n ) = Rn (x)o((xa)n )+o((xa)n )+o((xa)n )+o((xa)2n ) = Rn (x)+o((xa)n ). Thus we can use the inverse characterization of the Taylor polynomials to conclude that Rn is the n-th order Taylor polynomial of f (x)g(x) around a. In a similar way we get the Taylor polynomial for compositions of functions. Theorem 4. Suppose f and g are n times continuously dierentiable. Let Pn (x) be the n-th order Taylor polynomial of f around g(a) and Qn (x) the n-th order Taylor polynomial of g around a. Then the n-th degree polynomial obtained by removing terms (x a)s for s > n from Pn (Qn (x)) is the n-th order Taylor polynomial of f (g(x)). Proof. Write Rn (x) for the polynomial obtained by removing terms (x a)s for s > n from Pn (Qn (x)). Then Rn (x) = Pn (Qn (x)) + o((x a)n ) as before. Now we see that f (g(x)) = Pn (Qn (x) + o((x a)n )) + o((Qn (x) g(a))n ) First we notice that Qn (a) = g(a), as Qn is the Taylor polynomial of g around a, so Qn (x) g(a) = O(x a). Therefore we get that o((Qn (x) g(a))n ) = o((x a)n ). Then we observe that Pn (Qn (x) + o((x a)n )) equals a sum of terms of the form (x a)k o((x a)n )l for some k, l 0. Thus either l = 0, there are no o((x a)n ) terms, and we get some term from Pn (Qn (x)), or l > 0 and this term is o((x a)n ) (or even smaller). We conclude that Pn (Qn (x) + o((x a)n )) = Pn (Qn (x)) + o((x a)n ). Together we get that f (g(x)) = Pn (Qn (x)) + o((x a)n ) = Rn (x) + o((x a)n ). By the inverse characterization of the Taylor polynomials we see that Rn is the n-th order Taylor polynomial of f (g(x)). Let us now give some examples.
1 sin(x) cos(x): We know sin(x) = x x + o(x4 ) and cos(x) = 1 2 x2 + 6 1 4 4 4 24 x + o(x ). Thus their product is (where we throw all terms o(x ) and smaller in one pile in the second step)
3

sin(x) cos(x) = (x

x3 1 1 + o(x4 ))(1 x2 + x4 + o(x4 )) 6 2 24 1 1 x3 1 1 7 x3 = x x3 + x5 + x o(x4 ) + x5 x o(x4 ) 2 24 6 12 144 6 1 2 1 4 4 4 4 4 4 + o(x ) x o(x ) + x o(x ) + o(x )o(x ) 2 24 1 3 x3 2 =x x + o(x4 ) = x x3 + o(x4 ) 2 6 3 2 Thus the fourth order Taylor polynomial is x 3 x3 . In further calculations I will immediately remove the terms which combine into the error term (in

LITTLE oS AND BIG OS.

this case o(x4 )). Another, lot quicker way of calculating this is by using the doubling formula for the sine. sin(x) cos(x) = 1 1 (2x)3 8 3 sin(2x) = ((2x) + o((2x)4 )) = x x + o(x4 ). 2 2 6 26 Fortunately the result remains the same. cos( 1 + log(1 + x) 1) around x = 0. You dont really want to dierentiate this function a few times to calculate its Taylor polynomial. Fortunately we can rst write (dierentiating log(1 + x)) 1 1 1 log(1 + x) = x x2 + x3 x4 + o(x4 ) 2 3 4

Then taking the square root we see that (dierentiating 1 + u) 1 1 1 5 4 1 + u = 1 + u u2 + u3 u + o(u4 ) 2 8 16 128 Plugging in u = log(1 + x) gives us 1 + log(1 + x) = 1 + + 1 2 1 16 1 1 1 1 x x2 + x3 x4 + o(x4 ) 2 3 4 8
3

1 1 1 x x2 + x3 x4 + o(x4 ) 2 3 4

1 1 5 1 1 1 1 x x2 + x3 x4 + o(x4 ) x x2 + x3 x4 + o(x4 ) 2 3 4 128 2 3 4 1 1 1 + o((x x2 + x3 x4 + o(x4 ))4 ) 2 3 4 1 1 1 1 1 11 =1+ x x2 + x3 x4 + o(x4 ) x2 x3 + x4 + o(x4 ) 2 2 3 4 8 12 1 3 5 + x3 x4 + o(x4 ) x4 + o(x4 ) + o(x4 ) 16 2 128 1 1 1 1 1 1 1 1 = 1 + x + ( 1)x2 + ( + 1 + 1)x3 2 2 2 8 2 3 8 16 1 1 1 11 1 3 5 + ( 1)x4 + o(x4 ) 2 4 8 12 16 2 128 1 3 17 143 4 = 1 + x x2 + x3 x + o(x4 ). 2 8 48 384 Thus we subtract 1 and plug this in in 1 1 cos(v) = 1 v 2 + v 4 + o(v 4 ) 2 24 to get (note o(v 4 ) = o(x4 ), so we can simplify that term immediately) cos( 1 + log(1 + x) 1) = 1 + 1 2 1 3 17 143 4 x x2 + x3 x + o(x4 ) 2 8 48 384
2

1 1 3 17 143 4 x x2 + x3 x + o(x4 ) + o(x4 ) 24 2 8 48 384 95 4 1 1 4 1 1 2 3 3 =1 x x + x + o(x4 ) + x + o(x4 ) + o(x4 ) 2 4 8 192 24 16 1 3 47 4 = 1 x2 + x3 x + o(x4 ) 8 16 192

LITTLE OS AND BIG OS.

Admittedly the calculation here was somewhat tedious (the complexity quickly increases for higher orders of approximation, if you want just the third order term we already have to do less than half the work), but we did not have to calculate any ugly derivatives. I challenge you to try and dierentiate cos( 1 + log(1 + x) 1), which might well be more work. Moreover the power of the method lies in the fact that, as long as you know low order approximations of the basic functions appearing, you know it of all functions you can make with these functions. It should be noted that not all relevent kinds of behavior as x 0 can be expressed by powers of x; Other terms you will often see are o(log(x)k ). Around innity the relevant behaviors are, apart from polynomial growth/decay (O(xn )) (growth for n > 0, decay for n < 0), exponential growth/decay O(eax ) (for a > 0, respectively a < 0), and logarithmic growth/decay (O(log(x)n )). The exponential behaviour is growing or decaying fastest, while logarithmic is slowest, with polynomial growth somewhere in between. If a, n, k > 0 are positive numbers we get the sequence (with the left hand side being the smallest and the right hand side the largest). O(eax ) = O(xn ) = O(log(x)k ) = O(1) = O(log(x)k ) = O(xn ) = O(eax ). That is: any function that is O(eax ) is also O(xn ), and any function that is O(xn ) is also O(log(x)k ), etc. (But not the other way around!)

You might also like