Lectures On Approximation by Polynomials: J.G. Burkill

Lectures On
Approximation By Polynomials
By
J.G. Burkill
No part of this book may be reproduced in any

form by print, microfilm or any other means
without written permission from the Tata Insti-
tute of Fundamental Research, Apollo Pier Road,
Bombay-1
Tata Institute of Fundamental Research,

Bombay
1959
Contents
1 Weierstrass’s Theorem 1
1 Approximation by Polynomials . . . . . . . . . . . . . . 1
2 Singular Integrals and Landau’s Proof . . . . . . . . . . 4
3 Bernstein Polynomials . . . . . . . . . . . . . . . . . . 7
2 The Polynomial of Best Approximation... 11

4 The Lagrange Polynomial . . . . . . . . . . . . . . . . . 11
5 Best Approximation . . . . . . . . . . . . . . . . . . . . 12
6 Chebyshev polynomials . . . . . . . . . . . . . . . . . . 16
3 Approximations to |x| 19
7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
8 Oscillating polynomials . . . . . . . . . . . . . . . . . . 20
9 Approximation to |x| . . . . . . . . . . . . . . . . . . . 25
4 Trigonometric Polynomials 29
10 Trigonometric polynomials. Modulus of Continuity . . . 29
11 Fourier and Fejer Sums . . . . . . . . . . . . . . . . . . 33
5 Inequalities, etc. 41
12 Bernstein’s and Markoff’s Inequalities . . . . . . . . . . 41
13 Structural Properties Depend on the... . . . . . . . . . . 45
14 Divergence of the Lagrange Sequence . . . . . . . . . . 47
iii
iv Contents
6 Approximation in Terms of Differences 51

15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
16 Definition and Properties of the nth Difference . . . . . . 53
1 Runge’s Theorem . . . . . . . . . . . . . . . . . . . . . 59
2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 63
3 Best Approximation . . . . . . . . . . . . . . . . . . . . 66
Chapter 1
Weierstrass’s Theorem
1 Approximation by Polynomials
n
ar xr is that its value for 1
P
A basic property of a polynomial P(x) =
0
a given x can be calculated (e.g. by a machine) in a finite number of
steps. A central problem of mathematical analysis is the approximation
to more general functions by polynomials an the estimation of how small
the discrepancy can be made. A discussion of this problem should be
included in any University course of analysis. Not only are the results
important, but their proofs admirably illustrate a number of powerful
methods.
This account will be confined to the leading theorems, stated in their
fundamental rather than their most general forms. There are many ex-
cellent systematic presentations in the literature, to which this may serve
as an introduction.
Variables and functions will be real. We say that f (x) is C(a, b)
meaning that f (x) is continuous for a ≤ x ≤ b.p(x) or q(x) always
denotes a polynomial; pn (x) is a polynomial of degree at most n. In this
course, the goodness (or badness!) of the fit of a particular polynomial
p(x) to the function f (x) will always be measured by
sup | f (x) − p(x)|,
where the sup. is taken over a ≤ x ≤ b.
1
2 1. Weierstrass’s Theorem
There are other useful ways of defining a ‘distance’ between f (x)

and p(x), e.g.
Z b
f (x) − p(x) 2 dx,

a
but we shall not deal with them here.
2 The interval (a, b) will commonly be taken to be (0, 1) or (−1, 1) as
may be convenient in particular context; there will be no loss of gener-
ality. Our enquiry is restricted to finite intervals. The numbers ǫ will
always be supposed greater than 0. The Halmos symbol /// denotes the
end of a proof.
Theorem 1 (Weierstrass 1885). If f (x) is C(a, b), then, given ε, we can
find p(x) such that
sup | f (x) − p(x)| < ε.
This is the fundamental theorem of the subject. An alternative state-
ment of it is that a continuous function is the sum of a uniformly conver-
gent series of polynomials. For let pn1 (x), pn2 (x), · · · (n1 ≤ n2 ≤ · · · ) be
1
polynomials corresponding to ε, ε, . . . , ε/2n . . .. Then the series
2

pn1 (x) + pn2 (x) − pn1 (x) + · · ·
converges uniformly to f (x).

We shall give three proofs of Weierstrass’s theorem. The first and
simplest is that of Lebesgue (1898). It is based on a polynomial ap-
proximation to the particular function |x| in (−1, 1). We shall study this
function closely in Chapter III, and shall learn a lot from it.
Lemma. There is a sequence of polynomials converging uniformly to |x|
for −1 ≤ x ≤ 1.
√
Proof. If u = 1 − x2 . then |x| = (1 − u), and 0 ≤ u ≤ 1 corresponds to
1 ≥ |x| ≥ 0.
√
3 (1 − u) has a binomial expansion in which the term in un is −cn un
where
1.3.5 . . . (2n − 3)
cn = (n ≥ 2)
2.4.6 . . . 2n
1. Approximation by Polynomials 3
We can prove that this series, which certainly converges for |u| < 1,
also converges for u = 1. This follows either from Gauss’s test applied
to !
cn 3 1
=1+ +0 2
cn+1 2n n
or by proving (on the lines of the Lemma following Theorem 2) that
cn ∼ n√An .
√
By Abel’s limit theorem, the series for (1 − u) converges uniformly
for 0 ≤ u ≤ 1, i.e., |x| is uniform limit of a sequence of polynomials for
−1 ≤ x ≤ 1.
Corollary. Let
g(x) = 0 for x < 0
g(x) = 0 for 0 ≤ x ≤ k.
Then g(x) is the limit of a uniformly convergent sequence of polyno-
mials in −k ≤ x ≤ k.
Proof. Changing the variable by a factor k, we may suppose that k is 1.
Then
1
2 g(x) = (x + |x|).
2
Proof of theorem 1. Given ε, we can find a function l(x) whose graph
is a polygon with vertices at (a, y0 ), (x1 , y1 ), . . . , (xi , yi ), . . . , (b, yn ) such
that
1
| f (x) − l(x)| < ε.
2
Now l(x) is the sum of constant multiples of functions of the type 4
g(x − xi ) defined in the Corollary, namely,
n−1
X
l(x) = y0 + ci g(x − xi ).
0
For the right hand side is linear in each (xi , xi+1 ), and the ci give the
right value of l(x) at the vertices if
y1 =y0 + c0 (x1 − x0 )
······
i−1
X
yi =y0 + c(xi − xk ).
k=0
By the lemma and corollary, we can find a polynomial p(x) such that
1
|l(x) − p(x)| < ε, a ≤ x ≤ b
2
and this gives | f (x) − p(x)| < ε, a ≤ x ≤ b.
2 Singular Integrals and Landau’s Proof

Weierstrass’s own proof of Theorem 1 rested on the limit as n → ∞ of
the ‘singular integral’
Z ∞
n n o
√ exp −n2 (t − x)2 f (t)dt.
π −∞
The essence of the argument is that, if n is large, the exponential
‘kernel’ is small except in a small interval round t = x, and so the inte-
gral is nearly equal to f (x). This integral is not, however, a polynomial
in x and, to complete the proof, Weierstrass had to approximate to the
exponential by the sum of a finite number of terms of its series. A natu-
ral step, taken, by Landau and by de la Vallee Poussin, was to start with
a singular integral which is a polynomial in x. An appropriate kernel to
replace Weierstrass’s exponential factor is
1 − (t − x)2 n

5 which (for large n) falls away rapidly from the value 1 as t moves away
form x.We need a theorem about the convergence of singular integrals,
and this is best stated for a general kernel Kn (t − x).
Theorem 2. Let
Z 1
Jn = Kn (u)du
−1
Z δ
Ln (δ) = Kn (u)du (0 < δ < 1)
−δ
2. Singular Integrals and Landau’s Proof 5
Suppose that
(i) Kn (u) ≥ 0
(ii) for each fixed δ, Ln (δ)/Jn → 1, as n → ∞.
Suppose that f (x) is C(0, 1) and 0 < a < b < 1. Then, as n → ∞,
Z1
1
In (x) = Kn (t − x) f (t)dt → f (x)
Jn
0
uniformly for a ≤ x ≤ b.
Proof. In In (x), we shall split up the integral over (0, 1)
(Z x−δ Z x+δ Z 1 )
1
(1) In (x) = + + ,
Jn 0 x−δ x+δ
where 0 < x − δ < x + δ < 1. Consider first the integral over (x − δ, x + δ).
Given ε, we can, by the continuity of f (x), find δ = δ(ε) such that
| f (t) − f (x)| < ε if a ≤ x ≤ b, |t − x| ≤ δ.

Suppose further that δ < min(a, 1 − b). Then the middle term on the
R. H. S. of (1)
Z δ
1
= Kn (u) f (x + u)du
Jn −δ
Z δ
Ln (δ) 1
= f (x) + Kn (u) f (x + u) − f (x) du.
Jn Jn −δ
The first term in the last line tends to f (x), from (ii) of the hypothe- 6
sis. The second term is, by (i), numerically less than εLn (δ)/Jn , that is,
less than ε.
Now return to equation (1) and consider the first term on the R. H.
S. Let M = sup | f (x)|in (0, 1).
Z x−δ
M −δ
Z
1
| Kn (t − x) f (t)dt| ≤ Kn (u)du
Jn 0 Jn −x
( )
Ln (δ)
≤ M 1−
Jn
→ 0 as n → ∞.
A similar estimate holds for the third term of (1).
All the inequalities in the above argument are independent of x, and,
collecting the results, we have proved that In (x) → f (x) uniformly for
a ≤ x ≤ b.
If, in Theorem 2, we take, following Landau
Kn (u) = (1 − u2 )n ,
then In (x) is a polynomial in x of degree 2n. We have, therefore, a
second proof of Theorem 1 as soon as we have proved, as we do in the
following Lemma, that this Kn (u) satisfies the conditions of Theorem 2.
Lemma. In Theorem 2, Kn (u) may be taken to be (1 − u2 )n .
Proof.
1
1 2 0π
Z Z
2 n
Jn = (1 − u ) du = 2 0 sin2n+1 θdθ
−1
= 2S 2n+1 , say.
2.4 . . . 2n
S 2n+1 =
3.5 . . . (2n + 1)
7 From the inequalities
S 2n > S 2n+1 > S 2n+2 ,
it is easily proved that
r r
π π
Jn ∼ and Jn > .
n n+1

Then
R1
Ln (δ) 2 δ (1 − u2 )n du
1− =
Jn Jn
r
n+1
< 2(1 − δ2 )n → 0 as n → ∞.
π
3. Bernstein Polynomials 7
3 Bernstein Polynomials
We shall give a third proof of Theorem 1. It has the advantage of em-
bodying a definite construction for the approximating polynomials.
Definition. Write ln,m (x) = (nm )xm (1 − x)n−m , 0 ≤ m ≤ n. The nth Bern-
stein polynomials of f (x) in (0, 1) is defined to be
n
X
Bn (x) = Bn ( f ; x) = f (m/n)ln,m (x).
m=0
Bn (x) has degree n (at most).

Theorem 3. Let f (x) be C(0, 1). Then, as n → ∞, Bn (x) → f (x) uni-
formly.
Note. We can see what underlies this. ln,m (x) has a maximum at x =
m/n. So the terms of Bn (x) for which m/n is near to x are those which
contribute most. It is, in fact, the analogue for a finite sum of the ’singu-
lar integral’ notion. Then two schemes, for sum and integral, could be
combined into one by using a Steltjes integral.
Lemmas on ln,m (x).
The sums on the R.H.S. being taken for values of m such that 0 ≤ 8
m ≤ n,
X
1= ln,m (x)
X
nx = mln,m (x)
X
nx(1 − x) = (nx − m)2 ln,m (x).
Proof. With a view to differentiating with regard to y, we write
X
e + (1 − x) n =
y
(nm )emy (1 − x)n−m

Put ey = x and we have the first result. Differentiate with regard to y and
put ey = x and we have second. Differentiating again gives
X
nx + n(n − 1)x2 = m2 ln,m (x)
Multiply the three equations in turn by n2 x2 , −2x, 1 and add. This gives
the third result in the lemma.
Proof of theorem 2. Given ε, there is δ such that | f (x1 ) − f (x2 )| < ε if

|x1 − x2 | < δ. Now,
n
X
f (x) − Bn (x) = f (x) − f (m/n) ln,m (x).
m=0
P
Divide the sum on the R.H.S. into parts: l taken over those values
m P P P
of m for which |x − | < δ, and 2 the rest. Then | 1 | ≤ ε 1 ln,m (x) ≤
n
Pn
ε ln.m (x) = ε. If M is sup | f (x)| in 0 ≤ x ≤ 1,
0
X X
| | ≤ 2M ln,m (x)
2 2
X (nx − m)2
≤ 2M ln,m (x)
2
n2 δ2
≤ 2Mnx(1 − x)/n2 δ2 , from the Lemma
≤ M/2nδ2
≤ ε + M/2nδ2 . Choose n > M/2εδ2

P P
9 So | f (x) − Bn (x)| ≤ | 1 |+| 2|
and the R.H.S. ≤ 2ε.
Remarks on Bernstein polynomials.

(1) They have applications to the theory of probability, moment prob-
lems and the summation of series. See Lorentz, Bernstein polyno-
mials, (Toronto 1953).
(2) In questions of polynomial approximation, it is a disadvantage that

the Bernstein polynomial of a polynomial pn (x) is not, in general,
pn (x), e.g.
1
for x2 , B2 (x) is x(1 + x)
2
1
for x(1 − x), B2 (x) is x(1 − x).
2
3. Bernstein Polynomials 9
For most of the useful systems of polynomials, the approximation

within the system to a given pn (x) is pn (x), e.g. with Legendre
polynomials,
1 2
x2 = P0 (x) + P2 (x).
3 3
Notes on Chapter I
Notes at the end of a chapter may include exercises (with hints for 10
solutions), extensions of the theorem and suggestions for further read-
ing.
1. In the Lemma of §1, prove that the p polynomial consisting of the

terms up to x2n in the expansion of 1 − (1 − x2 ) approximates

√
to |x| in (−1, 1) with a greatest error which ∼ A/ n.
1 1
2. Let f (x) = − |x − | in (0, 1). (This is an adaptation of |x| to
2 2
the interval (0, 1)). As in 1, investigate the order of magnitude of
1
the error at x = given by (a) the Landau singular integral, (b) the
2
Bernstein, approximations to f (x).
3. Theorem 1 can be extended to a function of two (or more) variables,

say f (x, y) for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Suggest a method of proof.
4. If f ′ (x) is continuous, then

d
Bn ( f ; x) → f ′ (x) uniformly.
dx
A similar result for the Landau integral.
5. Readers who like to place theorems on analysis in an abstract set-

ting will be interested in Stone’s extension of Theorem 1. See Math.
Magazine 21 (1948) 167 and 237, or Lorentz, 9, or Rudin, Principles
of Mathematical Analysis (New York 1953), 134.
Hints for 1 − 4
11 1. All the cn are positive n ≥ 2). Error is greatest when u = 1, i.e.,

∞
P
x = 0, and is cr .
n+1
∞
P A R ∞ dx √
This ∼ √ ∼A n √ ∼ A/ n.
n+1 r r x x
√
2. A/ n. For (b), approximate to factorials by Stirling’s formula.
3. Could use
R 1R 1n on n on
0 0
1 − (t − x)2 1 − (u − y)2 f (t, u)dtdu
R 2
1 2 )n dt
−1
(1 − t
or (with some labour extend Bernstein’s, as in P. L. Butzer, Cana-

dian Journal of Mathematics, 5(1953), 107, or Lorentz, 51.
4. Lorentz, 26.
For Landau, with Kn (u) = (1 − u2 )n in Theorem 2,
Z 1 Z 1 Z 1
d ∂Kn ∂Kn
Kn (t − x) f (t)dt = f (t)dt = − f (t)dt
dx 0 0 ∂x 0 ∂t
and integrate by parts.
Chapter 2
The Polynomial of Best
Approximation Chebyshev
Polynomials
4 The Lagrange Polynomial

We are given n + 1 values of x, 12
x0 , x1 , . . . , xn
and n + 1 constants c0 , c1 , . . . , cn .
Q
Write (x) = (x − x0 ) · · · (x − xn ).
The polynomial p(x) of degree at most n which takes the values ci
Q
at xi is, by the partial- fraction rule for p(x)/ (x),
n
Y X 1 ci
(x) Q .
0
x − xi (xi )
If the ci are the values at xi of a function f (x), we call p(x) the

Lagrange polynomial of f (x) at the xi . Here we follow the usual termi-
nology, although Waring (1779) used the polynomial before Lagrange
(1795) and indeed it is clear that the formula was known to Newton.
Suppose that the values x0 , . . . , xn are fixed. The following lemmas
follow from the definition of the Lagrange polynomial.
11
12 2. The Polynomial of Best Approximation...
Lemma 1. Given an aggregate of polynomials pα (x) of degree at most

n, where α runs through an index-set I, such that
|pα (xi )| ≤ A, α in I; i = 0, . . . , n.
Then, if aα,r is the coefficient of xr in pα (x),
|aα,r | ≤ AB,
where B is independent of α.
Proof. Write pα (xi ) for ci . We have a B depending only on the xi .
13 Lemma 2. (For brevity of expression, we translate de la Vallee Poussin,

Leo̧ns, 74). If, at n + 1 given points, two polynomials of degree at most
n take ’infinitely close’ values, their corresponding coefficients are in-
finitely close.
Proof. Given ε, we have two polynomials say pα (x), qα (x) which dif-
fer by at most ε for each of the values x0 , . . . , xn . By Lemma 1, their
corresponding coefficients differ by at most Bε.
5 Best Approximation
Let Pn be the set of polynomials p(x) of degree less than or equal to n.
Then
P0 ⊂ P1 ⊂ P2 · · ·
Define, for any particular p in Pn ,
d(p, f ) = sup | f (x) − p(x)| f or a ≤ x ≤ b.
Let d = dn = dn ( f ) = inf d(p, f ) for all p in Pn . Then d ≥ 0.

Our first aim is to prove that there exists a p in Pn for which the inf. is
attained, i.e., that, given f (x) of C(a, b), there is a polynomial of degree
n of best approximation. Later we shall prove uniqueness.
If f is given,
d0 ≥ d1 ≥ d2 . . . ,
5. Best Approximation 13
and Theorem 1 asserts that lim dn = 0.

The existence of a polynomial of best approximation was known
to Chebyshev (or Tschebyscheff) (1821 - 1894) who was one of the
founders of the subject. The necessary proof was supplied by Borel
(1905).
Theorem 4. There is a polynomial p(x) in Pn for which 14
sup | f (x) − p(x)| = d(= dn ).

Proof. All our polynomials being Pn , we do not need the suffix n to
denote degree, and the suffixes in p1 , p2 , . . . will be used to specify par-
ticular polynomials of Pn . As here, we shall commonly omit the variable
x form a polynomial p(x) or a function f (x).
By definition of d, there is a polynomial pm with

1
d ≤ d(pm , f ) < d + .
m
For all m and a ≤ x ≤ b,
|pm (x)| ≤ d + 1 + sup | f (x)| = A.
By §4, Lemma 1, the n + 1 coefficients of powers x0 , x1 , . . . , xn in the
pm (x) all lie in a bounded region of n + 1 space. This set of points in
n + 1 space has at least one limit point, defining a polynomial p(x) for
which
d( f, p) ≤ d( f, pm ) + d(pm , p)
1
≤d+ +ε
m
where ε → 0 as m → ∞ through a sub-sequence for which there is
convergence of the coefficients to their limits.
Therefore d( f, p) = d.
Theorem 5. If f (x) is C(a, b) and p(x) satisfies Theorem 4 there are
n + 2 values (or more) at which
f (x) − p(x) = ±d,
with alternating sign.
Proof. g(x) = f (x) − p(x) is continuous. Divide (a, b) into sub-intervals 15

such that g(x) does not take the value 0 in any (closed) sub-interval in
which it takes the value ±d. Denote by l1 , l2 , . . . , lm (numbered from left
to right) those of the sub-intervals in which g(x) takes the value +d or
−d. Define ε1 , ε2 , . . . , εm to be +1 or −1 according as the value is +d or
−d. We have to prove that there are at least are at least n + 1 changes of
sign in the sequence of ε′ s. Suppose there are fewer. We shall obtain a
contradiction by constructing a polynomial of better approximation than
p(x).
If all the ε′ s have the same sign, say +, add a small constant to p(x).
This gives a polynomial of better approximation.
Generally, suppose that there are k changes of sign in the sequence
of ε′ s, where k ≤ n. Let εi , εi+1 be different. Then li and li+1 cannot
abut (since g(x) does not vanish in either), so we can choose a value of
x; lying between them. We have thus k values of x; call them
x1 , x2 , . . . , xk .
Define h(x) = ε1 (x1 − x)(x2 − x) · · · (xk − x).h(x) has the same sign as
g(x) in each of sub-intervals l. We shall prove that, if η is small enough,
the polynomial of Pn
p(x) + ηh(x)
has better approximation to f (x) than p(x) has.
In those intervals of the original subdivision which are not l′ s,
sup |g(x)| = d′ (say) < d.
16 Choose η to make |ηh(x)| < d − d′ (a ≤ x ≤ b), now,
| f − p − ηh| = |g − ηh|.
In the l′ s, this is less than d, since g, h have the same sign.

And, in the sub-intervals other than l′ s,
|g − ηh| ≤ |g| + |ηh| < d′ + (d − d′ ) = d.
So p + ηh approximates better to f than p does.

Theorem 6. The polynomial p(x) of Theorem 4 is unique.

Proof. Suppose that two polynomials p, q satisfy Theorem 4. Let r =
1 1
(p+q). Then f −r = 21 ( f − p)+ ( f −q). Therefore r satisfies Theorem
2 2
4, and so, by Theorem 5,
f − r = ±d
for n + 2 values of x.
But f − r = d only if f − p = f − q = d. Therefore there are n + 2
values of x for which p(x) and q(x), polynomials of degree at most n,
are equal. Therefore p(x) ≡ q(x).
In future we can (by Theorem 6) describe as the best Pn that poly-
nomial p(x) in Pn for which
sup | f (x) − p(x)| = d,
where d = inf d( f, q) for all q(x) in Pn . The number d (or dn if it is
necessary to make then n explicit) may be called the best approximation.
Theorem 7. Suppose f is C(a, b) and q is in Pn . Let there be n + 2
values of x at which f − q takes values alternating in sign
d1 , −d2 , d3 , . . . , (−1)n+1 dn+2 .
Then the best approximation d satisfies 17
d ≥ min di
Proof. Suppose that d < di (i = 1, . . . , n + 2) and let p be the best Pn .
Then p − q = ( f − q) − ( f − p) takes alternate signs at the n + 2 values in
the hypothesis. Therefore p − q (which is in Pn ) has at least n + 1 zeros.
This is a contradiction.
Corollary. Let q be in Pn and let
sup | f − q| = d′ .
Suppose that f − q takes the values ±d′ alternately for n + 2 values
of x. Then d′ = d and q is the best Pn .
Proof. By theorem 6, d ≥ d′ . But d ≤ d′ , since d is the best approxima-
tion.
6 Chebyshev polynomials
Theorem 4 guarantees the existence of the best Pn for a given f . It is
only in special cases that the explicit calculation of this polynomial is
practicable. Theorem 7 and its corollary can often be turned to use.
Easy exercises:
1 1
For x2 in (0, 1), the best P0 is , the best P1 is x + .
2 8
1
For x4 in (−1, 1), the best P3 is x2 + .
8
Consider now the general problem.
A. Among all pn (x) with coefficient of xn equal to 1, find that which de-
viates least from 0 in (−1, 1) in other words, that for which
sup |pn (x)| = d is least.
This problem can be stated in the equivalent form.
B. Find the best approximation in Pn−1 to xn in (−1, 1).
18 From Theorem 7 (Corollary) we wish to find a pn (x) which takes the

values ±d alternately at n+1 points (why not n+2 points?). Enlightened
guessing soon leads to the answer
pn (x) = d cos nθ where x = cos θ.
It is worth while to give Chebyshev’s own proof of this, which does

not depend on guesswork.
Theorem 8. Among all pn (x) with coefficient of xn equal to 1, the poly-

nomial
2−n+1 cos nθ where x = cos θ
deviates least from 0 in (−1, 1).
Proof. let p(x) = xn +· · · be the required polynomial, and d = sup |p(x)|.

By Theorem 5, there are n+ 1 values of x (at least) where p(x) = ±d.

These may be end-points or interior points of (−1, 1). At such a point
6. Chebyshev polynomials 17
which is an interior point, p(x) has a maximum or minimum and so

p′ (x) = 0. Since p′ (x) has degree n − 1, the n + 1 values must be 1, −1
and n − 1 others, say x1 , x2 , . . . , xn−1 .
The two polynomials of degree 2n
d2 − p2 and (1 − x2 )p′2
have the same zeros, namely, 1, −1 and each of x1 , x2 , . . . , xn−1 doubly.

Comparing the coefficients of x2n we have
n2 (d2 − p2 ) = (1 − x2 )p′2 .
Solving this differential equation for p we find, putting x = cos θ,
p = d cos(nθ + C)
Since p(x) is a polynomial, C = 0. But
cos nθ = 2n−1 cosn θ + lower powers of cos θ,

and so = 2−n+1 .
The polynomials revealed by Theorem 8 are named after Chebyshev 19

and (following the alternative spelling of his name) we define
T n (x) = cos(n arc cos x).
The early members of the sequence are
T 0 (x) = 1, T 1 (x) = x T 2 (x) = 2x2 − 1

T 3 (x) = 4x3 − 3x, T 4 (x) = 8x4 − 8x2 + 1.
Their mode of definition is restricted to (−1, 1) and it is in that inter-

val that their utility mainly lies. But many of their properties hold for all
values of x. Some useful results are collected in Theorem 9; the proofs
can easily be supplied.
Theorem 9. (1) y = T n (x) satisfies the differential equation
(1 − x2 )y′′ − xy′ + n2 y = 0.
(2) T n (x) is the coefficient of tn in the expansion of the generating func-

tion
1 − tx
1 − 2tx + t2
(3) the recurrence relation
T n (x) = 2xT n−1 (x) − T n−2 (x) (n ≥ 2).
(4) an explicit formula for the coefficients

!
k n n − k n−2k−1 n−2k
X
T n (x) = (−1) 2 x
n−k n
n
summed for 0 ≤ k ≤
2
p
(5) orthogonality with the weight-function 1/ (1 − x2 )
Z 1 
T n (x)T m (x) 0(m , n)


dx = 
 1 π(m = n)
p
−1 2
(1 − x )

2
(6) for |x| > 1,

p p
2T n (x) = {x + (x2 − 1)}n + {x + (x2 − 1)}n .
Note
20 The calculation of polynomials of best approximation is in practice

troublesome. See de la Vallee Poussin, Chapter V I. For a method of cal-
culation by a convergent sequence, see Polya, Comptes Rendus (Paris)
157 (1913), 840.
Chapter 3
Approximations to |x|
7
We now take up the central problem of polynomial approximation, 21
namely
Given a function f (x), how high is the degree of the polynomial
necessary to approach it with an assigned accuracy?
The answer may well depend on structural properties of f (x). For
instance, we may guess (rightly) that we can predict a lower degree if
f (x) is assumed to be differentiable instead of only continuous. The
best theorems on these matter lie fairly deep. We shall go through some
heuristic motion of finding from particular cases what truth appears to
be and then deciding how to try to establish it.
A useful function to study with care is |x| in (−1, 1). This function
was the basis of Lebesgue’s proof of Theorem 1.
From Exercises 1, 2 at the end of Chapter I, the deviation from |x|
of a polynomial of degree n of any of the three types used in proving
√
Theorem 1 is of order 1/ n.
Let us clarify our mode of speech. If, for some p(x) in Pn ,
| f (x) − p(x)| = 0{ϕ(n)}
we will say that the approximation is 0{ϕ(n)}. If, moreover, there is no
p(x) in Pn for which
| f (x) − p(x)| = 0{ϕ(n)}
19
20 3. Approximations to |x|
we will say that the approximation is actually 0{ϕ(n)}.

Study of the proofs of Theorem 1 might lead us to conjecture that
√
the best approximation in pn to |x| is actually 0(1/ n).
22 We proceed to show that, in fact, it is actually 0(1/n). This will be
proved, following Bernstein, Lecons Ch.I, by elementary (though rather
lengthy) algebra.
To approximation to |x| in (−1, 1) is the same thing as to approximate
to x in (0, 1) by polynomials whose exponents are all even, and this is
what we shall do.
If d2n is the best approximation to x in (0, 1) by
a0 + a1 x2 + · · · + an x2n
we shall prove that

1 1 1
> d2n > √ · .
2n + 1 4(1 + 2) 2n −1
Bernstein went further and proved that d2n ∼ C/n, where C is a

constant which he evaluated as 0.282 ± 0.004.
The theorems of this Chapter will not be used later in the course,
and any one who wishes may note above inequalities for d2n and pass
on to Chapter IV.
8 Oscillating polynomials
Definition. If 0 ≤ α0 < α1 · · · < αn and Ai , 0 (all i), we say that
p(x) = A0 xα0 + A1 xα1 + · · · + An xαn
is an oscillating polynomial in (0, 1) if sup |p(x)| is attained for n + 1

values of x in 0 ≤ x ≤ 1. We shall suppose the α′ s integers.
Illustrations.
(1) αi = 2i + 1 T 2n+1 (x) satisfies

√
(2) αi = i T 2n ( x).
8. Oscillating polynomials 21
23 Lemma 1. The polynomial p(x) in the definition has at most n positive

zeros. If it has n the coefficients alternate in sign.
From Descrates’ rule fo signs.
Lemma 2. The coefficients of an oscillating polynomial alternate in
sign.
Proof. (1) Let α0 = 0 (and A0 , 0). There are at most n − 1 changes of
sign in the coefficients of p′ (x). Therefore p′ (x) has at most n − 1
positive zeros. The n + 1 values of x at which sup |p(x)| is attained
must be n − 1 zeros of p′ (x) and x = 0, x = 1. So p′ (x) has n −
1 zeros, say x1 , x2 , . . . , xn−1 lying inside (0, 1) and its coefficients
A1 , . . . , An−1 alternate in sign.
p(x) has no maxima or minima other than these n − 1. Therefore
p(0), p(x1 ), . . . , p(xn−1 ), p(1)
alternate in sign. Therefore p(x) has n zeros. Therefore A0 , A1 , . . . ,

An alternate in sign
(2) Let α0 > 0. Then p(0) = 0. So sup |p(x)| is attained at n points

inside (0, 1) which are roots of p′ (x) = 0. Therefore the coefficients
alternate in sign.

Corollary. p(x) takes the values ± sup |p(x)| with + and − sign alter-
nately.
n
Ai xαi is an oscillating polynomial in (0, 1).q(x)
P
Theorem 10. p(x) =
i=0
is another polynomial Bi xαi with the same exponents. One coefficient
P
of p is the same as the corresponding one of q (say A j = B j), where
α j > 0. Then
sup |q| > sup |p|.
Proof. If not, p − q takes alternate signs (may be 0) for the n + 1 values 24
of x for which p takes its numerically greatest value. Therefore p − q
has at least n zeros in 0 ≤ x ≤ 1. But, since A j = B j it has only n terms,
and so at most n− 1 changes of sign in its coefficients and so (by Lemma

1) at most n − 1 positive zeros. This is a contradiction.
Converse of Theorem 10. p(x) and q(x) are two polynomials with the
same exponents and one coefficient the same (A j = B j, where α j > 0).
If
sup |p| < sup |q|
for every such q, then p is an oscillating polynomial.
Proof. We gives the gist of the proof, without setting out all the detail
in full. It uses a ’deformation’ argument like that of theorem 5.
Suppose that p(x) is not an oscillating polynomial. Then p(x) takes

the values ±M, where M = sup |p(x)|, at h points, say xk (k = 1, . . . , h),
where h < n + 1. We can construct a polynomial r(x) = Ci xαi with
P
C j = 0 and r(xk ) = p(xk ). (The n + 1 coefficients Ci have to satisfy at
most n + 1 equations; the determinant can be proved , 0).
We can take ε and intervals round the xk , outside which |p(x)| <
M − ε and inside each of which p(x) and r(x) have the same sign.
Choose λ to make λ|r(x)| < ε for 0 ≤ x ≤ 1.

Then sup |p − λr| < sup |p|.
But p − λr satisfies the conditions for a q, giving a contradiction.

25 Apply theorem 10, taking p(x) to be a constant multiple of one of
√
the oscillating polynomials T 2n ( x) and T 2n+1 (x). We obtain
Corollary 1. If q(x) = a0 + a1 x + · · · an xn and M = sup |q(x)| in (0, 1),

then
|ai | ≤ M|ti |(i = 0, 1, . . . , n),
√
26 where ti is the coefficient of xi in T 2n ( x).
Corollary 2. If q(x) = a0 x + a1 x3 + · · · an x2n+1 and M = sup |q(x)| in

(0, 1), then
|ai | ≤ M|ti |(i = 0, . . . , n)
where ti is the coefficient of x2i+1 in T 2n+1 (x).
8. Oscillating polynomials 23
Theorem 11. To a given set of exponents there corresponds an oscillat-

ing polynomial in (0, 1), which is unique except for a constant factor.
Proof. Let α0 , α1 , . . . , αn be the given exponents in ascending order.

Suppose that the coefficient of xαk is given to be K.
We need to prove that among all the polynomials with the given
exponents
q(x) = B0 xα0 + · · · + Bk−1 xαk−1 + kxαk + · · · + Bn xαn ,
there is a unique q(x) for which sup |q(x)| attains its lower bound.
0≤x≤1
Clearly sup |q(x)| is a continuous function of the n variables (B0 · · · ,
Bk−1 , Bk+1 , . . . , Bn ). Its lower bound is greater than 0 by Corollary 1 of
Theorem 10. It is less than or equal to K, as is seen by taking the B′ s to
be small. Again by Corollary 1, we need only consider values of Bi for
which
|Bi | ≤ K|ti | (i = 0, 1, . . . , k − 1, k + 1, . . . , n),
√
where ti is the coefficient of xαi in T 2αn ( x).
The Bi lie in a bounded closed region of n space, and so they have at
least one set of values for which sup |q(x)| attains its lower bound. This
proves the existence of an oscillating polynomial. Uniqueness follows
from Theorem 10.
Theorem 12. If
p(x) = xα0 + A1 xαβ11 + · · · + An xαβnn

and q(x) = xα0 + B1 xβ1 + · · · + Bn xβn
are both oscillating polynomials in (0, 1) where
0 < α0 < β1 < α1 < β2 < · · · < βn < αn ,
then sup |p(x)| > sup |q(x)|.

Proof. By Lemma 2, the coefficients fo of p(x) alternate in sign and so

do those of q(x).
q(x) − p(x) = B1 xβ1 − A1 xα1 + B2 xβ2 − · · · − Axαn n
has n variation of sign, and so the equation
q(x) − p(x) = 0
has at most n positive roots.
Suppose the theorem false and
sup |p(x)| ≤ sup |q(x)|.
Then q(x) − p(x) has the sign of q(x) (it may be 0) for the values
xn (k = 1, . . . , n + 1) at which |q(x)| takes its maximum value. Therefore
q(x) − p(x) vanishes for n values ξ1 , . . . , ξn such that
xi ≤ ξ1 ≤ x2 ≤ ξ2 ≤ · · · ≤ ξn ≤ xn+1 .
27 Moreover, there are n + 1x′ s and only nξ ′ s so at least one x, say xi

must satisfy ξi−1 < xi < ξi (giving meaning to ξ0 , ξn+1 ).
We shall now compute the sign of q(xi ) by two different methods
and obtain contradictory results.
Firstly, in (0, ξ1 ), q(x) − p(x) has the sign of its dominant term B1 xβ1 ,
which is negative. By following the changes of sign along the sequence,
q(x) − p(x) has sign (−1)i in (ξi−1 , ξi ). At xi , q(x) − p(x) and also q(x)
have the sign (−1)i .
Secondly, for small values of x, q(x) had the sign of its first term,
which is positive. Therefore q(x1 ) > 0. So q(x2 ) < 0, and generally,
q(xi ) has the sign (−1)i+1 .
This is a contradiction.
The same arguments can be used to prove
Theorem 12 (Extension). If
p(x) = A0 xα0 + · · · + Ai−1 xαi−1 + xm + Ai+1 xαi+1 + · · · + An xαn

9. Approximation to |x| 25
q(x) = B0 xβ0 + · · · + Bi−1 xβi−1 + xm + Bi+1 xβi+1 + · · · + Bn xβn
are both oscillating polynomials in (0, 1), where
0 ≤ α0 < β0 < · · · < αi−1 < βi−1 < m < βi+1 < αi+1 · · · < βn < αn ,
then
sup |p(x)| > sup |q(x)|.
9 Approximation to |x|
Theorem 13. If
p(x) = x + a1 x2 + a2 x4 + · · · + an x2n
is an oscillating polynomial in (0, 1), then

1 1
> sup |p(x)| > √ (n > 1).
2n + 1 2(1 + 2)(2n − 1)
Note. If n = 1, the second inequality is! to be replaced by equality. The 28
1 1
oscillating polynomial is x − + √ x2 .
2 2
Proof. Take n > 1. By Theorem 12, sup |p(x)| is less than the supremum
of the oscillating polynomial
x + b1 x3 + · · ·
with exponents 1, 3, 5, . . . 2n + 1. But that polynomial is (−1)n T 2n+1 (x)/

(2n + 1). This gives the first inequality of the theorem.
By Theorems 12 and 10, the oscillating polynomial x + b1 x3 + · · · +

bn−1 x2n−1 has smaller maximum modulus than the polynomial x+c2 x4 +
· · · + cr x2n with exponents 1, 4, 6, . . . , 2n. But the former polynomial is
(−1)n−1 T 2n−1 (x)/(2n − 1), with maximum modules 1/(2n − 1). We shall
now construct a polynomial of the latter form (with no term in x2 ).
With the notation of the hypothesis for p(x), write sup |p(x)| = m.
Then, if µ > 0,
!2 !2n
x + a x
· · · + an
x
≤ m.
1 + µ 1
1+µ 1+µ
Therefore
|x(1 + µ) + a1 x2 + a′2 x4 + · · · + a′n x2n | ≤ m(1 + µ)2

i.e., |p(x) + µ(x + c2 x4 + · · · + cn x2n )| ≤ m(1 + µ)2 .
Therefore
n o
|µ(x + c2 x4 + · · · + cn x2n )| ≤ m (1 + µ)2 + 1
n o
and so |x + c2 x4 + · · · + cn x2n | ≤ m (1 + µ)2 + 1 /µ
29 This is true for all positive values of µ, and so √ we can replace the
right-hand side by its minimum, which is 2m(1 + 2).
As we said, the maximum modulus of a polynomial with exponents
1, 4, 6, . . . , 2n is greater than 1/(2n − 1) and therefore
1 1
m> √ .
2(1 + 2) 2n − 1
We have now all the material for the final result announced at the
end of §7.
Theorem 14. If d2n is the best approximation to x in (0, 1) by
a0 + a1 x2 + · · · + an x2n ,
1 1 1
then > d2n > √ · .
2n + 1 4(1 + 2) 2n − 1
Proof. d2n is the maximum modulus of the oscillating polynomial
A0 + x + A1 x2 + A2 x4 + · · · + An x2n .
Let p(x) and m have the same meanings as in Theorem 13.

9. Approximation to |x| 27
By Theorem 10, d2n < m.

Write q(x) − A0 = x + A1 x2 + · · · + An x2n .
So, by Theorem 10,
sup |q(x) − A0 |
is greater than the maximum modulus of p(x), the oscillating polynomial

with exponents 1, 2, 4, . . . , 2n and coefficient of x equal to 1; that is to
say, is greater than m.
But sup |q(x) − A0 | ≤ d2n + |A0 | ≤ 2d2n .
Therefore 2d2n > m.
The inequalities of Theorem 13 for m give the started inequalities
for d2n .
Notes
1. Example. Find the polynomial in Pn for which the coefficient of xk 30

is 1 and which deviates least from 0.
2. The definition on page 22 of an oscillating polynomial can be ex-

tended to a system
A0 ϕ0 + · · · + An ϕn (x),
if the ϕ′ s satisfy certain conditions. See Bernstein, Lecons, 1 or As-

chieser, 67
Hint
√
1. If k, n are both even or both odd, consider T n (x), otherwise T 2n ( x).
Chapter 4
Trigonometric Polynomials
10 Trigonometric polynomials. Modulus of Conti-

nuity
The central problem of approximation, namely the degree of the poly- 31
nomial required an assigned closeness to a given function, yields more
easily to trigonometric than to algebraic treatment. Trigonometric series
and in particular Fourier series have been in the fore-front of Analysis
for something like a century, and knowledge about them has been avail-
able for any problem of approximation.
A trigonometric polynomials is
1
t(x) = a0 + (a1 cos x + b1 sin x) + · · · + (an cos nx + bn sin nx). This
2
can be written tn (x) if an , 0 or bn , 0 and we wish to display the order
of the polynomial. We can denote by T n the set of all polynomials which
are sums of multiples of cos kx and sin kx for <≤ k ≤ n. (There will be
no confusion with the Chebyshev polynomials T n (x) of §6).
The function t(x) is periodic with period 2π (and, in general, with no
smaller period). We say that f (x) is C(2π) if it is continuous with period
2π.
The problem of approximating to a C(2π) function by a trigono-
metric polynomial is essentially the same as that of approximating to
a C(a, b) function by an algebraic polynomials. In the first place, the
analogue of Theorem 1 holds.
29
30 4. Trigonometric Polynomials
Theorem 15 (Weierstrass). If f (x) is C(2π) then, given ε, there is t(x)

such that
| f (x) − t(x)| < ε (all x)
This will emerge as a by-product of theorem 18, and we shall give
an independent proof here. You should, however, read Notes 1 − 3 at the
end of this chapter.
32 In statements about periodic functions, values of x differing by mul-

tiple of 2π will be regarded as the same.
Lemma 1. The equation tn (x) = 0 has at most 2n roots.

1
(Prove by expressing in term of tan x or of exp ix).
2
Corollary 1. Two tn′ s which take the same values at 2n + 1 points are
identical.
Corollary 2. If two tn′ s have 2n common zeros one is a consult multiple

of the order
The reader should verify that there is an analogue of the Lagrange

polynomial of §4, namely
The polynomial in T n which takes the values ci at xi (i = 0, 1, . . . , 2n)
is
X 1 ci
P(x) x−xi P′ (x )
2 sin 2 i
Y x − xi
where P(x) = sin .
2
We shall take for granted the trigonometric analogues of Theorem
4 − 7 (pages 14 − 17) about best approximation. Briefly, for a given
f (x) in C(2π), there is unique t(x) of best approximation in T n which
is characterized by f (x) − t(x) taking its greatest numerical value, with
alternating sign, for alt least 2n + 2 values of x. Proofs can be found in
the book of de la Vallee Poussin or Natanson.
10. Trigonometric polynomials. Modulus of Continuity 31
Illustrations:
1) If f (x) = tn−1 (x) + (an cos nx + bn sin nx). then tn−1 (x) gives the
best approximation in T n−1 to f (x).
p
Proof. f − tn−1 takes the values ± (a2n + b2n ) alternately at 2n points.

2) An interesting example is Weiertrass’s non-differentiable function 33

∞
X
f (x) = ar cos br x
r=0
where 0 < a < 1, b is an odd integer and ab > 1. We shall prove that
the best approximation in T n to f (x) is
k
X
t(x) = ar cos br x, where bk ≤ n < bk+1 .
r=0
P∞
Proof. f (x) − t(x) = k+1 ar cos br x.
∞
ar at x = 0. Cos bk+1 x takes the
P
This takes its greatest value
k+1
values ±1 alternately at integral multiples of π/bk+1 , of which there are
2bk+1 in a period.
Since b is an odd integer, cos br x for r > k + 1 takes the same values
at those points as cos bk+1 x.
Now 2bk+1 ≥ 2n + 2 and so f (x) − t(x) takes its numerically greatest
value for at least 2n + 2 values ofx.
Corollary. The approximation given by this t(x) is A/nα , where α =
log(1/a)/ log b.
Proof. The approximation is
ak+1 b−α(k+1) 1 1
= ∼ · α
1−a 1−a 1−a n

Modulus of continuity. Let f (x) be C(a, b) and define

ω(δ) = sup | f (x2 ) − f (x1 )| for |x2 − x1 | ≤ δ.
Then ω(δ) is continuous, increases as δ increases, and tends to 0 as
δ tends to 0. We shall find that the rapidity with which ω(1/n) tends to
0 as n → ∞ gives the clue to the approximation to f (x) attainable in Pn
or T n .
34 If f (x) is C(2π), the same definition of ω(δ) holds. Observe that now
the greatest value ofω(δ) is ω(π)
Properties of ω(δ) are collected in the following theorem.
Theorem 16. (1) If n is an integer,
ω(nδ) ≤ nω(δ).
(2) If k > 0, ω(kδ) ≤ (k + 1)ω(δ).

(3) If ω(δ) = for some δ > 0, then f (x) is a constant.
n−1
P
Proof. (1) f (x + nh) − f (x) = f (x + kh + h) − f (x + kh) .
k=o
For h ≤ δ, the R.H.S. is numerically at most nω(δ).
(2) If k is not an integer, let n be the integer next greater. Then
ω(kδ) ≤ ω(nδ) ≤ nω(δ) ≤ (k + 1)ω(δ).
(3) f (x) is constant in any interval less that δ, and so everywhere.

Lipschitz condition. Def. f (x) satisfies the Lipschitz condition of

order α(briefly, is Lip. α) in a given interval, if for every x1 , x2 in it,
| f (x2 ) − f (x1 )| ≤ A|x2 − x1 |α .
It follows that ω(δ) ≤ Aδα .
In this, 0 < α ≤ 1. If α > 1, f (x) can only be a constant,be a
constant, because then
ω(δ) ≤ nω(δ/n) ≤ Aδα /nα−1 .
Making n → ∞, we have ω(δ) = 0.
11. Fourier and Fejer Sums 33
11 Fourier and Fejer Sums

We collect for reference in Theorem 17 some well-known facts. Proofs 35
can be found in any text-book of analysis which includes a chapter on
Fourier Series.
Theorem 17. (1) The sum
n
1 X
S n = a0 + (ar cos rx + br sin rx)
2 r=1
for the Fourier Series of f (x) is equal to

1
Z π sin n + 12 t
{ f (x + t) + f (x − t)} dt.
π 0 2 sin 12 t
(2) | f (x) − S n (x)| < M(A log n + B), where M = sup | f (x)| and A, B are
constants.
(3) If σn = (S 0 + S 1 · · · + S n−1 )/n

is the Fejer (C1) sum of the Fourier series of f (x), then
Z π !2
1 sin nt
σn = f (x + 2t) dt.
nπ 0 sin t
1 P∞ 1
(4) 2
= −∞ (t , kπ).
sin t (t + kπ)2
The result (2), which cannot be improved, shows that, in general,
the Fourier series of a function gives a poor approximation in the sense
measured by sup | f (x) − S n (x)|. As the R.H.S. of (2) tends to infinity
with n, (2) does not include Weierstrass’s Theorem 15. The sense in
which the Fourier series does give the best approximation is the mean-
square sense (omitted here). It is known that the Fejer sums σn of (3)
behave more regularly than the Fourier sums S n ; this is due to the kernel
(sin nt/ sin t)2 in the integral for σn being positive, whereas the kernel in
S n takes both signs. The next theorem gives the approximation to f (x)
afforded by (σn (x).
Theorem 18. If f (x) has modulus of continuity ω(δ), then 36
| f (x) − σn (x)| ≤ Aω(1/n)|logω(1/n)|.
Proof. We first put σn (x) into a form more convenient than that of The-
orem 17(3). Since f is periodic
Z π
sin2 nt
Z (k+1)π
sin2 nt
f (x + 2t) dt = f (x + 2t) dt.
0 (t + kπ)2 kπ (t2 )

Then, from (3) and (4) of Theorem 17,

Z π
sin2 nt sin2 nt
Z ∞
1 1
σn = f (x + 2t) dt = f (x + 2t) dt,
nπ 0 sin2 t nπ −∞ t2
and so, by changing the variable from t to t/n,
1 ∞ 2t sin2 t
Z
σn = f (x + ) 2 dt
π −∞ n t
) 2
1 ∞
Z (
2t sin t
σn − f = f (x + ) − f (x) dt.
π −∞ n t2
Therefore
∞
sin2 t
Z
2
| f − σn | ≤ ω(2t/n) dt.
π 0 t2
The integral on the R.H.S. is the sum of the integrals over (0, 1),
(1, X) and (X, ∞). This gives
( Z X Z ∞ )
2 dt dt
| f − σn | ≤ ω(2/n) + ω(2t/n) 2 + ω(π) 2
π 1 t X t
( Z X )
2 t+1 ω(π)
≤ ω(2/n) + ω(2/n) 2
dt +
π 1 t X
( )
2 ω(π)
≤ ω(2/n)(2 + log X) + .
π X
Choose X = 1/ω(2/n) and we have a result equivalent to that stated.
37 Corollary 1. Theorem 15
AB
Corollary 2. If ω(δ) < Aδα (0 < α < 1), then | f − σn | < nα , where
B = B(α) is independent of f .
Proof.
2 ∞ sin2 t
Z
| f − σn | ≤ ω(2t/n) 2 dt
π 0 t
α+1 Z ∞ 2
2 A α sin t
≤ t dt
πnα 0 t2
The estimates in Chapter III would lead us to suspect that, if we can
find a t(x) which approximates to f (x) more closely than σn (x) does, we
may get rid of the log ω(1/n) on the R.H.S. of Theorem 18. It is easy to
see how to try to do this. The logarithm arises from integrating a term
in 1/t. The Fejer sum is
Z ∞ ! !
1 2t sin t
Fr (x, n) = f x+ 2r dt
Jr −∞ n t
for r = 1 and Jr = π. If r ≥ 2, there will be no term in 1/t. We shall
achieve our purpose by taking r = 2.
R∞
sin t 4

2π
Lemma 1. (1) J2 = −∞ t dt = 3 .
(2) F2 (x, n) is in T 2n−1 .

Proof. (1)
Zπ ∞
X 1
J2 = sin4 t dt
−∞
(t + kπ)4
0
Zπ
d2
!
1 4 1
= sin t 2 dt
6 dt sin2 t
0
Zπ !
1 4 6 4 2π
= sin t 4
− 2
dt = .
6 sin t sin t 3
0
(2) Reversing the steps by which F1 (x, n) was obtained in the first part
of the proof of Theorem 18. we have
Zπ
d2
!
3 3 4 1
F2 (x, n) = f (x + 2t) sin nt 2 dt.
2π 6n dt sin2 t
0
d2
! !
4 1 4 6 4
38 Then sin nt 2 = sin nt − .
dt sin2 t sin4 t sin2 t
sin nt
Now is the sum of multiples of cos kt where k ≤ n − 1. Hence
sin t !
d 2 1
sin4 nt 2 is the sum of multiples of cos kt where k ≤ 4n − 2.
dt sin2 t
Moreover, the expression is even and has period π, so k takes only even
values, 21 say, where 1 ≤ 2n − 1. Finally,
Z π
1 2π
Z
f (x + 2n) cos 21tdt = f (u) cos l(u − x)du
0 2 0
and F2 (x, n) is in T 2n−1 .
Theorem 19. | f (x) − F2 (x, n)| ≤ 3ω(1/n).
3 R∞ 2t sin t !4
Proof. F2 (x, n) − f (x) = f (x + ) − f (x) dt.
2π −∞ n t
!
2t 2|t|
Now | f (x + ) − f (x)| ≤ ω ≤ (2|t| + 1)ω(1/n) from Theorem
n n
16 (2). Therefore
!4
3 ∞
Z
sin t
| f (x) − F2 (x, n)| ≤ ω(1/n) (2t + 1) dt = Aω(1/n),
π 0 t
6 R ∞ sin4 t
where A = 1 + dt. But
π 0 t3
1 π sin3 t
Z ∞ 4
sin3 t
Z ∞ Z
sin t
dt | 2 |dt = dt = 1
0 t3 0 t 2 0 sin2 t
(again by use of Theorem 17(4)).
6
So A < 1 + < 3.
π
Theorem 20. If f (x) is C(2π) and f ′ (x) is continuous with modulus of

continuity ω1 (δ), then
A
| f (x) − F2 (x, n)| ≤ ω1 (1/n) where A < 5/2.
n
!4
3 R∞
( ! ! )
2t 2t sin t
Proof. F2 (x, n) − f (x) = f x+ + f x− − 2 f (x) dt.
2π 0 n n t
The modulus of the term within { } is 39
2 t ′
Z ( ! !)
2u ′ 2u
| f x+ − f x− du|
n 0 n n
Zt !
2 4u
≤ ω1 du
n n
0
! Zt
2 1
≤ ω1 (4u + 1)du
n n
0
!
2 1
= ω1 (2t2 + t)
n n
Therefore
!
A 1
| f (x) − F2 (x, n)| ≤ ω1 ,
n n
where
Z∞ !4 Z∞
3 sin t 3 2 3 sin4 t 3 π 5
A= (2t2 + t) dt = sin tdt + 3
dt < +1 <
π t π π t π 2 2
0 0
Theorem 20 can be extended to higher derivatives. If f (x) has an

r-th derivative with modulus of continuity ωr (δ), the approximation at-
tainable in T n is a constant multiple of n−r ωr (1/n).
Notes on Chapter IV 40
1. Use the singular integral (de la Vallee Poussin)

Zπ
1 1
cos2n (t − x) f (t)dt,
Jn 2
−π
where Jn is the value of the integral when f (t) = 1, to give a direct

proof of Theorem 15.
2. Assuming Theorem 15 proved, deduce Theorem 1 from it.
3. Deduce Theorem 15 from Theorem 1 as follows:
(a) Prove that, if f (x) is C(0, π), it can be approximated uniformly

by a t(x) containing cosines only.
(b) By applying (a) to the even functions
2g(x) = f (x) + f (−x)

2h(x) = f (x) − f (−x) sin x,
deduce that f (x) is uniformly approximately by a t(x).
4. With the notation of Illustration (2), Corollary, page 31, prove that
Weierstrass’s function ar cos br x satisfies a Lipschitz condition of
P
order α.
41 Hints
1 Follow Theorem 2. Detail is in Natanson, 10.
2 Approximate to cos kx and sin kx by a finite number of terms of their

expansions in powers of x.
3 (a) Put y = cos x.

(b) g(x), h(x) are uniformly approximately in (−π, π). So is g(x)
sin2 x + h(x) sin2 x. So is f (x) cos2 x, and hence f (x)(sin2 x +
cos2 x).
4 Given h, choose n so that bn h ≤ 1 < bn+1 h.

∞
X
f (x + h) − f (x − h) = −2 ar sin br h sin br x
1
n
X ∞
X
= +
1 n+1
∞ ∞
2an+1
X X
≤ 2 ar =

n+1

n+1
1−a
n n
an bn − 1 2ban+1
X X
≤ 2h ar br = 2abh <

1

1
ab − 1 ab − 1
But an+1 = b−α(n+1) < hα .

Hence | f (x + h) − f (x − h)| < Ahα .
With more trouble (Aschieser and Krein, 167) this can be proved
best possible.
Chapter 5
Inequalities, etc.
12 Bernstein’s and Markoff’s Inequalities

1
Theorem 21 (Bernstein). If t(x) = ao + n1 (ak cos kx + bk sin kx) then 42
P
2
|t′ (x)| ≤ n sup |t(x)|.
Proof. Suppose, on the contrary, that
sup |t′ (x)| = nl,
where l > sup |t(x)|.
t′ (x), being continuous, attains its bounds and so, for some c, t′ (a) =
±nl and we will suppose that
t′ (c) = nl.
Since nl is a maximum value of t′ (x),
t′′ (c) = 0.
Define S (x) = l sin n(x − c) − t(x).
Then r(x) = S (x) = nl cos n(x − c) − t′ (x).
′
S (x) and r(x) both have order n.

Consider the points
uo = c + π/2n, uk = uo + kπ/n(1 ≤ k ≤ 2n).
41
42 5. Inequalities, etc.
Then
S (uo ) = 1 − t(uo ) > 0

S (u1 ) = 1 − t(u1 ) < 0
······
S (u2n ) = 1 − t(u2n ) > 0
Each of the 2n intervals (uo , u1 ), (u1 , u2 ), . . . , (u2n−1 , u2n ) then contains a

zero of S (x).say
S (yi ) = 0,
43 where ui < yi < ui+1 , (0 ≤ i ≤ 2n − 1). Clearly
y2n−1 < yo + 2π.

Write y2n = yo + 2π.
Then S (y2n ) = S (yo ) = 0.
By Rolle’s Theorem, there is a zero xi of r(x) inside each interval

(yi , yi+1 ) where 0 ≤ i ≤ 2n − 1. Clearly
x2n−1 < xo + 2π.
Now r(c) = nl − t′ (c) = 0.

Since the polynomial r(x) of order n has at most 2n zeros, it follows
that, for some k,
c ≡ xk (mod 2π).
But r′ (c) = −t′′ (c) = 0.
Therefore c(and so xk ) is a double zero (at least) of r(x).
Therefore the xi (0 ≤ i ≤ 2n − 1) provide at least 2n + 1 zeros of
r(x). This is only possible if r(x) ≡ 0, and so S (x) is a constant. But
S (uo ) > 0 and S (u1 ) < 0 and we have a contradiction
Corollary 1. t(x) = sin nx shows that the result is the best possible.
Corollary 2. The algebraic equivalent is- If p(x) has degree n and
|p(x)| ≤ M in (−1, 1), then
p
|p′ (x)| ≤ nM (1 − x2 ).
12. Bernstein’s and Markoff’s Inequalities 43
Proof. Put
t(θ) = p(cos θ)
t (θ) = −p′ (cos θ) sin(θ).
′
The bound for |p′ (x)| given in Corollary 2 fails at the end-points ±1. A 44
better result, due to Markoff, is
|p′ (x)| ≤ Mn2 ,
as will be proved in Theorem 22.
Lemma 1. Let
(2k − 1)π
xk = cos (k = 1, 2, . . . , n)
2n
be the zeros of the Chebyschev polynomial T n (x). If q(x) is in Pn−1 , then
n
1X T n (x)
q
q(x) = (−1)k−1 (1 − x2k )q(xk ) · .
n k=1 x − xk
Proof. Both sides are in Pn−1 and so it is sufficient to show that they
agree for the n values xk . As x → xk ,
T n (x) n
→ T n′ (xk ) = q sin(n arc cos xk )
x − xk
(1 − x2k )
n(−1)k−1
= q
(1 − x2k )
Also, for x = xk , every term on the R.H.S. except the k-th vanishes.
1
Lemma 2. Suppose that q(x) is in Pn−1 and |q(x)| ≤ p (−1 <
(1 − x2 )
x < 1).
Then |q(x)| ≤ n (−1 ≤ x ≤ 1).

Proof. With the notation of Lemma 1, if −x1 = xn ≤ x ≤ x1 ,
π 1
p q
(1 − x2 ) ≥ (1 − x21 ) = sin ≥ .
2n n
Therefore Lemma 2 is true for xn ≤ x ≤ x1 . If x1 < x ≤ 1 (or

−1 ≤ x < xn ) Lemma 1 gives
1 X T n (x)
|q(x)| ≤ | |,
n x − xk
45 since all the x − xk are positive (or all negative). Now

Y
T n (x) = 2n−1 (x − xk ),
′
T n (x) X 1
and so = .
T n (x) x − xk

1
Therefore |q(x)| ≤ |T n′ (x)|.
n
′ n sin nθ
But, if x = cos θ, T n (x) = , which gives
sin θ
|T n′ (x)| ≤ n2 .
Theorem 22 (Markoff). If p(x) is in Pn , then
|p′ (x)| ≤ n2 sup |p(x)| − 1 ≤ x ≤ 1.
Proof. If sup |p(x)| = M, take in Lemma 2,
p′ (x)
q(x) = .
Mn

Corollary. p(x) = T n (x) shows that the result is the best possible.
13. Structural Properties Depend on the... 45
13 Structural Properties Depend on the closeness of

the approximation
Theorem 21 can be used to prove theorems of a type converse to The-
orems 18-20. Theorem 23, which is complementary to Theorem 18,
Corollary 2, will suffice to show the method.
Theorem 23. Let f (x) be C(2π). Suppose that, for all n, the best ap-
proximation in T n to f (x) is less than A/nα , where 0 < α < 1. Then f (x)
is Lip.α.
Proof. Let tn (x) satisfy
A
| f (x) − tn (x)| ≤ .
nα
Define u0 (x) = t1 (x)
an (x)t2n (x) − t2n−1 (x) (n ≥ 1).

∞
P
Then f (x) is the sum of the uniformly convergent series un (x). 46
0
1
Choose δ with 0 < δ ≤ , and m such that
2
1
2m−1 ≤ < 2m .
δ
Suppose |x − y| ≤ δ. We have
m−1
X ∞
X ∞
X
| f (x) − f (y)| ≤ |un (x) − un (y)| + |un (x)| + |un (y)|.
0 m m
We shall find upper bounds for the terms on the R.H.S.
|un (x)| ≤ |t2n (x) − f (x)| + | f (x) − t2n−1 (x)|

A A A(1 + 2α )
≤ nα + (n−1) = .
2 2 α 2nα
Therefore
∞ ∞
X X
α 1 A(1 + 2α ) 1
|un (x)| ≤ A(1 + 2 ) = .
m m
2nα 1 − 2−α 2mα
This gives
m−1
X B
| f (x) − f (y)| ≤ |un (x) − un (y)| + .
o
2mα
Theorem 21 applied to un (x) gives
|u′n (x)| ≤ 2n sup |un (x)| ≤ A(1 + 2α )2n(1−α) .
By the mean-value theorem,
|un (x) − un (y)| ≤ |u′n (ξ)||x − y| ≤ A(1 + 2α )2n(1−α) δ
Therefore
m−1
X B
| f (x) − f (y)| ≤ A(1 + 2α )δ 2n(1−α) + .
o
2mα
1
Putting C = A(1 + 2α ) and using < δ, we have
2m
m−1
X
ω(δ) ≤ Cδ 2n(1−α) + Bδα.
o
47 If now α < 1,
m−1
X 2m(1−α) − 1 2m(1−α)
2n(1−α) = < .
o
21−α − 1 21−α − 1
2
Use now 2m ≤ and we find
δ
21−α
!
ω(δ) < 1−α C + B δα
2 −1
See Notes 1-4 at the end of the Chapter.
14. Divergence of the Lagrange Sequence 47
14 Divergence of the Lagrange Sequence

There is a sense in which the Lagrange polynomial of degree n (§4) fit-
ted to a function f (x) at n + 1 points equally spaced through an interval
follows the function closely. It is natural to expect that, by increasing n,
the approximation would improve and we might, for instance, find an-
other proof of Theorem 1 on these lines. Such expectations are falsified.
Unless heavy restrictions are laid on f (x), the sequence of Lagrange
polynomials diverges except for certain special values of x.
We shall construct an example of this phenomenon.
Lemma. Let p(x) be the Lagrange polynomial which takes the value 0
at the 2m values of x
k/m (−m ≤ k ≤ m; k , 2)
!
1
and takes the value 1/m when x = 2/m. Then, if m is odd, |p |→∞
2
as m → ∞.
Proof. The polynomial p(x), of degree 2m, is
1 (x + 1)(x + m−1
m ) . . . x(x − 1/m)(x − 3/m) . . . (x − 1)
m−1
m (2/m + 1)(2/m + m ) . . . 2/m(2/m − 1/m)(2/m − 3/m) . . . (2/m − 1)
!
1
This gives for |p | 48
2
1 3m(3m − 2) . . . m(m − 2)(m − 6) . . . 1.1.3. . . . m

m 22m (m + 1)(m + 1) . . . 2.1.1.2 . . . (m − 2)

This can be estimated by forming it into factorials and using Stirb-

ing’s theorem. More simply, we can prove that it tends to ∞ by grouping
the factors as follows:
!
1 m−1
|p |= A2 BC,
2 2(m + 1)(m + 2)(m − 4)
3.54 . . . m
where A=
2.4 . . . (m − 1)
(m + 2)(m + 4) . . . (2m + 1)
B=
(m + 1)(m + 3) . . . 2m
! ! !
2m + 3 3m + 5 3m
C= − ... .
m+1 m+3 2m − 2
Here A > 1, B > 1, and the factors of C decrease from left to right,
the last being greater than 3/2. So C > (3/2)m−1 .
1
Note.x = has been taken for ease of calculation. The conclusion holds
2
for other values of x.
Theorem 24 (Borel). There is f (x) in C(−1, 1) whose nth Lagrange
polynomial does not converge to f (x) as n → ∞.
Proof. Define a continuous curve Ck which coincides with Ox outside
the interval (3−k−1 , 3−k ) and has maximum 3−k−1 at the midpoint of that
interval. For example we can define Ck by
n o
y = 3−k−1 sin (3k+1 x − 1)π/2 .

We shall use the Ck to construct a curve S .Pk,S (x) will denote the
Lagrange polynomial which takes the same values as S for the values
x = 1/3k , where −3k ≤ 1 (integer) ≤ 3k . !
1
49 We shall construct S so that Pk,S does not converge to the point
2
1
on S where x = . Observe first that Pk,Ck−1 is the Lagrange polynomial
2
in the Lemma with m = 3k . From the Lemma, given A, there is h1 such
that !
1
|Pk,Ck−1 | > 2A if k − 1 > h1 .
2
There are two possibilities:
!
1
(a) With h1 fixed, Pk,Ch1 does not tend to 0 as k → ∞. Then S can
2
be taken to be Ch1 ; or
14. Divergence of the Lagrange Sequence 49
(b) there exists r such that

!
1 1
|Pk,Ch1 | < A for all k > r1 .
2 2
Choose h2 > max(h1 , r1 ).

Define D2,k to be the sine-curves in Ch1 and Ck−1 (k − 1 ≥ h2 ), and,
for the rest, the x-axis in (−1, 1).
1
D2,k is a continuous curve; its ordinate for x = is 0, and
2
Pk,D2,k = Pk,Ch1 + Pk,Ck−1 .
From above, since k − 1 ≥ h2 ,

!
1 1
|Pk,D2,k | > 2A − A.
2 2
Again, there are two possibilities:
!
1
(a) With h2 fixed,Pk,D2,k does not tend to 0 as k → ∞. Then S can
2
be taken to be D2,h2 ; or
(b) there exists r2 such that
!
1 1
|Pk,D2,kh | < · A for all k > r2 .
2 2 4
Choose h3 > max(h2 , r2 ). 50

Define D3,k to be the sine-curves in Ch1 , Ch2 and Ck−1 (k − 1 ≥ h3 )
and, for the rest, the -axis in (−1, 1). After n repetitions, there are two
possibilities:
(a) There is a Dn,hn for which the k th Lagrange polynomial does not
1
tend to 0 at x = ; and this serves for S ; or
2
(b) there is an infinite sequence Dn,hn for which
!
1 1 1 A
|Phn +1,Dn,hn | > 2A − A − A − . . . − n−1 > A.
2 2 4 2
As n → ∞, Dn,hn defines a continuous curve S whose ordinate for

1
x = is 0. Its Lagrange polynomial takes values greater than A for
2
1
x = when its degree is h1 , h2 , . . . , hn , . . ..
2
51 Notes
1. Weierstrass’s function ar cos br x illustrates Theorem 18 (Corol-

P
lary 2) and Theorem 23. See Chapter IV, note 4.
2. If α = 1, the best that can be proved in Theorem 23 is that ω(δ) <

Aδ log(1/δ). The latter part of the argument can be adapted for
this purpose (Natanson, 91).
1
The function ∞
P sin nx
1 n2
satisfies dn < , but is not in Lip.1 (Natan-
n
son, 93).
3. A condition which is necessary and sufficient of dn < A/n is that
| f (x + h) − 2 f (x) + f (x − h)| < Bh.
(Zymund, Duke Mathematical Journal, 12(1945)47 or Natanson,

96).
4. (Extension of Theorem 23). If, for f (x), dn < A/n p+α (p =

integer, 0 < α < 1), then f (x) has a pth derivative f (p) (x) in Lip
α.
5. For further ‘negative results’ like Theorem 24, see Natanson, 369
– 388. For example, the Lagrange polynomial taking the values of
|x| at n equally spaced points in (−1, 1) converges to |x| as n → ∞
for no value of x expect 0, ±1.
Chapter 6
Approximation in Terms of
Differences
15
This is the only chapter in the course, of which the results are not clas- 52
sical. The point of view here might lead to a re-orientation towards
algebraic rather than trigonometric polynomials.
In Theorem 20 (and its known extensions) the approximation attain-
able in Pn or T n to a differentiable function f (x) is expressed in terms of
its first higher derivative. We shall now give simple examples which lead
us to suppose that bounds of differences of f (x) rather than derivatives
may be more directly related to the closeness of the approximation.
Example 1. If f (x) attains its greatest value at x2 and its least at x1 , then
the best approximation in P0 is
1
{ f (x1 ) + f (x2 )}
2
1 1
and do = { f (x2 ) − f (x1 )} = sup |∆ f |
2 2
This depends solely on the first difference of f (x); the derivative of

f (x)- if it exists-has no bearing on it.
Now raise the degree by one.
51
52 6. Approximation in Terms of Differences
Example 2. If f (x) is C(0, 1), there is a linear function p(x) for which
| f (x) − p(x)| ≤ sup | f (x + 2h) − 2 f (x + h) + f (x)|,
the sup being taken over all x, h such that
0 ≤ x ≤ x + 2h ≤ 1.
53 Proof. Define p(x) to be equal to f (x) at x = 0 and x = 1 Write
g(x) = f (x) − p(x).
Then |g(x)| attains its maximum, M, for 0 ≤ x ≤ 1 at x1 , say.

1
If 0 ≤ x1 ≤ , take x = 0, h = x1 . Then
2
|g(x + 2h) − 2g(x + h) + g(x)| = |g(2x1 ) − 2g(x1 ) + g(0)| ≥ M.
1
If < x1 ≤ 1 take x = 2x1 − 1, h = 1 − x1 . Then
2
|g(x + 2h) − 2g(x + h) + g(x)| = |g(1) − 2g(x1 ) + g(1 − x1 )| ≥ M.
But the second difference of g(x) and f (x) are equal.

By a longer argument it is possible to prove the corresponding result
for P2 and the third difference.
The general result was conjectured in 1949 by H. Burkill.
Theorem 25. There is a number Kn depending only on n such that given

f (x) in C(a, b), there is a polynomial p(x) in Pn−1 for which
| f (x) − p(x)| ≤ Kn sup |∆n ( f )|
(where the supremum is taken for all sets of n + 1 points x, . . . , x + nh in

(a, b))
16. Definition and Properties of the nth Difference 53
The theorem looks innocent, but attempts at it failed until Whitney

it in 1955. He took for his p(x) the Lagrange polynomial for the points
of division of (a, b) into n − 1 equal parts. His work does not yield an
estimate of Kn for general n; in view of Theorem 24, we should hardly
expect good value of Kn .
54 Whitney’s elegant arguments are too long for reproduction here, and
the reader is referred to his paper in journal de Mathematiques 36(1957),
67-95.
It is worth observing, however, that instead of the usual nth differ-
ence with equal increments, we can take a more general nth difference
depending of the values of f (x) at n + 1 arbitrary points. The difficulty
then disappears and the polynomial of best approximation can be used
instead of the Lagrange polynomial.
16 Definition and Properties of the nth Difference

If
ϕ(u) = (u − ho )(u − h1 ) · · · (u − hn ),
the nth divided difference of f (x) for the values specified is commonly
defined by
n
X f (hi )
Dn = Dn ( f ; ho , . . . , hn ) = ′ (h )
.
i=0
ϕ i
In what follows it will be convenient to suppose that
ho > h1 > · · · > hn
To define an nth difference ∆n , as distinct from a divided difference,

we naturally take
∆n = Hn Dn ,
where Hn is homogeneous of degree n in the h′ s.
The most suitable definition of Hn appears to be
Hn = 2n /T n ,
n
X
where T n = T n (ho , h1 , . . . , hn ) = |ϕ′ (hi )|−1 .
i=0
55 In the special case of equal increments with ho − hn = nh, this gives

Hn = n!hn , which is right.
As a further check on the appropriateness of our Hn , we observe that
if a function is numerically less than A, its nth difference ∆n is numeri-
cally less than 2n A.
In working with Dn , ∆n , etc., we shall specify the function and the
values of the variables only so far as is necessary for clarity.
We are now in a position to restate and prove Theorem 25, taking
∆n ( f ) to be the difference Hn Dn ( f ) just defined.
Theorem 25′ . Theorem 25 is true with Kn = 2−n and ∆n ( f ) as just de-

fined and the supremum taken over all values of ho , . . . , hn in (a, b).
Proof. Given f (x), take p(x) to be its polynomial of best approximation

of degree at most n − 1. Then f (x) − p(x) takes its greatest numerical
value at n + 1 points, with signs alternately + and −. These n + 1 points
we takes as ho , . . . , hn .
Since the nth difference of a polynomial of degree n − 1 is 0, we have

n
X f (hi ) − p(hi )
∆n ( f ; ho , . . . , hn ) = Hn
ϕ′ (hi )
X i=o
So |∆n | = Hnd |ϕ′ (hi )|−1 = 2n d,
by definition of Hn .
Therefore, for all x in (−1, 1),
| f (x) − p(x)| ≤ d ≤ 2−n sup |∆n ( f )|.
This proves Theorem 25′ .

56 Alternatively we can prove Theorem 25′ , starting from the upper
bound of |∆n | instead of from the polynomial of best approximation.
16. Definition and Properties of the nth Difference 55
Suppose, then, that sup |∆n | = L and that the bound L is assumed
for the values ho , h1 , . . . , hn of the independent variable. Define points
(hi , yi ) for i = 0, 1, . . . , by taking
L
yi = f (hi ) − (−1)k ,
2n
where k is i or i + 1 according as ∆n ( f, ho , . . . , hn ) is positive or negative.
Construct a p(x) of Pn−1 through the n points (hi , yi ) for i = 0, 1, . . . ,
n − 1. Write
g(x) = f (x) − p(x).
Since ∆n (p) ≡ 0, |∆n (g)| = |∆n ( f )| attains its upper bound for h0 , h1 ,
. . . , hn . From the definition of ∆n , the value of g(hn ) which makes
|∆n (g, h0 , h1 , . . . , hn )| = L is (−1)k L/2n , where k is n or n + 1 accord-
ing as ∆n ( f, ho , . . . , hn ) is positive or negative.
We prove that |g(x)| ≤ 2−n L for all x. Suppose that |g| takes values
greater than L/2n , say g(h′i ) > L/2n for a value h′i between hi−1 and hi+1
where g(hi ) = 2−n L. Then, from the definition of T n ,
Dn (g, ho , . . . , hi−1 , h′i , hi+1 , . . . , hn )

> 2−n LT n (ho , . . . , hi−1 , h′i , hi+1 , . . . , hn )
and so, by definition of ∆n ,
∆n (g, ho , . . . , hi−1 , h′i , hi+1 , . . . , hn ) > L,
which is a contradiction. We have therefore
| f (x) − p(x)| = |g(x)| ≤ 2−n sup |∆n ( f )|,
which is Theorem 25′ . 57

For the next result let us call the n + 1 values
π π
−1, − cos , . . . , cos , 1
n n
at which the Chebyshev polynomial cos(n arc cos x) assumes the values
±1 the Chebyshev points of the interval (−1, 1).
Theorem 26. Suppose that 1 ≥> ho > h1 ≥ · · · ≥ hn ≥ −1. Then
T n (ho , . . . , hn ) ≥ 2n−1
and the sign = holds if and only if the hi are the Chebyshev points.
Proof. The polynomial qn (x) of degree n which takes the value (−1)i at
hi (i = 0, . . . , n) is
n
X (−1)i
qn (x) = ϕ(x) .
i=0
ϕ′ (hi ) (x − hi )
Then qn (x) = an xn + · · · + ao , where

n
X 1
an = = T n (ho , . . . , hn ).
i=o
|ϕ′ (hi )|
Write tn (x) = 2−n+1 an cos(n arc cos x).

Then qn (x) − tn (x) has degree n − 1 at most.
If an < 2n−1 , then |tn (x)| < 1 and qn (x) − tn (x) has the sign of qn (x)
for the n + 1 values ho , . . . , h1 . If an = 2n−1 the same is true on the
understanding that qn (x) − tn (x) may vanish for any of these values. So
the polynomial qn (x) − tn (x), of degree at most n − 1, has n zeros. This
is a contradiction if an < 2n−1 , and is only possible for an = 2n−1 when
qn (x) ≡ tn (x). This proves the theorem.
Corollary. For 1 ≥ ho ≥ · · · ≥ hn ≥ −1, Hn ≤ 2.

Bibliography
[1] Achieser,N.I.- (English by Hyman), Theory of Approximation, 58

New Yor, 1956.
[2] Bertstein,S.- Lecons sur les properties extremales et la meilleure

approximation des fucntions analytiques d′ une variables reelle,
paris, 1926
[3] Jackson,D.- The theory of approximation, (Ameriacan Mathemat-

ical Society Colloquium, XI New York, 1930.
[4] Natanson,I,P. - (German by Bogel), Konstruktive Funktionentheo-

rie, Berlin, 1955.
[5] de la Vallee Poussin, G.J.- Lecons sur 1’ approximation ses func-

tions dune variable reelle, Paris, 1919.
57
APPENDIX
Approximation by
Polynomials in the Complex
Domain
1 Runge’s Theorem
The problem considered till now was the approximation of a given con- 59
tinuous function on a finite closed interval by polynomials in a real vari-
able. Even for functions of two variables, we considered only the prob-
lem of approximation by polynomials in two independent real variables
x, y. in what follows, we shall consider the approximation of a function
in a domain in the plane (open connected set) by polynomials in the
complex variable z = x + iy (which are analytic functions of the variable
z).
Let pn (z) be a sequence of polynomials and suppose that G (which
we assume is not empty) is the largest open set in which pn (z) converges,
uniformly on every compact subset. (This is the type of approximation
we shall consider; the problem of approximation on closed sets more
difficult). By Weierstrass’s theorem the limit of the sequence pn (z) is an
analytic function in G. There is moreover, a purely topological restric-
tion on G, viz., every connected component D of G is simply connected:
for, if C is a sample closed curve contained in D and B is its interior, then
59
60 BIBLIOGRAPHY
(maximum modulus principle)

sup |pn (z) − pm (z)| = sup |pn (z) − pm (z)| → as 0n, m → ∞
z∈B∪C z∈C
so that the sequence pn (z) converges uniformly on B∪C. Hence B∪C ⊂

G and since D is a connected component and C ⊂ D, B ⊂ D.
60 The main theorem, which is the analogue of Weierstrass’s approx-
imation theorem (T h.1, p.2) and which includes a converse of the re-
marks made above, runs as follows.
Theorem A (Runge). Let D be a domain in the plane and f an analytic
function in D. Then f can be approximated, uniformly on every compact
subset of D, by rational functions whose poles lie outside D. If D is
simply connected, f may be approximated by polynomials.
We begin by funding a sequence of open regions Gn , n = 1, 2, 3, . . .
bounded by polygons such that Gn is relatively compact in Gn+1 , whose
limit is D. We may take Gn as a subsequence of the sequence Bm , where
k
Bm is defined to be the interior of the union of those squares m ≤
2
k+1 1 l+1 m
x ≤ m , m ≤ y ≤ m , k, 1 integers, |k|, |l| ≤ 22 , which lie in D.
2 2 2
The boundary of Gn can be split into a finite number of simple closed
polygons Cn,k can be joined by a simple are which does not meet Gn to
S
a point on the boundary of D. If Cn = Cn,k is the boundary of Gn we
k
have Z
1 f (t)
f (z) = dt, z ∈ Gn .
2πi t−z
Cn
By the definition of the integral, we may approximate f (z) uniformly
in Gn−1 by finite sums of the form
1 X f (tr )
fn (z) = (tr+1 − tr )
2πi tr − z
where tr are certain points on Cn . Hence if εn ↓ 0, we can find a se-
quence of rational functions Rn such that Rn has poles at most on Cn
and
(1) | f (z) − Rn (z)| < εn in Gn−1 .
1. Runge’s Theorem 61
The main idea in the proof of the theorem is contained in the next 61
step, which we state as a separate lemma.
Lemma. Let C be a simple are joining the points zo and z1 and K a
compact set not meeting C. Then, any rational function whose only
possible pole is at zo can be approximated, uniformly on K, by rational
functions which no poles expect possibly at z1 .
Proof of the lemma. Let ε > 0 be given and 2d be the distance
between C and K. We find points ao , . . . , aµ on C, ao = zo , aµ = z1 such
that |ak+1 − ak | ≤ d. Let R(z) be the given rational function. There are
two polynomials p and q so that
!
1
R(z) = p(z) + q
z − zo
!
1
and we have only to approximate q = f (z). Since f (z) is ana-
z − zo
lytic in |z−a1 | > d and is finite at ∞, the Laurent expansion of f (z) about
a1 contains no positive powers of z − a1 and converges uniformly on ev-
ery compact subset of |z − a1 | > d. A suitable! partial sum then gives us
1 ε
a polynomial p1 with f (z) − p1 < for z ∈ K. Repeating
z − a1 µ+1
this process, we find successively, polynomials p j , j = 1, . . . , µ with
! !
1 1 ε
p
j − p j+1 < for z ∈ K.
z − aj z − a j+1 µ+1
Then clearly
!
1
f (z) − p
µ < ε, z ∈ K
z − aµ
and the lemma follows.
Proof of Theorem. Let D be any domain and f analytic in D. Let Gn
be the sequence of regions exhausting D described above. There is a 62
rational function rn with poles at most on Cn+1 such that

f (z) − r (z) < 1 on G .
n 2n n
62 BIBLIOGRAPHY
Every point of the boundary of Gn+1 can be joined be an arc not

meeting Ḡn to the boundary of D, so that, by the lemma, there is a
1
rational function Rn with poles outside D such that Rn (z) − rn (z) <
2n
1
on Gn and | f (z) − Rn (z)| < on Gn . The first part of Runge’s theorem
n
is proved. If D is simply connected, then every connected component
of the complement is unbounded (unless D is the whole plane in which
case the theorem is trivial). Hence every point of the boundary of Gn+1
can be joined to a point z1 (|z1 | ≥ 2r) by an arc which does not meet Ḡn ,
r being such that Gn is contained in the circle |z| < r.
Now it follows as above that there is a rational function Rn (z) with
all its poles lying in |z| ≥ 2r, with

f (z) − R (z) < 1 on G .
n n
2n
If we expand Rn (z) in a Taylor series about z = 0 (which converges
uniformly for |z| ≤ r), then a suitable partial sum pn (z) satisfies

R (z) − p (z) < 1 on G
n n n
2n
so that f (z) − p (z) < 1 on G .
n n n
This complete the proof of Runge’s theorem

The same argument proves the following theorem
THEOREM A1 . Let D be any plane domain. From each connected

component of the complement of D, choose a point zα . Then any analytic
63 function in D can be approximated uniformly on every compact set in D
by rational functions which have poles at most at the points zα .
Runge’s theorem is of importance in the theory of functions. As

an instance of its applicability we prove the following extension to an
arbitrary of Mittag-Leffler’s theorem
THEOREM. Let D be a plane domain and aν , ν = 1, 2, . . . sequence of

points in D having no limit point in D. Let pν be polynomials (without
2. Interpolation 63
constant term). Then there is a meromorphic! function f in D with poles

1
at most at the aν such that f (z) − pν is analytic at aν .
z − aν
Proof. We can construct a sequence Gn , n = 1, 2, . . . of regions so that
S
Gn is relatively compact in Gn+1 , n Gn = D and so that any point of the
boundary of Gn can be joined to a point not in D by an arc not meeting
Gn . Let !
X 1
fn (z) = pν
a ∈G
z − aν
ν n
the sum being over those (finitely many) aν which lie in Gn . Since fn+2 −
fn+1 is analytic in Gn+1 , we can find a rational function Rn+1 with no
poles in D such that

f (z) − f (z) − R (z) < 1 for z in G .
n+2 n+1 n+1
2n
n
Since the poles of the Rn+1 lie outside D and the series
∞
X
( fn+1 − fn − Rn )
n=no +1
converges uniformly in Gno , it follows easily that we may take 64
∞
X
f (z) = f2 (z) + ( fn+1 (z) − fn (z) − Rn (z)).
n=2
2 Interpolation
For functions f of a real variable, if pn is the (Lagrange) polynomial p
of degree n which agrees with f at n + 1 equally spaces points on an
interval, the sequence pn in general diverges as n → ∞. The behaviour
for functions of a complex variable is more satisfactory. We proceed to
prove two of the main theorems.
64 BIBLIOGRAPHY
Let C be a simple closed rectifiable curve and f (z) a function an-

alytic inside and on C. Let t1 , . . . , tn+1 be n + 1 points inside C (not
necessarily distinct). Then the polynomial pn (z) of degree n such that
pn (ti ) = f (ti ), i = 1, . . . , n + 1 (multiplicity being taken into account if
some of the ti coincide) is easily seen to be given by
Z
1 (z − t1 ) · · · (z − tn+1 ) f (t)
f (z) − pn (z) = dt.
2πi (t − t1 ) · · · (t − tn+1 ) t − z
C
Our first theorem is as follows. It is also due to Runge.
THEOREM B. Let f (z) be analytic for |z| < R, R > 1 and pn (z) the
polynomial of degree n with pn (zi ) = f (zi ), i = 0, 1, . . . , n, where the zi
are the (n + 1)th roots of unity.
Then
pn (z) → f (z) as n → ∞
uniformly for |z| ≤ ̺ < R.
65 Proof. Let C be the circle |z| = ρ′ , ρ′ > ρ, ρ > 1. Then, for |z| ≤ ρ,
Z n+1
1 z − 1 f (t)
f (z) − pn (z) = dt,
2πi c tn+1 − 1 t − z
so that
zn+1 − 1 f (t)
Z
1
| f (z) − pn (z)| = | dt|
2πc tn+1 − 1 t − z
1 + ρn+1
≤ ′n+1 M (M = sup | f (z)|)
(ρ − 1)(ρ′ − ρ) z∈C
→ 0 as n → ∞ since ρ′ > ρ, ρ′ > 1.
The next theorem is due to Fejer and is considerably deeper; it con-

tains Theorem B.
Let C be a simple closed curve and suppose that w(z) maps the exte-
rior of C one-one conformally onto |w| > 1 in such a way that the points
2. Interpolation 65
at infinity correspond. Then, as is well known, w(z) is one-one continu-

ous on C. Let α(n)
i , i = 0, 1, . . . , n be the n + 1 points of C corresponding
to the (n + 1)th roots of unity in the w-plane. Then we have
THEOREM C. Let f (z) be a function analytic inside and on Cand pn (z)
the polynomial of degree n which equals f (z) at the points α(c)
i . Then
pn (z) → f (z), uniformly inside and on C.
We begin with a lemma.
Lemma.
n
Y 1
(n) (n+1)
lim z − α i = A|w(z)|.
n→∞
i=0
uniformly on any compact set exterior to C, A > 0 being a constant 66
(depending on C).
Proof of the lemma. Let z = z(w) be the inverse of w = w(z) and let
wo , . . . , wn be the (n + 1)th roots of unity. We prove first that
n
1
Q z(w)−z(wi ) (n+1)
1. lim
w−wi
= A.
n→∞ i=0
The logarithm of the term on the left is
1 Pn

z(w)−z(wi ) 1
R2π z(w)−z(eiθ )
2. lim n+1 i=o log w−wi = 2π log w−eiθ dθ

n→∞
o
and the limit is uniform for w in a compact set in |w| > 1.
z(w) − z(ζ)
Now is an analytic function of ζ for |ζ| > 1 and fixed w,
w−ζ
including ζ = ∞, ζ = w. Hence the integral in (2) is equal to

z(w) − z(ζ)
lim log = log A, say
ζ→∞ w−ζ
(we have only to make the substitution ζ → 1/ζ and use the Poison
integral).
From (1), it follows that
n
Q 1
n 1 |z(w) − z(wi )| (n+1)
Y |z(w) − z(wi )| (n+1)
i=0
lim = lim
n→∞
i=o
|w − wi |1/(n+1) n→∞ |wn+1 − 1|1/(n+1)
66 BIBLIOGRAPHY
n
Q 1
lim |z(w) − z(wi ) (n+1)
n→∞ i=0
= =A
|w|
and the lemma follows on substituting w = w(z).
67 Proof of Theorem. Let CR (R > 1) denote the image under z(w) of the
circle |w| = R; we can choose R > 1 such that f is analytic inside and on
CR . Let 1 < r1 < r2 < R; and put
n
Y
πn (z) = (z − α(n)
i ).
i=o
We have Z
1 πn (z) f (t)
f (z) − pn (z) = dt.
2πi πn (t) t − z
Cr2
If z is on Cr1 and t on Cr2 ,

1
πn (z) n+1 r1
lim = (by the lemma)
n→∞ πn (t) r2
1
n+1 r1
so that lim sup | f (z) − pn (z)| ≤ < 1.
n→∞ z∈C
r1
r2
Consequently f (z) − pn (z) → o uniformly for z on Cr1 .

The theorem follows at once from the maximal modulus principle.
3 Best Approximation
In this section we shall consider the problem of best approximation.
68 Let K be a compact set containing infinitely many points and f (z) a
continuous function on K. Our aim is to prove the existence and unique-
ness of a polynomial pn (z) of degree n such that

d( f, pn ) = sup f (z) − pn (z)

z∈K
is least. in general, of course, this minimum d( f, pn ) does not tend to

zero as n → ∞.
Existence of a polynomial of best approximation.
Let Pn be the family of all polynomials of degree ≤ n, and let f (z)
be a continuous function on the compact set K. Let
d( f ) = d = inf d( f, p) = inf (sup | f (z) − p(z)|).

p∈Pn p∈Pn z∈K
Then we have the
THEOREM D. There exists a p ∈ Pn with d( f, p) = d.
Proof. Any polynomial p ∈ Pn takes values 0 or 1 at n points at most.

Hence Pn is a quasi-normal family of order n, (theorem of Montel, see
[1] p. 67) i.e., given a sequence pν of polynomials in Pn , there is a
subsequence pνk and n points zi such that pνk converges, uniformly on
every compact set not containing the zi , either to a finite limit function
or to ∞. In the first case it is clear that pνk converges uniformly on any
compact set (which may contain some of the zi ).
There is a sequence p(ν) of polynomials of Pn so that d( f, p(ν) ) → d.

Then, clearly, if z ∈ K, |p(ν) (z)| ≤ d + 1 + sup | f (ζ)| for large ν (we may 69
ζ∈K
suppose that holds for all ν). Let p(νk ) be a subsequence converging out-
side n points zi , uniformly on compact sets. Since K contains infinitely
many points there are points of K not equal to any zi , and it these points
z, |p(νk ) (z)| is bounded. Hence the limit outside zi is finite and conse-
quently, p(νk ) converges uniformly on any compact set. From Cauchy’s
inequality, it follows then that the corresponding co-efficients of p(νk )
converge, so that lim p(νk ) (z) = p(z) ∈ Pn . Then we have
k→∞
d ≤ d( f, p) ≤ d( f, p(νk ) ) + d(p(νk ) , p) → d
so that d( f, p) = d.
[If K contains a circle |z − a| < r, r > o, the existence of a sequence
p(νk ) converging uniformly on any compact set follows at once, as in the
case (Ch.II, T h.4, p.14) if we use the Cauchy inequalities.]
68 BIBLIOGRAPHY
Uniqueness of the polynomial of best approximation.

We shall deduce the uniqueness fro the following theorem, as in the
case of a real variable.
Let p ∈ Pn satisfy d( f, p) = d( f ). Then | f (z) − p(z)| attains its
maximum at atleast n + 2 distinct points of K.
(The proof is similar in principle to the proof of T h.5, p.14).
Proof. Suppose that f (z)−p(z) = g(z) attains its maximum modulus at m

points (m ≤ n + 1)z1 , . . . , zm of K. Then, we can construct a polynomial
70 q(z) of degree n such that q(zi ) = g(zi ). Given ε > o, we can find δ > 0
so that if
|ζ1 − ζ2 | < δ, |g(ζ1 ) − g(ζ2 )| < ε, |q(ζ1 ) − q(ζ2 )| < ε
Let K 1 be the set obtained from K by removing the points of the

(open) discs |z − zi | < δ. Then
sup |g(z)| = d1 < d = sup |g(z)|.

z∈K 1 z∈K
Let 1 > η > o be sufficiently small. Consider g(z) − ηq(z); then for
|z − zi | < δ, |g(z) − g(zi )| < ε, |ηq(z) − ηq(zi )| < ηε, so that |g(z) − ηq(z)| =
|η(g(zi ) − g(z)) + ηg(z) − g(z) + η(q(z) − q(zi ))| (since q(zi ) = g(zi )) <
ηε + ηε + (1 − η)d < d if 2ε < d
If we choose η so small that
sup |g(z) − ηq(z)| < d

z∈K 1
we have sup |g(z) − ηq(z)| < d
z∈K
and d( f, p + ηq) < d, contradicting the definition of d.
THEOREM E. The polynomial of degree ≤ n of best approximation is

unique.
1
Proof. Let d( f, p) = d = d( f, q); let r(z) = (p(z) + q(z)).
2
Then
1 1
| f (z) − r(z)| = | ( f (z) − p(z)) + ( f (z) − q(z))| ≤ d
2 2
Let z1 , . . . , zn+2 be points at which | f (zi ) − r(zi )| = d. Then, unless 71
1
f (zi ) − p(zi ) = f (zi ) − q(zi ) = wi with |wi | = d, | f (zi ) − p(zi ) + f (zi ) −
2
q(zi )| < d. Hence, p(z) and q(z) take the same value at the n + 2 points
zi and since they are polynomials of degree n, p(z) = q(z).
Bibliography
[1] Montel, P. : Lecons sur les familles normales de fonctions analy-

tiques et leurs applications, Gauthiers-Villars, Paris, 1927
[2] Walsh,J.L. : Approximation by polynomials in the complex do-

main, Mem. des Sciences Mathem., Paris, 1935
[3] Walsh, J.L. : Interpolation and approximation by rational func-

tions in the complex domain, Amer. Math. Soc. Colloquium Pub-
lications, Vol.XX, 1935.
71

Lectures On Approximation by Polynomials: J.G. Burkill

Uploaded by

Copyright:

Available Formats

Lectures On Approximation by Polynomials: J.G. Burkill

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lectures On Approximation by Polynomials: J.G. Burkill

Uploaded by

Copyright:

Available Formats

Lectures On

No part of this book may be reproduced in any

Tata Institute of Fundamental Research,

2 The Polynomial of Best Approximation... 11

6 Approximation in Terms of Differences 51

sup | f (x) − p(x)|,

where the sup. is taken over a ≤ x ≤ b.

There are other useful ways of defining a ‘distance’ between f (x)

converges uniformly to f (x).

2 Singular Integrals and Landau’s Proof

Bn (x) has degree n (at most).

Proof of theorem 2. Given ε, there is δ such that | f (x1 ) − f (x2 )| < ε if

≤ ε + M/2nδ2 . Choose n > M/2εδ2

Remarks on Bernstein polynomials.

(2) In questions of polynomial approximation, it is a disadvantage that

For most of the useful systems of polynomials, the approximation

1. In the Lemma of §1, prove that the p polynomial consisting of the

3. Theorem 1 can be extended to a function of two (or more) variables,

4. If f ′ (x) is continuous, then

A similar result for the Landau integral.

5. Readers who like to place theorems on analysis in an abstract set-

11 1. All the cn are positive n ≥ 2). Error is greatest when u = 1, i.e.,

or (with some labour extend Bernstein’s, as in P. L. Butzer, Cana-

4 The Lagrange Polynomial

If the ci are the values at xi of a function f (x), we call p(x) the

Lemma 1. Given an aggregate of polynomials pα (x) of degree at most

Then, if aα,r is the coefficient of xr in pα (x),

Proof. Write pα (xi ) for ci . We have a B depending only on the xi .

13 Lemma 2. (For brevity of expression, we translate de la Vallee Poussin,

d(p, f ) = sup | f (x) − p(x)| f or a ≤ x ≤ b.

Let d = dn = dn ( f ) = inf d(p, f ) for all p in Pn . Then d ≥ 0.

and Theorem 1 asserts that lim dn = 0.

sup | f (x) − p(x)| = d(= dn ).

By definition of d, there is a polynomial pm with

Proof. g(x) = f (x) − p(x) is continuous. Divide (a, b) into sub-intervals 15

sup |g(x)| = d′ (say) < d.

16 Choose η to make |ηh(x)| < d − d′ (a ≤ x ≤ b), now,

In the l′ s, this is less than d, since g, h have the same sign.

|g − ηh| ≤ |g| + |ηh| < d′ + (d − d′ ) = d.

So p + ηh approximates better to f than p does.

Theorem 6. The polynomial p(x) of Theorem 4 is unique.

B. Find the best approximation in Pn−1 to xn in (−1, 1).

18 From Theorem 7 (Corollary) we wish to find a pn (x) which takes the

pn (x) = d cos nθ where x = cos θ.

It is worth while to give Chebyshev’s own proof of this, which does

Theorem 8. Among all pn (x) with coefficient of xn equal to 1, the poly-

Proof. let p(x) = xn +· · · be the required polynomial, and d = sup |p(x)|.

By Theorem 5, there are n+ 1 values of x (at least) where p(x) = ±d.

which is an interior point, p(x) has a maximum or minimum and so

have the same zeros, namely, 1, −1 and each of x1 , x2 , . . . , xn−1 doubly.

Solving this differential equation for p we find, putting x = cos θ,

Since p(x) is a polynomial, C = 0. But

cos nθ = 2n−1 cosn θ + lower powers of cos θ,

The polynomials revealed by Theorem 8 are named after Chebyshev 19

T n (x) = cos(n arc cos x).

The early members of the sequence are

T 0 (x) = 1, T 1 (x) = x T 2 (x) = 2x2 − 1

Their mode of definition is restricted to (−1, 1) and it is in that inter-

(2) T n (x) is the coefficient of tn in the expansion of the generating func-