Ineq Lagrange PDF
Ineq Lagrange PDF
Ineq Lagrange PDF
c Nagy
Many (classical) inequalities can be proven by setting up and solving certain optimization
problems. In turn, such optimization problems can be handled using the method of Lagrange
Multipliers (see the Theorem 2 below).
A. Compactness (in RN )
When solving optimization problems, the following notions are extremely important.
Definitions. Suppose N is a positive integer, and A is a non-empty subset in RN .
A. We say that A is bounded, if there exists some constant m, such that: kak ≤ m, for all
a in A.
B. We say that A is closed, if whenever (an )∞ n=1 is a sequence of points, which converges
to some x, it follows that x itself belongs to A.
1
Corollary. If A ⊂ RN is compact, and f : A → R is continuous, then there exist points
a0 , a1 ∈ A, such that: f (a0 ) ≤ f (a) ≤ f (a1 ), ∀ a ∈ A. (That is, f attains its maximum and
minimum values on A.)
B. Lagrange Multipliers
Theorem 2 (Lagrange). Assume g is some continuously differentiable real-valued func-
tion, defined on some (open) domain D in RN , and K is some real number, such that the
level set A = {x ∈ D | g(x) = K} has the following property:
(∗) for every a ∈ A the gradient (∇g)(a) is non-zero, i.e.
∂g ∂g
(a), . . . , (a) 6= (0, . . . , 0).
∂x1 ∂xN
Suppose f is another real-valued continuously differentiable function, defined on D. If x ∈ A
is local maximum point for f on A, i.e.
(∗∗) there exists some neighborhood V of x in RN , such that
f (x) ≥ f (a), ∀ a ∈ A ∩ V,
then there exists some real number λ, such that (∇f )(x) = λ(∇g)(x), that is,
∂f ∂g
(x) = λ (x), ∀ j = 1, . . . , N.
∂xj ∂xj
The system (1) has N + 1 equations and N + 1 unknowns: λ and the coordinates of x.
In our examples, the hyper-surface A will be compact, so the function f will have an
absolute maximum value on A.
C. Applications
Arithmetic Vs. Geometric Means Inequality. Given a1 , a2 , . . . , aN ≥ 0, one has
the inequality:
√ a1 + a2 + · · · + aN
N
a1 a2 · · · aN ≤ (2)
N
Moreover, one has equality in (2), if and only if a1 = a2 = · · · = aN .
2
Proof. It is clear that the statement does not change if all a’s are multiplied by a positive
constant. Therefore2 the above statement can be rephrased as follows.
a1 a2 · · · aN ≤ 1 (3)
We now restate (∗) as an optimization problem, using the substitution aj = x2j , so we consider
√
the sphere of radius N in RN , that is, the set
which is a level set for the function g(x1 , . . . , xN ) = x21 + · · · + x2N . Of course, the gradient
of g is: (∇g)(x1 , . . . , xN ) = (2x1 , . . . , 2xN ), which never vanishes on A, so A is indeed a
hyper-surface. Note also that A is compact. The objective function is f : A → R defined by
f (x) = x1 x2 · · · xN , x = (x1 , x2 , . . . , xN ) ∈ A,
for which we must find the absolute maximum on A. Our ”guess” is the follwoing.
(**) The maximum value maxx∈A f (x) is equal to 1, and all points x = (x1 , x2 , . . . , xN ) ∈ A,
with f (x) = 1, satisfy the condition
x x · · · xN = λx1
2 3
x1 x3 · · · xN = λx2
.
..
x1 · · · xbj · · · xN = λxj (5)
.
..
x x · · · xN −1 = λxN
12 2
x1 + · · · + x2N = N
3
(b) λ 6= 0, in which case we have (using the last equation) x21 = x22 = · · · = x2N = 1, which
forces f (x) = ±1.
Clearly (a) does not yield a maximum. Case (b), however proves statement (∗∗).
1 1
Hölder’s Inequality. Let p, q > 1 be two real numbers satisfying + = 1. Given
p q
a1 , a2 , . . . , aN , b1 , b2 , . . . , bN ≥ 0, one has the inequality:
a1 b1 + a2 b2 + . . . aN bN ≤ [ap1 + ap2 + · · · + apN ]1/p · [bq1 + aq2 + · · · + bqN ]1/q (7)
Moreover, one has equality in (7), if and only the N -tuples (ap1 , ap2 , . . . , apN ) and (bq1 , bq2 , . . . , bqN )
are proportional.
Proof. It is clear that the statement does not change if all a’s are multiplied by a positive
constant and all the b’s are multiplied by another positive constant. Therefore3 the above
statement can be rephrased as follows.
(∗) Whenever a1 , a2 , . . . , aN , b1 , b2 , . . . , bN ≥ 0 are such that
ap1 + ap2 + · · · + apN = bq1 + bq2 + · · · + bqn = 1,
it follows that
a1 b1 + a2 b2 + . . . aN bn ≤ 1 (8)
Moreover, one has equality in (8), if and only if apj = bqj , for all j = 1, 2, . . . , N .
We now restate (∗) as an optimization problem, as follows. We first fix a1 , a2 , . . . , aN ≥ 0,
such that ap1 + ap2 + · · · + apN = 1. Second (with the substitutions bj = x2j in mind) we consider
the function g : RN → R defined by
g(x) = (x21 )q + (x22 )q + · · · + (x2N )q , x = (x1 , x2 , . . . , xN ) ∈ RN .
Note that the definition of g involves the “funny” function4 φ(t) = (t2 )q , whose derivative is
(2q)(t2 )q
if t 6= 0
φ0 (t) = t (9)
0 if t = 0
In particular, φ is continuously differentiable, and so is g, whose gradient is
(∇g)(x) = φ0 (x1 ), φ0 (x2 ), . . . , φ0 (xN ) , x = (x1 , x2 , . . . , xN ) ∈ RN .
(10)
Consider now the set A = {x ∈ RN | g(x) = 1}. Since for every x = (x1 , x2 , . . . , xN ) ∈ A,
at least one coordinate is non-zero, using (10) and (9) we see that (∇g)(x) is non-zero, for
each x ∈ A, so A is a hyper-surface. Obviously A is also compact. Our objective function is
f : A → R defined by
f (x) = a1 x21 + a2 x22 + . . . an x2N , x = (x1 , x2 , . . . , xN ) ∈ A,
for which we must find the absolute maximum on A. Our ”guess” is the following.
3
The trivial case when a1 = a2 = · · · = aN = b1 = b2 = · · · = bN = 0 is omitted.
4
One might be tempted to write φ( t) = t2q , but this would be incorrect, since we allow t to be negative.
4
(∗∗) The maximum value maxx∈A f (x) is equal to 1, and all points x = (x1 , x2 , . . . , xN ) ∈ A,
with f (x) = 1, satisfy the condition
If we write down the system of equations (1) for this particular setting, it reads:
2a1 x1 = λφ0 (x1 )
0
2a2 x2 = λφ (x2 )
.. (12)
.
2aN xN = λφ0 (xN )
(x2 )q + · · · + (x2 )q = 1
1 N
Fix a solution (λ, x) of (12). If we multiply the j th equation by xj , then using (9) we obtain
the system
a1 x21 = (λq)(x21 )q
2 2 q
a2 x2 = (λq)(x2 )
.. (13)
.
2 2 q
aN x = (λq)(x )
(x2 )qN+ · · · + (xN2 )q = 1
1 N
so upon adding the top N equations, and using the bottom we one, we obtain
Going back to the system (13), let j1 , . . . , jm be the indices for which xj 6= 0, so that for
each k = 1, . . . , m, the jk th equation yields
1 1
Taking then the pth power (and using the identity + = 1, which gives p(q − 1) = q) we
p q
now obtain
apjk = (λq)p (x2jk )q , ∀ k = 1, . . . , m. (15)
If we sum up the above identities (and use the fact that xj = 0, for all j 6∈ {j1 , . . . , jm }),
then by the last equation in (12) we obtain
apj1 + · · · + apjm = (λq)p [(x2j1 )q + · · · + (x2jm )q ] = (λq)p [(x21 )q + (x22 )q + · · · + (x2N )q ] = (λq)p ,
This of course yields λq ≤ 1, so going back to (14) we obtain f (x) ≤ 1. Up to this point we
have shown that maxx∈A f (x) ≤ 1, which, by the way, proves (8).
To finish the proof of (∗∗) we need to:
5
(a) show that maxx∈A f (x) = 1, and
We begin with assertion (b). If f (x) = 1, then there is some λ such that (λ, x) is a solution
of (12). As we have seen in (14), this forces λq = 1, so going back to (16) this also forces
the inequality there to be an equality. This would mean, of course, that aj = 0(= xj ),
∀ j 6∈ {j1 , . . . , jm }, so by (15) the desired conclusion (11) follows. To prove the above
assertion (a), all we have to do is to produce one x ∈ A, with f (x) = 1. Such an x, however,
p
Moreover, equality holds in (17), if and only the N -tuples (x1 |x1 |p−1 , x2 |x2 |p−1 , . . . , xN |xN |p−1 )
and (y 1 |y1 |q−1 , y 2 |y2 |q−1 , . . . , y N |yN |q−1 ) are proportional.
Proof. Without loss of generality (multiply, if necessary all the x’s by a positive constant,
and multiply the all the y’s by another positive constant), we can assume that
The inequality (17) follows immediately from (7), applied with aj = |xj | and bj = |yj |:
Assume now we have equality in (17), which forces equality in both inequalities from (19)
and (20). First of all, by the preceding result, the equality in (20) forces the sequences
(ap1 , ap2 , . . . , apN ) and (bq1 , bq2 , . . . , bqN ), so by (18) we have in fact the equalities |yj |q = |xj |p ,
∀ j = 1, . . . , N , so if we denote this common value by cj , we have
1/p 1/q
|xj | = cj and |yj | = cj , ∀ j = 1, . . . , N. (21)
. Secondly, let us observe that the fact that we have equality in the first inequality in (19)
is an instance of equality in the “polygon” inequality
and we know that equality holds in (22) if and only if all z’s are on a ray, i.e. a set of the
form Rw = {ρw | ρ ≥ 0}, for some w ∈ C, with |w| = 1, which means that
zj = w|zj |, ∀ j = 1, . . . , N. (23)
6
If we apply this to zj = xj yj , using (21) it follows that we have the equalities xj yj = wcj , so
upon after multiplying by y j we now have xj |yj |2 = wy j cj , i.e.
2/q
xj cj = wy j cj , ∀ j = 1, . . . , N. (24)
1/q 1−1/q 1/q 1−1/p
Dividing by cj (in the case when cj 6= 0) yields wy j cj = xj cj = xj cj , which by
(21) reads:
wy j |yj |q−1 = xj |xj |p−1 , ∀ j = 1, . . . , N. (25)
Strictly speaking, (25) was obtained only when cj 6= 0, but is is trivially satisfied if cj = 0,
in which case we have xj = yj = 0.
Comment. The special case of Hölder’s Inequality that corresponds to p = q = 2 is
known as the Cauchy-Bunyakovsky-Schwartz Inequality. In this special case the inequality
reads
1/2 2 1/2
|x1 y1 + x2 y2 + . . . xN yN | ≤ |x1 |2 + |x2 |2 + · · · + |xN |2 · |y1 | + |y2 |2 + · · · + |yN |2
, (26)
and equality holds in (26), if and only if the N -tuples (x1 , x2 , . . . , xN ) and (y 1 , y 2 , . . . , y N )
are proportional.
Indeed, in the case p = q = 2, if we argue as above (with same notations), under the
assumption (18), the equality in (26) will yield the equality (24), which in this case reads
xj cj = wy j cj , thus forcing directly: xj = wy j , ∀ j = 1, . . . , N .