Ineq Lagrange PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Gabriel

c Nagy

Inequalities Via Lagrange Multipliers

Many (classical) inequalities can be proven by setting up and solving certain optimization
problems. In turn, such optimization problems can be handled using the method of Lagrange
Multipliers (see the Theorem 2 below).

A. Compactness (in RN )
When solving optimization problems, the following notions are extremely important.
Definitions. Suppose N is a positive integer, and A is a non-empty subset in RN .
A. We say that A is bounded, if there exists some constant m, such that: kak ≤ m, for all
a in A.

B. We say that A is closed, if whenever (an )∞ n=1 is a sequence of points, which converges
to some x, it follows that x itself belongs to A.

C. We say that A is compact, if it is both closed and bounded.


Comments. The norm k . k mentioned in Definition A can be any norm1 . Two frequently
used choices are
q
kxk2 = x21 + · · · + x2N ,
kxk∞ = max{|x1 |, . . . , |xN |},
x = (x1 , . . . , xN ) ∈ RN

For a sequence (an )∞ N ∞


n=1 of vectors in R , the condition that (an )n=1 converges to x ∈ R
N

means that limn→∞ kan − xk = 0, were k . k is any norm on RN .


The above definition of compactness is ad-hoc, and specific to RN . Another important
characterization, specific to metric spaces, is the following: every sequence in A has a sub-
sequence which converges to a point in A. (The fact that this characterization is equivalent
to the condition stated in Definition C follows from the well-known property of bounded
sequences: every bounded sequence in R has a convergent sub-sequence. Using this property,
one can easily show that the same statement is true with R replaced with RN .)
The most general characterization of compactness (for so-called topological spaces) is
beyond the scope of these notes.
Compactness is an essential property that is sought when attacking optimization prob-
lems, as suggested by the following results.
Theorem 1. If A ⊂ RN is compact, and f : A → RM is continuous, then f (A) is
compact.
1
On RN any two norms k . k and k . k0 are equivalent, in the sense that there exists positive constants C
and D, such that Ckxk ≤ kxk0 ≤ Dkxk, for every x.

1
Corollary. If A ⊂ RN is compact, and f : A → R is continuous, then there exist points
a0 , a1 ∈ A, such that: f (a0 ) ≤ f (a) ≤ f (a1 ), ∀ a ∈ A. (That is, f attains its maximum and
minimum values on A.)

B. Lagrange Multipliers
Theorem 2 (Lagrange). Assume g is some continuously differentiable real-valued func-
tion, defined on some (open) domain D in RN , and K is some real number, such that the
level set A = {x ∈ D | g(x) = K} has the following property:
(∗) for every a ∈ A the gradient (∇g)(a) is non-zero, i.e.
∂g ∂g 
(a), . . . , (a) 6= (0, . . . , 0).
∂x1 ∂xN
Suppose f is another real-valued continuously differentiable function, defined on D. If x ∈ A
is local maximum point for f on A, i.e.
(∗∗) there exists some neighborhood V of x in RN , such that
f (x) ≥ f (a), ∀ a ∈ A ∩ V,

then there exists some real number λ, such that (∇f )(x) = λ(∇g)(x), that is,
∂f ∂g
(x) = λ (x), ∀ j = 1, . . . , N.
∂xj ∂xj

Comments. A level set A satisfying condition (∗) is called a hyper-surface in RN . In


practice, “the candidates” for x satisfying (∗∗) are found by solving (or just narrowing down
the solution set of) the system of equations

∂f ∂g

 (x) = λ (x)
∂x ∂x

1 1



 ..

 .
∂f ∂g (1)
 (x) = λ (x)



 ∂xN ∂xN



g(x) = K

The system (1) has N + 1 equations and N + 1 unknowns: λ and the coordinates of x.
In our examples, the hyper-surface A will be compact, so the function f will have an
absolute maximum value on A.

C. Applications
Arithmetic Vs. Geometric Means Inequality. Given a1 , a2 , . . . , aN ≥ 0, one has
the inequality:
√ a1 + a2 + · · · + aN
N
a1 a2 · · · aN ≤ (2)
N
Moreover, one has equality in (2), if and only if a1 = a2 = · · · = aN .

2
Proof. It is clear that the statement does not change if all a’s are multiplied by a positive
constant. Therefore2 the above statement can be rephrased as follows.

(∗) Whenever a1 , a2 , . . . , aN ≥ 0 are such that a1 + a2 + · · · + aN = N , it follows that

a1 a2 · · · aN ≤ 1 (3)

Moreover, one has equality in (3), if and only if a1 = a2 = · · · = aN = 1.

We now restate (∗) as an optimization problem, using the substitution aj = x2j , so we consider

the sphere of radius N in RN , that is, the set

A = {x = (x1 , x2 , . . . , xN ) ∈ RN | x21 + · · · + x2N = N },

which is a level set for the function g(x1 , . . . , xN ) = x21 + · · · + x2N . Of course, the gradient
of g is: (∇g)(x1 , . . . , xN ) = (2x1 , . . . , 2xN ), which never vanishes on A, so A is indeed a
hyper-surface. Note also that A is compact. The objective function is f : A → R defined by

f (x) = x1 x2 · · · xN , x = (x1 , x2 , . . . , xN ) ∈ A,

for which we must find the absolute maximum on A. Our ”guess” is the follwoing.

(**) The maximum value maxx∈A f (x) is equal to 1, and all points x = (x1 , x2 , . . . , xN ) ∈ A,
with f (x) = 1, satisfy the condition

|x1 | = |x2 | = · · · = |xN | = 1. (4)

The system of equations (1) reads (after replacing λ with λ/2)

x x · · · xN = λx1

 2 3



 x1 x3 · · · xN = λx2
 .
 ..



x1 · · · xbj · · · xN = λxj (5)


 .
..


x x · · · xN −1 = λxN


 12 2


x1 + · · · + x2N = N

(The notation xbj indicates that the factor xj is missing.)


Multiplying each one of the top N equations by its missing factor from the left-hand side,
yields: 
λx21 = λx22 = · · · = λx2N = x1 x2 · · · xN
(6)
x21 + · · · + x2N = N
Now we see that every solution (λ, x) of (3) satisfies one of the following conditions:

(a) λ = 0, in which case f (x) = 0;


2
The trivial case when a1 = a2 = · · · = aN = 0 is omitted.

3
(b) λ 6= 0, in which case we have (using the last equation) x21 = x22 = · · · = x2N = 1, which
forces f (x) = ±1.
Clearly (a) does not yield a maximum. Case (b), however proves statement (∗∗).
1 1
Hölder’s Inequality. Let p, q > 1 be two real numbers satisfying + = 1. Given
p q
a1 , a2 , . . . , aN , b1 , b2 , . . . , bN ≥ 0, one has the inequality:
a1 b1 + a2 b2 + . . . aN bN ≤ [ap1 + ap2 + · · · + apN ]1/p · [bq1 + aq2 + · · · + bqN ]1/q (7)
Moreover, one has equality in (7), if and only the N -tuples (ap1 , ap2 , . . . , apN ) and (bq1 , bq2 , . . . , bqN )
are proportional.
Proof. It is clear that the statement does not change if all a’s are multiplied by a positive
constant and all the b’s are multiplied by another positive constant. Therefore3 the above
statement can be rephrased as follows.
(∗) Whenever a1 , a2 , . . . , aN , b1 , b2 , . . . , bN ≥ 0 are such that
ap1 + ap2 + · · · + apN = bq1 + bq2 + · · · + bqn = 1,
it follows that
a1 b1 + a2 b2 + . . . aN bn ≤ 1 (8)
Moreover, one has equality in (8), if and only if apj = bqj , for all j = 1, 2, . . . , N .
We now restate (∗) as an optimization problem, as follows. We first fix a1 , a2 , . . . , aN ≥ 0,
such that ap1 + ap2 + · · · + apN = 1. Second (with the substitutions bj = x2j in mind) we consider
the function g : RN → R defined by
g(x) = (x21 )q + (x22 )q + · · · + (x2N )q , x = (x1 , x2 , . . . , xN ) ∈ RN .
Note that the definition of g involves the “funny” function4 φ(t) = (t2 )q , whose derivative is
(2q)(t2 )q


 if t 6= 0
φ0 (t) = t (9)


0 if t = 0
In particular, φ is continuously differentiable, and so is g, whose gradient is
(∇g)(x) = φ0 (x1 ), φ0 (x2 ), . . . , φ0 (xN ) , x = (x1 , x2 , . . . , xN ) ∈ RN .

(10)
Consider now the set A = {x ∈ RN | g(x) = 1}. Since for every x = (x1 , x2 , . . . , xN ) ∈ A,
at least one coordinate is non-zero, using (10) and (9) we see that (∇g)(x) is non-zero, for
each x ∈ A, so A is a hyper-surface. Obviously A is also compact. Our objective function is
f : A → R defined by
f (x) = a1 x21 + a2 x22 + . . . an x2N , x = (x1 , x2 , . . . , xN ) ∈ A,
for which we must find the absolute maximum on A. Our ”guess” is the following.
3
The trivial case when a1 = a2 = · · · = aN = b1 = b2 = · · · = bN = 0 is omitted.
4
One might be tempted to write φ( t) = t2q , but this would be incorrect, since we allow t to be negative.

4
(∗∗) The maximum value maxx∈A f (x) is equal to 1, and all points x = (x1 , x2 , . . . , xN ) ∈ A,
with f (x) = 1, satisfy the condition

(x2j )q = apj , ∀ j = 1, 2, . . . , N. (11)

If we write down the system of equations (1) for this particular setting, it reads:


 2a1 x1 = λφ0 (x1 )
0
 2a2 x2 = λφ (x2 )



.. (12)
.
2aN xN = λφ0 (xN )




 (x2 )q + · · · + (x2 )q = 1

1 N

Fix a solution (λ, x) of (12). If we multiply the j th equation by xj , then using (9) we obtain
the system 

 a1 x21 = (λq)(x21 )q
2 2 q
 a2 x2 = (λq)(x2 )



.. (13)
.
2 2 q

 aN x = (λq)(x )
 (x2 )qN+ · · · + (xN2 )q = 1



1 N

so upon adding the top N equations, and using the bottom we one, we obtain

f (x) = a1 x21 + a2 x22 + · · · + aN x2N = λq[(x21 )q + · · · + (x2N )q ] = λq. (14)

Going back to the system (13), let j1 , . . . , jm be the indices for which xj 6= 0, so that for
each k = 1, . . . , m, the jk th equation yields

ajk = (λq)(x2jk )q−1 .

1 1
Taking then the pth power (and using the identity + = 1, which gives p(q − 1) = q) we
p q
now obtain
apjk = (λq)p (x2jk )q , ∀ k = 1, . . . , m. (15)
If we sum up the above identities (and use the fact that xj = 0, for all j 6∈ {j1 , . . . , jm }),
then by the last equation in (12) we obtain

apj1 + · · · + apjm = (λq)p [(x2j1 )q + · · · + (x2jm )q ] = (λq)p [(x21 )q + (x22 )q + · · · + (x2N )q ] = (λq)p ,

so using the assumption on the a’s we will get:

(λq)p = apj1 + · · · + apjm ≤ ap1 + ap2 + · · · + apN = 1. (16)

This of course yields λq ≤ 1, so going back to (14) we obtain f (x) ≤ 1. Up to this point we
have shown that maxx∈A f (x) ≤ 1, which, by the way, proves (8).
To finish the proof of (∗∗) we need to:

5
(a) show that maxx∈A f (x) = 1, and

(b) show that any x ∈ A with f (x) = 1 satisfies (11).

We begin with assertion (b). If f (x) = 1, then there is some λ such that (λ, x) is a solution
of (12). As we have seen in (14), this forces λq = 1, so going back to (16) this also forces
the inequality there to be an equality. This would mean, of course, that aj = 0(= xj ),
∀ j 6∈ {j1 , . . . , jm }, so by (15) the desired conclusion (11) follows. To prove the above
assertion (a), all we have to do is to produce one x ∈ A, with f (x) = 1. Such an x, however,
p

can be easily defined by xj = aj2q .


Hölder’s Inequality for Complex Numbers. Let p, q > 1 be two real numbers
1 1
satisfying + = 1. Given x1 , x2 , . . . , xN , y1 , y2 , . . . , yN ∈ C, one has the inequality:
p q
1/p  q 1/q
|x1 y1 + x2 y2 + . . . xN yN ≤ |x1 |p + |x2 |p + · · · + |xN |p · |y1 | + |y2 |q + · · · + |yN |q

(17)

Moreover, equality holds in (17), if and only the N -tuples (x1 |x1 |p−1 , x2 |x2 |p−1 , . . . , xN |xN |p−1 )
and (y 1 |y1 |q−1 , y 2 |y2 |q−1 , . . . , y N |yN |q−1 ) are proportional.
Proof. Without loss of generality (multiply, if necessary all the x’s by a positive constant,
and multiply the all the y’s by another positive constant), we can assume that

|x1 |p + |x2 |p + · · · + |xN |p = |y1 |q + |y2 |q + · · · + |yN |q . (18)

The inequality (17) follows immediately from (7), applied with aj = |xj | and bj = |yj |:

|x1 y1 + x2 y2 + · · · + xN yN | ≤ |x1 | · |y1 | + |x2 | · |y2 | + · · · + |xN | · |yN | ≤ (19)


1/p  q 1/q
≤ |x1 |p + |x2 |p + · · · + |xN |p · |y1 | + |y2 |q + · · · + |yN |q

(20)

Assume now we have equality in (17), which forces equality in both inequalities from (19)
and (20). First of all, by the preceding result, the equality in (20) forces the sequences
(ap1 , ap2 , . . . , apN ) and (bq1 , bq2 , . . . , bqN ), so by (18) we have in fact the equalities |yj |q = |xj |p ,
∀ j = 1, . . . , N , so if we denote this common value by cj , we have
1/p 1/q
|xj | = cj and |yj | = cj , ∀ j = 1, . . . , N. (21)

. Secondly, let us observe that the fact that we have equality in the first inequality in (19)
is an instance of equality in the “polygon” inequality

|z1 + z2 + · · · + zN | ≤ |z1 | + |z2 | + · · · + |zN |, (22)

and we know that equality holds in (22) if and only if all z’s are on a ray, i.e. a set of the
form Rw = {ρw | ρ ≥ 0}, for some w ∈ C, with |w| = 1, which means that

zj = w|zj |, ∀ j = 1, . . . , N. (23)

6
If we apply this to zj = xj yj , using (21) it follows that we have the equalities xj yj = wcj , so
upon after multiplying by y j we now have xj |yj |2 = wy j cj , i.e.
2/q
xj cj = wy j cj , ∀ j = 1, . . . , N. (24)
1/q 1−1/q 1/q 1−1/p
Dividing by cj (in the case when cj 6= 0) yields wy j cj = xj cj = xj cj , which by
(21) reads:
wy j |yj |q−1 = xj |xj |p−1 , ∀ j = 1, . . . , N. (25)
Strictly speaking, (25) was obtained only when cj 6= 0, but is is trivially satisfied if cj = 0,
in which case we have xj = yj = 0.
Comment. The special case of Hölder’s Inequality that corresponds to p = q = 2 is
known as the Cauchy-Bunyakovsky-Schwartz Inequality. In this special case the inequality
reads
1/2  2 1/2
|x1 y1 + x2 y2 + . . . xN yN | ≤ |x1 |2 + |x2 |2 + · · · + |xN |2 · |y1 | + |y2 |2 + · · · + |yN |2

, (26)

and equality holds in (26), if and only if the N -tuples (x1 , x2 , . . . , xN ) and (y 1 , y 2 , . . . , y N )
are proportional.
Indeed, in the case p = q = 2, if we argue as above (with same notations), under the
assumption (18), the equality in (26) will yield the equality (24), which in this case reads
xj cj = wy j cj , thus forcing directly: xj = wy j , ∀ j = 1, . . . , N .

You might also like