EC400 Optimisation and Fixed Points Course Pack

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

London School of Economics Dr Francesco Nava

Department of Economics O¢ ce: 32L.3.20

EC400 –Optimization and Fixed Points


Syllabus 2020/21

The course is based on three recorded lectures (available on Moodle), on three synchronous Q&A
sessions (approximately 1 hour long) and on three classes (2.5 hours each). Lectures revise essential
mathematical notions for static optimization and …xed point theorems. Classes show how to apply
these notions to solve relevant problems.

Course Outline

Lecture 1: Tools for Optimization

Quadratic Forms

Determinant

Taylor’s Expansion

Concavity and Convexity

Quasi-Concavity and Quasi-Convexity

Lecture 2: Optimization

Unconstrained Optimization

Constrained Optimization: Equality Constraints

Lagrange Theorem

Constrained Optimization: Inequality Constraints

Kuhn-Tucker Theorem

Arrow-Enthoven Theorem

Lecture 3: Comparative Statics and Fixed Points

Envelope Theorem

Implicit Function Theorem

Correspondences

Fixed Point Theorems

Theorem of the Maximum


Suggested Textbooks

Simon & Blume, Mathematics for Economists, Norton, 1994.

Takayama, Mathematical Economics, Cambridge University Press, 1985.

Chiang & Wainwright, Fundamental Methods of Mathematical Economics, McGrawHill, 2005.

Dixit, Optimization in Economic Theory, Oxford University Press, 1990.

Ok, Real Analysis with Economic Applications, Princeton University Press, 2007.

Chiang, Elements of Dynamic Optimization, McGrawHill, 1992.

Kamien & Schwartz, Dynamic Optimization, Elsevier Science, 1991.

Exercises

Three problem sets to test background and preparation.

Access to past exam paper to prepare for the exam.

2
London School of Economics Dr Francesco Nava
Department of Economics O¢ ce: 32L.3.20

EC400 –Math for Microeconomics


Lecture Notes 2020/21

Course Outline

Section 1: Tools for Optimization

Quadratic Forms

Determinant

Taylor’s Expansion

Concavity and Convexity

Quasi-Concavity and Quasi-Convexity

Section 2: Optimization

Unconstrained Optimization

Constrained Optimization: Equality Constraints

Lagrange Theorem

Constrained Optimization: Inequality Constraints

Kuhn-Tucker Theorem

Arrow-Enthoven Theorem

Section 3: Comparative Statics and Fixed Points

Envelope Theorem

Implicit Function Theorem

Correspondences

Fixed Point Theorems

Theorem of the Maximum


Section 1: Tools for Optimization
Quadratic Forms and Their Determinant

De…ning Quadratic Forms

Quadratic forms are useful and important because: (i) the simplest functions after linear ones;
(ii) conditions for optimization techniques are stated in terms of quadratic forms; (iii) relevant
economic optimization problems have a quadratic objective function, such as risk minimization
problems in …nance (where riskiness is measured by the quadratic variance of the returns from
investments).
De…nition: A quadratic form on Rn is a real valued function:
X
Q(x1 ; x2 ; :::; xn ) = aij xi xj
i j

Among all functions of one variable, the simplest functions with a unique global extremum are
the pure quadratic forms: x2 and x2 . The level curve of a general quadratic form in R2 is

a11 x21 + a12 x1 x2 + a22 x22 = b

such curve can take the form of an ellipse, a hyperbola, a pair of lines, or the empty set. A a
quadratic form in R2 a11 x21 + a12 x1 x2 + a22 x22 can always be written as

a11 a12 x1
x1 x2 = xT Ax.
0 a22 x2

Every quadratic form can be represented as

Q(x) = xT Ax

where A is a (unique) symmetric matrix,


0 1
a11 a12 =2 ::: a1n =2
B a21 =2 a22 ::: a2n =2 C
A=B @ :::
C.
A
::: ::: :::
an1 =2 an2 =2 ::: ann

Conversely if A is a symmetric matrix, then the real valued function Q(x) = xT Ax, is a
quadratic form.

2
De…niteness of Quadratic Forms

Remark: Any quadratic form always takes the value 0 when x = 0.


We want to know whether x = 0 is a maximum, a minimum, or neither.
For example for the quadratic form ax2 :

1. if a > 0, ax2 is non negative and equal to 0 only when x = 0 which implies that the
quadratic form is positive de…nite and that x = 0 is a global minimizer;

2. if a < 0, then the function is negative de…nite and x = 0 is a global maximizer.

In two dimensions there are additional intermediate cases, the quadratic form is:

1. positive de…nite when it is always non-negative, and positive for all x 6= 0, eg:

x21 + x22 ;

2. negative de…nite when it is always non-positive, and negative for all x 6= 0, eg:

(x1 + x2 );

3. inde…nite when it can take both positive and negative values, eg:

x21 x22 ;

4. positive semide…nite when it is non-negative, but equal to zero for some x 6= 0, eg:

(x1 + x2 )2 ;

5. negative semide…nite when it is non-positive, but equal to zero for some x 6= 0, eg:

(x1 + x2 )2 .

We apply the same terminology for the symmetric matrix A and the implied quadratic form
Q(x) = xT Ax.
De…nition: Let A be an (n n) symmetric matrix. If so, A is:
(a) positive de…nite if xT Ax > 0 for all x 6= 0 in Rn ;
(b) positive semide…nite if xT Ax 0 for all x 6= 0 in Rn ;
(c) negative de…nite if xT Ax < 0 for all x 6= 0 in Rn ;
(d) negative semide…nite if xT Ax 0 for all x 6= 0 in Rn ;
(e) inde…nite xT Ax > 0 for some x 6= 0 in Rn and xT Ax < 0 for some x 6= 0 in Rn .
Application: A function y = f (x) of one variable is concave if its second derivative f 00 (x) 0
on some interval. The generalization of this concept to higher dimensions states that a function
is concave on some region if its second derivative matrix is negative semide…nite for all x on
that region.

3
The Determinant of a Matrix

The determinant of a matrix is a unique scalar associated with the matrix. Rather than showing
that the many usual de…nitions of the determinant are all the same by comparing them to each
other, I’m going to state four general properties of the determinant that are enough to specify
uniquely what number you should get when given a matrix. Then, you may check that all of
the de…nitions for determinant that you’ve encountered satisfy those properties. What we’re
really looking for is a function that takes n vectors (the n columns of the matrix) and returns
a number. Consider a matrix A, and denote its ith column by Ai so that

A = (A1 ; A2 ; :::; An ) .

Assume we’re working with real numbers. Standard operation change the value of the deter-
minant, as follows:

1. switching two columns changes the sign,

det A = det(A2 ; A1 ; A3 ; :::; An );

2. multiplying one column by a constant multiplies the determinant by that constant,

a det A = det(aA1 ; A2 ; :::; An );

3. the determinant is linear in each column,

det(A1 + B1 ; A2 ; :::; An ) = det(A1 ; A2 ; :::; An ) + det(B1 ; A2 ; :::; An ).

These three properties, together with the fact that the determinant of the identity matrix is
one, are enough to de…ne a unique function that takes n vectors (each of length n) as inputs
and returns a real number which amounts to the determinant of the matrix given by those
vectors.
This observation helps with some of the interpretations of the determinant. In particular,
there’s a nice geometric way to think of a determinant. Consider the unit cube in n dimensional
space (that is the set of vectors of length n with coordinates 0 or 1 in each spot). The determi-
nant of the linear transformation (i.e. of the matrix) T is the signed volume of the region you
get by applying T to the unit cube. Consider how that follows from the abstract de…nition. If
you apply the identity to the unit cube, you get back the unit cube, and the volume of the unit
cube is 1. If you stretch the cube by a constant factor in one direction only, the new volume
is that constant. And if you stack two blocks together aligned on the same direction, their
combined volume is the sum of their volumes: this all shows that the signed volume we have
is linear in each coordinate when considered as a function of the input vectors. Finally, when
you switch two of the vectors that de…ne the unit cube, you ‡ip the orientation.

4
With this geometric de…nition of determinant, if you are familiar with multivariate calculus,
you could think about why determinants (the Jacobian) appears when we change coordinates
doing integration. Hint: a derivative is a linear approximation of the associated function,
and consider a “di¤erential volume element” in your starting coordinate system. It’s not too
much work to check that the area of the parallelogram formed by vectors (a; b) and (c; d) is
det((a; b); (c; d)).

5
Computing the Determinant of a Matrix

Computing the determinant (det A or equivalently jAj) of a matrix A, is straightforward in two


dimensions,
a11 a12
det A = det = a11 a22 a12 a21 ,
a21 a22
and in three dimensions,
0 1
a11 a12 a13
det A = det @ a21 a22 a23 A =
a31 a32 a33
a22 a23 a21 a23 a21 a22
= a11 det a12 det + a13 det .
a32 a33 a31 a33 a31 a32

With more than three dimensions standard recursive de…nitions can be applied for its calcula-
tion. To do so, denote by:

aij the entry in row i and column j of a matrix A;

Mij the submatrix obtained by removing row i and column j form a matrix A.

If so, Laplace’s formula states that:


Pn
det A = j=1 ( 1)i+j aij det Mij .

6
Submatrices and Minors

De…nition: Let A be an (n n) matrix. Any (k k) submatrix of A formed by deleting


(n k) columns, say columns (i1 ; i2 ; :::; in k ) and the corresponding (n k) rows from A,
(i1 ; i2 ; :::; in k ), is called a k th order principal submatrix of A. The determinant of a k th order
principal submatrix is called a k th order principal minor of A.

De…nition: Let A be an (n n) matrix. The k th order principal submatrix of A obtained


by deleting the last (n k) rows and columns from A is called the k th order leading principal
submatrix of A, denoted Lk . Its determinant is called the k th order leading principal minor of
A.

Example: A general (3 3) matrix


0 1
a11 a12 a13
A = @ a21 a22 a23 A
a31 a32 a33

possesses one third order principal minor, namely det A, three second order principal minors,

a11 a12 a22 a23 a11 a13


det , det , det ,
a21 a22 a32 a33 a31 a33
and three …rst order principal minors a11 , a22 , a33 . The leading principal minors are

a11 a12
L3 = det A, L2 = det , L1 = a11 .
a21 a22

7
Testing the De…niteness of a Matrix

Let A be an (n n) symmetric matrix, then A is:

positive de…nite if and only if every leading principal minor is strictly positive;

jL1 j > 0, jL2 j > 0, jL3 j > 0...

negative de…nite if and only if every leading principal minor alternates in sign

jL1 j < 0, jL2 j > 0, jL3 j < 0...

with the k th order leading principal minor having the same sign as ( 1)k .

positive semide…nite if and only if every principal minor of A is non-negative;

negative semide…nite if and only if every principal minor of odd order is non-positive and
every principal minor of even order is non-negative.

Example: Diagonal matrices correspond to the simplest quadratic forms


0 1
a1 0 0
A = @ 0 a2 0 A .
0 0 a3

Any diagonal quadratic form can be written as

xT A x = a1 x21 + a2 x22 + a3 x23 .

Obviously such a quadratic form will be: positive (negative) de…nite if and only if every ai
is positive (negative); positive (negative) semide…nite if and only if every ai is non-negative
(non-positive); inde…nite if there are two entries ai with opposite signs.

Example: Consider a (2 2) matrix A and its corresponding quadratic form:

a b x1
Q(x1 ; x2 ) = x1 x2 = ax21 + 2bx1 x2 + cx22
b c x2

If a = 0, then Q cannot be de…nite since Q(1; 0) = 0. If a 6= 0 instead, add and subtract b2 x22 =a
to get

b2 2 b2 2
Q(x1 ; x2 ) = ax21 + 2bx1 x2 + cx22 + x x
a 2 a 2
2bx1 x2 b2 2 b2 2
= a(x21 + + 2 x2 ) x + cx22
a a a 2
b 2 (ac b2 ) 2
= a(x1 + x2 ) + x2 .
a a

8
If both coe¢ cients, a and (ac b2 )=a, are positive, then Q will never be negative. It will equal
0 only when x1 + ab x2 = 0 and x2 = 0 in other words, when x1 = 0 and x2 = 0. That is, if

jaj > 0 and det A = (ac b2 ) > 0

then Q is positive de…nite. Conversely, if Q is positive de…nite then both a and det A = ac b2
are positive.
Similarly, Q is negative de…nite if and only if both coe¢ cient are negative, which occurs if and
only if a < 0 and ac b2 > 0, that is, when the leading principal minors alternative in sign.
Finally, if a < 0 and ac b2 < 0, then the two coe¢ cients will have opposite signs and Q will
be inde…nite.

Examples Testing De…niteness:

2 3
Consider A = . Since jA1 j = 2 and jA2 j = 5, A is positive de…nite.
3 7

2 4
Consider B = . Since jB1 j = 2 and jB2 j = 2, B is inde…nite.
4 7

Consider Q(x) = bx21 + a(x22 + x23 ) + 2bx2 x3 . Its principal submatrices are:
0 1
b 0 0
b 0 b 0 a b
3rd : @ 0 a b A ; 2nd : , , ; 1st : (b) , (a) , (a).
0 a 0 a b a
0 b a

Q(x) is positive semide…nite if all principal minors are non-negative which requires:

a 0, b 0, ab 0, a2 b2 0, a2 b b3 0.

For these to hold at once it must be that a 0, b 0, a b. Q(x) is negative


semide…nite is all principal minors suitably alternate in sign which requires:

a 0, b 0, ab 0, a2 b2 0, a2 b b3 0.

For these to hold at once it must be that a 0, b 0, b a. Thus, the quadratic form
is inde…nite if and only if either ab < b2 or ab < 0. The plot below depicts the answer

9
Section 1: Tools for Optimization
Taylor Theorem
Taylor’s Expansion

The second tool needed for maximization is Taylor’s expansion (or approximation). The …rst
order Taylor’s approximation of a function f : R1 ! R1 satis…es
P (hja) = f (a) + f 0 (a)h.
The polynomial approximates f around a, since f (a + h) can be written as
f (a + h) = f (a) + f 0 (a)h + R(hja)
where R(hja) is the di¤erence between the two sides of the approximation.
R(hja)
The de…nition of the derivative f 0 (a) implies that h
! 0 as h ! 0.

Geometrically, this formalizes the approximation of f by its tangent line at (a; f (a)). Analyti-
cally, it describes the best approximation of f by a polynomial of degree 1.
De…nition: The k th order Taylor expansion of f at x = a is
f 00 (a) 2 f [k] (a) k
Pk (hja) = f (a) + f 0 (a)h + h + ::: + h .
2! k!

Moreover the corresponding Taylor residual Rk (hja) satis…es


Rk (hja)
Rk (hja) = f (a + h) Pk (hja) and lim = 0.
h!0 hk

Example: Compute the …rst and second order Taylor polynomial of the exponential function
f (a) = ea at a = 0. Since all the derivatives of f at a = 0 are equal 1:
P1 (hj0) = 1 + h
h2
P2 (hj0) = 1 + h +
2
For h = 0:2, then P1 (0:2j0) = 1:2 and P2 (0:2j0) = 1:22 compared with the actual value of
e0:2 = 1:2214.

10
Multivariate Taylor’s Expansion

For functions of several variables the …rst order Taylor expansion satis…es
@F @F
P1 (hja) = F (a) + (a)h1 + ::: + (a)hn = F (a) + DF (a)h,
@x1 @xn
where DF (a) denotes the Jacobian,

@F @F
DF (a) = (a); :::; (a) .
@x1 @xn

As before, the approximation holds since

R1 (hja) F (a + h) F (a) DF (a)h


lim = lim = 0.
h!0 jjhjj h!0 jjhjj

The second order Taylor expansion satis…es


1 T 2
P2 (hja) = F (a) + DF (a)h + h D F (a)h,
2!
where D2 F (a) denotes the Hessian,

0 @2F @2F 1
@x1 @x1
(a)
::: @xn @x1
(a)
D2 F (a) = @ ::: ::: ::: A.
@2F @2F
@x1 @xn
(a) ::: @xn @xn
(a)

The extension for order k then follows accordingly.

11
Section 1: Tools for Optimization
Concavity and Quasi-Concavity

Concavity and Convexity

De…nition: A set U is convex if for all x; y 2 U and for all t 2 [0; 1]

tx + (1 t)y 2 U .

De…nition: A real valued function f de…ned on a convex subset U of Rn is:


(1) concave, if for all x; y 2 U and for all t 2 [0; 1]

f (tx + (1 t)y) tf (x) + (1 t)f (y);

(2) convex, if for all x; y 2 U and for all t 2 [0; 1]

f (tx + (1 t)y) tf (x) + (1 t)f (y).

Observations:
(1) A function f is concave if and only if f is convex.
(2) Linear functions are both convex and concave.
(3) The de…nitions of concavity and convexity require f to have a convex set as a domain.

12
More on Concavity

Claim: Let f be a di¤erentiable function on a convex subset U of Rn . If so, f is concave on


U if and only if for all x; y 2 U
Pn @f (x)
f (y) f (x) Df (x)(y x) = i=1 (yi xi ).
@xi
Proof: The function f : R ! R is concave if and only if

tf (y) + (1 t)f (x) f (ty + (1 t)x) , t(f (y) f (x)) + f (x) f (x + t(y x)) ,
f (x + t(y x)) f (x) f (x + h) f (x)
f (y) f (x) , f (y) f (x) (y x),
t h
for h = t(y x). Taking limits when h ! 0, the latter implies that

f (y) f (x) f 0 (x)(y x). (1)

To prove the converse, let z = ty + (1 t)x. By applying equation (1), we have that

f (y) f (z) f 0 (z)(y z),


f (x) f (z) f 0 (z)(x z).

Summing the …rst inequality multiplied by t to the second multiplied by (1 t), we …nd the
result

[f (y) f (z)]t + [f (x) f (z)](1 t) 0 , tf (y) + (1 t)f (x) f (z).

Claim: A twice di¤erentiable function f on an open convex subset U of Rn is concave on U


if and only if its Hessian D2 f (x) is negative semide…nite for all x in U . The function f is a
convex function if and only if D2 f (x) is positive semide…nite for all x in U .
Proof: The function f : R ! R is concave if and only if

f (y) f (x) f 0 (x)(y x) and f (x) f (y) f 0 (y)(x y):

Summing these and dividing by (y x)2 implies that

f 0 (y) f 0 (x)
0:
y x

The limit of the last expression when y ! x yields f 00 (x) 0. To prove the converse, pick
y x, and note that by Fundamental Theorem of Calculus and f 00 0, we have that
Ry Ry 0
f (y) f (x) = x f 0 (t)dt x
f (x)dt = f 0 (x)(y x).

13
Claim: Let f1 ; :::; fk be concave functions de…ned on the same convex set U Rn . Let
a1 ; a2 ; :::; ak be positive numbers. If so, the function

g = a1 f1 + a2 f2 + ::: + ak fk

is a concave function on U .
Proof: Immediately observe:
P Pk
g(tx + (1 t)y) = ki=1 ai fi (tx + (1 t)y) i=1 ai (tfi (x) + (1 t)fi (y)) =
Pk Pk
= t i=1 ai fi (x) + (1 t) i=1 ai fi (y) = tg(x) + (1 t)g(y).

Example: Consider a …rm whose production function is y = g(x), where y denotes output
and x denote the input bundle. If p denotes the price of output and wi is the cost per unit of
input i, then the …rm’s pro…t function satis…es

(x) = pg(x) (w1 x1 + w2 x2 + ::: + wn xn ).

Thus, the pro…t function is a concave function if the production function g is concave, since
the previous claim holds and because (w1 x1 + w2 x2 + ::: + wn xn ) is linear and, thus concave.

14
Quasiconcave and Quasiconvex Functions

De…nition: Consider a real valued function f de…ned on U Rn :


(1) its a-level set is
Ca = fx 2 U jf (x) = ag;
(2) its a-uppercontour set is
Ca+ = fx 2 U jf (x) ag;
(3) its a-lowercontour set is
Ca = fx 2 U jf (x) ag.

De…nition: A real valued function f de…ned on a convex set U Rn is:


(1) quasiconcave if Ca+ is a convex set for every real number a;
(2) quasiconvex if Ca is a convex set for every real number a.

The level sets of a quasiconcave function bound convex subsets from below.
The level sets of a quasiconvex function bound convex subsets from above.

Claim: Every concave function is quasiconcave and every convex function is quasiconvex.
Proof: Let x; y 2 Ca+ , so that f (x) a and f (y) a, and let f be concave. If so:
f (tx + (1 t)y) tf (x) + (1 t)f (y)
ta + (1 t)a = a.
So tx + (1 t)y is in Ca+ and hence this set is convex. We have shown that if f is concave, it
is also quasiconcave. Try to show that every convex function is quasiconvex.

15
The Economic Content of Quasiconcavity

Quasi-concavity is simply a desirable property when we talk about economic objective functions
such as preferences:

(1) The convexity of uppercontour sets is a natural requirement for utility and production
functions. For example, consider an indi¤erence curve of the concave utility function f (x; y).
Take two bundles on this indi¤erence curve. The set of bundles which are preferred to them, is
a convex set. In particular, the bundles that mix their contents are in this preferred set. Thus,
a consumer with a concave utility function always prefers a mixture of any two bundles with
the same utility to any of them.

(2) A more important advantage of the shape of the indi¤erence curve is that it displays a
diminishing marginal rate of substitution. As one moves left to right along the indi¤erence
curve increasing consumption of good x, the consumer is willing to give up more and more
units of good x to gain an additional unit of good y. This property is a property of concave
utility functions because each level set forms the boundary of a convex region.

16
Properties of Quasiconcavity

Observations:
(1) Any increasing transformation of a concave function is quasiconcave.
(2) An monotone function f (x) on R1 is both quasiconcave and quasiconvex.
(3) A single peaked function f (x) on R1 is quasiconcave.
(4) The function minfx; yg is quasiconcave, as Ca+ is convex.

Proof of (1): Let f be quasiconcave and de…ned on a convex set U in Rn . Consider an


increasing transformation g : R ! R with g 0 > 0 (you will prove the case in which g 0 0 in the
problem set). For any function h, let Ck+ (h) denote its uppercontour sets. So if h(x) = g(f (x)),
we have that
+
Ck+ (f ) = Cg(k) (h).
But as Ck+ (f ) is convex for all k, so is Cg(k)
+
(h); and hence h is quasi-concave.

17
Example: To show why monotone functions R2+ are not quasiconcave, consider the function

f (x; y) = x2 + y 2 .

Such function is clearly increasing in R2+ as Df (x; y) = (2x; 2y). However, the function is not
quasi-concave. In fact, (1; 0) ; (0; 1) 2 C1+ . However, (1=2; 1=2) 2
= C1+ as

f (1=2; 1=2) = 1=2 < 1.

Claim: Let f be a function de…ned on a convex set U Rn . The following are equivalent:

1. f is a quasiconcave function on U ;

2. for all x; y 2 U and t 2 [0; 1],

f (tx + (1 t)y) minff (x); f (y)g.

Proof: To prove that (1) implies (2), take any x; y 2 U and let minff (x); f (y)g = k. The latter
implies that x; y 2 Ck+ . But if f is quasiconcave, Ck+ must be convex and thus tx+(1 t)y 2 Ck+
for all t 2 [0; 1]. Thus, for any t 2 [0; 1],

f (tx + (1 t)y) k = minff (x); f (y)g.

To prove the converse, take any x; y 2 Ck+ and note that (2) implies that for all t 2 [0; 1]

f (tx + (1 t)y) minff (x); f (y) k.

Thus, tx + (1 t)y 2 Ck+ for all t 2 [0; 1] and all x; y 2 Ck+ , meaning that Ck+ is convex for all
k, and hence f is quasiconcave.

18
Section 2: Optimization

De…nition of Extreme Points

Optimization plays a crucial role in economic problems. We start by analyzing unconstrained


optimization problems.

De…nition: The ball B(xj") centred at x of radius " is the set of all vectors y in Rn whose
distance from x is less than ",

B(xj") = fy 2 Rn j" > jjy xjjg.

De…nition: Let f (x) be a real valued function de…ned on C Rn . A point x 2 C is:


(1) A global maximizer for f (x) on C if

f (x ) f (x) for all x 2 C.

(2) A strict global maximizer for f (x) on C if

f (x ) > f (x) for all x 2 C such that x 6= x .

(3) A local maximizer for f (x) if there is a strictly positive number " such that:

f (x ) f (x) for all x 2 C \ B(x ; ").

(4) A strict local maximizer for f (x) if there is a strictly positive number " such that:

f (x ) > f (x) for all x 2 C \ B(x ; ") such that x 6= x .

(5) A critical point for f (x) if the …rst partial derivative of f (x) exists at x and

@f (x )
= 0 for i = 1; 2; :::; n , Df (x ) = 0.
@xi

Example: To …nd the critical points of F (x; y) = x3 y 3 + 9xy, set

@F @F
(x; y) = 3x2 + 9y = 0 and (x; y) = 3y 2 + 9x = 0.
@x @y

The critical points are (0; 0) and (3; 3).

19
Do Extreme Points Exist?

De…nition: A set is compact if it is closed and bounded.

Extreme Value Theorem: Let C be a compact subset of Rn and f (x) be a continuous


function de…ned on C. If so, there exists a point x in C which is a global maximizer of f , and
a point x in C which is a global minimizer of f . That is,

f (x ) f (x) f (x ) for all x 2 C.

Unconstrained Optimization: Functions of One Variable

Consider real valued functions f (x) that are continuously di¤erentiable function on an interval
I (so that, f 0 (x) exists and is continuous).

First order necessary condition for maximum in R:


If x is a local maximizer of f (x), then either x is an end point of I or f 0 (x ) = 0.

Second order su¢ cient condition for a maximum in R:


If f 00 (x) is continuous on I and x is a critical point of f (x), then x is:

a global maximizer of f on I if f 00 (x) 0 for all x 2 I;

a strict global maximizer of f on I if f 00 (x) < 0 for all x 2 I and x 6= x ;

a strict local maximizer of f on I if f 00 (x ) < 0.

20
Unconstrained Optimization: Functions of Several Variables

Consider real valued functions f (x) that are continuously di¤erentiable on a subset C Rn
(so that, all …rst partial derivatives exist and are continuous).

First order necessary conditions for a maximum in Rn :


If x is in the interior of C and is a local maximizer of f (x), then x is a critical point of f ,

@f @f
Df (x ) = (x ); :::; (x ) = 0.
@x1 @xn

How does one determine whether a critical point is a local maximum or a local minimum
then? To this end one has to consider the Hessian of the map f (the matrix of the second
order partial derivatives). By the Clairaut-Schwarz theorem, the Hessian is always a symmetric
matrix as cross-partial derivatives are equal (if the function has continuous second order partial
derivatives).

Second order su¢ cient conditions for a local maximum in Rn :


Let f (x) be a function for which all second partial derivatives exist and are continuous on a
subset C Rn . If x is critical point of f and if D2 f (x ) is negative (positive) de…nite, then
x is a strict local maximizer (minimizer) of f .

Observations:
(1) If x is an interior point and a global maximum (minimum) of f , then D2 f (x ) is negative
(positive) semide…nite.
(2) If x is a critical point, and D2 f (x ) is negative (positive) semide…nite, then x may not
be local maximum (minimum).

For a counterexample to (2) consider f (x) = x3 . Clearly, Df (0) = 0 and D2 f (0) = 0 is


semide…nite. But x = 0 is neither a maximum or minimum.

Example: Again consider the example f (x1 ; x2 ) = x31 x32 + 9x1 x2 . Compute the Hessian:

6x1 9
D2 f (x1 ; x2 ) =
9 6x2
The …rst order leading principle minor is 6x1 and the second order leading principal minor is
det (D2 f (x1 ; x2 )) = 36x1 x2 81. At (0; 0) the two minors are 0 and 81 and hence the matrix
is inde…nite. The point is neither a local min nor a local max (it is a saddle point). At (3; 3)
the two minors are positive and hence the point is a strict local minimum of f . However, this
is not a global minimum (why? set x1 = 0 and x2 ! 1).

21
Unconstrained Optimization: More on Su¢ cient Conditions

Proof of Local SOC: By Taylor’s Theorem we have that


1
f (x + h) = f (x ) + Df (x )h + hT D2 f (x )h + R2 (hjx ).
2
Ignore R2 (hjx ) as it is small and note that by assumption Df (x ) = 0. If so,
1 T 2
f (x + h) f (x ) h D f (x )h.
2
If D2 f (x ) is negative de…nite, for all small enough h 6= 0, the right hand side is negative. This
implies that
f (x + h) f (x ) < 0,
or in other words x is a strict local maximizer of f .

Claim: If f is a di¤erentiable concave function de…ned on a convex set U and if x; y 2 U , then

Df (x)(y x) 0 ) f (y) f (x).

Moreover, x is a global maximizer of f if one of the following holds: (i) Df (x )(y x) 0


for all y 2 U ; (ii) Df (x ) = 0.
Proof: As the function is di¤erentiable, concavity is equivalent to

f (y) f (x) Df (x)(y x).

If so, since by assumption Df (x)(y x) 0 we must have that

f (y) f (x) 0.

If follows then immediately that ful…lling (i) or (ii) implies that x is a global maximizer of f .

Second order su¢ cient conditions for global maximum (minimum) in Rn :


Let x be a critical point of a twice di¤erentiable function f (x) on Rn . Then, x is:
(1) a global maximizer for f if D2 f (x) is negative (positive) semide…nite on Rn ;
(2) a strict global maximizer for f if D2 f (x) is negative (positive) de…nite on Rn .

Comment: The property that critical points of concave functions are global maximizers is an
important one in economic theory. For example, many economic principles, such as marginal
rate of substitution equals the price ratio, or marginal revenue equals marginal cost are simply
the …rst order necessary conditions of the corresponding maximization problem. Ideally, one
would like such conditions also to be a su¢ cient for guaranteeing that utility or pro…t is being
maximized. This will indeed be the case when the objective function is concave.

22
Unconstrained Optimization: Summary Necessary and Su¢ cient Conditions

Consider a subset U Rn , a vector x 2 U and a twice continuously di¤erentiable objective


function f : U ! R.

Recall that B(xj") denotes an open ball of radius epsilon around x, and let B(xj") = B(xj")nfxg.

Consider the following conditions:


(A) Df (x ) = 0 & D2 f (x ) 0.
(B) Df (x ) = 0, D2 f (x ) 0, & D2 f (x) < 0 for any x 2 B(x j") for some " > 0.
(C) Df (x ) = 0 & D2 f (x) 0 for any x 2 U .
(D) Df (x ) = 0 & D2 f (x) 0 for any x 2 B(x j") for some " > 0.
(E) Df (x ) = 0, D2 f (x) 0 for any x 2 U & D2 f (x) < 0 for any x 2 B(x j") for some " > 0.

The following table summarizes necessary and su¢ cient conditions for a maximum:

Necessary Su¢ cient


Strict Global Max B E
Global Max A or D C
Strict Local Max B B
Local Max A or D D

23
Constrained Optimization: A General Problem

We now analyze an optimization problem in which constraints limit the choice of variable

maxx2Rn f (x) subject to


g1 (x) 0, ... , gk (x) 0,
h1 (x) = 0, ... , hm (x) = 0.

The function f is called objective function.


The functions g are called inequality constraints.
The functions h are called equality constraints.

Example: The classical utility maximization problem:

maxx2Rn U (x1 ; x2 ; :::; xn ) subject to


p1 x1 + p2 x2 + ::: + pn xn m
x1 0, x2 0, ... , xn 0

where we can treat the latter constraints as xi 0.

24
Equality Constraints: Necessary Conditions

Consider a problem with two variables and one equality constraint

max f (x1 ; x2 ) subject to


x1 ;x2

p 1 x1 + p 2 x2 = m

Geometrical Representation:
Draw the constraint on the (x1 ; x2 ) plane.
Draw representative samples of level curves of the objective function f .
The goal is to …nd the highest value level curve of f which meets the constraint.
It cannot cross the constraint set; it therefore must be tangent to it.

To …nd the slope of the level set f (x1 ; x2 ) = k, use total di¤erentiation:

@f (x1 ; x2 ) @f (x1 ; x2 ) dx2 @f (x1 ; x2 ) @f (x1 ; x2 )


dx1 + dx2 = 0 ) = = .
@x1 @x2 dx1 @x1 @x2
The slope of the constraint similarly amounts to
dx2 @h(x1 ; x2 ) @h(x1 ; x2 )
= = .
dx1 @x1 @x2
Hence, tangency at the optimal x requires
@f @h
@x1
(x ) @x1
(x )
@f
= @h
.
@x2
(x ) @x2
(x )

Manipulate the previous equation and let be common value satisfying


@f @f
@x1
(x ) @x2
(x )
@h
= @h
@x1
(x ) @x2
(x )

If so, it is possible to re-write these two equations as:


@f @h
(x ) (x ) = 0
@x1 @x1
@f @h
(x ) (x ) = 0
@x2 @x2
The solution of the maximization problem is thus de…ned the following system of three equations
in three unknowns
@f @h
(x ) (x ) = 0
@x1 @x1
@f @h
(x ) (x ) = 0
@x2 @x2
h(x1 ; x2 ) c = 0

25
Alternatively, form the Lagrangian function associated to the maximization problem:

L(x1 ; x2 ; ) = f (x1 ; x2 ) (h(x1 ; x2 ) c)

and …nd the critical point of L, by solving:


@L @L @L
= 0, = 0 and =0
@x1 @x2 @
and note that this results in the same system equations de…ning the optimum. The variable
is called the Lagrange multiplier.
The Lagrange approach allowed us to reduce a constrained optimization problem in two vari-
ables to an unconstrained problem in three variables.
@h @h
Caveat: The procedure fails if @x 1
(x ) = @x 2
(x ) = 0.
A constraint quali…cation must be added requiring x not to be a critical point of h, or
formally Dh(x ) 6= 0. The procedure is not well de…ned when a critical point of the constraint
belongs to the constraint set since the slope of the constraint is not de…ned at that point.
Lagrange Theorem: Let f and h be continuously di¤erentiable functions from Rn to R. Let
x maximize f (x) subject to h(x) = 0 and suppose that x is not a critical point of h. If so,
there is a real number such that (x ; ) is a critical point of the Lagrange function

L(x; ) = f (x) h(x).

Intuition: The Lagrangian amounts to the objective function diminished by a penalty (the
Lagrange multiplier) for violating the constraint. As the constraint holds with equality when
@L=@ = 0, no penalty is ever paid at the optimum. But the Lagrange multiplier characterizes
the smallest …ne for which it is not worthwhile to violate the constraint. The same analysis
easily extends to the case of several equality constraints.

Example: Consider the following problem:

max x1 x2 subject to x1 + 4x2 = 16.


x1 ;x2

The constraint quali…cation is satis…ed trivially, the Lagrangian amounts to

L(x1 ; x2 ; ) = x1 x2 (x1 + 4x2 16),

and the …rst order conditions amount to

x2 = 0,
x1 4 = 0,
x1 + 4x2 16 = 0.

The only solution requires x1 = 8, x2 = 2, = 2.

26
Equality Constraints: Lagrange Theorem in General

Lagrange Theorem: Let f and hi for all i 2 f1; :::; mg be continuously di¤erentiable functions
from Rn to R. Let x maximize f (x) subject to h(x) = 0 and suppose that the gradients
Dh1 (x ); :::; Dhm (x ) are linearly independent. If so, there exists a vector = ( 1 ; :::; m )
such that P
Df (x ) = Dh(x ) = m i=1 i Dhi (x ).
| {z } |{z}| {z }
1 n 1 m m n

Proof: For convenience, consider the norm khk = (hT h)1=2 . For a; c > 0, de…ne the following
auxiliary function
a c
f a (x) = f (x) kh(x)k2 kx x k2 .
2 2
Consider the neighborhood B(x j") = fx 2 R j " kx x kg. Let xa maximize f a (x) subject
n

to x 2 B(x j") (where xa exists by the extreme value theorem). For any a > 0, by optimality
of xa for all x 2 B(x j"),
f a (xa ) f a (x ) = f (x ). (2)
As f a (x) is bounded on B" by continuity, we have that

lima!1 kh(xa )k2 = 0,

or else lima!1 f a (xa ) = 1. Thus, if lima!1 xa = x, we must have that kh(x)k2 = 0. Taking
limits of equation (2) implies that
c
lima!1 f a (xa ) = f (x) kx x k2 f (x ).
2
But if so, x = x , given that f (x) f (x ) as x maximizes f (x) subject to h(x) = 0.
Necessary conditions for the optimality of xa require

Df a (xa ) = Df (xa ) ah(xa )T Dh(xa ) c(xa x )T = 0. (3)


| {z } | {z }| {z } | {z }
1 n 1 m m n 1 n

As Dh(x ) is linearly independent, Dh(x) must be linearly independent for any x 2 B(x j")
1
when " su¢ ciently small, since h is continuously di¤erentiable. But if so, Dh(x)Dh(x)T ex-
a T a a T 1
ists for all x 2 B(x j"). Then, post-multiplying equation (3) by Dh(x ) Dh(x )Dh(x )
implies that
1
(Df (xa ) c(xa x )T )Dh(xa )T Dh(xa )Dh(xa )T = ah(xa )T .

Taking limits as a diverges, implies that


1
lima!1 ah(xa )T = Df (x )Dh(x )T Dh(x )Dh(x )T .

Taking limits of equation (3) then delivers the desired result since

Df (x ) = Dh(x ).

27
Inequality Constraints: Introduction

With equality constraints, the solution to a maximization problem satis…ed

Df (x ) = Dh(x ),

and no further restriction necessary on .

Now consider a problem with two variables and one inequality constraint,

max f (x1 ; x2 ) subject to g(x1 ; x2 ) 0.


x1 ;x2

Graphical Representation:
Either the solution is on the boundary of the constraint set.
If so, the constraint is binding and there is a tangency at the solution.
Or the solution is in the interior of the constraint set.
If so, the constraint is slack and the solution is una¤ected.

When the constraint binds g(x1 ; x2 ) = 0, the problem is similar to one with an equality con-
straint. However, even when the constraint binds, a restriction has to be imposed on the sign
of Lagrange multiplier . Gradients still satisfy

Df (x ) = Dg(x ).

But now the sign of is important. In fact, the gradients must point in the same direction or
else f could increase and still satisfy the constraint. This means that 0. This is the main
di¤erence between inequality and equality constraints.

To solve such a problem, we still construct the Lagrangian

L(x1 ; x2 ; ) = f (x1 ; x2 ) g(x1 ; x2 ),

and then …nd the critical point of L, by setting


@L @f (x1 ; x2 ) @g(x1 ; x2 )
= = 0 for all i 2 f1; 2g .
@xi @xi @xi

28
But what about @L=@ ?
(1) If the maximum x is achieved when g(x ) = 0, then the problem is solved as for an equality
constraint and @L=@ = 0.
(2) If the maximum x is achieved when g(x ) < 0, the constraint is not binding and the optimal
solution is interior. If so, the point x is an unconstrained local maximum and satis…es
@f @f
(x ) = (x ) = 0,
@x1 @x2
and therefore can still use the Lagrangian, provided that we set = 0!!

In other words, either the constraint is binding and g(x1 ; x2 ) = 0, or it does not bind and
= 0 as a slack constraint cannot a¤ect the solution to a maximization problem. In short, the
following complementary slackness condition has to hold
@L
= g(x1 ; x2 ) = 0.
@

29
Inequality Constraints: Examples

Recall that the constrained optimization problem with two variables and one inequality con-
straints. Let f and g be continuous di¤erentiable functions. Let x = (x1 ; x2 ) be a solution to
max f (x) subject to g(x) 0. If g(x ) = 0, suppose that x is not a critical point of g. Then,
there is a real number such that

Dx L(x ; ) = 0, D L(x ; ) = 0, 0 and g(x ) 0.

where the Lagrangian amounts to L(x; ) = f (x) g(x).

Example One Constraint: Consider a pro…t maximizing …rm that produces output y from
input x according to the production function x0:5 . Let the price of output be 2, and that of
input be 1. The …rm cannot buy more than a > 0 units of input. This …rm’s maximization
problem amounts to
maxx 2x0:5 x subject to x a 0.

The Lagrangian associated to this problem is

L(x; ) = 2x0:5 x [x a].

First order necessary conditions for this problem require


0:5
x 1= , (x a) = 0, 0 and x a 0.

To solve the system of equations consider two cases:


(1) If > 0, the constraint must bind. Thus, x = a and

= f 0 (a) = a 0:5
1.

For this solution to be viable > 0 which amounts to


0:5
=a 1 > 0 , a < 1.

(2) If = 0, the constraint must be slack. From the …rst order condition,
0:5
x 1 = 0 , x = 1.

The solution thus requires x = 1 and = 0, which is viable only if a 1.

30
Inequality Constraints: Several Constraints

The generalization to several inequality constraints is straightforward. However, now a subset


of constraints may be binding at the optimum.

Example More Constraints: Consider the following maximization problem:

max x1 x2 x3 subject to x1 + x2 + x3 3, x1 0, x2 0 and x3 0.

The Lagrangian associated to this problem satis…es:

L(x; ) = x1 x2 x3 + 1 x1 + 2 x2 + 3 x3 4 (x1 + x2 + x3 3).

Before solving the Lagrangian, observe that at the relevant critical points 1 = 2 = 3 = 0.
In fact, if i > 0, complementary slackness would require xi = 0 and x1 x2 x3 = 0.
Obviously we could improve, for instance by setting xi = 1=2 for all i.

What remains to be established is whether 4 > 0 or 4 = 0 at the optimum.


But, 4 > 0 since if the constraint was slack

x1 + x2 + x3 < 3,

and it would be possible to meet the constraint while increase the value of the function x1 x2 x3
by increasing any one of the three variables.

Thus, from the …rst order conditions we obtain that

x1 x2 = x2 x3 = x1 x3 = 4.

It follows that 4 = x1 = x2 = x3 = 1 at the solution.

31
Inequality Constraints: Su¢ cient Conditions

Consider the following general constrained optimization problem

maxx2Rn f (x) subject to g(x) 0,

where g consists of m constraints g(x) = (g1 (x); :::; gm (x))T . For this problem we have char-
acterized necessary conditions for a maximum. These conditions state that any solution x of
the constrained optimization problem must also be a critical point of the Lagrangian,

L(x; ) = f (x) g(x),

where = ( 1 ; ::; m) denotes the vector of multipliers.

Problem: We would also like to know:

whether any critical point of the Lagrangian is a maximizer of the Lagrangian;


whether any maximizer of the Lagrangian, solves the constrained optimization problem.

Solution: To answer the latter, let (x ; ) satisfy necessary conditions for a maximum:
@L(x ; ) @f (x ) @g(x )
= = 0 for any i 2 f1; :::; ng ,
@xi @xi @xi
@L(x ; )
j = 0, j 0 and gj (x ) 0 for any j 2 f1; :::; mg .
@ j
Claim: If x maximizes the Lagrangian, it also maximizes f subject to g(x) = 0.
Proof: To see this note that by complementary slackness g(x ) = 0 and thus

L(x ; ) = f (x ) g(x ) = f (x ).

By 0 and g(x) 0 for all other x, we get that

L(x; ) = f (x) g(x) f (x).

Moreover, since x maximizes the Lagrangian, for all x,

L(x ; ) = f (x ) g(x ) f (x) g(x) = L(x; ),

which implies that


f (x ) f (x).
Thus, if x maximizes the Lagrangian, it also solves f (x) subject to g(x) 0.

To conclude the argument recall the main results from unconstrained optimization. If f is a
concave function de…ned on a convex subset of Rn and x0 is a point in the interior in which
Df (x0 ) = 0, then f (x) f (x0 ) for all x. You have shown in class that in the constrained
optimization problem, if f is concave and g is convex, then the Lagrangian function is also
concave which means that …rst order conditions are su¢ cient in this case.

32
Inequality Constraints: Karush-Kuhn-Tucker Theorem

Again, consider the general constrained optimization problem

maxx2Rn f (x) subject to h(x) = 0 and g(x) 0.

We need to impose additional constraint quali…cations to ensure that the problem is not degen-
erate. These can be stated in many ways. Two equivalent requirements appear below. Denote
by gE = (g1 ; ::; ge )T the vector of binding constraints (namely, those for which ( 1 ; ::; e ) > 0),
while letting ge+1 to gk denote slack ones. Posit that the multipliers associated to all equality
constraints di¤er from 0 (or else, drop those for which this is not the case).
KKT Constraint Quali…cations: Consider one of the following two restrictions:

Slater Condition: there exists a point x 2 Rn such that g(x) < 0 and h(x) = 0, and h
is a¢ ne.

Linear Independence: the gradients of the binding inequality constraints, Dx gE (x ),


and the gradients of the equality constraints, Dx h(x ) are linearly independent at the
solution x .

Recall that linear independence amounts to requiring

Dx gE (x )
wT 6= 0 for any w 2Re+m nf0g.
Dx h(x )

KKT Theorem: Let f , h and g be di¤erentiable. Let f be concave, hj be convex for all
j 2 f1; :::; kg, gj be convex for all j 2 f1; :::; mg, and a constraint quali…cation hold. Then x
solves the constraint optimization problem if and only if (x ; ; ) solves

Df (x ) Dh(x ) Dg(x ) = 0,
h(x ) = 0, g(x ) = 0, 0 and g(x ) 0.

33
Inequality Constraints: Local Su¢ cient Conditions

Concavity of the objective function and convexity of the constraints imply that any critical
point of the Lagrangian is a global solution of the constraint optimization problem. If such
conditions are not satis…ed, we can still rely on su¢ cient conditions for local maxima.

Mechanically, one can solve constrained optimization problems in the following way:
(1) Form the Lagrangian L(x; ) = f (x) g(x), and a solution to KKT necessary conditions:

Dx L(x ; ) = 0, D L(x ; )T = 0, 0 and g(x ) 0.

(2) Let g1 to ge denote the binding constraints and ge+1 to gk slack ones. De…ne gE = (g1 ; ::; ge )T .
(3) Observe that the set fv 2Rn jDx gE (x )v = 0g identi…es the hyperplane tangent to the
binding constraint set at the critical point x . In particular, the hyperplane tangent to
constraint i 2 f1; :::; eg at x is fx 2Rn jDx gi (x )(x x ) = 0g, which we normalize to
fv 2Rn jDx gi (x )v = 0g. Each of these hyperplanes can be interpreted as a linear map from
Rn 1 into R. Now, fv 2Rn jDx gE (x )v = 0g denotes the intersection of these hyperplanes
which can be interpreted as a linear map from Rn e into R. Note that if n = e, we are looking
at a point as there are no free parameters.
(4) For local second order conditions to hold we need the Lagrangian to be negative de…nite
for any non-degenerate point in a neighborhood of x on such a linear subspace tangent to the
constraint set. Formally this requires the Hessian of the Lagrangian with respect to x at the
critical point (x ; ) to be negative de…nite on the hyperplane fv 2Rn jDx gE (x )v = 0g. That
is, we one of the followowing two conditions to hold:

(i) Dx gE (x )v = 0 ) vT (Dx2 L(x ; ))v < 0, for all v 2 Rn nf0g;


(ii) Dx gE (x )(x x ) = 0 ) (x x )T (Dx2 L(x ; ))(x x ) < 0, for all x 2 Rn nfx g.

If so, x is a strict local maximum of f on the constraint set.


(5) A simple way to check this condition can be obtained by forming the bordered Hessian,

D2 L(x ; ) D Dx L(x ; ) 0 Dx gE (x )
D2 L(x ; )= = .
D Dx L(x ; )T Dx2 L(x ; ) Dx gE (x )T Dx2 L(x ; )

(6) If the largest n e leading principal minors of D2 (x ; ) alternate in sign with the sign of
the largest leading principal minor the same as the sign of ( 1)n , then su¢ cient second order
conditions hold for the critical point to be a local maximum of the constrained maximization
problem.

Remark: If the largest n e leading principal minors of D2 (x ; ) have the same sign as
( 1)e , then su¢ cient second order conditions hold for the critical point to be a local minimum
of the constrained maximization problem.

34
Inequality Constraints: Arrow-Enthoven Theorem

The KKT Theorem applies to concave objective functions and convex constraints. Many eco-
nomic objective functions are however only quasi-concave, and many constraints are only quasi-
convex. The following result generalizes the KKT Theorem to these settings. For sake of brevity,
we state the result only for inequality constraints. But the generalization is straightforward.

AE Theorem: Let f and g be di¤erentiable. Let f be quasi-concave and gj be quasi-convex


for all j. Then, any solution (x ; ) to the KKT necessary conditions,

Dx L(x ; ) = 0, D L(x ; )T = 0, 0 and g(x ) 0,

solves the constraint optimization problem provided that Dx f (x ) 6= 0 and that f (x) is twice
di¤erentiable in the neighborhood of x .

Numerous alternative regularity conditions can replace the latter requirement which only deals
with slopes being well de…ned.

35
Section 3: Comparative Statics and Fixed Points
Value Functions and Envelope Theorem

The value function is the objective function of a constraint optimization problem evaluated at
the solution. Pro…t functions and indirect utility functions are examples of maximum value
functions, whereas cost functions and expenditure functions are minimum value functions.

Consider a general constrained optimization problem:

maxx2Rn f (xjb) subject to g(xjb) 0.

where g(xjb) = (g1 (xjb); ::; gm (xjb))T denotes the vector of constraints and b a parameter.
For this constrained optimization problem:

let x(b) = (x1 (b); :::; xn (b))T denote the optimal solution;

let (b) = ( 1 (b); :::; m (b)) denote the corresponding Lagrange multipliers;

let L(x; jb) denote the Lagrangian of the constraint optimization problem;

omit any slack constraints so that (b) >> 0.

De…nition: The maximum value function of the problem amounts to

v(b) = f (x(b)jb).

The Envelope Theorem relates changes in parameters to the value function.

Envelope Theorem: Suppose that (x(b); (b)) are di¤erentiable functions and that x(b)
satis…es the constraint quali…cation. If so,

dv(b) @L(x(b); (b)jb)


= .
db @b

36
A General Proof of the Envelope Theorem

Envelope Theorem: Suppose that (x(b); (b)) are di¤erentiable functions and that x(b)
satis…es the constraint quali…cation. If so,

dv(b) @L(x(b); (b)jb)


= .
db @b

Proof: Recall that FOC for the constraint optimization problem require that

Dx f (x(b)jb) = (b) Dx g(x(b)jb).


| {z } |{z} | {z }
1 n 1 m m n

Observe that, as all the constraints bind, total di¤erentiation implies that

Dx g(x(b)jb) Db x(b) + Db g(x(b)jb) = 0.


| {z } | {z } | {z }
m n n 1 m 1

Finally note that by partially di¤erentiating the Lagrangian we get

Db L(x(b); (b)jb) = Db f (x(b)jb) (b) D g(x(b)jb).


| {z } | {z } |{z} | b {z }
1 1 1 1 1 m m 1

To prove the result observe that

dv(b) df (x(b)jb)
= = Db f (x(b)jb) + Dx f (x(b)jb)Db x(b) =
db db
= Db f (x(b)jb) + (b)Dx g(x(b)jb)Db x(b) =
= Db f (x(b)jb) (b)Db g(x(b)jb) =
@L(x(b); (b)jb)
= .
@b
where the …rst equality is de…nitional, the second holds by simple di¤erentiation, the third
holds by FOC, the fourth by totally di¤erentiating constraints, and the …fth amounts to the
partial of the Lagrangian.

37
The Meaning of Lagrange Multiplier

Consider a simple problem with n variables and 1 equality constraint

maxx2Rn f (x) subject to h(x) = b.

The Lagrangian of this problem satis…es

L(x; jb) = f (x) (h(x) b).

For any value b, by FOC the solution satis…es:

Dx f (x(b)) (b)Dx h(x(b)) = 0.

Furthermore, since h(x(b)) = b for all b:

Dx h(x(b))Db x(b) = 1

Therefore, using the chain rule we obtain the desired result

df (x(b))
= Dx f (x(b)Db x(b) = (b)Dx h(x(b))Db x(b) = (b).
db
Interpretation: This observation allows us to attach to the multiplier the economic interpre-
tation of a shadow price. For instance, in a model in which a …rm maximizes pro…ts subject
to some resource constraint, the multiplier identi…es how valuable an additional unit of input
would be to the …rm’s pro…ts, or how much the maximum value of the …rm changes when the
constraint is relaxed. In other words, it is the maximum amount the …rm would be willing to
pay to acquire another unit of input. This immediately follows as
d @
f (x(b)) = (b) = L(x(b); (b)jb).
db @b

38
Envelope Theorem Intuition

Consider the implications of the envelope theorem for a scalar b in an unconstrained problem

maxx2Rn f (xjb).

Let x(b) denote the solution of this problem, and assume it to be continuous and di¤erentiable
in b.

In this setup the envelope theorem simpli…es to

df (x(b)jb) @f (x(b)jb) @x(b) @f (x(b)jb) @f (x(b)jb)


= + = ,
db @x @b @b @b
since by the …rst order conditions
@f (x(b)jb)
= 0.
@x

Intuition: When we are already at a maximum, changing slightly the parameters of the
problem or the constraints, does not a¤ect the value through changes in the solution x(b),
because the …rst order conditions hold. Thus, only the partial derivative matters. If there are
multiple solutions and you can still use the envelope theorem, but you have to make sure that
you do not jump to a di¤erent solution in a discrete manner.

39
Implicit Function Theorem and Comparative Statics

Many relevant economic problems study how an equilibrium or a solution to an optimization


problem is a¤ected by changes in the exogenous variables. The implicit function theorem (IFT)
is the prime tool for such comparisons.

The IFT provides su¢ cient conditions for a set of simultaneous equations
Fi (xjb) = Fi (x1 ; :::; xn jb1 ; :::; bm ) = 0 for 8i = 1; :::; n or equivalently F(xjb) = 0,
to have a solution de…ned by of implicit functions
xi = xi (b) = xi (b1 ; :::; bm ) for 8i = 1; :::; n or equivalently x = x(b).
In other words, IFT disciplines the system of equations so to ensure that the n equations
de…ning the system can be solved by the n variables, x1 ; :::; xn , even when an explicit form
solution cannot be obtained.

Implicit Function Theorem: Consider the system of equations F(xjb) = 0. Let any function
Fi have continuous partial derivatives with respect to all x and b variables. Consider a point
x that solves the system of equations at parameter values b and at which the determinant of
the n n Jacobian with respect to the x variables is not zero,
@F1 @F1 @F1
@x1 @x2
::: @xn
@F2 @F2 @F2
@F(xjb) @x1 @x2
::: @xn
Dx F(xjb) = = 6= 0.
@x ::: ::: ::: :::
@Fn @Fn @Fn
@x1 @x2
::: @xn

If so, there exists an m-dimensional neighborhood of b in which the variables x are functions
of b. For every vector b in the neighborhood F(x(b)jb) = 0, thereby giving to the system of
equations the status of a set of identities in this neighborhood. Moreover, the implicit functions
x(b) are continuous and have continuous partial derivatives with respect to all the variables b.

Thus, IFT allows us to …nd the partial derivatives of the implicit functions x(b) without solving
for the x variables. To do so, since in the neighborhood of the solution the equations have a
status of identities, we can set the total di¤erential to zero, dF (x(b)jb)=db = 0. For instance,
when considering only dbi 6= 0 and setting dbj = 0 for j 6= i, the result in matrix notation
implies that
1
@F(x(b)jb) @x(b) @F(x(b)jb) @x(b) @F(x(b)jb) @F(x(b)jb)
+ =0 ) = .
@x @bi @bi @bi @x @bi
Since Dx F(xjb) 6= 0, a unique nontrivial solution exists to this linear system which by
Cramer’s rule satis…es
@xj (b) Dx F(xjb)j
=
@bi Dx F(xjb)
where Dx F(xjb)j is the matrix formed by replacing the j th column of Dx F(xjb) with the vector
Dbi F(xjb).

40
Example: Implicit Function Theorem

For n = 2 and m = 1, consider the system of equations

x 1 e x2
F(xjb) = = 0.
x21 + x22 b

Restrict attention to the case in which x 0. When x 6= 0, we have that

jDx F(xjb)j = 2x2 + 2x1 ex2 > 0.

Thus, when x 6= 0, IFT applies and there exists implicit functions such that

x1 (b) ex2 (b)


F(x(b)jb) = = 0.
x1 (b)2 + x2 (b)2 b

By totally di¤erentiating the system, we …nd

dF(x(b)jb) x01 (b) x02 (b)ex2 (b)


= = 0.
db 2x01 (b)x1 (b) 2x02 (b)x2 (b) 1

Consequently, by solving the linear system, we …nd that


1
@x(b) @F(x(b)jb) @F(x(b)jb) ex2 =(2x2 + 2x1 ex2 )
= = .
@b @x @b 1=(2x2 + 2x1 ex2 )

41
Implicit Function Theorem and Optimization

The previous argument was designed to hold for arbitrary systems of equations. Optimization
problems however, have a unique feature though, namely that the condition jDx F(xjb)j 6= 0
always holds. This is the case since in an optimization problem such matrix simply coincides
with the Hessian of the Lagrangian (that is, the bordered Hessian).

This means that indeed we can take the maximum value function, or set of equilibrium condi-
tions, totally di¤erentiate them and …nd how the endogenous variables change with the exoge-
nous ones in the neighborhood of the solution.

An Example: Consider a simple optimization with two variables and one equality constraint

max2 f (x) subject to g(x) b.


x2R

First order conditions require

F 1 ( ; xjb) = g(x) + b = 0,
F 2 ( ; xjb) = f1 (x) g1 (x) = 0,
3
F ( ; xjb) = f2 (x) g2 (x) = 0.

We need to verify that the Jacobian of the system is di¤erent from zero,
@F 1 @F 1 @F 1
@ @x1 @x2 0 g1 g2
@F 2 @F 2 @F 2
jD ;x F( ; xjb)j = @ @x1 @x2 = g1 f11 g11 f21 g21 6= 0.
@F 3 @F 3 @F 3 g2 f12 g12 f22 g22
@ @x1 @x2

But the determinant of such a matrix is that of the bordered Hessian of the problem.

Thus, whenever second order conditions are satis…ed:


(1) the determinant of the bordered Hessian must be di¤erent zero (negative) and;
(2) we can totally di¤erentiate the equations to …nd how the solution changes with b,

@x1 @x2
g1 + g2 1 = 0,
@b @b
@x1 @x2 @
(f11 g11 ) + (f12 g12 ) g1 = 0,
@b @b @b
@x1 @x2 @
(f21 g21 ) + (f22 g22 ) g2 = 0.
@b @b @b
@x1 @x2 @
At the equilibrium solution, one can then solve for @b
, @b , and @b
.

42
Correspondences

As equilibria in economics are generally de…ned as …xed points of some system of equations,
proving equilibrium existence essentially relies on classical …xed point theorems. This true
in game theory and in general equilibrium theory. We state these results more generally for
correspondences (and not just for functions) since the solutions of the optimization problems
faced by players may not be unique.

De…nition: A correspondence from set X to set Y , denoted h : X Y , is a function from X


to P (Y ) = fAjA Y g which is the set of all subsets of Y .

Remark: A correspondence h is set-valued function, as h(x) Y for every x 2 X.

Examples: For instance, the following set-valued functions are correspondences:

h(x) = (2; 1);

h(x) = [x=2 + 10; x=2 + 11];

0 if x 2 [0; 1)
h(x) = .
[0; 1] if x = 1

43
Properties of Correspondences

De…nition: A correspondence h : X Y is:

non-empty if h(x) 6= ; for all x 2 X;

convex-valued if y 2 h(x) and y 0 2 h(x) imply that

py + (1 p)y 0 2 h(x) for any p 2 [0; 1] ;

1
upper-hemicontinuous if for any a sequence y k ; xk k=1
such that y k 2 h(xk ) for all k,

xk ! x and y k ! y imply that y 2 h(x).

Remark: A correspondence h is upper-hemicontinuous if and only if its graph, Gr(h) =


f(y; x) 2 Y X j y 2 h (x)g, is a closed set.

Remark: Any function is a convex-valued correspondence. Any continuous function is a


upper-hemicontinuous correspondence.

44
Fixed Point Theorem

De…nition: A …xed point of the correspondence is a point x 2 X such that x 2 h(x).

Kakutani Fixed Point Theorem: Let X be a non-empty, compact and convex subset of Rn .
Let h : X X be a non-empty, convex-valued, upper-hemicontinuous correspondence. Then
h has a …xed point.

45
Intuition: Fixed Point Theorem

Proof Intuition One-Dimension: Let X = [0; 1] which is a non-empty, compact and convex.
If 0 2 h(0) or if 1 2 h(1), we have a …xed point. If not, y > 0 for all y 2 h(0) and y < 1 for all
y 2 h(1). If so, take the smallest value x such that y x for some y 2 h(x). By UHC, there
1
exists y 0 2 h(x) such that y 0 x. To show the latter, consider any sequence y k ; xk k=1 such
that y k 2 h(xk ) and xk < x for all k. If so, as y k > xk for all k, we have that

y0 limk!1 y k limk!1 xk x,

and by UHC we further have that y 0 2 h(x). But if so, as h is convex-valued, we must have
that x 2 h(x).

Two-Dimensions Plot: Consider a two-dimensional correspondence h(x) R2 . Suppose


that hi (x) = hi (xj ) R. If so, we can exploit the following graph in order to plot the …xed
points.

46
More on Continuity and Correspondences

De…nition: A correspondence h : X Y is:


1
lower-hemicontinuous if for any (y; x) 2 Gr(h) and for any sequence xk k=1
such that
1
xk ! x, there exists a sequence y k k=1 such that

y k 2 h(xk ) for all k and y k ! y;

continuous if it is both lower-hemicontinuous and upper-hemicontinuous;

compact-valued if h(x) is compact for all x 2 X.

47
More on Continuity and Theorem of the Maximum

Theorem of the Maximum: Let X be a compact subset of Rn , and let B be a subset of


Rm . Let f : X B ! R be a continuous function, and let g : B X be a continuous
correspondence. Let x : B X be the correspondence de…ned by

x (b) = arg maxx2Rn f (xjb) subject to x 2 g(b).

If so the following must hold:


(1) x is upper-hemicontinuous;
(2) if g is non-empty and compact-valued, so is x .

Remark: The theorem plays a key role in invoking FPT as it delivers UHC and non-empty
compact values not convex values.

Remark: If one further invokes that:

f (xjb) is quasi-concave in x for all b;

g(b) convex-valued for all b;

then x is convex-valued.

48
Extra: Dynamic Consumption-Savings Problem

A player starts life with wealth a0 > 0. Each period t = 0; : : : ; 1, he decides how much to
consume ct 0. Consumption is chosen to maximize your lifetime utility,
X1
t
u(ct ),
t=0

where 2 [0; 1) is the discount factor. Further assume that limc!0 u0 (c) = +1, so that ct > 0
each period. Wealth at the beginning of period t is denoted by at . Wealth at is invested at a
constant interest rate r. Wealth evolves according to the law of motion

at+1 = (1 + r)at ct .

Given a0 > 0, the problem is solved by fct g1 1


t=0 and fat gt=0 to maximize
X1
t
u(ct ).
t=0

subject to the constraints that for all t 0,

at+1 = (1 + r)at ct ,
at+1 0.

To …nd the solution as a sequence fct ; at+1 g1


t=0 write the Lagrangian:
X1
L= [ t u(ct ) t (at+1 (1 + r)at + ct ) + t at+1 ]:
t=0

where f t ; t g1
t=0 are sequences of Lagrange multipliers associated to the double sequence of
constraints. The …rst-order conditions of the problem require for all t 0:

@L=@ct = t u0 (ct ) t = 0,
@L=@at+1 = (1 + r) t+1 t+ t = 0.

Complementary slackness requires instead for all t 0:

t at+1 = 0 and t (at+1 (1 + r)at + ct ) = 0.

Necessary conditions are also su¢ cient if the problem is well behaved. Wealth at never falls to
zero. If aT = 0 for some T , then ct = 0 for all t T which cannot be as u0 (0) = 1. Therefore,
we have that for all t 0:

at+1 > 0; t = 0; (1 + r) t+1 = t.

Consequently, for all t 0, consumption ct satis…es:

u0 (ct ) = (1 + r) u0 (ct+1 );

and, if so, wealth at satis…es for all t 0:

at+1 = (1 + r)at ct .

49
London School of Economics Dr Francesco Nava
Department of Economics O¢ ce: 32L.3.20

EC400 –Optimization and Fixed Points


Problem Set 1: Tools for Optimization

1. Show that the general quadratic form

a11 x21 + a12 x1 x2 + a22 x22

can be written as
a11 a12 x1
x1 x2 ;
0 a22 x2
and …nd its unique symmetric representation.

2. List all the principal minors of a general (3 3) matrix and denote which are the three
leading principal minors.

3. Determine the de…niteness of the following symmetric matrices:


0 1
1 2 0
0 0 2 1 3 4
(a) ; (b) ; (c) ; (d) @ 2 4 5 A .
0 c 1 1 4 6
0 5 6

4. [Harder] Consider the following quadratic form

Q(x) = ax21 + bx22 + 2abx1 x2 .

For what values of the parameter values, a and b, is the quadratic form Q(x) inde…nite?
Plot your answer in R2 .
5. Approximate ex at x = 0 with a Taylor polynomial of order three and four. Then compute
the values of these approximation at h = 0:2 and at h = 1 and compare with the actual
values.

6. For each of the following functions on R, determine whether they are quasiconcave, qua-
siconvex, both, or neither:

(a) ex ; (b) ln x; (c) x3 x

7. [Harder] Let f be a function de…ned on a convex set U in Rn . In lecture, we have shown


that f is a quasiconcave function on U if and only if for all x; y 2 U and t 2 [0; 1]

f (tx + (1 t)y) minff (x); f (y)g.

State the corresponding theorem for quasiconvex functions and prove it.
8. Prove that a weakly increasing transformation (g : R ! R such that g 0 0) of a quasi-
concave function is quasi-concave.

9. A commonly used production or utility function is f (x; y) = xy. Check whether it is


concave or convex using its Hessian. Then, check whether it is quasiconcave.

10. [Harder] Show that any continuously di¤erentiable function f : R ! R, satisfying

@f (x)
x 0,
@x
must be quasi-concave.

2
London School of Economics Dr Francesco Nava
Department of Economics O¢ ce: 32L.3.20

EC400 –Optimization and Fixed Points


Problem Set 2: Optimization

1. Find critical points, local maxima and local minima for each of the following functions:
(a) x4 + x2 6xy + 3y 2
(b) x2 6xy + 2y 2 + 10x + 2y 5
(c) xy 2 + x3 y xy
Which of the critical points are also global maxima or global minima?
2. Let S Rn be an open set and f : S ! R be a twice continuously di¤erentiable function.
Suppose that Df (x ) = 0. State the weakest su¢ cient conditions that the Hessian must
satisfy at the critical point x for:
(i) x to be a local max;
(ii) x to be a strict local min.

3. Check whether f (x; y) = x4 + x2 y 2 + y 4 3x 8y is concave or convex by using the


Hessian.

4. [Harder] Solve the following problem


maxx;y minfx; yg x2 y2 .

5. Find the optimal solution for the following program


maxx;y x subject to x3 + y 2 = 0.
Is the Lagrange approach appropriate?

6. Solve the following problem


maxx1 ;x2 x21 x2 subject to 2x21 + x22 = 3.
1 3
7. [Harder] Solve the following problem when a 2 ;
2 2

maxx;y 0 x2 + y 2 subject to ax + y = 1.

8. [Harder] Let X be a convex subset of Rn , f : X ! R a concave function, g : X ! Rm a


convex function, a is a vector in Rm . Consider the following problem
maxx2X f (x) subject to g(x) a.
What is the Largrangian for this problem? Prove that the Largangian is a concave
function of the choice variable x on X.

3
9. Consider the problem of maximizing xyz subject to x + y + z 1, x 0, y 0, and
z 0. Obviously, the three latter constraints do not bind, and we can concentrate only
on the …rst constraint, x + y + z 1. Find the solution and the Lagrange multiplier,
and show how the optimal value would change if instead the constraint was changed to
x + y + z 9=10.

10. [Harder] Consider a function f : Rn ! R satisfying:

u(x) if g(x) 0
f (x) = .
v(x) if g(x) 0

Further, suppose that: (i) u(x) = v(x) if g(x) = 0; (ii) u and v are di¤erentiable, strictly
concave, and posses maximizer in Rn ; and (iii) g(x) is di¤erentiable and strictly convex.
Carefully explain how you would solve the problem of maximizing f by choosing x 2 Rn .

4
London School of Economics Dr Francesco Nava
Department of Economics O¢ ce: 32L.3.20

EC400 –Optimization and Fixed Points


Problem Set 3: Comparative Statics and Correspondences

1. [Harder] For a > 0, consider the problem:

maxx;y ax + y
s.t. x2 + (x y)2 1
x a, y 0.

Using the Kuhn-Tucker approach, write down the necessary …rst order conditions that
must be satis…ed by the solution of the constrained optimization problem. Are solutions to
these conditions also maximizers of the Lagrangian? Solve the constrained optimization
problem in terms of a. Find the slope of value function with respect to a, possibly relying
on the Envelope Theorem.

2. Let a, x and y be non-negative. Consider the problem of maximizing xy subject to


x2 + ay 2 1. What happens to the optimal value when a increases marginally?

3. [Harder] Assume that the utility function of a consumer satis…es

u(x; y) = x + y 1=2 .

The consumer has a positive income I > 0 and faces positive prices px > 0 and py > 0.
The consumer cannot buy negative amounts of any of the goods.

(a) Use Kuhn-Tucker to solve the consumer’s problem.


(b) Show how the utility at the maximum depends on I.
(c) For the case of an interior solution, …nd how the endogenous variables change when
I and px change. That is, compute:
@x @y @ 0 @x @y @ 0
, , ; , , .
@I @I @I @px @px @px

4. [Harder] Consider the system of equations:

x1 ex2
F(xjb) = = 0.
ex1 + x2 b2

Argue whether you can invoke the implicit function theorem. Then, exploit the theorem
where possible to …nd the slopes of the implicit functions f (b).

5
5. For each of the following correspondences h : R R show whether they are convex-
valued, upper-hemicontinuous or lower-hemicontinuous:

(a) h(x) = [5x; 10x) for x 2 [0; 1];


(b) h(x) = f5x; 10xg for x 2 [0; 1];
1=2 for x 2 [0; 1)
(c) h(x) = .
[0; 1] for x=1

6. Consider the correspondence h : R R where

[0; 1=2] if x > 1=3


h(x) = .
[1=2; 1] if x < 1=3

Argue whether the Kakutani’s Fixed Point Theorem applies and …nd all the …xed points.

7. Consider the correspondence h : R2 R2 , where h(x) = (h1 (x); h2 (x)),


8 8
< 1 if x2 > 1=3 < 1 if x1 < 1=2
h1 (x) = [0; 1] if x2 = 1=3 and h2 (x) = [0; 1] if x1 = 1=2 .
: :
0 if x2 < 1=3 0 if x1 > 1=2

Show how you can expolit Kakutani’s Fixed Point Theorem despite the domain being
open, and …nd all the …xed points.

8. [Harder] For all values of b 0, solve the problem

x (b) = arg maxx2[0;1]2 bx1 + x2 s.t. x1 + x2 = 1.

Show that x (b) is non-empty, compact-valued, convex-valued, and upper-hemicontinuous.


Would this have followed also from the theorem of the maximum? Are the assumptions
of the theorem satis…ed?

You might also like