LQR Lagrange

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

EE363 Winter 2008-09

Lecture 2
LQR via Lagrange multipliers

• useful matrix identities

• linearly constrained optimization

• LQR via constrained optimization

2–1
Some useful matrix identities

let’s start with a simple one:

Z(I + Z)−1 = I − (I + Z)−1

(provided I + Z is invertible)

to verify this identity, we start with

I = (I + Z)(I + Z)−1 = (I + Z)−1 + Z(I + Z)−1

re-arrange terms to get identity

LQR via Lagrange multipliers 2–2


an identity that’s a bit more complicated:

(I + XY )−1 = I − X(I + Y X)−1Y

(if either inverse exists, then the other does; in fact


det(I + XY ) = det(I + Y X))

to verify:

−1
Y (I + XY ) = I + XY − X(I + Y X)−1Y (I + XY )

I − X(I + Y X)
= I + XY − X(I + Y X)−1(I + Y X)Y
= I + XY − XY = I

LQR via Lagrange multipliers 2–3


another identity:
Y (I + XY )−1 = (I + Y X)−1Y

to verify this one, start with Y (I + XY ) = (I + Y X)Y

then multiply on left by (I + Y X)−1, on right by (I + XY )−1

• note dimensions of inverses not necessarily the same

• mnemonic: lefthand Y moves into inverse, pushes righthand Y out . . .

LQR via Lagrange multipliers 2–4


and one more:

(I + XZ −1Y )−1 = I − X(Z + Y X)−1Y

let’s check:

−1
−1 −1
−1 −1
I + X(Z Y) = I −X I +Z YX Z Y
= I − X(Z(I + Z −1Y X))−1Y
= I − X(Z + Y X)−1Y

LQR via Lagrange multipliers 2–5


Example: rank one update

• suppose we’ve already calculated or know A−1, where A ∈ Rn×n

• we need to calculate (A + bcT )−1, where b, c ∈ Rn


(A + bcT is called a rank one update of A)

we’ll use another identity, called matrix inversion lemma:

1
(A + bcT )−1 = A−1 − (A−1b)(cT A−1)
1 + cT A−1b

note that RHS is easy to calculate since we know A−1

LQR via Lagrange multipliers 2–6


more general form of matrix inversion lemma:

−1 −1 −1 −1
−1
(A + BC) =A −A B I + CA B CA−1

let’s verify it:

−1 −1
−1
(A + BC) = A(I + A BC)
= (I + (A−1B)C)−1A−1
−1 −1 −1
C A−1

= I − (A B)(I + C(A B))
= A−1 − A−1B(I + CA−1B)−1CA−1

LQR via Lagrange multipliers 2–7


Another formula for the Riccati recursion

Pt−1 = Q + AT PtA − AT PtB(R + B T PtB)−1B T PtA


T T −1 T

= Q + A Pt I − B(R + B PtB) B Pt A
T T −1 −1 T

= Q + A Pt I − B((I + B PtBR B Pt A )R)
T −1 T −1 −1 T

= Q + A Pt I − BR (I + B PtBR ) B Pt A
T −1 T
−1
= Q + A Pt I + BR B Pt A
T −1 T −1

= Q + A I + PtBR B Pt A

or, in pretty, symmetric form:


 −1
T 1/2 1/2 1/2 1/2
Pt−1 = Q + A Pt I+ Pt BR−1B T Pt Pt A

LQR via Lagrange multipliers 2–8


Linearly constrained optimization

minimize f (x)
subject to F x = g

• f : Rn → R is smooth objective function

• F ∈ Rm×n is fat

form Lagrangian L(x, λ) = f (x) + λT (g − F x) (λ is Lagrange multiplier )


if x is optimal, then

∇xL = ∇f (x) − F T λ = 0, ∇λ L = g − F x = 0

i.e., ∇f (x) = F T λ for some λ ∈ Rm


(generalizes optimality condition ∇f (x) = 0 for unconstrained
minimization problem)

LQR via Lagrange multipliers 2–9


Picture

∇f

{x | F x = g}
f (x) = constant

∇f

∇f (x) = F T λ for some λ ⇐⇒ ∇f (x) ∈ R(F T ) ⇐⇒ ∇f (x) ⊥ N (F )

LQR via Lagrange multipliers 2–10


Feasible descent direction

suppose x is current, feasible point (i.e., F x = g)

consider a small step in direction v, to x + hv (h small, positive)

when is x + hv better than x?

need x + hv feasible: F (x + hv) = g + hF v = g, so F v = 0

v ∈ N (F ) is called a feasible direction

we need x + hv to have smaller objective than x:

f (x + hv) ≈ f (x) + h∇f (x)T v < f (x)

so we need ∇f (x)T v < 0 (called a descent direction)

(if ∇f (x)T v > 0, −v is a descent direction, so we need only ∇f (x)T v 6= 0)

x is not optimal if there exists a feasible descent direction

LQR via Lagrange multipliers 2–11


if x is optimal, every feasible direction satisfies ∇f (x)T v = 0

F v = 0 ⇒ ∇f (x)T v = 0 ⇐⇒ N (F ) ⊆ N (∇f (x)T )


⇐⇒ R(F T ) ⊇ R(∇f (x))
⇐⇒ ∇f (x) ∈ R(F T )
⇐⇒ ∇f (x) = F T λ for some λ ∈ Rm
⇐⇒ ∇f (x) ⊥ N (F )

LQR via Lagrange multipliers 2–12


LQR as constrained minimization problem

1
PN −1
xTt Qxt uTt Rut + 12 xTN Qf xN

minimize J= 2 t=0 +
subject to xt+1 = Axt + But, t = 0, . . . , N − 1

• variables are u0, . . . , uN −1 and x1, . . . , xN


(x0 = xinit is given)

• objective is (convex) quadratic


(factor 1/2 in objective is for convenience)

introduce Lagrange multipliers λ1, . . . , λN ∈ Rn and form Lagrangian

N
X −1
L=J+ λTt+1 (Axt + But − xt+1)
t=0

LQR via Lagrange multipliers 2–13


Optimality conditions

we have xt+1 = Axt + But for t = 0, . . . , N − 1, x0 = xinit

for t = 0, . . . , N − 1, ∇ut L = Rut + B T λt+1 = 0

hence, ut = −R−1B T λt+1

for t = 1, . . . , N − 1, ∇xt L = Qxt + AT λt+1 − λt = 0

hence, λt = AT λt+1 + Qxt

∇xN L = Qf xN − λN = 0, so λN = Qf xN

these are a set of linear equations in the variables

u0, . . . , uN −1, x1, . . . , xN , λ1, . . . , λN

LQR via Lagrange multipliers 2–14


Co-state equations

optimality conditions break into two parts:

xt+1 = Axt + But, x0 = xinit

this recursion for state x runs forward in time, with initial condition

λt = AT λt+1 + Qxt, λN = Qf xN

this recursion for λ runs backward in time, with final condition

• λ is called co-state
• recursion for λ sometimes called adjoint system

LQR via Lagrange multipliers 2–15


Solution via Riccati recursion

we will see that λt = Ptxt, where Pt is the min-cost-to-go matrix defined


by the Riccati recursion

thus, Riccati recursion gives clever way to solve this set of linear equations

it holds for t = N , since PN = Qf and λN = Qf xN

now suppose it holds for t + 1, i.e., λt+1 = Pt+1xt+1

let’s show it holds for t, i.e., λt = Ptxt

using xt+1 = Axt + But and ut = −R−1B T λt+1,

λt+1 = Pt+1(Axt + But) = Pt+1(Axt − BR−1B T λt+1)

so
λt+1 = (I + Pt+1BR−1B T )−1Pt+1Axt

LQR via Lagrange multipliers 2–16


using λt = AT λt+1 + Qxt, we get

λt = AT (I + Pt+1BR−1B T )−1Pt+1Axt + Qxt = Ptxt

since by the Riccati recursion

Pt = Q + AT (I + Pt+1BR−1B T )−1Pt+1A

this proves λt = Ptxt

LQR via Lagrange multipliers 2–17


let’s check that our two formulas for ut are consistent:

ut = −R−1B T λt+1
= −R−1B T (I + Pt+1BR−1B T )−1Pt+1Axt
= −R−1(I + B T Pt+1BR−1)−1B T Pt+1Axt
= −((I + B T Pt+1BR−1)R)−1B T Pt+1Axt
= −(R + B T Pt+1B)−1B T Pt+1Axt

which is what we had before

LQR via Lagrange multipliers 2–18

You might also like