Notes On Adjoint Methods MIT
Notes On Adjoint Methods MIT
Notes On Adjoint Methods MIT
335
Steven G. Johnson
Created Spring 2006, updated December 17, 2012.
Introduction
(1)
and then we multiply a single vector T by our M P
dg
= gp T fp .
(3)
matrix for only (MP) work. Putting it all together,
dp f=0
we obtain:
The only difference is that the adjoint equation (2) is
dg
T
T
not simply the adjoint of the equation for x. Still, it
= gp fp = gp (Ap x bp ).
dp f=0
is a single M M linear equation for that should
be of comparable (or lesser) difficulty to solving for
Again, A(p) and b(p) are presumably specified an- x.
alytically and thus Ap and bp can easily be computed (in some cases automatically, by automatic
program differentiators such as ADIFOR). Note that 4 Eigenproblems
the adjoint problem is of the same size as the original Ax = b system, can use the same factorization As a more complicated example illustrating the use
(e.g. LU factorization A = LU immediately gives of equations (2) and (3) from the previous sections,
AT = U T LT ), has the same condition number, and let us suppose that we are solving a linear eigenhas the same spectrum of eigenvalues (the eigenval- problem Ax = x and looking at some function
ues of A and AT are identical) so iterative algorithms g(x, , p). For simplicity, assume that A is reali.e.,
will have similar performance (and can use similar symmetric and that is simple (non-degenerate;
6 In this case, we
x
is
the
only
eigenvector
for
).
preconditioners)in every sense, solving the adjoint
problem should be no harder than solving the origi- now have M + 1 unknowns described by the column
vector:
nal problem.
x
x =
.
sides will be significantly more costly than solving a single right- M + 1 equations f = 0 where:
hand side for the adjoint formulation.
4 Another way of looking at this, and the source of the nof
f =
.
tation, is to think of sort of a Lagrange multiplier process: rexT x 1
Nonlinear equations
6 Problems involving degenerate eigenvalues occur surprisingly often in optimization of eigenvalues (e.g. when maximizing the minimum eigenvalue of some system), and must be treated
with special care. In that case, a generalization of the gradient
is required to determine sensitivities or the steepest-descent direction [4], a more elaborate version of what is called degenerate
perturbation theory in quantum mechanics [?].
=
.
0.2
0.15
0.1
= gTx 2 x,
(A )
(4)
xT = g .
(5)
0.05
(6)
Now, however, we will specify a particular 0 (x) and
find the V (x) that gives (x) 0 (x) for the groundstate eigenfunction (i.e. for the smallest eigenvalue
E). In particular, we will find the V (x) that minimizes
Z
g=
1
To solve this numerically, we will discretize the interval x [1, 1) with M equally-spaced points xn =
2
), and solve for the solution (xn )
nx (x = M+1
at these points, denoted by the vector . That is,
to compare with the notation of the previous sections, we have the eigenvector x = , the eigenvalue
= E, and the parameters V (xn ) or p = V. If we
discretize the eigenoperator with the usual center = E
for:
difference scheme, we get A
2 1 0
0 1
1 2 1 0
1 0 1 2 1 0
A= 2 .
+diag(V).
..
x ..
.
1 2 1
1 0
0 1 2
, V) = (
0 )T (
0 )x
g(
8 We
also
R have an arbitrary choice of sign, which we fix by
choosing dx > 0.
where 0 is 0 (xn ), our target eigenfunction. There 0 )T x and thus, by eq. (6), we
fore, g = 2(
find via:
= 2P(
0 )x,
(A E)
(8)
0.3
= 0 (
= 0 since gE = 0). gV and gE are
with P
both 0. Moreover, AVn is simply the matrix with 1 at
(n, n) and 0s elsewhere, and thus from (7):
0.25
0.2
0.15
0.1
dg
= n n
dVn
0
0.05
V / 10000
dg
where is the point=
or equivalently dV
0.05
wise product (.* in Matlab).
0.1
Whew! Now how do we solve these equations nu0.15
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
merically? This is illustrated by the Matlab function
x
schrodinger_fd_adj given below. We set up A as
a sparse matrix, then find the smallest eigenvalue and
Figure 2: Optimized V (x) (scaled by 1/10000) and
eigenvector via the eigs function (which uses an it(x) for 0 (x) = 1 |x| for |x| < 0.5, after 5000 cg
erative Arnoldi method). Then we solve (8) for via
iterations.
the Matlab pcg function (preconditioned conjugategradient, although we dont bother with a preconditioner).
dg
Then, given g and dV
, we then just plug it into
some optimization algorithm. In particular, nonlinear conjugate gradient seems to work well for this
problem.9
5.1
Optimization results
0.2
(x)
10 cg iterations
In this section, we give a few example results from
20
0.18
40
80
running the above procedure (nonlinear cg optimiza0.16
160
320
tion) for M = 100. As the starting guess for our opti5000
0.14
mization, well just use V (x) = 0. That is, we are do0.12
ing a local optimization in a 100-dimensional space,
0.1
using the adjoint method to get the gradient. It is
0.08
somewhat remarkable that this worksin a few sec0.06
onds on a PC, it converges to a very good solution!
0.04
Well try a couple of example 0 (x) functions. To
0.02
start with, lets do 0 (x) = 1 + sin[x + cos(3x)].
0
(Note that the ground-state will never have any
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
x
nodes, so we require 0 0 everywhere.) This
0 (x), along with the resulting (x) and V (x) after
500 cg iterations, are shown in figure 1. The solution Figure 3: Optimized (x) for 0 (x) = 1 |x|
(x) matches 0 (x) very well except for a couple for |x| < 0.5, after various numbers of nonlinear
of small ripples, and V (x) is quite complicatednot conjugate-gradient iterations (from 10 to 10000).
something you could easily guess!
Oh, but that 0 was too easy! Lets try one with 5.2 Matlab code
discontinuities: 0 (x) = 1 |x| for |x| < 0.5 and 0
dg
, not to
The following code solves for g and dV
otherwise (which looks a bit like a house). This
mention the eigenfunction and the corresponding
0 (x), along with the resulting (x) and V (x) after
eigenvalue E, for a given V and 0 .
500 cg iterations, are shown in figure 2. Amazingly,
it still captures 0 pretty well, although it has a bit
% Usage: [g,gp,E,psi] = schrodinger_fd_adj(x, V, psi0)
more trouble with the discontinuities than with the
%
slope discontinuity. This time, we let it converge for
% Given a column-vector x(:) of N equally spaced x points an
5000 cg iterations to give it a bit more time. Was this
% V of the potential V(x) at those points, solves Schrodinge
really necessary? In figure 3, we plot (x) for 10,
%
[ -d^2/dx^2 + V(x) ] psi(x) = E psi(x)
20, 40, 80, 160, 320, and 5000 cg iterations. It gets
% with periodic boundaries for the lowest "ground state" eig
the rough shape pretty quickly, but the discontinuous
% wavefunction psi.
features are converging fairly slowly. (Presumably
%
this could be improved if we found a good precondi% Furthermore, it computes the function g = integral |psi tioner, or perhaps by a different optimization method
% the gradient gp = dg/dV (at each point x).
or objective function.)
function [g,gp,E,psi] = schrodinger_fd_adj(x, V, psi0)
dx = x(2) - x(1);
N = length(x);
A = spdiags([ones(N,1), -2 * ones(N,1), ones(N,1)], -1:1,
A(1,N) = 1;
A(N,1) = 1;
A = - A / dx^2 + spdiags(V, 0, N,N);
opts.disp = 0;
[psi,E] = eigs(A, 1, sa, opts);
E = E(1,1);
if sum(psi) < 0
psi = -psi; % pick sign; note that psi * psi = 1 from e
end
gpsi = psi - psi0;
g = gpsi * gpsi * dx;
gpsi = gpsi * 2*dx;
Initial-value problems
matrix exponential
turns out to be more complicated:
R
0
0
Ap = 0t eBt Bp eB(tt ) dt 0 , and so,
=B
References
[1] R. M. Errico, What is an adjoint model?, Bulletin Am. Meteorological Soc., vol. 78, pp. 2577
2591, 1997.
[2] Y. Cao, S. Li, L. Petzold, and R. Serban,
Adjoint sensitivity analysis for differentialalgebraic equations: The adjoint DAE system
and its numerical solution, SIAM J. Sci. Comput., vol. 24, no. 3, pp. 10761089, 2003.
[3] G. Strang, Computational Science and Engineering. Wellesley, MA: Wellesley-Cambridge
Press, 2007.
[4] A. P. Seyranian, E. Lund, and N. Olhoff, Multiple eigenvalues in structural optimization problems, Structural Optimization, vol. 8, pp. 207
227, 1994.
dg
= gp T (Ap x bp ).
dp
Now, since A = eBt , one might be tempted to write
Ap = Bpt A, but this is not true except in the very
special case where Bp commutes with B! Unfortunately, the general expression for differentiating a
6