Basic Sens Analysis Review PDF
Basic Sens Analysis Review PDF
Basic Sens Analysis Review PDF
AIAA 2012-1589
Nomenclature
n
nf
nx
ny
ei
f
r
ri
v
x
y
yi
F
R
Ri
Yi
V
v
df
dx
[0, . . . , 1, . . . , 0]T
[f1 , . . . , fnf ]T
[r1 , . . . , rny ]T
[r1,i , . . . , rny ,i ]T
[v1 , . . . , vn ]T
[x1 , . . . , xnx ]T
[y1 , . . . , yny ]T
[y1,i , . . . , yny ,i ]T
[F1 , . . . , Fnf ]T
[R1 , . . . , Rny ]T
[R1,i , . . . , Rny ,i ]T
[Y1,i , . . . , Yny ,i ]T
[V1 , . . . , Vn ]T
[v1 ,. . . , vn ]T
dfi
dx
" j #nf nx
F
Fi
x
xj
nf nx
dvi
Dv
" dvj #nn
Vi
DV
vj
nn
Associate
Ph.D.
1 of 26
Copyright 2012 by the authors. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.
I.
Introduction
The computation of derivatives is part of the broader field of sensitivity analysis, which is the study of how the
outputs of a model change in response to changes in the inputs of the same model, and plays a key role in gradientbased optimization, uncertainty quantification, error analysis, model development, and computational model-assisted
decision making. There are various types of sensitivities that can be defined. One common classification distinguishes
between local and global sensitivity analysis [1]. Global sensitivity analysis aims to quantify the response with respect
to inputs over a wide range of values, and it is better suited for models that have large uncertainties. Local sensitivity
analysis aims to quantify the response for a fixed set of inputs, and is typically used in physics based models where the
uncertainties tend to be lower. In the present review, we focus on the computation of local sensitivities in the form of
first-order total derivatives, where the model is a numerical algorithm. The computational models are assumed to be
deterministic. Although stochastic models require approaches that are beyond the scope of this paper, some of these
approaches can benefit from the deterministic techniques presented herein.
Derivatives play a central role in many numerical algorithms. In many cases, such as in Newton-based methods,
the computational effort of an algorithm depends heavily on the run time and memory requirements of the computation of the derivatives. Examples of such algorithms include NewtonKrylov methods applied to the solution of the
Euler equations [2], coupled aerostructural equations [3, 4, 5], and quasi-Newton methods used to solve optimization
problems [6, 7]. Other applications of derivatives include gradient-enhanced surrogate models [8], structural topology
optimization [9, 10, 11, 12], aerostructural optimization [13, 14] and aircraft stability [15, 16].
The accuracy of the derivative computation affects the convergence behavior of the solver used in the algorithm.
For instance, accurate derivatives are important in gradient-based optimization to ensure robust and efficient convergence, especially for problems with large numbers of constraints. The precision of the gradients limits that of the
optimum solution, and inaccurate gradients can cause the optimizer to halt or to take a less direct route to the optimum
that involves more iterations.
In this review, for generality, we consider the numerical models to be algorithms that solve a set of governing
equations to find the state of a system. The computational effort involved in these numerical models, or simulations,
is assumed to be significant. Examples of such simulations include computational fluid dynamics (CFD) and structural finite-element solvers. We also extend our review to consider multiple coupled simulations, which appear in
multidisciplinary design optimization (MDO) problems.
The simplest method for computing derivatives is the use of an appropriate finite-difference formula, such as a
forward finite-difference, where each input of interest is perturbed and the output reevaluated to determine its new
value. The derivative is then estimated by taking the difference in the output relative to the unperturbed one and
dividing by the value of the perturbation. Although finite differences are not known for being particularly accurate or
computationally efficient, they are extremely easy to implement and therefore widely used.
In addition to inaccuracies inherent in finite-differences, computing sensitivities with respect to a large number of
inputs using these methods is prohibitively expensive when the computational cost of the simulations is significant.
Most applications require more accuracy and efficiency than is afforded by this approach, motivating the pursuit of the
more advanced methods that we describe in this paper.
The overarching goal of this paper is to review the available methods for the sensitivity analysis of coupled systems
and to advance the understanding of these methods in a unified mathematical framework. Some of this material has
been the subject of excellent reviews and textbooks, but they have either been limited to a single discipline [17, 18,
19, 20] or limited in scope [21, 22, 23]. Spurred by the recent advances in this area, we decided to write this review
and connect some methods that are usually not explained together, leading to new insights and a broader view of the
subject. In addition to a deeper understanding of the topic, we aim to help researchers and practitioners decide which
method is suitable for their particular needs.
We start this review by defining the nomenclature and the context of the theory. Then we progress through the
theory of sensitivity analysis for single systems and connect the various methods under a unified mathematical framework. Finally, we extend this theory to the sensitivity analysis of coupled systems, and present some recent advances.
The historical literature is cited as the theory is presented.
2 of 26
American Institute of Aeronautics and Astronautics
Black box
Finitedifference
Complexstep
Classification
of methods
for computing
derivatives
Differentiation
methods
Solver
Level of
decomposition
Discipline
Symbolic
Line of code
Figure 1: Classification of methods for derivative computations: the differentation methods are the building blocks for
other methods, each of which considers a different level of decomposition.
II.
Differentiation of a Function
Throughout this paper we assume that we want ultimately to compute the derivatives of a vector-valued function
f with respect to a vector of independent variables x, i.e., we want the Jacobian,
df1
df1
dx1
dxnx
df
..
..
.
..
= .
(1)
.
dx
dfnf
dfnf
dx1
dxnx
which is an nf nx matrix.
A. Finite Differences
Finite-difference formulas are derived from combining Taylor series expansions. Using the right combinations of
these expansions, it is possible to obtain finite-difference formulas that estimate an arbitrary order derivative with any
required order truncation error. The simplest finite-difference formula can be directly derived from one Taylor series
expansion, yielding
f (x + ej h) f (x)
df
=
+ O(h)
(2)
dxj
h
which is directly related to the definition of derivative. Note that in general there are multiple functions of interest,
and thus f can be a vector that includes all the outputs of a given component. The application of this formula requires
the evaluation of a component at the reference point x, and one perturbed point x + ej h, and yields one column
of the Jacobian (1). Each additional column requires an additional evaluation of the component. Hence, the cost of
computing the complete Jacobian is proportional to the number of input variables of interest, nx .
Finite-difference methods are widely used to compute derivatives due to their simplicity and the fact that they can
be implemented even when a given component is a black box. Most gradient-based optimization algorithms perform
finite-differences by default when the user does not provide the required gradients.
When it comes to accuracy, we can see from the forward-difference formula (2) that the truncation error is proportional to the magnitude of the perturbation, h. Thus it is desirable to decrease h as much as possible. The problem
with decreasing h is that the perturbed value of the functions of interest will approach the reference values. When
using finite-precision arithmetic, this leads to subtractive cancellation: a loss of significant digits in the subtraction
operation. In the extreme case, when h is small enough, all digits of the perturbed functions will match the reference
values, yielding zero for the derivatives. Given the opposite trends exhibited by the subtractive cancellation error and
truncation error, for each x there is a best h that minimizes the overall error.
Due to their flexibility, finite-difference formulas can always be used to compute derivatives, at any level of nesting.
They can be used to compute derivatives of a single function, composite functions, iterative functions or any system
with multiply nested components.
3 of 26
American Institute of Aeronautics and Astronautics
B. Complex Step
The complex-step derivative approximation, strangely enough, computes derivatives of real functions using complex
variables. This method originated with the work of Lyness and Moler [24] and Lyness [25]. They developed several
methods that made use of complex variables, including a reliable method for calculating the nth derivative of an
analytic function. However, only later was this theory rediscovered by Squire and Trapp [26], who derived a simple
formula for estimating the first derivative.
The complex-step derivative approximation, like finite-difference formulas, can also be derived using a Taylor
series expansion. Rather than using a real step h, we now use a pure imaginary step, ih. If f is a real function in real
variables and it is also analytic, we can expand it in a Taylor series about a real point x as follows,
f (x + ihej ) = f (x) + ih
ih3 d3 f
h2 d2 f
df
+ ...
dxj
2 dx2j
6 dx3j
(3)
Taking the imaginary parts of both sides of this equation and dividing it by h yields
df
Im [f (x + ihej )]
=
+ O(h2 )
dxj
h
(4)
Hence the approximations is a O(h2 ) estimate of the derivative. Like a finite-difference formula, each additional
evaluation results in a column of the Jacobian (1), and the cost of computing the required derivatives is proportional to
the number of design variables, nx .
Because there is no subtraction operation in the complex-step derivative approximation (4), the only source of
numerical error is the truncation error, which is O(h2 ). By decreasing h to a small enough value, the truncation error
can be made to be of the same order as the numerical precision of the evaluation of f .
The first application of this approach to an iterative solver is due to Anderson et al. [27], who used it to compute derivatives of a NavierStokes solver, and later multidisciplinary systems [28]. Martins et al. [29] showed that
the complex-step method is generally applicable to any algorithm and described the detailed procedure for its implementation. They also present an alternative way of deriving and understanding the complex step, and connect this to
algorithmic differentiation.
The complex-step method requires access to the source code of the given component, and thus cannot be applied to
black box components without additional effort. To implement the complex-step method, the source code of the component must be modified so that all real variables and computations are replaced with complex ones. In addition, some
intrinsic functions need to be replaced, depending on the programming language. Martins et al. [29] provide a script
that facilitates the implementation of the complex-step method to Fortran codes, as well as details for implementation
in Matlab, C/C++ and Python.
Figure 2 illustrates the difference between the complex-step and finite-difference formulas. When using the
complex-step method, the differencing quotient is evaluated using the imaginary parts of the function values and
step size, and the quantity f (xj ) has no imaginary component to subtract.
Im
(x, ih)
(x, 0)
(x + h, 0)
df
f (x + h) f (x)
dx
h
Re
(x, 0)
Re
df
Im[f (x + ih)] Im[f (x)]
Im[f (x + ih)]
=
dx
Im[ih]
h
Figure 2: Derivative approximations df / dx using a forward step in the real (left) and complex (right) axes. Here,
f and x are scalars. In the complex-step method, there is no subtraction operation involved because the value of the
initial point, f (x), has no imaginary part.
The complex-step approach is now widely used, with applications ranging from the verification of high-fidelity
aerostructural derivatives [30, 14] to development of immunology models [31]. In one case, the complex-step was
4 of 26
American Institute of Aeronautics and Astronautics
implemented in an aircraft design optimization framework that involves multiple languages (Python, C, C++ and
Fortran) [32], demonstrating the flexibility of this approach.
C. Symbolic Differentiation
Symbolic differentiation is only possible for explicit functions, and can either be done by hand or by appropriate
software. For a sequence of composite functions, it is possible to use the chain rule, but symbolic differentiation
becomes impossible for general algorithms.
III.
The methods of differentiation presented in the previous section are limited in scope to computing derivatives of
a single function or a more complex system without regard to its internal dependencies and structure. These methods
are the building blocks of more sophisticated methods that are the focus of this section.
A. Variables, Components and System
To make sure the methods are explained for the most general case and show how the various methods can be derived
under a single theoretical framework, it is important to characterize the computational model that is involved and
precisely define the relevant terms. In the most general sense, a computational model takes a series of numerical
inputs and produces outputs. As previously mentioned, the computational model is assumed to be deterministic. The
computational model is ultimately implemented as a computer program. Depending on the type of method, we might
take the computational model view or the computer program view. In either case, we sometimes refer to the model or
program a system, since it is an interconnected series of computations.
It is particularly important to realize the nested nature of the system The most fundamental building blocks of
this system are the unary and binary operations. These operations can be combined to obtain more elaborate explicit
functions, which are typically expressed in one line of computer code. A more complex computation can be performed
by evaluating a sequence of explicit functions Vi , where i = 1, . . . , n. In its simplest form, each function in this
sequence depends only on the inputs and the functions that have been computed earlier in the sequence. Thus we can
represent such a computation as,
vi = Vi (v1 , v2 , . . . , vi1 ).
(5)
Here we adopt the convention that the lower case represents the value of a variable, and the upper case represents
the function that computes the value. This is a distinction that will be particularly useful in developing the theory
presented herein.
In the more general case, a given function might require values that have not been previously computed, i.e.,
vi = Vi (v1 , v2 , . . . , vi , . . . , vn ).
(6)
The solution of such systems require numerical methods that can be programmed by using loops. Numerical methods
range from simple fixed-point iterations to sophisticated Newton-type algorithms. Note that loops are also used to
repeat one or more computations over a computational grid.
One concept that will be used later is that it is always possible to represent any given computation without loops
and dependencies as written in Equation (5) if we unroll all the loops, and represent all values a variable might
take in the iteration as a separate variable that is never overwritten.
In the context of this paper, it is useful to generalize any computation that produces an output vector of variables
vout v for given an arbitrary set of input variables vin v. We write this computation as
vout = V (vin ),
(7)
and call it a component. What constitutes a component is somewhat arbitrary, but components are usually defined
and organized in a way that helps us understand the overall system. The boundaries of a given component are usually
determined by a number of practical factors, such as technical discipline, programming language, data dependencies
or developer team.
A given component is in part characterized by the input variables it requires, and by the variables that it outputs.
In the process of computing the output variables, a component might also set a number of other variables in v that are
neither inputs or output, which we call intermediate variables.
5 of 26
American Institute of Aeronautics and Astronautics
When a component is just a series of explicit functions, we can consider the component itself to be an explicit
composite function. In cases where the computation of the outputs requires iteration, it is helpful to denote the
computation as a vector of residual equations,
r = R(v) = 0
(8)
where the algorithm changes certain components of v until all the residuals converge to zero (or in practice, to within
a small specified tolerance). The subset of v that is iterated to achieve the solution of these equations are called the
state variables.
To relate these concepts to the usual conventions in sensitivity analysis, we now separate the subsets in v into
independent variables x, state variables y and quantities of interest, f . Note that these do not necessary correspond
exactly to the component inputs, intermediate variables and outputs, respectively. Using this notation, we can write
the residual equations as,
r = R(x, y(x)) = 0
(9)
where y(x) denotes the fact that y depends implicitly on x through the solution of the residual equations (9). It is the
solution of these equations that completely determines y for a given x. The functions of interest (usually included in
the set of component outputs) also have the same type of variable dependence in the general case, i.e.,
f = F (x, y(x)).
(10)
When we compute the values f , we assume that the state variables y have already been determined by the solution of
the residual equations (9). The dependencies involved in the computation of the functions of interest are represented
in Figure 3. For the purposes of this paper, we are ultimately interested in the total derivatives of quantities f with
respect to x.
x
x Rn x
y Rn y
r Rn y
R(x, y) = 0
F (x, y)
f R nf
Figure 3: Definition of the variables involved at the solver level, showing the dependency of the quantity of interest on
the design variables, both directly and through the residual equations that determine the system states
B. A Unified Framework
In this section, we present the mathematical framework that unifies the methods for computing total derivatives. The
methods differ in the extent to which they decompose a system, but they all come from a basic principle: a generalized
chain rule.
To arrive at this form of chain rule, we start from the sequence of variables (v1 , . . . , vn ), whose values are functions
of earlier variables, vi = Vi (v1 , . . . , vi1 ). For brevity, Vi (v1 , . . . , vi1 ) is written as vi (). We define a partial
derivative, Vi /vj , of a function Vi with respect to a variable vj as
Vi
Vi (v1 , . . . , vj1 , vj + h, vj+1 , . . . , vi1 ) Vi ()
=
.
vj
h
(11)
The total variation vk , due to a perturbation vj can be computed by using the sum of partial derivatives,
vk =
k1
X
l=j
Vk
vl
vl
(12)
where all intermediate vs between j and k are computed and used. The total derivative is defined as,
dvi
vi
=
,
dvj
vj
6 of 26
American Institute of Aeronautics and Astronautics
(13)
Using the two equations above, we can derive the following equation:
i1
X Vi dvk
dvi
,
= ij +
dvj
vk dvj
(14)
k=j
which expresses a total derivative in terms of the other total derivatives and the Jacobian of partial derivatives. Equation (14) is represents the chain rule for a system whose variables are v.
To get a better understanding of the structure of the chain rule (14), and the options for performing the computation
it represents, we now write it in matrix form. We can write the partial derivatives of the elementary functions Vi with
respect to vi as the square n n Jacobian matrix,
V2
0
v1
V3 V3
Vi
0
(15)
DV =
= v1 v2
vj
..
..
..
..
.
.
.
.
Vn
Vn
Vn
0
v1
v2
vn1
where D is a differential operator. The total derivatives of the variables vi form another Jacobian matrix of the same
size that has a unit diagonal,
1
0
dv2
1
0
dv1
dv3
dv3
dvi
1
0
Dv =
(16)
= dv1 v2
dvj
..
..
..
..
.
.
.
.
dvn
dvn
dvn
1
dv1
dv2
dvn1
Both of these matrices are lower triangular matrices, due to our assumption that we have unrolled all the loops.
Using this notation, the chain rule (14) can be writen as
Dv = I + DV Dv .
(17)
(I DV ) Dv = I.
(18)
(I DV ) DvT = I.
(19)
= (I DV ) DvT
(20)
We call the left and right hand sides of this equation the forward and reverse chain rule equations, respectively. As
we will see throughout this paper: All methods for derivative computation can be derived from one of the forms of
this chain rule (20) by changing what we mean by variables, which can be seen as a level of decomposition. The
various levels of decomposition were shown in Figure 1 and summarized later, in Table 1.
C. Algorithmic Differentiation
Algorithmic differentiation (AD) also known as computational differentiation or automatic differentiation is a
well known method based on the systematic application of the differentiation chain rule to computer programs [33, 34].
Although this approach is as accurate as an analytic method, it is potentially much easier to implement since the
implementation can be done automatically. To explain AD, we start by describing the basic theory and how it relates
to the chain rule identity (20) introduced in the previous section. We then explain how the method is implemented in
practice, and show an example.
7 of 26
American Institute of Aeronautics and Astronautics
From the AD perspective, the variables v in the chain rule (20) are all the variables assigned in the computer
program, and AD applies the chain rule for every single line in the program. The computer program thus can be
considered a sequence of explicit functions Vi , where i = 1, . . . , n. In its simplest form, each function in this sequence
depends only on the inputs and the functions that have been computed earlier in the sequence, as expressed in the
functional dependence (5).
Again, for this assumption to hold, we assume that all the loops in the program are unrolled, and therefore no
variables are overwritten and each variable only depends on earlier variables in the sequence. Later, when we explain
how AD is implemented, it will become clear that this assumption is not restrictive, as programs iterate the chain rule
(and thus the total derivatives) together with the program variables, converging to the correct total derivatives.
In the AD perspective, the independent variables x and the quantities of interest f are assumed to be in the vector
of variables v. Typically, the design variables are among the vs with lower indices, and the quantities of interest are
among the last quantities. Thus, to make clear the connection to the other derivative computation methods, we group
these variables as follows,
v = [v1 , . . . , vnx , . . . , vj , . . . , vi , . . . , v(nnf ) , . . . , vn ]T .
| {z }
|
{z
}
x
v1
v2
(21)
x
v3
v4
r1
.
.
r2
y1
y2
f
vn
Figure 4: Decomposition level for algorithmic differentiation: the variables v are all the variables assigned in the
computer program.
The chain rule (14) introduced in the previous section was
i1
X Vi dvk
dvi
= ij +
,
dvj
vk dvj
(22)
k=j
where the V represent explicit functions, each defined by a single line in the computer program. The partial derivatives,
Vi /vk can be automatically differentiated symbolically by applying another chain rule within the function defined
by the respective line.
The chain rule (22) can be solved in two ways. In the forward mode, we choose one vj and keep j fixed. Then
we work our way forward in the index i = 1, 2, . . . , n until we get the desired total derivative. In the reverse mode,
on the other hand, we fix vi (the quantity we want to differentiate) and work our way backward in the index j =
n, n 1, . . . , 1 all the way to the independent variables. We now describe these two modes in more detail, and
compare the computational costs associated with each of them.
8 of 26
American Institute of Aeronautics and Astronautics
1.
Forward Mode
To get a better understanding of the structure of the chain rule (14), and the options for performing that computation,
we now write it in the matrix form (18):
(I DV ) Dv = I
1
0
1
0
1 0
dv2
V2
1
0
1
0
dv1
0 1 0
v1
V3 V3
dv3
dv3
1
0
1
0
0
0
1
0
v1
v2
dv1
v2
(23)
=
.
.
. .
.
.
..
.
.
.
..
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
Vn
Vn
dvn
dvn
dvn
n
0
0
0
0
1
1
V
v
v
v
dv
dv
dv
n1
n1
The terms that we ultimately want to compute are the total derivatives of quantities of interest with respect to the design
variables, corresponding to a block in the Dv matrix in the lower left. Using the definition expressed in Equation (1),
this block is
dv(nnf )
dv(nnf )
df1
df1
dx1 dxn dv
dvnx
x
1
df
..
.
.
..
.
.
..
.. =
..
..
(24)
= .
,
.
dx
dfnf
dfnf dvn
dvn
dx1
dxnx
dv1
dvnx
which is an nf nx matrix.
The forward mode is equivalent to solving the linear system (24) for one column of Dv . Since (I DV ) is a lower
triangular matrix, this solution can be accomplished by forward substitution. In the process, We end up computing the
derivative of the chosen quantity with respect to all the other variables. The cost of this procedure is similar to the cost
of the procedure that computes the vs, and as we will see in Section 3, the forward AD operations are interspersed
with the operations that compute the vs in the original computer code.
2.
Reverse Mode
The matrix representation for the reverse mode of algorithmic differentiation is given by Equation (19), which expands
to,
T
(I DV ) DvT = I
dv3
dvn
2
3
n
2
1 V
V
V
1 dv
v1
v1
v1
dv1
dv1
dv1
1 0
0
V3
Vn
dv3
dvn
1
v2 v2 0
1
0 1 0
dv2
dv2
.
..
..
..
..
..
..
..
..
.
.
0
0
1
0
.
.
.
.
(25)
.
. =
.
.
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
..
.
.
.
.
dvn
..
..
. .
.
n
.
.
..
1 vVn1
1
.
dvn1
0 0 0
0
1
0
0
0
1
0
0
0
1
The block matrix we want to compute is in the upper right section of DvT and now its size is nx nf . As with the
forward mode, we need to solve this linear system one column at the time, but now each column yields the derivatives
of the chosen quantity with respect to all the other variables. Because the matrix (I DV )T is upper triangular, the
system can be solved using back substitution.
3.
For readers that are not familiar with AD, we have worked through an example in Appendix A, in which the chain
rule (14) is applied both in forward and reverse modes, and the chain rule identity (20) is evaluated numerically for
the same example.
The implementation of AD that intersperses lines of code that computes the derivatives with the original code is
called the source code transformation approach and is exemplified in the code listed in Figure 12. There is another
alternative to implementing AD: operator overloading [35, 33]. When using this approach, the original code does not
change, but the variable types and the operations are redefined. When using operator overloading, each real number
v is replaced by a type that includes not only the original real number, but the corresponding derivative as well, i.e.,
v = (v, dv). Then, all operations are redefined such that, in addition to the result of the original operations, they yield
the derivative of that operation as well [33].
9 of 26
American Institute of Aeronautics and Astronautics
One significant connection to make is that the complex-step method is equivalent to the forward mode of AD with
an operator overloading implementation, as explained by Martins et al. [29].
There are AD tools available for a most programming languages, including Fortran, C/C++[36, 37, 38], and Matlab.
They have been extensively developed and provide the user with great functionality, including the calculation of
higher-order derivatives and reverse mode options. ADIFOR [39], TAF [40], TAMC [41] and Tapenade [42, 43] are
some of the tools available for Fortran that use source transformation. The necessary changes to the source code are
made automatically. The operator overloading approach is used in the following tools: AD01 [35], ADOL-F [44],
IMAS [45] and OPTIMA90. Although it is in theory possible to have a script make the necessary changes in the
source code automatically, none of these tools have this facility and the changes must be done manually.
D. Analytic Methods
Analytic methods are the most accurate and efficient methods available for computing derivatives. However, analytic
methods are much more involved that the other methods, since they require detailed knowledge of the computational
model and a long implementation time. Analytic methods are applicable when we have a quantity of interest f that
depends implicitly on the independent variables of interest x, as previously described in Equation (10), which we
repeat here for convenience:
f = F (x, y(x)).
(26)
The implicit relationship between the state variables y and the independent variables is defined by the solution of a set
of residual equations, which we also repeat here:
r = R(x, y(x)) = 0.
(27)
By writing the computational model in this form, we have assume a discrete analytic approach. This is in contrast to the
continuous approach, in which the equations are not discretized until later. We will not discuss the continuous approach
in this paper, but ample literature can be found on the subject [46, 47, 48, 49], including discussions comparing the
two approaches [50, 51].
In this section we derive the two forms of the analytic method the direct and the adjoint in two ways. The
first derivation follows the derivation that is typically presented in the literature, while the second derivation is based
on the chain rule identity (20), and is a new perspective that connects it to algorithmic differentiation.
1.
Traditional Derivation
As a first step toward obtaining the derivatives that we ultimately want to compute, we use the chain rule to write the
total derivative Jacobian of f as
F
F dy
df
=
+
,
(28)
dx
x
y dx
where the result is an nf nx matrix. As previously mentioned it important to distinguish the total and partial
derivatives and define the context. The partial derivatives represent the variation of f = F (x) with respect to changes
in x for a fixed y, while the total derivative df / dx takes into account the change in y that is required to keep the
residual equations (27) equal to zero. As we have seen, this distinction depends on the context, i.e., what is considered
a total or partial derivative depends on the level that is being considered in the nested system of components.
We should also mention that the partial derivatives can be computed using the methods that we have described
earlier (finite differences and complex step), as well as the method that we describe in the next section (algorithmic
differentiation).
Since the governing equations must always be satisfied, the total derivative of the residuals (27) with respect to the
design variables must also be zero. Thus, using the chain rule we obtain,
dr
R R dy
=
+
= 0.
dx
x
y dx
(29)
The computation of the total derivative matrix dy/ dx in Equations (28) and (29) has a much higher computational
cost than any of the partial derivatives, since it requires the solution of the residual equations. The partial derivatives
can be computed by differentiating the function F with respect to x while keeping y constant.
The linearized residual equations (29) provide the means for computing the total sensitivity matrix dy/ dx, by
rewriting those equations as
R dy
R
=
.
(30)
y dx
x
10 of 26
American Institute of Aeronautics and Astronautics
Substituting this result into the total derivative equation (28), we obtain
dy
dx
}|
{
1
df
F
F R
R
=
.
dx
x
y y
x
{z
}
|
z
(31)
The inverse of the square Jacobian matrix R/y is not necessarily explicitly calculated.However, we use the inverse
to denote the fact that this matrix needs to be solved as a linear system with some right-hand-side vector.
Equation (31) shows that there are two ways of obtaining the total derivative matrix dy/ dx, depending on which
right-hand side is chosen for the solution of the linear system.
2.
Direct Method
The direct method consists in solving the linear system with R/x as the right-hand side vector, which results in
the linear system (30). This linear system needs to be solved for nx right-hand sides to get the full Jacobian matrix
dy/ dx. Then, we can use dy/ dx in Equation (28) to obtain the derivatives of interest, df / dx.
Since the cost of computing derivatives with the direct method is proportional to the number of design variables,
nx , it does not have much of a computational cost advantage relative to finite differencing. In a case where the
computational model is a nonlinear system, then the direct method can be advantageous. Both methods require the
solution of a system with the same size nx times, but the direct method solves a linear system, while the finitedifference method solves the original nonlinear one.
3.
Adjoint Method
Returning to the total sensitivity equation (31), we observe that there is an alternative option for computing the total
derivatives: The linear system involving the square Jacobian matrix R/y can be solved with f /y as the righthand side. This results in the following linear system, which we call the adjoint equations,
R
y
T
F
=
y
T
,
(32)
where we will call the adjoint matrix (of size ny nf ). Although this is usually expressed as a vector, we obtain a
matrix due to our generalization for the case where f is a vector. The solution of this linear system needs to be solved
T
for each column of [F /y] , and thus the computational cost is proportional to the number of quantities of interest,
nf . The adjoint vector can then be substituted into Equation (31) to find the total sensitivity,
df
F
R
=
+ T
dx
x
x
(33)
Thus, the cost of computing the total derivative matrix using the adjoint method is independent of the number of design
variables, nx , and instead proportional to the number of quantities of interest, f .
Note that the partial derivatives shown in these equations need to be computed using some other method. They
can be differentiated symbolically, computed by finite differences, the complex-step method or even AD. The use of
AD for these partials has been shown to be particularly effective in the development of analytic methods for PDE
solvers [52].
4.
The first step in this derivation is to clearly define the level of decomposition, i.e., how the variables v are defined.
Since the analytic methods apply to coupled systems of equations, the assumption that the Jacobians are lower triangular matrices does no longer apply. Therefore, we first linearize the residuals (27) so that it is possible to write explicit
equations for the state variables y. We linearize about the converged point [x0 , r0 , y0 , f0 ]T , and divide v into
v1 = x,
v2 = r,
v3 = y,
v4 = f .
11 of 26
American Institute of Aeronautics and Astronautics
(34)
x
r1
r2
y1
y2
v = [v1 , . . . , vnx , v(nx +1) , . . . , v(nx +ny ) , v(nx +ny +1) , . . . , v(nx +2ny ) , v(nnf ) , . . . , tn ]T .
| {z } |
{z
} |
{z
} |
{z
}
x
x
r
y
f
Figure 6: Dependence of the variations in the design variables, residuals, states and quantities of interest for the
linearized system
So instead of defining them as every single variable assignment in the computer program, we defined them as variations
in the design variables, residuals, state variables and quantities of interest. This decomposition is represented in
Figure 5. The dependence of these variations about the converged states is illustrated in Figure 6.
Since x are the only independent variables, we have an initial perturbation x that leads to a response r.
However, we require that r = 0 be satisfied when we take a total derivative, and therefore,
R=0
R
R
x +
y = 0
x
y
(35)
The solution vector y from this linear system is used in conjunction with the original perturbation vector x to
12 of 26
American Institute of Aeronautics and Astronautics
(36)
R
x
x
1
R
v3 = y =
(r)
y
F
F
v4 = f =
x +
y
x
y
v2 = r =
(37)
(38)
(39)
(40)
At this point, all variables are functions of only previous variables, so we can apply the forward and reverse chain
rule equations (20) to the linearized system with the definition (34). The result is the set of block matrix equations
show in Figure 7. The forward chain rule (18) yields the left column, which is the direct method. The right column
represents the adjoint method, which is obtained from the reverse chain rule (19).
IV.
We now extend the analytic methods derived in the previous section to multidisciplinary systems, where each
discipline is seen as one component. We start with the equations in the last row of Figure 7 (also repeated in the
first row of Figure 10), which represent the direct and adjoint methods, respectively, for a given system. In this form,
the Jacobian matrices for the direct and adjoint methods are block lower and upper triangular, respectively, but not
fully lower or upper triangular because we eliminate the inverse of the Jacobian R/y at the expense of creating a
linear system that must now be solved. This inverse Jacobian is necessary to obtain an explicit definition for y, but we
eliminate it because the solution of a linear system is cheaper than matrix inversion of the same size.
The direct and adjoint methods for multidisciplinary systems can be derived by partitioning the various variables
by disciplines, as follows,
R = [R1 R2 ]T y = [y1 y2 ]T
(41)
where we have assumed two different disciplines. All the design variables are included in x. Then, we use these
vectors in the equations in the direct and adjoint equations shown in the first row of Figure 10 to obtain the second
row in Figure 10. These are the coupled versions of the direct and adjoint methods, respectively. The coupled direct
method was first developed by Bloebaum and Sobieski [53, 22, 54]. The coupled adjoint was originally developed by
Martins et al. [30].
Figure 8 illustrates the level of decomposition that this involves, using earlier notation. In Figure 10, we can see
that this decomposition turns the Jacobian R/y into a matrix with distinct blocks corresponding to the different
disciplines. Figure 9(a) shows a graphical view of the two-discipline system.
Figure 9(b) and the third row in Figure 10 show another alternative for obtaining the total derivatives of multidisciplinary systems that was first developed by Sobieski [22] for the direct method, and by Martins et al. [30] for the
adjoint method. The advantage of this approach is that we do not need to know the residuals of a given disciplinary
solvers, but instead can use the coupling variables. To derive the direct and adjoint versions of this approach within
our mathematical framework, we define the artificial residual functions
Ri = Yi yi ,
(42)
where the yi vector contains the intermediate variables of the ith discipline, and Yi is the vector of functions that
explicitly define these intermediate variables. This leads to the third row of equations in Figure 10, which we call the
functional approach. This contrasts with the residual approach that we used previously.
The yi vector is treated as a vector of state variables in the terminology of systems with residuals. In general,
the functions in the vector Yi can depend on all other intermediate variables in the ith discipline as well as any other
disciplines, signifying that this development allows for coupling to be present among the intermediate variables.
To further generalize the computation of derivatives for multidisciplinary systems, consider a system with two
disciplines: one with residuals and one without, as shown in Figure 9(c). In a practical setting, this could correspond
to a problem with one discipline that has residuals that are nonlinear in its state variables and another discipline
where all intermediate variables are explicitly defined. Because of the high cost of the first discipline, it would be
13 of 26
American Institute of Aeronautics and Astronautics
I
V2
v1
V
3
v1
V4
v1
V3
v2
V4
v2
V4
v3
I
dv2
I
0
dv1
dv 0
3 =
0
0
dv1
0
dv4
I
dv1
0
V2
v1
T
I
R
y
0
1
I
F
y
0 I I
dr
0
0
dx =
dy
0
0
dx
df
0
I
dx
T
I
0
0
R
y
F
T
df
0
dx
df
0
dr
T
df =
F
dy
I
I
I
F
df
=
dy
y
1
df R
df
=
dr
dy y
F
df R
df
=
+
dx
x
dr x
(f) Back substituted (adjoint method)
0
T
R
y
dr
R
=
dx
x
1
dy
R
dr
=
dx
y
dx
F
F dy
df
=
+
dx
x
y dx
T
V4
dv4
v
dv
1 T
0
dv1
V4 4
dv =
v
2 T dv24 0
V4
I
dv
3
v3
I
I
T
V3
v
1 T
V3
v2
0 I I
dy
= 0
0
dx
df
I
0
dx
T
R
x
T
R
y
T
F
df
0
x
dx
T
F
df = 0
y dr
I
I
I
Figure 7: Derivation of the direct (left) and adjoint (right) methods from the forward and reverse chain rule, respectively. The top row shows the chain rule in block form with the four variable vectors; in the second row we replace
those vectors with the variables we defined, and the third row shows the equations after the solution of the block
matrix, which correspond to the traditional direct and adjoint equations. In the last row, the direct and adjoint methods are presented with the inverse of the R/y matrix eliminated, at the cost of the overall matrix no longer being
lower/upper triangular.
14 of 26
American Institute of Aeronautics and Astronautics
x
r1
r2
y1
y2
v = [v1 , . . . , vnx , . . . , v(nx +ny1 ) , . . . , v(nx +ny1 +ny2 ) , . . . , v(nx +2ny1 +ny2 ) , . . . , v(nx +2ny1 +2ny2 ) , v(nnf ) , . . . , tn ]T .
| {z } |
{z
} |
{z
} |
{z
} |
{z
} |
{z
}
x
r1
r2
y1
y2
valuable to be able to use the direct or adjoint method even with the second discipline added. Equations (g) and (h) in
Figure 10 show that this is possible with a hybrid formulation that combines elements from the residual and functional
approaches. Equations (g) and (h) can be generalized to any number of disciplines with a combination of residuals
(using the residual approach) and explicit intermediate variables (using the functional approach).
x
x
r1
r1
y1
r2
y1
y1
y2
y2
y2
(a) Residual
(b) Functional
(c) Hybrid
V.
Conclusions
In this paper we achieved what we had set out to accomplish: we derived all known methods for the computation of
derivatives for general multiply-nested systems under a single mathematical framework. One of the keys to achieving
this unification was the realization that we must view the components of the system at various levels: the elementary
function level, the line of code level, the composite function level, the numerical solver level, and the discipline level.
We first presented finite differences, the complex-step method and symbolic differentiation, and discussed their
15 of 26
American Institute of Aeronautics and Astronautics
0 I I
dy
= 0
0
dx
df
0
I
dx
R
y
F
T
R
x
T
R
R
1
R2
R1
y1
R2
y1
F
y1
R1
y2
R2
y2
F
y2
0 I I
dy
1
0
0
dx
=
dy2
0
0
dx
df
0
I
dx
Y
1
Y2
Y2
y1
F
y1
Y1
y2
F
y2
0 I I
dy
1
0
0
dx
=
dy2
0
0
dx
df
0
I
dx
R
1
Y2
R1
y1
Y2
y1
F
y1
R1
y2
I
F
y2
T
R1
x
T
R1
y
1 T
R1
y2
T
R2
x
T
R2
y
1 T
R2
y2
T
F
df
0
x
dx
T
df
F
dr1
y
1 T
df =
F
0
dr
y2
2
I
I
I
T
F
df
0
x
dx
T
F df
= 0
y dr
I
I
I
0 I I
dy
1
0
0
dx
=
dy2
0
0
dx
df
0
I
dx
Y1
T
I
Y1
y2
T
T
Y2
x
T
Y2
y1
T
F
df
0
x
dx
T
df
F
dy1
y
1 T
df =
F
0
dy
y2
2
I
I
I
T
R1
x
T
R1
y
1 T
R1
y2
T
Y2
x
T
Y2
y1
T
F
df
0
x
dx
T
df
F
dr1
y
1 T
df =
F
0
dy
y2
2
I
I
I
Figure 10: Derivation of the coupled direct (left column) and adjoint (right column) methods. The first row shows
the original single system equations; in the second row we divide the residuals and state variables into two blocks; in
the third row we present the functional approach to the coupled sensitivity equations; and the last row shows how the
residual and functional approaches can be combined.
16 of 26
American Institute of Aeronautics and Astronautics
Level of decomposition
Differentiation method
Linear solution
Monolithic
Analytic
Multidisciplinary analytic
AD
Black box
FD/CS
Trivial
Solver
Any
Numerical
Discipline
Any
Numerical (block)
Line of code
Symbolic
Forward-substitution
Back-substitution
Table 1: Classification of the methods for computing derivatives with respect to the level of decomposition, differentiation method, and strategy for solving the linear system.
relative advantages and disadvantages. These methods can be used at any level and are the basic building blocks for
more sophisticated methods.
The second key to unifying the various derivative computation methods was the derivation of the chain rule identity (20), which shows the elegant symmetry and connections between the various methods and their respective modes.
Algorithmic differentiation can be derived from this chain rule by considering the variables and respective functions
to be the lines in a given computer program.
To derive the analytic methods, we linearized the system and defined the variables to be perturbations about the
converged solution. Using this definition, we retrieved the well-known direct and adjoint equations from the forward
chain rule (18) and reverse chain rule (19), respectively.
Since we derived these from the same equations, we showed the connection between the forward mode of algorithmic differentiation and the direct method, as well as the connection between the adjoint method and the reverse
mode of algorithmic differentiation.
Finally, the analytic methods were generalized for the case of multidisciplinary systems, where multiple solvers
are coupled. Two different approaches the residual approach and the functional approach were shown to be
possible for both the coupled direct and couple adjoint methods, resulting in four possible combinations. In addition,
we showed that it is possible to combine the residual and functional approaches to create a hybrid approach. This
flexibility is valuable, since it is not always possible to use one or the other, due to limitations of the disciplinary
solvers.
In summary, each of the methods for computing derivatives shares a common origin, but they differ in three aspects.
Table 1 classifies each of these methods in terms of the level of decomposition at which the generalized chain rule is
applied, the differentiation method used to assemble the Jacobian of partial derivatives, and the strategy for solving
the linear system that results.
Acknowledgments
We would like to thank Graeme J. Kennedy for his valuable insights.
17 of 26
American Institute of Aeronautics and Astronautics
References
[1] Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., and Tarantola, S., Global Sensitivity
Analysis: The Primer, John Wiley & Sons Ltd., 2008.
[2] Hicken, J. and Zingg, D., A Parallel NewtonKrylov Solver for the Euler Equations Discretized Using Simultaneous Approximation Terms, AIAA Journal, Vol. 46, No. 11, 2008.
[3] Biros, G. and Ghattas, O., Parallel LargrangeNewtonKrylovSchur methods for PDE-constrained optimization. Part I: The
KrylovSchur solver, SIAM Journal on Scientific Computing, Vol. 27, No. 2, 2005, pp. 687713.
[4] Biros, G. and Ghattas, O., Parallel LargrangeNewtonKrylovSchur methods for PDE-constrained optimization. Part II:
The LargrangeNewton solver and its application to optimal control of steady viscous flows, SIAM Journal on Scientific
Computing, Vol. 27, No. 2, 2005, pp. 687713.
[5] Kennedy, G. J. and Martins, J. R. R. A., Parallel Solution Methods for Aerostructural Analysis and Design Optimization,
Proceedings of the 13th AIAA/ISSMO Multidisciplinary Analysis Optimization Conference, Forth Worth, TX, September
2010, AIAA 2010-9308.
Downloaded by UNIVERSITY OF MICHIGAN on April 3, 2013 | http://arc.aiaa.org | DOI: 10.2514/6.2012-1589
[6] Dennis, J. and Moree, J. J., Quasi-Newton Methods, Motivation and Theory, SIAM Review, Vol. 19, No. 1, 1977, pp. 4689.
[7] Gill, P. E., Murray, W., and Saunders, M. A., SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization, SIAM
Review, Vol. 47, No. 1, 2005, pp. 99131. doi:10.1137/S0036144504446096.
[8] Chung, H. S. and Alonso, J. J., Using gradients to construct response surface models for high-dimensional design optimization problems, 39th AIAA Aerospace Sciences Meeting, Reno, NV, January 2001, AIAA-2001-0922.
[9] James, K., Hansen, J. S., and Martins, J. R. R. A., Structural topology optimization for multiple load cases using a dynamic aggregation technique, Engineering Optimization, Vol. 41, No. 12, December 2009, pp. 11031118.
doi:10.1080/03052150902926827.
[10] Sigmund, O., On the usefulness of non-gradient approaches in topology optimization, Structural and Multidisciplinary
Optimization, Vol. 43, 2011, pp. 589596. doi:10.1007/s00158-011-0638-7.
[11] Lee, E., James, K. A., and Martins, J. R. R. A., Stress-Constrained Topology Optimization with Design-Dependent Loading,
Structural and Multidisciplinary Optimization, 2012. doi:10.1007/s00158-012-0780-x, (In press).
[12] Lee, E. and Martins, J. R. R. A., Structural Topology Optimization with Design-Dependent Pressure Loads, Computer
Methods in Applied Mechanics and Engineering, 2012. doi:10.1016/j.cma.2012.04.007, (In press).
[13] Martins, J. R. R. A., Alonso, J. J., and Reuther, J. J., High-Fidelity Aerostructural Design Optimization of a Supersonic
Business Jet, Journal of Aircraft, Vol. 41, No. 3, 2004, pp. 523530. doi:10.2514/1.11478.
[14] Kenway, G. K. W., Kennedy, G. J., and Martins, J. R. R. A., A Scalable Parallel Approach for High-Fidelity Aerostructural
Analysis and Optimization, 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference,
Honolulu, HI, April 2012.
[15] Mader, C. A. and Martins, J. R. R. A., An Automatic Differentiation Discrete Adjoint Approach for Time-Spectral Computational Fluid Dynamics, AIAA Journal, 2012, (In press).
[16] Mader, C. A. and Martins, J. R. R. A., Computation of Aircraft Stability Derivatives Using an Automatic Differentiation
Adjoint Approach, AIAA Journal, Vol. 49, No. 12, 2011, pp. 27372750. doi:10.2514/1.55678.
[17] Barthelemy, J.-F. and Sobieszczanski-Sobieski, J., Optimum Sensitivity Derivatives of Objective Functions in Nonlinear
Programming, AIAA Journal, Vol. 21, 1982, pp. 913915.
[18] Adelman, H. M. and Haftka, R. T., Sensitivity Analysis of Discrete Structural Systems, AIAA Journal, Vol. 24, No. 5, 1986,
pp. 823832. doi:10.2514/3.48671.
[19] van Keulen, F., Haftka, R., and Kim, N., Review of options for structural design sensitivity analysis. Part 1: Linear systems,
Computer Methods in Applied Mechanics and Engineering, Vol. 194, 2005, pp. 32133243.
[20] Haug, E. J., Choi, K. K., and Komkov, V., Design Sensitivity Analysis of Structural Systems, Vol. 177 of Mathematics in
Science and Engineering, Academic Press, 1986.
[21] Sobieszczanski-Sobieski, J., Sensitivity Analysis and Multidisciplinary Optimization for Aircraft Design: Recent Advances
and Results, Journal of Aircraft, Vol. 27, No. 12, December 1990, pp. 9931001. doi:10.2514/3.45973.
[22] Sobieszczanski-Sobieski, J., Sensitivity of Complex, Internally Coupled Systems, AIAA Journal, Vol. 28, No. 1, 1990,
pp. 153160.
[23] Sobieszczanski-Sobieski, J., Higher Order Sensitivity Analysis of Complex, Coupled Systems, AIAA Journal, Vol. 28,
No. 4, 1990, pp. 756758, Technical Note.
[24] Lyness, J. N., Numerical algorithms based on the theory of complex variable, Proceedings ACM National Meeting,
Thompson Book Co., Washington DC, 1967, pp. 125133.
18 of 26
American Institute of Aeronautics and Astronautics
[25] Lyness, J. N. and Moler, C. B., Numerical Differentiation of Analytic Functions, SIAM Journal on Numerical Analysis,
Vol. 4, No. 2, 1967, pp. 202210.
[26] Squire, W. and Trapp, G., Using Complex Variables to Estimate Derivatives of Real Functions, SIAM Review, Vol. 40, No. 1,
1998, pp. 110112.
[27] Anderson, W. K., Newman, J. C., Whitfield, D. L., and Nielsen, E. J., Sensitivity analysis for the NavierStokes equations
on unstructured meshes using complex variables, AIAA Paper 99-3294, 1999.
[28] Newman III, J. C., Whitfield, D. L., and Anderson, W. K., Step-Size Independent Approach for Multidisciplinary Sensitivity
Analysis, Journal of Aircraft, Vol. 40, No. 3, 2003, pp. 566573.
[29] Martins, J. R. R. A., Sturdza, P., and Alonso, J. J., The Complex-Step Derivative Approximation, ACM Transactions on
Mathematical Software, Vol. 29, No. 3, 2003, pp. 245262. doi:10.1145/838250.838251.
[30] Martins, J. R. R. A., Alonso, J. J., and Reuther, J. J., A Coupled-Adjoint Sensitivity Analysis Method for
High-Fidelity Aero-Structural Design, Optimization and Engineering, Vol. 6, No. 1, March 2005, pp. 3362.
doi:10.1023/B:OPTE.0000048536.47956.62.
[31] Luzyanina, T. and Bocharov, G., Critical Issues in the Numerical Treatment of the Parameter Estimation Problems in Immunology, Journal of Computational Mathematics, Vol. 30, No. 1, Jan. 2012, pp. 5979.
[32] Sturdza, P., An Aerodynamic Design Method for Supersonic Natural Laminar Flow Aircraft, PhD thesis 153159, Stanford
University, Stanford, California, 2004.
[33] Griewank, A., Evaluating Derivatives, SIAM, Philadelphia, 2000.
[34] Naumann, U., The Art of Differentiating Computer Programs An Introduction to Algorithmic Differentiation, SIAM, 2011.
[35] Pryce, J. D. and Reid, J. K., AD01, a Fortran 90 Code for Automatic Differentiation, Report RAL-TR-1998-057, Rutherford
Appleton Laboratory, Chilton, Didcot, Oxfordshire, OX11 OQX, U.K., 1998.
[36] Griewank, A., Juedes, D., and Utke, J., Algorithm 755: ADOL-C: a package for the automatic differentiation of algorithms
written in C/C++, ACM Transactions on Mathematical Software, Vol. 22, No. 2, 1996, pp. 131167.
[37] Bendtsen, C. and Stauning, O., FADBAD, a flexible C++ package for automatic differentiation using the forward and
backward methods, Tech. Rep. IMM-REP-1996-17, Technical University of Denmark, DK-2800 Lyngby, Denmark, 1996.
[38] Bischof, C. H., Roh, L., and Mauer-Oats, A. J., ADIC: an extensible automatic differentiation tool for ANSI-C, Software
Practice and Experience, Vol. 27, No. 12, 1997, pp. 14271456.
[39] Carle, A. and Fagan, M., ADIFOR 3.0 Overview, Tech. Rep. CAAM-TR-00-02, Rice University, 2000.
[40] Giering, R. and Kaminski, T., Applying TAF to Generate Efficient Derivative Code of Fortran 77-95 Programs, Proceedings
of GAMM 2002, Augsburg, Germany, 2002.
[41] Gockenbach, M. S., Understanding Code Generated by TAMC, IAAA Paper TR00-29, Department of Computational and
Applied Mathematics, Rice University, Texas, USA, 2000.
[42] Hascoet, L. and Pascual, V., TAPENADE 2.1 Users Guide, Technical report 300, INRIA, 2004.
[43] Pascual, V. and Hascoet, L., Extension of TAPENADE Towards Fortran 95, Automatic Differentiation: Applications, Theory, and Tools, edited by H. M. Bucker, G. Corliss, P. Hovland, U. Naumann, and B. Norris, Lecture Notes in Computational
Science and Engineering, Springer, 2005.
[44] Shiriaev, D., ADOLF Automatic Differentiation of Fortran Codes, Computational Differentiation: Techniques, Applications, and Tools, edited by M. Berz, C. H. Bischof, G. F. Corliss, and A. Griewank, SIAM, Philadelphia, Penn., 1996, pp.
375384.
[45] Rhodin, A., IMAS Integrated Modeling and Analysis System for the solution of optimal control problems, Computer
Physics Communications, , No. 107, 1997, pp. 2138.
[46] Jameson, A., Aerodynamic Design via Control Theory, Journal of Scientific Computing, Vol. 3, No. 3, sep 1988, pp. 233
260.
[47] Jameson, A., Martinelli, L., and Pierce, N. A., Optimum Aerodynamic Design Using the NavierStokes Equations, Theoretical and Computational Fluid Dynamics, Vol. 10, 1998, pp. 213237.
[48] Anderson, W. K. and Venkatakrishnan, V., Aerodynamic design optimization on unstructured grids with a continuous adjoint
formulation, Computers and Fluids, Vol. 28, No. 4, 1999, pp. 443480.
[49] Giles, M. B. and Pierce, N. A., An Introduction to the Adjoint Approach to Design, Flow, Turbulence and Combustion,
Vol. 65, 2000, pp. 393415.
[50] Nadarajah, S. and Jameson, A., A Comparison of the Continuous and Discrete Adjoint Approach to Automatic Aerodynamic
Optimization, Proceedings of the 38th AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, 2000, AIAA 2000-0667.
[51] Dwight, R. P. and Brezillion, J., Effect of Approximations of the Discrete Adjoint on Gradient-Based Optimization, AIAA
Journal, Vol. 44, No. 12, 2006, pp. 30223031.
19 of 26
American Institute of Aeronautics and Astronautics
[52] Mader, C. A., Martins, J. R. R. A., Alonso, J. J., and van der Weide, E., ADjoint: An Approach for the Rapid Development
of Discrete Adjoint Solvers, AIAA Journal, Vol. 46, No. 4, April 2008, pp. 863873. doi:10.2514/1.29123.
[53] Bloebaum, C., Global Sensitivity Analysis in Control-Augmented Structural Synthesis, Proceedings of the 27th AIAA
Aerospace Sciences Meeting, Reno, NV, January 1989, AlAA 1989-0844.
[54] Bloebaum, C., Hajela, P., and Sobieszczanski-Sobieski, J., Non-hierarchic system decomposition in structural optimization,
Proceedings of the 3rd USAF/NASA Symposium on Recent Advances in Multidisciplinary Analysis and Optimization, San
Francisco, CA, 1990.
[55] Hascoet, L., TAPENADE: A tool for Automatic Differentiation of programs, Proceedings of 4th European Congress on
Computational Methods, ECCOMAS2004, Jyvaskyla, Finland, 2004.
[56] Cusdin, P. and Muller, J.-D., On the Performance of Discrete Adjoint CFD Codes using Automatic Differentiation, International Journal of Numerical Methods in Fluids, Vol. 47, No. 6-7, 2005, pp. 939945.
[57] Fagan, M. and Carle, A., Reducing Reverse-Mode Memory Requirements by Using Profile-Driven Checkpointing, Future
Generation Comp. Syst., Vol. 21, No. 8, 2005, pp. 13801390.
[58] Giering, R., Kaminski, T., and Slawig, T., Generating Efficient Derivative Code with TAF: Adjoint and Tangent Linear Euler
Flow Around an Airfoil. Future Generation Comp. Syst., Vol. 21, No. 8, 2005, pp. 13451355.
[59] Heimbach, P., Hill, C., and Giering, R., An Efficient Exact Adjoint of the Parallel MIT General Circulation Model, Generated
via Automatic Differentiation. Future Generation Comp. Syst., Vol. 21, No. 8, 2005, pp. 13561371.
20 of 26
American Institute of Aeronautics and Astronautics
df2
2
x2
x1 + 2x2
dx
dx1 dx2
However, the point of this example is to show how the chain rule can be systematically applied in an automated fashion. To
illustrate this more clearly, we assume that the function above is computed using a computer program that performs the following
series of unary and binary operation:
v1 = x1
v2 = x2
v3 = V3 (v1 ) = sin v1
v5 = V5 (v2 ) =
v22
v4 = V4 (v1 , v2 ) = v1 v2
v6 = 3
v7 = V7 (v3 , v4 ) = v3 + v4
v9 = 6
(45)
v8 = V8 (v5 , v6 ) = v5 v6
v10 = V10 (v8 , v9 ) = v8 + v9
(= f2 )
Thus in this case, n = 12. The Fortran source code corresponding to these computations is shown in Figure 11, Appendix B.
To use the chain rule (22) in forward mode to compute df1 / dx1 (which is the same as dv11 / dv1 ), we set j = 1, and then
vary i = 1, 2, . . . , 11. Note that in the sum in the chain rule, we only include the ks for which Vi /vk 6= 0. Since each operation
in this case has at most two terms, the sums in the chain rule have at most two terms. The sequence given by the procedure is as
follows:
dv1
dv1
dv2
dv1
dv3
dv1
dv4
dv1
dv5
dv1
dv6
dv1
dv7
dv1
dv8
dv1
dv9
dv1
dv10
dv1
dv11
dv1
=1
=0
V3
v1
V4
=
v1
V5
=
v2
=
dv1
= cos v1 1 = cos v1
dv1
dv1
V4 dv2
+
= v2 1 + v1 0 = v2
dv1
v2 dv1
dv2
= 2v2 0 = 0
dv1
=0
V7
v3
V8
=
v5
=
dv3
V7 dv4
+
= 1 cos v1 + 1 v2 = cos v1 + v2
dv1
v4 dv1
dv5
V8 dv6
+
= v6 0 + v5 0 = 0
dv1
v6 dv1
=0
V10
v8
V11
=
v7
=
dv8
V10
+
dv1
v9
dv7
V11
+
dv1
v10
dv9
=10+10=0
dv1
dv10
= v10 (cos v1 + v2 ) + v7 0 = 3v22 + 6 (cos v1 + v2 )
dv1
The above operation can be interspersed with the original code above. Figure 12 shows the code that results from running the
algorithmic differentiation tool Tapenade [55] through the original code (Figure 11).
For the reverse mode, we use the chain rule in reverse, i.e., we set i = 11 and loop backwards such that j = 11, 10, . . . , 1. The
21 of 26
American Institute of Aeronautics and Astronautics
=1
=
=
=
=
=
=
=
=
=
=
dv11
dv11
dv11
dv10
dv11
dv10
dv11
dv11
dv11
dv8
dv11
dv8
dv11
dv7
dv11
dv7
dv11
dv4
dv11
dv3
V11
= 1 v7 = v7
v10
V10
= v7 1 = v7
v9
V10
= v7 1 = v7
v8
V11
= 1 v10 = v10
v7
V8
= v7 v5 = v7 v5
v6
V8
= v7 v6 = v7 v6
v5
V7
= v10 1 = v10
v4
V7
= v10 1 = v10
v3
V4
dv11 V5
+
= v10 v1 + v7 v6 2v2 = 3v22 + 6 v1 + 6v2 (sin v1 + v1 v2 )
v2
dv5 v2
V3
dv11 V4
+
= v10 cos v1 + v10 v2 = 3v22 + 6 (cos v1 + v2 )
v1
dv4 v1
Note that unlike the forward mode, the above computation cannot be interspersed with the original code. This is because
throughout the reverse computation, the values for the vs are required (e.g., v7 in the second line). Thus the original code must
run first, and all the intermediate variables v must be stored for later use in the reverse mode computations. The dependency
graph of the whole code must also be stored before the reverse computation is performed. This contrasts with the forward mode
for which the only terms computed for the current derivative dvi / dvj are for those variables that vi depends on. For the reverse
mode, the terms that must be computed are for those functions vj affects, which is information that must be determined and stored
beforehand. For an iterative code, the memory requirements for the reverse mode can therefore be prohibitive, although there has
been some progress toward alleviating these requirements [56, 57, 58, 59]. Figure 15 shows the result of running the algorithmic
tool Tapenade [55] in reverse mode.
To complete the example, we now form the chain rule equation (18) and evaluate it at xT = [/4, 2]. The result is,
(I DV ) Dv = I
2/2
/4
18
2.28
22 of 26
American Institute of Aeronautics and Astronautics
v1
x1
v2
x1
v3
x1
.
.
.
v9
x1
v10
x1
f1
x1
f2
x1
v1
x2
v2
x2
v3
x2
.
.
.
v9
x2
v10
x2
f1
x2
f2
x2
0
0
0
0
0
0
0
0
0
0
0
(46)
2
2
18
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
1
0
2.28
0
0
1
0
0
0
0
df1
dx1
df1
dx2
df1
dv3
.
.
.
df1
dv9
df1
dv10
df1
dv11
df1
dv12
df2
dx1
df2
dx2
df2
dv3
.
.
.
df2
dv9
df2
dv10
df2
dv11
df2
dv12
(47)
23 of 26
American Institute of Aeronautics and Astronautics
Figure 12: Fortran source code for simple function differentiated in forward mode using source code transformation
SUBROUTINE CALCF2(x, f)
REAL :: x(2), f(2)
f(1) = (x(1)*x(2) + SIN(x(1))) * (3*x(2)**2 + 6)
f(2) = x(1)*x(2) + x(2)**2
END SUBROUTINE CALCF2
Figure 14: Fortran source code for simple function differentiated in forward mode using source code transformation
24 of 26
American Institute of Aeronautics and Astronautics
Figure 15: Fortran source code for simple function differentiated in reverse mode using source code transformation
25 of 26
American Institute of Aeronautics and Astronautics
Figure 16: Fortran source code for simple function differentiated in reverse mode using source code transformation
26 of 26
American Institute of Aeronautics and Astronautics