Mosek Modeling
Mosek Modeling
Mosek Modeling
1 Introduction 1
2 Linear optimization 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Geometry of linear optimization . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Linear modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Convex piecewise-linear functions . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Absolute values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 The `1 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.4 The `∞ norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.5 Avoid ill-posed problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.6 Avoid badly scaled problems . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Duality in linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 The dual problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Duality properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Infeasibility in linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Locating infeasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
i
3.3 Conic quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Quadratically constrained quadratic optimization . . . . . . . . . . . . . . 25
3.3.2 Robust optimization with ellipsoidal uncertainties . . . . . . . . . . . . . 26
3.3.3 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Duality in conic quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Primal versus dual form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2 Examples of bad duality states . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Infeasibility in conic quadratic optimization . . . . . . . . . . . . . . . . . . . . . 30
3.6 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Semidefinite optimization 32
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Semidefinite matrices and cones . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Properties of semidefinite matrices . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Semidefinite duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Semidefinite modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Linear matrix inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Eigenvalue optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.3 Singular value optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.4 Matrix inequalities from Schur’s Lemma . . . . . . . . . . . . . . . . . . . 41
4.2.5 Nonnegative polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.6 Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.7 Nonnegative trigonometric polynomials . . . . . . . . . . . . . . . . . . . 44
4.3 Semidefinite optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.1 Nearest correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.2 Extremal ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.3 Polynomial curve-fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.4 Filter design problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.5 Relaxations of binary optimization . . . . . . . . . . . . . . . . . . . . . . 52
4.3.6 Relaxations of boolean optimization . . . . . . . . . . . . . . . . . . . . . 54
4.3.7 Converting a problem to standard form . . . . . . . . . . . . . . . . . . . 55
4.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5 Quadratic optimization 57
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Convex quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2.1 Geometry of quadratic optimization . . . . . . . . . . . . . . . . . . . . . 57
5.2.2 Duality in quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.3 Infeasibility in quadratic optimization . . . . . . . . . . . . . . . . . . . . 59
5.3 Quadratically constrained optimization . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.1 Duality in quadratically constrained optimization . . . . . . . . . . . . . . 60
5.4 Separable quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
ii
6.2.5 Pack constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.6 Partition constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.7 Continuous piecewise-linear functions . . . . . . . . . . . . . . . . . . . . 65
6.2.8 Lower semicontinuous piecewise-linear functions . . . . . . . . . . . . . . 67
6.3 Boolean primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.1 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.2 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.3 Disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.4 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.3.5 Exclusive disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Integer optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Practical optimization 71
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 The quality of a solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.3 Distance to a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8 Case studies 74
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 A resource constrained production and inventory problem . . . . . . . . . . . . . 74
8.3 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.2 Slippage costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.3.3 Maximizing the Sharpe ratio . . . . . . . . . . . . . . . . . . . . . . . . . 78
iii
Preface
This manual is about model building using convex optimization. It is intended as a modeling
guide for the MOSEK optimization package. However, the style is intentionally quite generic
without specific MOSEK commands or API descriptions.
There are several excellent books available on this topic, for example the recent books by
Ben-Tal and Nemirovski [BN01] and Boyd and Vandenberghe [BV04], which have both been
a great source of inspiration for this manual. The purpose of this manual is to collect the
material which we consider most relevant to our customers and to present it in a practical self-
contained manner; however, we highly recommend the books [BN01, BV04] as a supplement to
this manual.
Some textbooks on building models using optimization (or mathematical programming) in-
troduce different concept through practical examples. In this manual we have chosen a different
route, where we instead show the different sets and functions that can be modeled using convex
optimization, which can subsequently be combined into realistic examples and applications. In
other words, we present simple convex building blocks, which can then be combined into more
elaborate convex models. With the advent of more expressive and sophisticated tools like conic
optimization, we feel that this approach is better suited.
The first three chapters discuss self-dual conic optimization, namely linear optimization
(Chap. 2), conic quadratic optimization (Chap. 3) and semidefinite optimization (Chap 4), which
should be read in succession. Chap. 5 discusses quadratic optimization, and has Chap. 3 as a
prerequisite. Finally, the last chapter (Chap. 6) diverges from the path of convex optimization
and discusses mixed integer conic optimization. App. A contains details on notation used in
the manual.
iv
Chapter 1
Introduction
Many decision problems can be phrased as making the best choice subject to some constraints.
An example is the problem of choosing an optimal portfolio of assets such as stocks and bonds.
For this and many other decision problems it is hard to identify or even quantify an optimal
solution, but it might be possible to build a mathematical optimization model representing the
problem. Given such a model and appropriate software it may then be tractable to find the
optimal decision for the mathematical model.
Loosely speaking an optimization model consists of a goal (or objective) as well as some
constraints that must be satisfied. The goal could be to obtain the highest possible return and
the constraint could be not too assume too much risk. Finally, the variables or decisions is the
amount to buy of each asset. In mathematical terms we may state the problem as
minimize f (x)
(1.1)
subject to x ∈ X ,
where x is the variable, i.e., xj denotes the amount we should buy of asset j. The function
f (x) is the goal or objective function and X denotes the set of feasible solutions for the decision
variables x. For example X may specify that we can only buy a nonnegative amount of each
asset, that we only have a limited budget and do not want too much risk. Subsequently we will
denote “x ∈ X ” as the constraints.
If we want to solve problem (1.1) using a computer we must be explicit about the goal
function f and the set X . This leads to the question of tractable objective functions and
constraints. Some choices for the objective function and the constraints might be desirable, but
for those choices we cannot expect find a solution within a reasonable amount of time even with
the fastest computer. Other choices are more tractable in the sense that we are guaranteed to
find the optimal solution within a reasonable time using an suitable piece of software; those
choices leads to the field of convex optimization, which is the main subject of this manual.
Thus, when formulating an optimization model for a decision problem we are facing at
least two issues; (i) How to represent the problem using a mathematical notation. (ii) The
choice of formulation should abide by what we can actually solve. For instance we may use
an approximate model if the approximation can be solved much faster than a more accurate
model. Thus, formulating a good optimization model for given decision problems can often be
nontrivial; infact it can be somewhat of an art.
In our experience it is advantageous to view optimization models as a combination of a
number of smaller blocks that can be treated more or less independently. Ultimately the different
building blocks are put together to form a more complicated model, but by using simple tractable
building blocks we know that the overall problem will remain tractable. Moreover, the building
blocks can be reused in different models, which is less error-prone and allows us to develop
1
complicated models more rapidly. Therefore, in this manual we discuss different typical (and
tractable) building blocks, and we give examples of complete optimization models formed using
those building blocks.
2
Chapter 2
Linear optimization
2.1 Introduction
In this chapter we discuss different aspects of linear optimization. We first introduce the basic
concepts of a linear optimization and discuss the underlying geometric interpretations. We
then give examples of the most frequently used reformulations or modeling tricks used in linear
optimization, and we finally discuss duality and infeasibility theory in some detail.
x1 + x2 + x3 = 1, x1 , x2 , x3 ≥ 0.
The function we minimize is often called the objective function; in this case we have a linear
objective function. The constraints are also linear and consists of both linear equality constraints
and linear inequality constraints. We typically use a more compact notation
minimize x1 + 2x2 − x3
subject to x1 + x2 + x3 = 1 (2.1)
x1 , x2 , x3 ≥ 0,
and we call (2.1) an linear optimization problem The domain where all constraints are satisfied
is called the feasible set; the feasible set for (2.1) is shown in Fig. 2.1. For this simple problem we
see by inspection that the optimal value of the problem is −1 obtained by the optimal solution
minimize cT x
subject to Ax = b
x ≥ 0.
3
x3
x1 x2
x0
x
There are many other formulations for linear optimization problems; we can have different types
of constraints,
Ax = b, Ax ≥ b, Ax ≤ b, lc ≤ Ax ≤ uc ,
and different bounds on the variables
lx ≤ x ≤ ux
or we may have no bounds on some xi , in which case we say that xi is a free variable.
All these formulations are equivalent in the sense that by simple transformations and in-
troduction of auxiliary variables they can represent the same problem with the same optimal
solution. The important feature is that the objective function and the constraints are all linear
in x.
4
a Tx a
>γ
x0
x
a2
a3
a1 a4
a5
see Fig. 2.3. Similarly the set {x | aT x ≤ 0} forms another halfspace; in Fig. 2.3 it corresponds
to the area below the dashed line. A set of linear inequalities
Ax ≤ b
corresponds to an intersection of halfspaces and forms a polyhedron, see Fig. 2.4. The polyhedral
description of the feasible set gives us a very intuitive intepretation of linear optimization, which
is illustrated in Fig. 2.5. The dashed lines are normal to the objective c = (−1, 1), and to
minimize cT x we move as far as possible in the opposite direction of c, to a point where one the
normals intersect the polyhedron; an optimal solution is therefore always either a vertex of the
polyhedron, or an entire facet of the polyhedron may be optimal.
The polyhedron shown in the figure in bounded, but this is not always the case for polyhe-
dra coming from linear inequalities in optimization problems. In such cases the optimization
problem may be unbounded, which we will discuss in detail in Section 2.4.
5
c
a2
a3
a1 a4
x⋆
a5
Figure 2.5: Geometric interpretation of linear optimization. The optimal solution x? is at point
where the normals to c (the dashed lines) intersect the polyhedron.
a3 x + b3
a1 x + b1
a2 x + b2
Figure 2.6: A convex piecewise-linear function (solid lines) of a single variable x. The function
is defined as the maximum of 3 affine functions.
the convex piecewise-linear (or rather piecewise-affine) function illustrated in Fig. 2.6, where
the function can be described as max{a1 x + b1 , a2 x + b2 , a3 x + b3 }.
More generally, we consider convex piecewise-linear functions f : Rn 7→ R defined as
where the important notion is that f is defined as the maximum of a set of affine functions. To
model the epigraph f (x) ≤ t (see App. A), we note that t ≥ maxi {aTi x + bi } if and only if t is
larger than all the affine functions, i.e., we have an equivalent formulation with m inequalities,
aTi x + bi ≤ t, i = 1, . . . , m.
Piecewise-linear functions have many uses linear in optimization; either we have a convex
piecewise-linear formulation from the onset, or we may approximate a more complicated (non-
linear) problem using piecewise-linear approximations.
A word of caution is in order at this point; in the past the use of linear optimization was
significantly more widespread than the use of nonlinear optimation, and piecewise-linear mod-
eling was used to approximately solve nonlinear optimization problems. The field of nonlinear
optimization (especially conic optimization) has since matured, and with modern optimization
software it is both easier and more efficient to directly formulate and solve conic problems
without piecewise-linear approximations.
6
2.2.2 Absolute values
The absolute value of a scalar variable is an example of a simple convex piecewise-linear function,
|x| := max{ x, −x },
−t ≤ x ≤ t.
with additional (auxiliary) variable z ∈ Rn , and we claim that (2.2) and (2.3) are equivalent.
They are equivalent if all zi = |xi | in (2.3). Suppose that t is optimal (as small as possible) for
(2.3), but that some zi > |xi |. But then we could reduce t further by reducing zi contradicting
the assumption that t optimal, so the two formulations are equivalent. Therefore, we can model
(2.2) using linear (in)equalities
n
X
−zi ≤ xi ≤ zi , zi = t,
i=1
with auxiliary variables z. Similarly, we can describe the epigraph of the norm of an affine
function of x,
kAx − bk1 ≤ t
as
n
X
−zi ≤ aTi x − bi ≤ zi , zi = t,
i=1
minimize kxk1
(2.4)
subject to Ax = b,
7
uses the `1 norm of x as a heuristic of finding a sparse solution (one with many zero elements)
to Ax = b, i.e., it aims to represent b using few columns of A. Using the reformulation above
we can pose the problem as a linear optimization problem,
minimize eT z
subject to −z ≤ x ≤ z (2.5)
Ax = b,
kxk∞ ≤ t (2.6)
we use that t ≥ maxi=1,...,n |xi | if and only if t is greater than each term, i.e., we can model
(2.6) as
−t ≤ xi ≤ t, i = 1, . . . , n.
Again, we can also consider an affine function of x, i.e.,
kAx − bk∞ ≤ t,
To maximize xT v subject to |v1 | + · · · + |vn | ≤ 1 we identify the largest element of x, say |xk |,
The optimizer v is then given by vk = ±1 and vi = 0, i 6= k, i.e., kxk∗,1 = kxk∞ . This illustrates
a more general property of dual norms, namely that kxk∗∗ = kxk.
8
2.2.5 Avoid ill-posed problems
A problem is ill-posed if small perturbations of the problem data result in arbitrarily large
perturbations of the solution, or change feasibility of the problem. Such problem formulations
should always be avoided as even the smallest numerical perturbations (for example rounding
errors, or solving the problem on a different computer) can result in different or wrong solutions.
Additionally, from an algorithmic point of view, even computing a wrong solution is very difficult
for ill-posed problems.
A rigorous definition of the degree of ill-posedness is possible by defining a condition number
for a linear optimization, but unfortunately this is not a very practical metric, as evaluating
such a condition number requires solving several optimization problems. Therefore even though
being able to quatify the difficulty of an optimization problem from a condition number is very
attractive, we only make the modest recommendations to avoid problems
• that are nearly infeasible,
minimize cT x + 1012 x1
subject to Ax = b
x ≥ 0,
reasoning that the large penalty term will force x1 = 0. However, if kck < 1 we have the
exact same problem, namely that in finite precision the penalty term will complety dominate
the objective and render the contribution cT x insignificant or unreliable. Therefore, penality or
big-M terms should never be used explicitly as part of an optimization model.
9
As a different example, consider again a problem (2.7), but with additional (redundant)
constraints xi ≤ γ. For optimization practioners this is a common approach for stabilizing
the optimization algorithm since it improves conditioning of the linear systems solved by an
interior-point algorithm. The problem we solve is then
minimize cT x
subject to Ax = b
x≥0
x ≤ γe,
minimize cT x
subject to Ax = b (2.7)
x ≥ 0.
L(x, y, s) = cT x + y T (b − Ax) − sT x,
The variables y ∈ Rp and s ∈ Rn+ are called Lagrange multipliers or dual variables. It is easy
to verify that
L(x, y, s) ≤ cT x
for any feasible x. Indeed, we have b − Ax = 0 and xT s ≥ 0 since x, s ≥ 0, i.e., L(x, y, s) ≤ cT x.
Note the importance of nonnegativity of s; more generally of all Langrange multipliers associated
with inequality constraints. Without the nonnegativity constraint the Lagrangian function is
not a lower bound.
The dual function is defined as the minimum of L(x, y, s) over x. Thus the dual function of
(2.7) is
g(y, s) = min L(x, y, s) = min xT (c − AT y − s) + bT y.
x x
10
We see that the Langrangian function is linear in x, so it is unbounded below unless when
c − AT y − s = 0, i.e., T
b y, c − AT y − s = 0
g(y, s) =
∞, otherwise.
Finally, we get a dual problem is by maximizing g(y, s). The dual problem of (2.7) is
maximize bT y
subject to c − AT y = s (2.8)
s ≥ 0.
Example 2.1 (Dual of basis pursuit). As another example, let us derive the dual of the basis
pursuit formulation (2.5). The Lagrangian function is
L(x, z, y, u, v) = eT z + uT (x − z) − v T (x + z) + y T (b − Ax)
Example 2.2 (Dual of basis pursuit revisited). We can also derive the dual of the basis pursuit
formulation (2.4) directly. The Lagrangian is
L(x, y) = kxk1 + y T (Ax − b) = kxk1 + xT AT y − bT y
with a dual function
g(y, s) = −bT y + min(kxk1 + xT AT y).
x
The term minx (kxk1 + xT AT y) can be simplified as
min(kxk1 + xT AT y) = min min (tkzk1 + tz T AT y)
x t≥0 kzk1 =1
where we used the definition of the dual norm in the last line. Finally mint≥0 t(1 − kAT yk∞ )
is 0 if kAT yk∞ ≤ 1 and unbounded below otherwise. In other words, we get a dual function
−bT y, kAT yk∞ ≤ 1,
g(y) =
−∞, otherwise,
and a dual problem
maximize bT y
(2.10)
subject to kAT yk∞ ≤ 1.
Not surprisingly, (2.9) and (2.10) are equivalent in the sense that
e = u + v, AT y = u − v, u, v ≥ 0 ⇐⇒ kAT yk∞ ≤ 1.
11
2.3.2 Duality properties
Many important observations can be made for the dual problem (2.8), which we discuss in
details in Appendix B. We summarize the most important properties below; to that end let x?
and (y ? , s? ) denote the optimal primal and dual solutions with optimal values p? := cT x? and
d? := bT y ? , respectively.
• In both the primal and dual problem we optimize over the nonnegative orthant, i.e.,
x, s ≥ 0. The number of constraints in the primal problem becomes the number of
variables in the dual problem, and vice versa.
• The dual problem gives a lower bound on the primal problem for all (x, y, s). From the
dual problem we have c − AT y = s, so
xT (c − AT y) = xT s ≥ 0,
Fp = {x ∈ Rn | Ax = b, x ≥ 0} =
6 ∅
Fd = {y ∈ Rm | c − AT y ≥ 0} =
6 ∅
d? = cT x? = bT y ? = p? .
This remarkable fact (see Appendix B.5 for a proof) is called the strong duality property.
If strong duality holds then (x? )T s? = 0 so x? and s? are complementary; since x? , s? ≥ 0
we have that x?i > 0 ⇔ s?i = 0 and vice versa.
For linear (and conic) optimization strong duality means that we have the choice of solving
either the primal or the dual problem (or both).
Lemma 2.1 (Farkas). Given A and b, exactly one of the two statements are true:
12
Lemma 2.2 (Farkas dual variant). Given A and c, exactly one of the two statements are true:
1. There exists an x ≥ 0 such that Ax = 0 and cT x < 0.
AT y ≤ 0, bT y > 0
6 ∃x ≥ 0 : Ax = b,
c − AT (ty + y0 ) ≥ 0, ∀t ≥ 0
• If the primal problem is infeasible (p? = ∞), then from Farkas’ lemma the dual problem
is unbounded (d? = ∞) or infeasible (d? = −∞).
• If the primal problem is unbounded (p? = −∞), then from weak duality the dual problem
is infeasible (d? = −∞).
• If the dual problem is infeasible (d? = −∞), then from Farkas’ dual lemma then the
primal problem is unbounded (p? = −∞) or infeasible (p? = ∞).
Example 2.3. As an example exhibiting both primal and dual infeasibility consider the prob-
lem
minimize −x1 − x2
subject to x1 = −1
x1 , x2 ≥ 0
with a dual problem
maximize −y
1 −1
subject to y≤ .
0 −1
Both the primal and dual problems are trivially infeasible; y = −1 serves as a certificate of
primal infeasibility, and x = (0, 1) is a certificate of dual infeasiblity.
13
2.4.2 Locating infeasibility
In some cases we are interested in locating the cause of infeasibility in a model, for example if
expect the infeasibility to be caused by an error in the problem formulation. This can be difficult
in practice, but a Farkas certificate lets us reduce the dimension of the infeasible problem, which
in some cases pinpoints the cause of infeasibility.
To that end, suppose we are given a certificate of primal infeasibility,
AT y ≤ 0, bT y > 0,
AI,: x = bI , x ≥ 0.
2.5 Bibliography
The material in this chapter is very basic, and can be found in any textbook on linear optimiza-
tion. For further details, we suggest a few standard references [Chv83, BT97, PS98], which all
cover much more that discussed here. Although the first edition of [PS98] is not very recent,
it is still considered one of the best textbooks on graph and network flow algorithms. [NW06]
gives a more modern treatment of both theory and algorithmic aspects of linear optimization.
14
Chapter 3
3.1 Introduction
This chapter extends the notion of linear optimization with quadratic cones; conic quadratic
optimization is a straightforward generalization of linear optimization, in the sense that we
optimize a linear function with linear (in)equalities with variables belonging to one or more
(rotated) quadratic cones. In general we also allow some of the variables to be linear variables
as long as some of the variables belong to a quadratic cone.
We discuss the basic concept of quadratic cones, and demonstrate the surprisingly large
flexibility of conic quadratic modeling with several examples of (non-trivial) convex functions
or sets that be represented using quadratic cones. These convex sets can then be combined
arbitrarily to form different conic quadratic optimization problems.
We finally extend the duality theory and infeasibility analysis from linear to conic optimiza-
tion, and discuss infeasibility of conic quadratic optimization problems.
The geometric interpretation of a quadratic (or second-order) cone is shown in Fig. 3.1 for a
cone with three variables, and illustrates how the exterior of the cone resembles an ice-cream
cone. The 1-dimension quadratic cone simply implies standard nonnegativity x1 ≥ 0.
A set S is called a convex cone if for any x ∈ S we have αx ∈ S, ∀α ≥ 0. From the definition
(3.1) it is clear that if x ∈ Qn then obviously αx ∈ Qn , ∀α ≥ 0, which justifies the notion
quadratic cone.
15
x1
x2 x3
p
Figure 3.1: A quadratic or second-order cone satisfying x1 ≥ x22 + x23 .
x ∈ Qn ⇐⇒ (Tn x) ∈ Qnr ,
and since T is orthogonal we call Qr a rotated cone; the transformation corresponds to a rotation
of π/4 of the (x1 , x2 ) axis. For example if x ∈ Q3 and
√1 √1
√1
0 (x + x )
z1 2 2 x 1 1 2
2
z2 = √12 − √12 0 x2 = √12 (x1 − x2 )
z3 0 0 1 x3 x3
then
2z1 z2 ≥ z32 , z1 , z2 ≥ 0 =⇒ (x21 − x22 ) ≥ x23 , x1 ≥ 0,
and similarly (by interchanging roles of x and z) we see that
Thus, one could argue that we only need quadratic cones, but there are many examples of
functions where using an explicit rotated quadratic conic formulation is more natural; in Sec. 3.2
we discuss many examples involving both quadratic cones and rotated quadratic cones.
16
3.2.1 Absolute values
In Sec. 2.2.2 we saw how to model |x| ≤ t using two linear inequalities, but in fact the epigraph
of the absolute value is just the definition of a two-dimensional quadratic cone, i.e.,
|x| ≤ t ⇐⇒ (t, x) ∈ Q2 .
(1/2)xT Qx + cT x + r ≤ 0
may be rewritten as
t + cT x + r = 0,
(3.4)
xT Qx ≤ 2t.
Since Q is symmetric positive semidefinite the epigraph
xT Qx ≤ 2t (3.5)
Q = F T F, (3.6)
(see Chap. 4 for properties of semidefinite matrices); for instance F could be the Cholesky
factorization of Q. We then have an equivalent characterization of (3.5) as
F x − y = 0,
s = 1,
kyk2 ≤ 2st,
17
where I is the identity matrix, so
xT Qx = xT x + xT F T F x
and hence
Fx − y = 0,
f +h = t
kxk2 ≤ 2ef, e = 1,
kyk2 ≤ 2gh, g = 1.
In other words,
(f, 1, x) ∈ Q2+n
r , (h, 1, F x) ∈ Q2+k
r , f +h=t
is a conic quadratic representation of (3.5).
(Ax + b, cT x + d) ∈ Qm+1
or equivalently
s = Ax + b,
t = cT x + d, (3.8)
(s, t) ∈ Qm+1 .
As will be explained later, we refer to (3.7) as the dual form and (3.8) as the primal form,
respectively. An alternative characterization of (3.7) is
which shows that certain quadratic inequalities are conic quadratic representable. The next
section shows such an example.
√
(ii) (t, x) | t ≥ x1 , x ≥ 0 = (t, x) | (x, t, 2) ∈ Q3r .
(iv) (t, x) | t ≥ x5/3 , x ≥ 0 = (t, x) | (s, t, x), (1/8, z, s), (s, x, z) ∈ Q3r .
√
(v) (t, x) | t ≥ x12 , x ≥ 0 = (t, x) | (t, 1/2, s), (x, s, 2) ∈ Q3r .
n 3
o
(vi) (t, x, y) | t ≥ |x|
y2
, y ≥ 0 = (t, x, y) | (z, x) ∈ Q2 , ( y2 , s, z), ( 2t , z, s) ∈ Q3r .
18
Proof. Proposition (i) follows from
(x, 12 , t) ∈ Q3r
⇔ x ≥ t2 , x ≥ 0
√
⇔ x ≥ |t|, x ≥ 0.
Proposition (ii) follows from √
(x, t, 2) ∈ Q3r
⇔ 2xt ≥ 2, x, t ≥ 0
⇔ t ≥ x1 , x ≥ 0.
Proposition (iii) follows from
(s, t, x), (x, 18 , s) ∈ Q3r
⇔ 2st ≥ x2 , 41 x ≥ s2 , s, t, x ≥ 0
√
⇔ xt ≥ x2 , t, x ≥ 0
⇔ t ≥ x3/2 , x ≥ 0.
Proposition (iv) follows from
(s, t, x), (1/8, z, s), (s, x, z) ∈ Q3r
1 2 2 2
⇔ 4 z ≥ s , 2sx ≥ z , 2st ≥ x , s, t, x ≥ 0
⇔ 2 2 2
2sx ≥ (4s ) , 2st ≥ x , s, t, x ≥ 0
⇔ x ≥ 8s3 , 2st ≥ x2 , x, s ≥ 0
⇔ x1/3 t ≥ x2 , x ≥ 0
⇔ t ≥ x5/3 , x ≥ 0.
Proposition (v) follows from
√
(t, 21 , s), (x, s, 2) ∈ Q3r
⇔ t ≥ s2 , 2xs ≥ 2, t, x ≥ 0
⇔ t ≥ x12 .
Finally, proposition (vi) follows from
(z, x) ∈ Q2 , ( y2 , s, z), ( 2t , z, s) ∈ Q3r
⇔ z ≥ |x|, ys ≥ z 2 , zt ≥ s2 , z, y, s, t ≥ 0
4
⇔ z ≥ |x|, zt ≥ yz 2 , z, y, t ≥ 0
|x|3
⇔ t≥ y2
, y ≥ 0.
In the above proposition the terms x3/2 and x5/3 appeared. Those terms are special cases
of
2k−1
t≥x k ,x≥0 (3.10)
defined for k = 2, 3, 4, 5, .... (3.10) is identical to
tx1/k ≥ x2 , x ≥ 0
which implies
2tz ≥ x2 , x, z ≥ 0,
x1/k ≥ 2z.
The latter can be rewritten as
2tz ≥ x, x, s ≥ 0,
x ≥ 2k z k
for which it is easy to build a conic quadratic representation. Therefore we will leave it as an
exercise for reader.
19
3.2.7 Geometric mean
Closely related is the hypograph of the geometric mean of nonnegative variables x1 and x2 ,
2 √
Kgm = {(x1 , x2 , t) | x1 x2 ≥ t, x1 , x2 ≥ 0}.
Initially let us assume that n = 2l for an integer l ≥ 1. In particular, let us assume l = 3, i.e.,
we seek a conic quadratic representation of
then
2x1 x2 ≥ y12
and so forth, which is obviously equivalent to
x1 x2 ≥ (1/2)y12 .
or equivalently
1
√ (y1 y2 y3 y4 )1/4 ≤ (x1 x2 · · · x8 )1/8 .
2
Hence, by introducing 4 3-dimensional rotated quadratic cones, we have obtained a simpler
problem with 4 y variables instead of 8 x variables. Clearly, we can apply the same idea to the
reduced problem. To that end, introduce the inequalities
implying that
1
√ (z1 z2 )1/2 ≤ (y1 y2 y3 y4 )1/4 .
2
Finally, if we introduce the conic relationship
(z1 , z2 , w1 ) ∈ Q3r
we get √ √ √
w1 ≤ 2(z1 z2 )1/2 ≤ 4(y1 y2 y3 y4 )1/4 ≤ 8(x1 x2 · · · x8 )1/8 .
20
2z1 z2 ≥ w2
x1 x2 x3 x4 x5 x6 x7 x8
Therefore
(x1 , x2 , y1 ), (x3 , x4 , y2 ), (x5 , x6 , y3 ), (x7 , x8 , y4 ) ∈ Q3r ,
(y1 , y2 , z1 ), (y3 , y4 , z2 ) ∈ Q3r ,
(3.13)
(z1 , z2 , w1 ) ∈ Q3r ,
√
w1 = 8t
is a representation of the set (3.11), which is by construction conic quadratic representable.
The relaxation using l levels of auxiliary variables and rotated quadratic cones can represented
using a tree-structure as shown shown in Fig. 3.2.
Next let us assume that n is not a power of two let, for example n = 6. We then wish to
characterize the set
t ≤ (x1 x2 x3 x4 x5 x6 )1/6 , x ≥ 0.
which is obviously equivalent to
t ≤ (x1 x2 x3 x4 x5 x6 )1/8 , x7 = x8 = t, x ≥ 0.
In other words, we can reuse the result (3.13) if we add simple equality constraints
x7 = x8 = t.
Thus, if n is not a power of two, we take l = dlog2 ne and build the conic quadratic quadratic
representation for that set, and we add 2l − n simple equality constraints.
This is a convex inequality, but not conic quadratic representable. However, the reciprocal
inequality
n
1 X −1
xi ≤ y, x ≥ 0. (3.14)
n
i=1
with y = 1/t can be characterized as
n
X
xi zi ≥ 1, zi = ny.
i=1
21
In other words, the set (3.14) corresponding to the epigraph of the reciprocal harmonic mean
of x can be described using an intersection of rotated quadratic cones an affine hyperplanes
n
√ X
2xi zi ≥ wi2 , xi , zi ≥ 0, wi = 2, zi = ny,
i=1
√
or (xi , zi , 2) ∈ Q3r , eT z = ny.
x5/3 ≤ t, x ≥ 0.
We rewrite this as
x5 ≤ t3 , x ≥ 0,
which can be described (using the approach in Sec. 3.2.7) as
0 ≤ x ≤ (y1 y2 y3 y4 y5 )1/5 , y1 = y2 = y3 = t, y4 = y5 = 1,
x−5/2 ≤ t, x ≥ 0,
which we rewrite as
1 ≤ x5 t 2 , x ≥ 0,
or equivalently as
1 ≤ (y1 y2 . . . y7 )1/7 , y1 = · · · = y5 = x, y6 = y7 = t, y, t ≥ 0.
Let ek ∈ Rk denote the vector of all ones. For general p, q ∈ Z+ we then have
22
3.2.11 Power cones
The (n + 1)-dimensional power cone is defined by
n
α
Y
Kαn+1 = (x, y) ∈ Rn+ × R | |y| ≤ xj j (3.15)
j=1
For example, the intersection of the three-dimensional power cone with an affine hyperplane
|y| ≤ xα1 1 x1−α
2
1
, x2 = 1
is equivalent to
|y|1/α1 ≤ x1 ,
i.e., we can model the epigraph of convex increasing terms (1/α1 ≥ 1) using a three-dimensional
power cone. As another example, the cone
2
1− 23
|y| ≤ x13 x2
is equivalent to
|y|3
≤ x2 .
x21
Assume next that α1 = p/q where p and q are positive integers and p ≤ q. Then
p
(1− pq )
3 2 q
K( p ,1− p ) = (x, y) ∈ R+ × R | |y| ≤ x1 x2 . (3.17)
q q
Now p
(1− pq )
|y| ≤ x1q x2
is equivalent to
1
|y| ≤ (xp1 xq−p
2 ) ,
q (3.18)
so by introducing additional z variables and the constraints
z1 = z2 = . . . = zq−p = x1 , zq−p+1 = zq−p+2 = . . . = zq = x2
we can rewrite (3.18) as
1
|y| ≤ (z1 z2 . . . zq ) p (3.19)
which essentially is the geometric mean inequality discussed above.
We next consider the n + 1 dimensional power cone with rational powers,
n
p /q
Y
Kαn+1 = (x, y) ∈ Rn+ × R | |y| ≤ xj j j (3.20)
j=1
Pn
where pj , qj are integers satisfying 0 < pj ≤ qj and j=1 pj /qj = 1. To motivate the general
procedure, we consider a simple example.
23
Example 3.1. Consider the 4 dimensional power cone
1 3 3
4 3
K( 1 , 3 , 3 ) = (x, y) ∈ R+ × R | |y| ≤ (x1 x2 x3 ) .
4 8 8
4 8 8
with
z1 = z2 = x 1 , z3 = z4 = z5 = x2 , z6 = z7 = z8 = x3 .
βpj Pn βpj
where we note that qj are integers and j=1 qj = β. If we define
j
X βpk
sj = , j = 1, . . . , n − 1,
qk
k=1
xT Ax ≤ 0
is equivalent to
n
X
αj (qjT x)2 ≤ α1 (q1T x)2 . (3.22)
j=2
24
or as
√ √ √
( α1 q1T x, α2 q2T x, . . . , αn qnT x) ∈ Qn . (3.24)
The latter equivalence follows directly by using the orthogonal transformation Tn+1 on (3.23),
i.e., √ √ √
Tx Tx
p
1/√2 1/√2 α1 /2q 1 α1 q 1
α1 /2q T x
p
√ 0
1/ 2 −1/ 2
√ 1
T T
1
α2 q2 x = α2 q2 x
.
.. .. ..
. . .
√ T √ T
1 α n qn x αn qn x
E = {x ∈ Rn | x = P −1 y + c, kyk2 ≤ 1}.
Depending on the context one characterization may be more useful than the other.
Qi = FiT Fi , i = 0, . . . , p,
where Fi ∈ Rki ×n . Using the formulations in Sec. 3.2.4 we then get an equivalent conic quadratic
problem
minimize t0 + cT0 x + r0
subject to ti + cTi x + ri = 0, i = 1, . . . , p, (3.26)
(ti , 1, Fi x) ∈ Qkr i +2 , i = 0, . . . , p.
Assume next that Qi is a rank 1 matrix, i.e., Fi has 1 row and n columns. Storing Qi requires
about n2 /2 space whereas storing Fi then only requires n space. Moreover, the amount of work
required to evaluate
x T Qi x
25
is proportional to n2 whereas the work required to evaluate
is proportional to n only. In other words, if Qi have low rank, then (3.25) will require much less
space to solve than (3.25). This fact usually also translates into much faster solution times.
E = {c ∈ Rn | c = F y + g, kyk2 ≤ 1}.
A common approach is then to optimize for the worst-case realization of c, i.e., we get a robust
version
minimize supc∈E cT x
subject to Ax = b (3.27)
x ≥ 0.
The worst-case objective can be evaluated as
where we used that supkuk2 ≤1 v T u = (v T v)/kvk2 = kvk2 . Thus the robust problem (3.27) is
equivalent to
minimize g T x + kF T xk2
subject to Ax = b
x ≥ 0,
which can be posed as a conic quadratic problem
minimize g T x + t
subject to Ax = b
(3.28)
(t, F T x) ∈ Qn+1
x ≥ 0.
µ = Er
and covariance
Σ = E(r − µ)(r − µ)T .
The return of our investment is also a random variable y = rT x with mean (or expected return)
Ey = µT x
26
and variance (or risk)
(y − Ey)2 = xT Σx.
We then wish to rebalance our portfolio to achieve a compromise between risk and expected
return, e.g., we can maximize the expected return given an upper bound γ on the tolerable risk
and a constraint that our total investment is fixed,
maximize µT x
subject to xT Σx ≤ γ
eT x = 1
x ≥ 0.
where cl ∈ Rnl , cqj ∈ Rnj , Al ∈ Rm×n , Aqj ∈ Rm×nj and b ∈ Rm , i.e., a problem with both a
l
linear cone and nq quadratic cones. We can also write (3.29) more compactly as
minimize cT x
subject to Ax = b (3.30)
x ∈ C,
with
xl cl
xq1 cq1
Al Aq1 . . . Aqnq
x= .. , c= .. , A= ,
. .
xqnq cqnq
for the cone C = Rl+ × Qn1 × · · · × Qnq . We note that problem (3.30) resembles a standard
linear optimization, except for a more general cone. The Lagrangian function is
X
L(x, y, s) = cT x − y T (Ax − b) − xT s.
For linear optimization the choice of nonnegative Lagrange multipliers s ≥ 0 ensures that
xT s ≥ 0, ∀x ≥ 0 so the Lagrangian function provides a lower bound on the primal problem.
27
For conic quadratic optimization the choice of Lagrange multipliers is more involved, and we
need the notion of a dual cone
(Qn )∗ = {v ∈ Rn | uT v ≥ 0, ∀u ∈ Qn }.
Conversely, assume that v1 < kv2:n k. Then u := (1, −v2:n /kv2:n k) ∈ Qn satisfies
uT v = v1 − kv2:n k < 0,
/ (Qn )∗ .
i.e., v ∈
The notion of a dual cone allows us to treat (3.30) as a linear optimization problem, where
the dual variables belong to the dual cone, i.e., we have a dual problem
maximize bT y
subject to c − AT y = s (3.31)
s ∈ C∗,
cT x − xT AT y = cT x − bT y = xT s ≥ 0.
maximize bT y
subject to (cqj − (Aqj )T y) ∈ Qnj , j = 1, . . . , nq ,
or equivalently
maximize bT y
subject to k(cqj )2:nj − ((Aqj )2:nj ,: )T yk2 ≤ (cqj )1 − ((Aqj )1,: )T y, j = 1, . . . , nj ,
correspond to the so-called dual form conic constraints in Sec. 3.2.5. If the matrix
A = Aq1 . . . Aqnq ,
has more columns than rows then usually it is more efficient to solve the primal problem (3.30),
and if the opposite is the case, then it is usually more efficient to solve the dual problem (3.31).
28
3.4.2 Examples of bad duality states
The issue of strong duality is more complicated than for linear programming, for example the
primal (or dual) optimal values may be finite but unattained, as illustrated in the following
example.
minimize x1 − x2
subject to x3 = p
1 (3.32)
x1 ≥ x22 + x23 ,
maximize y p
(3.33)
subject to 1 ≥ 1 + y 2 ,
with feasible set {y = 0} and optimal value d? = 0, so a positive duality gap exists for all
bounded (x, y).
It is also possible to have a positive duality gap even when both the primal and dual optimal
objectives are attained.
minimize x3
subject to x1 + x2 + u1 + u2 = 0
px3 + u2 = 1 (3.34)
x22 p
+ x23 ≤ x1
u22 ≤ u1 ,
with two quadratic cones x ∈ Q3 , u ∈ Q2 . It follows from the inequality constraints that
x1 ≥ |x2 |, u1 ≥ |u2 |
minimize p 1 − u2
2
x1 + (1 − u2 )2 ≤ x1 ,
with feasible set {u2 = 1, x1 ≥ 0} and optimal value p? = 0. The dual problem of (3.34) is
maximize y2 p
subject to y1 ≥ py12 + (1 − y2 )2 (3.35)
y1 ≥ (y1 − y2 )2 ,
with feasible set {y2 = 1, y1 ≥ 21 } and the optimal value is d? = −1. For this example, both
the primal and dual optimal values are attained, but we have positive duality gap p? −d? = 1.
29
To ensure strong duality for conic quadratic optimization, we need an additional regularity
assumption. Consider the primal feasible set for (3.30),
Fp = {x = (xl , xq1 , . . . , xqnq ) | Ax = b, xl ≥ 0, xqj ∈ Qnj }.
and dual feasible set for (3.31),
Fd = {y ∈ Rm | c − AT y = (sl , sq1 , . . . , sqnq ), sl ≥ 0, sqj ∈ Qnj },
respectively. If
∃x ∈ Fp : (xqj )1 > k(xqj )2:nj k, j = 1, . . . , nq ,
or
∃y ∈ Fp : c − AT y = (sl , sq1 , . . . , sqnq ), (sqj )1 > k(sqj )2:nj k, j = 1, . . . , nq ,
then strong duality between the problems (3.30) and (3.31) holds. The additional regularity
assumptions are called a Slater constraint qualification. In other words, strong duality holds if
the conic inequalities in the primal or dual problem are strictly feasible (the proofs in App. B.5
can be adapted to handle this case).
Thus the deficiency of positive duality gap in Example 3.3 is caused by the fact that neither
the primal or dual problem is strictly feasible; the inequality in the first problem
q
u1 ≥ u22
is not strict for any u in the primal feasible set, and similarly the inequality
q
y1 ≥ y12 + (1 + y2 )2
is not strict for any y in the dual feasible set.
30
Example 3.4. Consider the conic quadratic problem
minimize x1
subject to 2x1 − x2 = 0
x1p− s = 1
x22 ≤ x1
s ≥ 0,
maximize y2
1 2y1 + y2
subject to 0 + −y1 ∈ Q3 .
0 −y2
Any point y = (0, t), t ≥ 0 is a certificate of infeasibility since (t, 0, −t) ∈ Q3 ; in fact (0, t) is
a dual feasible direction with unbounded objective.
3.6 Bibliography
The material in this chapter is based on the paper [LVBL98] and the books [BN01, BV04]. The
papers [AG03, ART03] contain additional theoretical and algorithmic aspects.
31
Chapter 4
Semidefinite optimization
4.1 Introduction
In this chapter we extend the conic optimization framework from Chap. 2 and 3 with symmetric
positive semidefinite matrix variables.
z T Xz ≥ 0, ∀z ∈ Rn .
For brevity we will often use the shorter notion semidefinite instead of symmetric positive
semidefinite, and we will write X Y (X Y ) as shorthand notation for (X − Y ) ∈ S+ n
n
((Y − X) ∈ S+ ). As inner product for semidefinite matrices, we use the standard trace inner
product for general matrices, i.e.,
X
hA, Bi := tr(AT B) = aij bij .
ij
It is easy to see that (4.1) indeed specifies a convex cone; it is pointed (with origin X = 0),
and X, Y ∈ S+ n implies that (αX + βY ) ∈ S n , α, β ≥ 0. Let us a review a few equivalent
+
definitions of S+n . It is well-known that every symmetric matrix A has a spectral factorization
n
X
A= λi qi qiT .
i=1
where qi ∈ Rn are the (orthogonal) eigenvectors and λi are eigenvalues of A. Using the spectral
factorization of A we have
Xn
T
x Ax = λi (xT qi )2 ,
i=1
32
n if and only if it is a Grammian matrix A = V T V .
Another useful characterization is that A ∈ S+
Using the Grammian representation we have
xT Ax = xT V T V x = kV xk22 ,
In a completely analogous way we define the cone of symmetric positive definite matrices as
n
S++ = {X ∈ S n | z T Xz > 0, ∀z ∈ Rn }
= {X ∈ S n | λi (X) > 0, i = 1, . . . , n}
n
= {X ∈ S+ | X = V T V for some V ∈ Rk×n , rank(V ) = n},
n ((Y − X) ∈ S n ).
and we write X Y (X ≺ Y ) as shorthand notation for (X − Y ) ∈ S++ ++
1
The one dimensional cone S+ simply corresponds to R+ . Similarly consider
x1 x3
X=
x3 x2
x1 x2 ≥ x23 , x1 , x2 ≥ 0,
S = {(x, y, z) ∈ R3 | A(x, y, z) ∈ S+
3
},
33
Figure 4.1: Plot of spectrahedron S = {(x, y, z) ∈ R3 | A(x, y, z) 0}.
(shown in Fig. 4.1) is called a spectrahedron and is perhaps the simplest bounded semidefinite
representable set, which cannot be represented using (finitely many) linear or conic quadratic
cones. To gain a geometric intuition of S, we note that
x2 + y 2 + z 2 − 2xyz = 1,
or equivalently as
T
x 1 −z x
= 1 − z2.
y −z 1 y
For z = 0 this describes a circle in the (x, y)-plane, and for −1 ≤ z ≤ 1 it characterizes an
ellipse (for a fixed z).
• The diagonal elements of A ∈ S+ n are nonnegative. Let e denote the ith standard basis
i
vector (i.e., [ei ]j = 0, j 6= i, [ei ]i = 1). Then Aii = eTi Aei , so (4.1) implies that Aii ≥ 0.
34
• Any submatrix of A ∈ S++n (A ∈ S+n ) is positive (semi)definite; this follows from the
• The inner product of positive (semi)definite matrices is positive (nonnegative). For any
n let A = U T U and B = V T V where U and V have full rank. Then
A, B ∈ S++
where strict positivity follows from the assumption that U has full column-rank, i.e.,
U V T 6= 0.
• The inverse of a positive definite matrix is positive definite. This follows from the positive
spectral factorization A = QΛQT , which gives us
A−1 = QT Λ−1 Q
A BT
X= .
B C
Let us find necessary and sufficient conditions for X 0. We know that A 0 and C 0
(since any submatrix must be positive definite). Furthermore, we can simplify the analysis
using a nonsingular transformation
I 0
L=
F I
AF T + B T
A D1 0
= .
F A + B F AF T + F B T + BF T + C 0 D2
Since det(A) 6= 0 (by assuming that A 0) we see that F = −BA−1 and direct substition
gives us
A 0 D1 0
= .
0 C − BA−1 B T 0 D2
We have thus established the following useful result.
A BT
X= .
B C
A 0, C − BA−1 B T 0.
35
4.1.3 Semidefinite duality
Semidefinite duality is largely identical to conic quadratic duality. We define a primal semidef-
inite optimization problem as
minimize hC, Xi
subject to hAi , Xi = bi , i = 1, . . . , m (4.5)
X ∈ S+ n
The primal constraints are a set of equality constraints involving inner products hAi , Xi = bi ,
i.e., an intersection of n(n + 1)/2-dimensional affine hyperplanes. If we eliminate S from the
dual constraints,
Xm
C− y i Ai 0
i=1
but (similar to conic quadratic optimization) the issue of strong duality is more complicated
than for linear optimization. In particular, it it possible that the primal or dual optimal values
are unattained, and it is also possible for a problem to have a positive duality gap , even though
both the primal and dual optimal values are attained. This is illustrated in the following
example.
36
Example 4.2. We consider a problem
minimize x
1
0 x1 0
subject to x1 x2 0 3,
∈ S+
0 0 1 + x1
with feasible set {x1 = 0, x2 ≥ 0} and optimal value p? = 0. The dual problem can be
formulated as
maximize −z 2
z1 (1 − z2 )/2 0
subject to (1 − z2 )/2 0 3,
0 ∈ S+
0 0 z2
which has a feasible set {z1 ≥ 0, z2 = 1} and dual optimal value d? = −1. Both problems are
feasible, but not strictly feasible.
Just as for conic quadratic optimization, existence of either a strictly feasible primal or dual
point (i.e., a Slater constraint qualification) ensures strong duality for the pair of primal and
dual semidefinite programs.
As discussed in the introduction, we can embed the nonnegative orthant and quadratic cone
within the semidefinite cone, and the Carthesian product of several semidefinite cones can be
modeled as a larger block-diagonal semidefinite matrix. For an optimization solver, however,
it is more efficient to explicitly include multiple cones in the problem formulation. A general
linear, conic quadratic and semidefinite problem format often used by optimization solvers is
Pnq q q
minimize hcl , xl i + j=1 hcj , xj i + nj=1
P s
hCj , Xj i
l l
Pnq q q Pns
subject to A x + j=1 Aj xj + j=1 Aj (Xj ) = b (4.7)
s
xl ∈ Rn+l , xqj ∈ Qqj , Xj ∈ S+j ∀j
with a linear variable xl , conic quadratic variables xqj , semidefinite variables Xj and a linear
mapping Aj : S sj 7→ Rm ,
hAs1j , Xjs i
Aj (Xjs ) := ..
.
.
s s
hAmj , Xj i
All the duality results we have presented for the simpler linear, conic quadratic and semidefinite
problems translate directly to (4.7); the dual problem is
maximize bT y
subject to cl − (Al )T y = sl
(4.8)
cqj − (Aqj )T y = sqj , j = 1, . . . , nq
Cj − A∗j (y) = Sj , j = 1, . . . , ns ,
where
m
X
A∗j (y) := yi Asij ,
i=1
and strong duality holds if a strictly feasible point exists for either (4.7) or (4.8). Similarly, if
there exist
s
xl ∈ Rn+l , xqj ∈ Qqj ∀j, Xj ∈ S+j ∀j
37
satisfying
nq ns nq ns
Aqj xqj q q
X X X X
l l l l
Ax + + Aj (Xj ) = 0, hc , x i + hcj , xj i + hCj , Xj i < 0,
j=1 j=1 j=1 j=1
Sum of eigenvalues
The sum of the eigenvalues corresponds to
m
X n
X
λi (A(x)) = tr(A(x)) = tr(A0 ) + xi tr(Ai ).
i=1 i=1
38
Largest eigenvalue
The largest eigenvalue can be characterized in epigraph form λ1 (A(x)) ≤ t as
To verify this, suppose we have a spectral factorization A(x) = Q(x)Λ(x)Q(x)T where Q(x) is
orthogonal and Λ(x) is diagonal. Then t is an upper bound on the largest eigenvalue if and
only if
Q(x)T (tI − A(x))Q(x) = tI − Λ(x) 0.
Thus we can minimize the largest eigenvalue of A(x).
Smallest eigenvalue
The smallest eigenvalue can be described in hypograph form λm (A(x)) ≥ t as
Eigenvalue spread
The eigenvalue spread can be modeled in epigraph form
λ1 (A(x)) − λm (A(x)) ≤ t
by combining the two linear matrix inequalities in (4.10) and (4.11), i.e.,
Spectral radius
The spectral radius ρ(A(x)) := maxi |λi (A(x))| can be modeled in epigraph form ρ(A(x)) ≤ t
using two linear matrix inequalities
tI ± A(x) 0,
i.e., the epigraph is
K = {(x, t) ∈ Rn+1 | tI A(x) −tI}. (4.13)
µI A(x) µtI,
or equivalently if and only if I µ−1 A(x) tI. In other words, if we make a change of variables
z := x/µ, ν := 1/µ we can minimize the condition number as
minimize Ptm
subject to I νA0 + i=1 zi Ai tI,
and subsequently recover the solution x = νz. In essence, we first normalize the spectrum
by the smallest eigenvalue, and then minimize the largest eigenvalue of the normalized linear
matrix inequality.
39
Roots of the determininant
The determinant of a semidefinite matrix
m
Y
det(A(x)) = λi (A(x))
i=1
is neither convex or concave, but rational powers of the determinant can be modeled using linear
matrix inequalities. For a rational power 0 ≤ q ≤ 1/m we have that
t ≤ det(X)q
if and only if
A(x) Z
0 (4.14)
Z T Diag(Z)
(Z11 Z22 · · · Zmm )q ≥ t, (4.15)
where Z ∈ Rm×m is a lower-triangular matrix, and the inequality (4.15) can be modeled using
the formulations in Sec. 3.2.7; see [BN01, p. 150] for a proof. Similarly we can model negative
powers of the determinant, i.e., for any rational q > 0 we have
t ≥ det(X)−q
if and only if
A(x) Z
T 0 (4.16)
Z Diag(Z)
(Z11 Z22 · · · Zmm )−q ≥ t (4.17)
for a lower triangular Z.
40
Sum of singular values
The trace norm or the nuclear norm of A(x) is the dual of the `2 -norm. We can characterize it
as
kXk∗ = sup tr(X T Z). (4.20)
kZk2 ≤1
It turns out that the nuclear norm corresponds to the sum of the singular values,
m
X
kXk∗ = σi (X), (4.21)
i=1
= sup tr(ΣT U T ZV )
kU T ZV k2 ≤1
= sup tr(ΣT Y ),
kY k2 ≤1
maximize tr(X T Z)
I ZT
subject to 0,
Z I
A(x) = A0 + x1 A1 + · · · + xn An
B(y) = B0 + y1 B1 + · · · + yr Br
41
where Ai ∈ Rm×p , Bi ∈ S p , x ∈ Rn and y ∈ Rr . Then
if and only if
A(x)T
C
0.
A(x) B(y)
v(t) = 1, t, . . . , t2n .
It turns out that an equivalent characterization of {x | f (t) ≥ 0, ∀t} can be given in terms of a
semidefinite variable X,
n+1
xi = hX, Hi i, i = 0, . . . , 2n, X ∈ S+ . (4.25)
When there is no ambiguity, we drop the superscript on Hi . For example, for n = 2 we have
1 0 0 0 1 0 0 0 0
H0 = 0 0 0 , H1 = 1 0 0 , . . . H4 = 0 0 0 .
0 0 0 0 0 0 0 0 1
To verify that (4.23) and (4.25) are equivalent, we first note that
2n
X
u(t)u(t)T = Hi vi (t),
i=0
i.e.,
T
tn
1 1 1 t ...
t t t t2 ... tn+1
.. .. = .. .. .. .
..
. . . . . .
tn tn tn tn+1 . . . t2n
42
Assume next that f (t) ≥ 0. Then from (4.24) we have
f (t) = (q1T u(t))2 + (q2T u(t))2
= hq1 q1T + q2 q2T , u(t)u(t)T i
2n
X
= hq1 q1T + q2 q2T , Hi ivi (t),
i=0
i.e., we have f (t) = xT v(t) with xi = hX, Hi i, X = (q1 q1T + q2 q2T ) 0. Conversely, assume that
(4.25) holds. Then
2n
X 2n
X
f (t) = hHi , Xivi (t) = hX, Hi vi (t)i = hX, u(t)u(t)T i ≥ 0
i=0 i=0
since X 0. In summary, we can characterize the cone of nonnegative polynomials over the
real axis as
n n+1
K∞ = {x ∈ Rn+1 | xi = hX, Hi i, i = 0, . . . , 2n, X ∈ S+ }. (4.26)
Checking nonnegativity of a univariate polynomial thus corresponds to a semidefinite feasibility
problem.
43
4.2.6 Hermitian matrices
Semidefinite optimization can be extended to complex-valued matrices. To that end, let Hn
denote the cone of Hermitian matrices of order n, i.e.,
X ∈ Hn ⇐⇒ X ∈ Cn×n , XH = X
n if and only
where superscript ’H’ denotes Hermitian (or complex) transposition. Then X ∈ H+
if
In other words,
n <X −=X 2n
X∈ H+ ⇐⇒ ∈ S+ . (4.29)
=X <X
Note that (4.29) implies skew-symmetry of =X, i.e., =X = −=X T .
Xn
n
K0,π = {x ∈ R × Cn | x0 + 2<( xi z −i ) ≥ 0, ∀z = ejt , t ∈ [0, π].}.
i=1
v(z) = (1, z, . . . , z n ).
The Riesz-Fejer Theorem states that a trigonometric polynomial f (z) in (4.30) is nonnegative
(i.e., x ∈ K0,π ) if and only if for some q ∈ Cn+1
44
When there is no ambiguity, we drop the superscript on Ti . For example, for n = 2 we have
1 0 0 0 0 0 0 0 0
T0 = 0 1
0 , T1 = 1 0 0 , T2 = 0 0 0 .
0 0 1 0 1 0 1 0 0
i.e.,
H
z −1 z −n
1 1 1 ...
z z z 1 ... z 1−n
.. .. = .. .. .. .
..
. . . . . .
zn zn z n z n−1 . . . 1
Next assume that (4.31) is satisfied. Then
Nonnegativity on a subinterval
We next sketch a few useful extensions. An extension of the Riesz-Fejer Theorem states that a
trigonometric polynomial f (z) of degree n is nonnegative on I(a, b) = {z | z = ejt , t ∈ [a, b] ⊆
[0, π]} if and only if it can be written as a weighted sum of squared trigonometric polynomials
45
n+1 n , i.e.,
for X1 ∈ H+ , X2 ∈ H+
n
K0,α = {x ∈ R × Cn | xi = hX1 , Tin+1 i + hX2 , Ti+1
n n
i + hX2 , Ti−1 i
n+1
− 2 cos(α)hX2 , Tin i, X1 ∈ H+ n
, X2 ∈ H+ }. (4.35)
i.e.,
n
Kα,π = {x ∈ R × Cn | xi = hX1 , Tin+1 i + hX2 , Ti+1
n n
i + hX2 , Ti−1 i
n+1
+ 2 cos(α)hX2 , Tin i, X1 ∈ H+ n
, X2 ∈ H+ }. (4.36)
i.e., the projection of A onto the set S. To pose this as a conic optimization we define the linear
operator
√ √ √ √
svec(U ) = (U11 , 2U21 , . . . , 2Un1 , U22 , 2U32 , . . . , 2Un2 , . . . , Unn ),
which extracts and scales the lower-triangular part of U . We then get a conic formulation of
the nearest correlation problem exploiting symmetry of A − X,
minimize t
subject to kzk2 ≤t
svec(A − X) = z (4.37)
diag(X) =e
X 0.
This is an example of a problem with both conic quadratic and semidefinite constraints in primal
form, i.e., it matches the generic format (4.7).
We can add different constraints to the problem, for example a bound γ on the smallest
eigenvalue
X − γI 0.
A naive way to add the eigenvalue bound would be to introduce a new variable U ,
n
U = X − γI, U ∈ S+ ,
46
which would approximately double the number of variables and constraints in the model. Instead
we should just interpret U as a change of variables leading to a problem
minimize t
subject to kzk2 ≤t
svec(A − U − λI) = z
diag(U + λI) =e
U 0.
kCai k2 + aTi d ≤ bi , i = 1, . . . , m.
maximize det(C)1/n
subject to kCai k2 + aTi d ≤ bi i = 1, . . . , m
C 0.
maximize t
kCai k2 + aTi d ≤ bi , i = 1, . . . , m
subject to
C Z
0
Z T Diag(Z)
t ≤ (Z11 Z22 · · · Znn )1/n ,
where t ≤ (Z11 Z22 · · · Znn )1/n can be modeled as the intersection of rotated quadratic cones,
see Sec. 3.2.7.
47
Figure 4.2: Example of inner and outer ellipsoidal approximations.
S 0 = conv{x1 , x2 , . . . , xm }, xi ∈ Rn .
The ellipsoid
E 0 := {x | kP (x − c)k2 ≤ 1}
has Vol(E 0 ) ≈ det(P )−1/n , so the minimum-volume enclosing ellipsoid is the solution to
maximize t
subject to kP
(xi − c)k2 ≤ 1, i = 1, . . . , m
P Z
0
Z T Diag(Z)
t ≤ (Z11 Z22 · · · Znn )1/n .
In Fig. 4.2 we illustrate the inner and outer ellipsoidal approximations of a polytope character-
ized by 5 points in R2 .
f (t) = x0 + x1 t + x2 t2 + · · · + xn tn .
Often we wish to fit such a polynomial to a given set of measurements or control points
f (tj ) ≈ yj , j = 1, . . . , m.
48
To that end, define the Vandermonde matrix
1 t1 t21 . . . tn1
1 t2 t2 . . . tn2
2
A= . . .. .. .
.
. . . . .
2
1 tm tm . . . tnm
Ax ≈ y,
i.e., as a linear expression in the coefficients x. When the degree of the polynomial equals the
number measurements, n = m, the matrix A is square and non-singular (provided there are no
duplicate rows), so we can can solve
Ax = y
to find a polynomial that passes through all the control points (ti , yi ). Similarly, if n > m there
are infinitely many solutions satisfying the underdetermined system Ax = y. A typical choice
in that case is the least-norm solution
which (assuming again there are no duplicate rows) has a simply solution
On the other hand, if n < m we generally cannot find a solution to the overdetermined system
Ax = y, and we typically resort to a least-squares solution
f (t) := x0 + x1 t + · · · + xn tn ≥ 0, ∀t ∈ [a, b]
n , see (4.27).
with a semidefinite characterization embedded in x ∈ Ka,b
• Monotonicity. We can ensure monotonicity of f (t) by requiring that f 0 (t) ≥ 0 (or f 0 (t) ≤
0), i.e.,
n−1
(x1 , 2x2 , . . . , nxn ) ∈ Ka,b ,
or
n−1
−(x1 , 2x2 , . . . , nxn ) ∈ Ka,b ,
respectively.
49
3
f2 (t)
2
1
2 f4 (t)
f8 (t)
−2 −1 0 1 2 t
Figure 4.3: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through
{(−1, 1), (0, 0), (1, 1)}. The higher-degree polynomials are increasingly smoother on [−1, 1].
fn (t) = x0 + x1 t + · · · + xn tn
to the points {(−1, 1), (0, 0), (1, 1)}, where smoothness is implied by bounding |fn0 (t)|. More
specifically, we wish to solve the problem
minimize z
subject to |fn0 (t)| ≤ z, ∀t ∈ [−1, 1]
fn (−1) = 1, fn (0) = 0, fn (1) = 1,
or equivalently
minimize z
subject to z − fn0 (t) ≥ 0, ∀t ∈ [−1, 1]
fn0 (t) − z ≥ 0, ∀t ∈ [−1, 1]
fn (−1) = 1, fn (0) = 0, fn (1) = 1.
n to get a conic problem
Finally, we use the characterizations Ka,b
minimize z
n−1
subject to (z − x1 , −2x2 , . . . , −nxn ) ∈ K−1,1
n−1
(x1 − z, 2x2 , . . . , nxn ) ∈ K−1,1
fn (−1) = 1, fn (0) = 0, fn (1) = 1.
In Fig. 4.3 we show the graphs for the resulting polynomails of degree 2, 4 and 8, respectively.
The second degree polynomial is uniquely determined by the three constraints f2 (−1) = 1,
50
3
f2 (t)
2
f4 (t)
1
2
f8 (t)
−2 −1 0 1 2 t
Figure 4.4: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through
{(−1, 1), (0, 0), (1, 1)}. The polynomials all have positive second derivative (i.e., they are convex)
on [−1, 1] and the higher-degree polynomials are increasingly smoother on that interval.
f2 (0) = 0, f2 (1) = 1, i.e., f2 (t) = t2 . Also, we obviously have a lower bound on the largest
derivative maxt∈[−1,1] |fn0 (t)| ≥ 1. The computed fourth degree polynomial is given by
3 1
f4 (t) = t2 − t4
2 2
after rounding coefficients to rational numbers. Furthermore, the largest derivative is given by
√ √
f40 (1/ 2) = 2,
√
and f400 (t) < 0 on (1/ 2, 1] so, although not visibly clear, the polynomial is nonconvex on [−1, 1].
In Fig. 4.4 we show the graphs of the corresponding polynomials where we added a convexity
constraint fn00 (t) ≥ 0, i.e.,
n−2
(2x2 , 6x3 , . . . , (n − 1)nxn ) ∈ K−1,1 .
In this case, we get
6 1
f4 (t) = t2 − t4
5 5
8
and the largest derivative increases to 5 .
51
1+δ
1−δ
H(ω)
t?
ωp ωs π
ω
We often wish a transfer function where H(ω) ≈ 1 for 0 ≤ ω ≤ ωp and H(ω) ≈ 0 for
ωs ≤ ω ≤ π for given constants ωp , ωs . One possible formulation for achieving this is
minimize t
subject to 0 ≤ H(ω) ∀ω ∈ [0, π]
1 − δ ≤ H(ω) ≤ 1 + δ ∀ω ∈ [0, ωp ]
H(ω) ≤ t ∀ω ∈ [ωs , π],
which corresponds to minimizing H(w) on the interval [ωs , π] while allowing H(w) to depart
from unity by a small amount δ on the interval [0, ωp ]. Using the results from Sec. 4.2.7 (in
particular (4.33), (4.35) and (4.36)), we can pose this as a conic optimization problem
minimize t
subject to x ∈ n
K0,π
(x0 − (1 − δ), x1:n ) ∈ n
K0,ω p
(4.38)
−(x0 − (1 + δ), x1:n ) ∈ n
K0,ω p
−(x0 − t, x1:n ) ∈ Kωns ,π ,
which is a semidefinite optimization problem. In Fig. 4.5 we show H(ω) obtained by solving
(4.38) for n = 10, δ = 0.05, ωp = π/4 and ωs = ωp + π/8.
minimize cT x
subject to Ax = b (4.39)
x ∈ {0, 1}n .
In general, problem (4.39) is a very difficult non-convex problem where we have to explore 2n
different objectives. Alternatively, we can use semidefinite optimization to get a lower bound
on the optimal solution with polynomial complexity. We first note that
xi ∈ {0, 1} ⇐⇒ x2i = xi ,
52
which is, in fact, equivalent to a rank constraint on a semidefinite variable,
X = xxT , diag(X) = x.
minimize cT x
subject to Ax = b
(4.40)
diag(X) = x
X xxT ,
From Schur’s Lemma (Lemma 4.1) this is equivalent to a standard semidefinite optimization
problem
maximize −y T
b−t 1
(c + AT y − z)
t 2 (4.41)
subject to 1 T T 0.
2 (c + A y − z) diag(z)
This dual problem gives us a lower bound. To obtain an explicit relaxation of (4.39) we derive
the dual once more, but this time of problem (4.41). The Lagrangian function of (4.41) is
1 T y−z)
T
T t (c+A
L(t, y, X) = −y b − t + h xν xX , 1 (c+AT y−z)T diag(z)
2
i
2
which is only bounded above if ν = 1, Ax = b, Diag(X) = x, i.e., we get the dual problem
(4.40), which is called a Lagrangian relaxation. From strong duality, problem (4.40) has the
same optimal value as problem (4.41).
53
v1
v2 v0
v6 v5
v3 v4
We can tighten (or improve) the relaxation (4.40) by adding other constraints the cuts away
part of the feasible set, without excluding rank 1 solutions. By tightening the relaxation, we
reduce the duality gap between the optimal values of the original problem and the relxation.
For example, we can add the constraints
0 ≤ xi ≤ 1, i = 1, . . . , n
and
0 ≤ Xij ≤ 1, i = 1, . . . , n, j = 1, . . . , n.
A semidefinite matrix X with Xij ≥ 0 is called a doubly nonnegative matrix. In practice,
constraining a semidefinite matrix to be doubly nonnegative has a dramatic impact on the
solution time and memory requirements of an optimization problem, since we add n2 linear
inequality constraints. We refer to [LR05] for a recent survey on semidefinite relaxations for
integer programming.
54
where n = |V |, and let
+1, vi ∈ S
xi =
−1, vi ∈
/ S.
Suppose vi ∈ S. Then 1 − xi xj = 0 if vj ∈ S and 1 − xi xj = 2 if vj ∈
/ S, so we get an expression
of the capacity as
1X 1 1
c(x) = (1 − xi xj )Aij = eT Ae − xT Ax,
4 4 4
ij
A0 + x1 A1 + · · · + xn An 0
A0 + x1 A1 + · · · + xn An = S, S 0. (4.45)
Apart from introducing an additional semidefinite variable S ∈ S+ m , we also add m(m + 1)/2
equality constraints. On the other hand, a semidefinite variable X ∈ S+n can be rewritten as a
Such convertions are often necessary, but they can make the computational cost of the resulting
problem prohibitively high.
Obviously we should only use the the transformations (4.45) and (4.46) when necessary; if
we have a problem that is more naturally interpreted in either primal or dual form, we should
be careful to recognize that structure. For example, consider a problem
minimize eT z
subject to A + Diag(z) = X
X 0.
55
with the variables X ∈ S+ n and z ∈ Rn . This is a problem in primal form with n(n + 1)/2
equality constraints, but they are more naturally interpreted as a linear matrix inequality
X
A+ ei eTi zi 0.
i
4.4 Bibliography
Much of the material in this chapter is based on the paper [VB96] and the books [BN01,
BKVH04]. The section on optimization over nonnegative polynomials is based on [Nes99,
Hac03]. We refer to [LR05] for a comprehensive survey on semidefinite optimization and relax-
ations in combinatorial optimization.
56
Chapter 5
Quadratic optimization
5.1 Introduction
In this chapter we discuss convex quadratic optimization. Our discussion is fairly brief compared
to the previous chapters for three reasons; (i ) quadratic optimization is a special case of conic
quadratic optimization, (ii ) for most problems it is actually more efficient for the solver to pose
the problem in conic form, and (iii ) duality theory (including infeasibility certificates) are much
simpler for conic quadratic optimization. Therefore, we generally recommend a conic quadratic
formulation.
minimize (1/2)xT Qx + cT x
subject to Ax = b (5.1)
x ≥ 0,
quadratic term xT Qx. Note the important requirement that Q is symmetric positive semidefi-
nite; otherwise the objective function is not convex.
1 12
Q= 1 .
2 1
• The optimal solution x? is at the boundary of the polyhedron (as shown in Fig. 5.1). At
x? one of the hyperplanes is tangential to an ellipsoidal level curve.
• The optimal solution is inside the polyhedron; this occurs if the unconstrained minimizer
arg minx (1/2)xT Qx + cT x = −Q† c (i.e., the center of the ellipsoidal level curves) is inside
the polyhedron.
57
a1
a2
x⋆
a0 a3
a4
Figure 5.1: Geometric interpretation of quadratic optimization. At the optimal point x? the
hyperplane {x | aT1 x = b} is tangential to an ellipsoidal level curve.
• If the polyhedron is unbounded in the opposite direction of c, and if the ellipsoid level
curves are degenerate in that direction (i.e., Qc = 0), then the problem is unbounded. If
Q ∈ S++n , then the problem cannot be unbounded.
Possibly because of its simple geometric interpretation and similarity to linear optimization,
quadratic optimization has been more widely adopted by optimization practioners than conic
quadratic optimization.
with Lagrange multipliers s ∈ Rn+ , and from ∇x L(x, y, s) = 0 we get the necessary first-order
optimality condition
Qx = AT y + s − c,
i.e., (AT y + s − c) ∈ R(Q). Then
58
Alternatively, we can characterize the constraint (AT y + s − c) ∈ R(Q) as Qw = AT y + s − c
with an extraneous variable w to get an equivalent dual problem
maximize bT y − (1/2)wT Qw
subject to Qw = AT y − c + s (5.4)
s ≥ 0.
Note from the optimality conditions that w = x, so (5.4) is an unusual dual problem in the
sense that it involves both primal and dual variables.
Weak duality follows from the inner product between x and the dual equality constraint,
xT Qx = bT y − cT x + xT s,
which shows that
(1/2)xT Qx + cT x − bT y − (1/2)xT Qx = xT s ≥ 0.
Furthermore, strong duality holds if a Slater constraint qualification is satisfied, i.e., if either
the primal or dual problem is strictly feasible (as in the previous chapters).
intersection of ellipsoids (or affine halfspaces if some of Qi are zero). Note the important
requirement Qi 0, ∀i, so that the objective function is convex and the constraints
(1/2)xT Qi x + cTi x + ri ≤ 0
characterize convex sets. It is important to note that neither quadratic equalities
(1/2)xT Qi x + cTi x + ri = 0
or inequalities of the form
(1/2)xT Qi x + cTi x + ri ≥ 0
characterize convex sets, and therefore cannot be included.
59
5.3.1 Duality in quadratically constrained optimization
The Lagrangian function for (5.5) is
m
X
T
cT0 x λi (1/2)xT Qi x + cTi x + ri
L(x, λ) = (1/2)x Q0 x + + r0 +
i=1
T T
= (1/2)x Q(λ)x + c(λ) x + r(λ),
where
m
X m
X m
X
Q(λ) = Q0 + λi Qi , c(λ) = c0 + λi ci , r(λ) = r0 + λi ri .
i=1 i=1 i=1
or equivalently
maximize −(1/2)wT Q(λ)w + r(λ)
subject to Q(λ)w = −c(λ) (5.8)
λ ≥ 0,
Weak duality is easily verified; from (5.6) we have
xT Q(λ)x + c(λ)T x = 0
which implies
and strong duality holds provided the quadratic inequalities are strictly feasible,
for some x ∈ Rn . Using a general version of Lemma 4.1 for singular matrices (see, e.g., [BV04,
p. 651]), we can also write (5.7) as an equivalent semidefinite optimization problem,
maximize t
2(r(λ) − t) c(λ)T
subject to 0 (5.9)
c(λ) Q(λ)
λ ≥ 0.
60
5.4 Separable quadratic optimization
Often a factorization of Q = F T F is either known or readily available, in which case we get an
alternative formulation of (5.1) as
minimize (1/2)z T z + cT x
subject to Ax = b
(5.10)
Fx = z
x ≥ 0.
The formulation (5.10) has several advantages; convexity of the problem is obvious (which
can occasionally be difficult to detect in finite precision arithmetic), and the structure and
sparsity of the separable reformulation is typically more efficient for an optimization solver.
In the following example we consider a related reformulation, which can result in a significant
reduction of complexity and solution time (see also Sec. 3.3.3).
Q = diag(w) + F T F, w > 0.
which is very similar to the conic formulation (3.26). If the formulation (5.11) is sparse compared
to (5.5) (i.e., if it is described using fewer non-zeros), then it generally results in a reduction
of computation time. On the other hand, if we form (5.11), we might as well use the conic
formulation (3.26) instead.
61
Chapter 6
6.1 Introduction
In the previous chapters we have considered different classes of convex problems with continuous
variables. In this chapter we consider a much wider range of problems by allowing integer
variables. In particular, we consider conic and convex quadratic optimization problems where
some of the variables are constrained to be integer valued. In principle, this allows us to model
a vastly larger range of problems, but in practice we may not able to solve the resulting integer
optimization problems within a reasonable amount of time.
(x > 0) → (z = 1) (6.1)
where z ∈ {0, 1} is an indicator variable. Since 0 ≤ x < M we can rewrite (6.1) as a linear
inequality
x ≤ M z, (6.2)
where z ∈ {0, 1}. From the characterization (6.2) we see that that (6.1) is equivalent to
(z = 0) → (x = 0),
which is the well-known property of contraposition. In Sec. 6.3 we discuss such standard boolean
expressions in more details.
Example 6.1. Assume that production of a specific item i has a production cost of ui per
unit produced with an additional fixed charge of wi if we produce item i, i.e., a discontinuous
production cost
wi + ui xi , xi > 0
ci (xi ) =
0, xi = 0.
62
ui
Total cost
wi
0 1 2 3 4 5
Quantities produced
shown in Fig. 6.1. If we let M denote an upper bound on the quantities we can produce, we
get an alternative expression for the production cost,
where zi is a binary variable indicating whether item i is produced or not. We can then
minimize the total production cost under an affine constraint Ax = b as
minimize uT x + wT z
subject to Ax = b
xi ≤ M zi , i = 1, . . . , n
x≥0
z ∈ {0, 1}n ,
which is a linear mixed-integer optimization problem. Note that by minimizing the produc-
tion cost, we rule out the possibility that zi = 1 and xi = 0.
On the other hand, suppose x satisfies 0 < m ≤ x for a known lower bound m. We can then
model the implication
(z = 1) → (x > 0), (6.3)
(which is the reverse implication of (6.1)) as a linear inequality
x ≥ mz, (6.4)
where z ∈ {0, 1}.
63
6.2.3 Constraint satisfaction
Suppose we have an upper bound M on the affine expression aT x − b where x ∈ Rn . We can
then model the implication (z = 1) → (aT x ≤ b) as
aT x ≤ b + M (1 − z) (6.7)
where z ∈ {0, 1}, i.e., z indicates whether or not the constraint aT x ≤ b is satisfied. To
model the reverse implication (aT x ≤ b) → (z = 1) we equivalently consider the contraposition
(z = 0) → (aT x > b), which we can rewrite as
aT x > b + mz (6.8)
where m < aT x − b is a lower bound. In practice, we relax the strict inequality using a small
amount of slack, i.e.,
aT x ≥ b + (m − )z + . (6.9)
Collectively, we thus model (aT x ≤ b) ↔ (z = 1) as
aT x ≥ b + m(1 − z) (6.11)
using the alower bound m < aT x − b and upper bound M > aT x − b. We can combine (6.10)
and (6.10) to model indication of equality constraints aT x = b. To that end, we note that
(z = 1) → (aT x = b) is equivalent to both (6.7) and (6.11), i.e.,
On the other hand, (z = 0) → (aT x 6= b) (or equivalently (z = 0) → (aT x > b) ∨ (aT x < b)) can
be written using (6.9) and (6.12) as
aT x ≥ b + (m − )z1 + , aT x ≤ b + (M + )z2 − , z1 + z2 − z ≤ 1,
To that end, let M > aTj x − bj , ∀j be a collective upper bound. Then we can model
64
as
z = z1 + · · · + zk ≥ 1, aTj x ≤ bj + M (1 − zj ) ∀j, (6.15)
where zj ∈ {0, 1} are binary variables. To characterize the reverse implication (aT1 x ≤ b1 ) ∨
(aT2 x ≤ b2 ) ∨ · · · ∨ (aTk x ≤ bk ) → (z = 1) we consider the contrapositive
(z = 0) → (aT1 x > b1 ) ∧ (aT2 x > b2 ) ∧ · · · ∧ (aTk x > bk ), (6.16)
which we can write as
aTj x ≥ b + (m − )z + , j = 1, . . . , k, (6.17)
for a lower bound m < aTj x − bj , ∀j.
The condition that only two adjacant variables can be nonzero is sometimes called an SOS2
constraint. If we introduce indicator variables zi for each pair of adjacent variables (λi , λi+1 ),
we can model it as
λ1 ≤ z 1 , λ2 ≤ z1 + z2 , λ3 ≤ z2 + z3 , λ4 ≤ z4 + z3 , λ5 ≤ z4
z1 + z2 + z3 + z4 = 1, z ∈ B4 ,
which satisfies (zj = 1) → λi = 0, i 6= {j, j + 1}. Collectively, we can then model the epigraph
f (x) ≤ t as
x = nj=1 λj αj ,
P Pn
j=1 λj f (αj ) ≤ t
λ 1 ≤ z1 , λ ≤ z + zj−1 , j = 2, . . . , n − 1, λn ≤ zn−1 , (6.20)
Pj n j Pn−1 n−1
λ ≥ 0, j=1 λj = 1, j=1 zj = 1, z∈B ,
for a piecewise-linear function f (x) with n terms. This approach is often called the lambda-
method. For the function in Fig. 6.2 we can reduce the number of integer variables by using a
Gray encoding
65
f (x)
α1 α2 α3 α4 α5 x
00 10 11 01
α1 α2 α3 α4 α5
of the intervals [αj , αj+1 ] and an indicator variable y ∈ B2 to represent the four different values
of Gray code. We can then describe the constraints on λ using only two indicator variables,
(y1 = 0) → λ3 =0
(y1 = 1) → λ1 = λ5 = 0
(y2 = 0) → λ4 = λ5 = 0
(y2 = 1) → λ1 = λ2 = 0,
x = 5j=1 λj αj ,
P P5
j=1 λj f (αj ) ≤ t,
λ3 ≤ y1 , λ1 + λ5 ≤ (1 − y1 ), λ4 + λ5 ≤ y2 , λ1 + λ2 ≤ (1 − y2 ),
P5
λ ≥ 0, j=1 λj = 1, y1 + y2 = 1, y ∈ B2 ,
The lambda-method can also be used to model multivariate continuous piecewise-linear non-
convex functions, specified on a set of polyhedra Pk . For example, for the function shown in
Fig. 6.3 we can model the epigraph f (x) ≤ t as
x = 6i=1 λi vi ,
P P6
i=1 λi f (vi ) ≤ t,
λ1 ≤ z1 + z2 , λ2 ≤ z1 , λ3 ≤ z2 + z3 ,
(6.21)
λ4 ≤ z1 + z2 + z3 + z4 , λ5 ≤ z3 + z4 , λ6 ≤ z4 ,
P6 P4
λ ≥ 0, i=1 λi = 1, i=1 zi = 1, z ∈ B4 .
66
f (x)
α1 α2 α3 x
decision variables for the different polyhedra. If we use a decision variable y ∈ B2 with a Gray
encoding
(P1 , P2 , P3 , P4 ) → (00, 10, 11, 01),
we have
(y1 = 0) → λ3 =0
(y1 = 1) → λ2 = λ6 = 0
(y2 = 0) → λ5 = λ6 = 0
(y2 = 1) → λ1 = λ2 = 0,
which gives the characterization of the epigraph
x = 6i=1 λi vi ,
P P6
i=1 λi f (vi ) ≤ t,
λ1 ≤ y1 , λ1 + λ5 ≤ y2 , λ4 + λ5 ≤ y3 , λ1 + λ2 ≤ (1 − y2 ), (6.22)
P6 P2
λ ≥ 0, i=1 λi = 1, i=1 yi = 1, y ∈ B2 .
For multidimensional piecewise linear functions, a reduction of the number of binary decision
variables is only possible when the polyhedral regions have a special topoly, for example a
“Union Jack” arrangement as shown in Fig. 6.3. We refer to the recent survey [VAN10] for
further references and comparisons between different mixed-integer formulations for piecewise
linear functions.
x = λ1 α1 + (λ2 + λ3 + λ4 )α2 + λ5 α5 ,
λ1 f (α1 ) + λ2 f− (α2 ) + λ3 f (α2 ) + λ4 f+ (α2 ) + λ5 f (α3 ) ≤ t,
(6.23)
λ1 + λ2 ≤ z1 , λ3 ≤ z2 , λ4 + λ5 ≤ z3 ,
P6 P3
λ ≥ 0, i=5 λi = 1, i=1 zi = 1, z ∈ B3 ,
where we a different decision variable for the intervals [α1 , α2 ), [α2 , α2 ], and (α2 , α3 ]. As a
special case this gives us an alternative characterization of fixed charge models considered in
Sec. 6.2.1.
67
6.3 Boolean primitives
In the previous sections we used a binary decision variables to indicate whether or not a real
valued affine expression was positive. In general, we can express a multitude of boolean ex-
pressions as linear inequality constraints involving integer variables. The basic operators from
boolean algebra are complement, conjuction and disjunction, which are used to construct more
complicated boolean expressions. In the following we let x and y denote binary variables with
values {0, 1}.
6.3.1 Complement
Logical complement ¬x is defined by the following truth table.
x ¬x
0 1
1 0
Thus in terms of an affine relationship,
¬x = 1 − x. (6.24)
6.3.2 Conjunction
Logical conjuction (x ∧ y) is defined by the following truth table.
x y x∧y
0 0 0
1 0 0
0 1 0
1 1 1
Thus z = (x ∧ y) is equivalent to
z + 1 ≥ x + y, x ≥ z, y ≥ z. (6.25)
6.3.3 Disjunction
Logical disjuction (x ∨ y) is defined by the following truth table.
x y x∨y
0 0 0
1 0 1
0 1 1
1 1 1
Thus (x ∨ y) is equivalent to
x + y ≥ 1. (6.26)
The following table shows basic properties that are easily verified using the truth tables for
complement, conjunction and disjunction.
Associativity x ∨ (y ∨ z) = (x ∨ y) ∨ z x ∧ (y ∧ z) = (x ∧ y) ∧ z
Commutativity x∨y =y∨x x∧y =y∧x
Distributivity x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z) x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z)
De Morgan ¬(x ∧ y) = ¬x ∨ ¬y ¬(x ∨ y) = ¬x ∧ ¬y
The last row in the table shows the properties called De Morgan’s laws, which shows that
conjugation and disjunction in a sense are dual operators.
68
6.3.4 Implication
Logical implication (x → y) is defined by the following truth table.
x y x→y
0 0 1
1 0 0
0 1 1
1 1 1
It is easy verify that that (x → y) is equivalent to both (¬x ∨ y) and
¬y → ¬x. (6.27)
In terms of indicator variables we have that x → y is equivalent to
x ≤ y. (6.28)
or equivalently
¬x → (u = 0), ¬y → (v = 0).
We then model (u = 0) ∨ (v = 0) as (¬x ∨ ¬y), which can be written as
u ≤ M x, v ≤ M y, (1 − x) + (1 − y) ≥ 1.
69
6.5 Bibliography
This chapter is based on the books [NW88, Wil93]. Modeling of piecewise linear functions is
described in the survey paper [VAN10].
70
Chapter 7
Practical optimization
7.1 Introduction
In this chapter we discuss different practical aspects of creating optimization models.
minimize −x2
subject to x1 + x2 ≤ 2,
√ (7.1)
x2 ≤ 2,
x1 , x2 ≥ 0,
1.4142135623730951.
√
The true objective value is − 2, so the approximate objective value is wrong by the amount
√
1.4142135623730951 − 2.
Most likely this difference is irrelevant for all practical purposes. Nevertheless, in general a
solution obtained using floating point arithmetic is only an approximation. Most (if not all)
commercial optimization software uses double precision floating point arithmetic, implying that
about 16 digits are correct in the computations
√ performed internally by the software. This also
means that irrational numbers such as 2 and π can only be stored accurately within 16 digits.
A good practice after solving an optimization problem is to evaluate whether the reported
solution is indeed an optimal solution; at the very least this process should be caried out during
the initial phase of building a model, or if the reported solution is unexpected in some way. The
71
first step in that process is to check that is it feasible; in case of the small example (7.1) this
amounts to verifying that
x1 + x2 = 0.0000000000000000 + 1.4142135623730951 ≤ 2,
√
x2 = 1.4142135623730951 ≤ 2,
x1 = 0.0000000000000000 ≥ 0,
x2 = 1.4142135623730951 ≥ 0,
which demonstrates that the solution is feasible. However, in general the constraints might be
violated due to computations in finite precision. It can be difficult to access the significance of
a specific violation; for example a violation of one unit in
x1 + x2 ≤ 1
x1 + x2 ≤ 109 .
The right-hand side of 109 may itself be the result of a computation in finite precision, and my
only be known with, say 3 digits of accuracy. Therefore, a violation of 1 unit is not significant
since the true right-hand side could just as well be 1.001 · 109 .
Another question is how to verify that a feasible approximate solution is actually optimal,
which is answered by duality theory. From the discussion of duality in Sec. 2.3, the dual problem
to (7.1) is √
maximize 2y1 + 2y2
subject to y1 ≤ 0, (7.2)
y1 + y2 ≤ −1.
and weak duality mandates thats the primal objective value is greater than dual objective value
for any dual feasible solution, i.e.,
√
−x2 ≥ 2y1 + 2y2 .
Furthermore, if the bound is tight (i.e., if the inequality holds as an equality), then the primal
solution must be optimal. In other words,
√ if there exists a dual feasible solution such that the
dual objective value is identical to − 2, then we know (from duality) that the primal solution
is optimal. For example, the optimization software may report
y1 = 0 and y2 = −1,
√
which is easily verified to be a feasible solution with dual objective value − 2, proving opti-
mality of both the primal and solution.
minimize cT x
subject to Ax = b, (7.3)
x∈K
where K is a convex cone. Let P (x, K) denotes the projection of x onto K. The distance measure
72
then gives the distance from x to the nearest feasible point in K, and obviously
x∈K ⇔ d(x, K) = 0.
Thus d(x, K) is natural measure of the cone-feasibility of a point x; if d(x, K) > 0 then x 6∈ K
(i.e., x is infeasible) and the smaller the value, the closer x is to K. For x ∈ R, let
The following lemmas then give the projections and distance measure for the different cones.
and
n
d(X, S+ ) = max[−λi ]+ . (7.12)
i
73
Chapter 8
Case studies
8.1 Introduction
This chapter discusses a number of case studies and examples with details that are beyound the
general character of the previous chapters.
where n denotes the number of items to be produced, b denotes the amount of common resource,
and rj is the consumption of the limited resource to produce one unit of item j. The objective
function represents production and inventory costs. Let cpj denote the cost per unit of product
j and cij denote the rate of of holding costs, respectively. Further, let
cpj crj
dj =
2
so that
dj xj
is the average holding costs for product j. If Dj denotes the total demand or product j and coj
the ordering cost per order of product j then let
ej = coj Dj
and hence
ej coj Dj
=
xj xj
is the average ordering costs for product j. It is not always possible to produce a fractional
number of items. In such cases a constraint saying xj has to be integer valued should be added.
In summary, the problem finds the optimal batch size such that the inventory and ordering
cost are minimized while satisfying the constraints on the common resource. Given dj , ej ≥ 0
74
problem (8.1) is equivalent to the conic quadratic problem
Pn
minimize (dj xj + ej tj )
Pj=1
n
subject to j=1 r√ j xj ≤ b,
(8.2)
(tj , xj , 2) ∈ Q3r , j = 1, . . . , n,
xj ≥ 0, j = 1, . . . , n.
maximize µT x
subject to xT Σx ≤ γ
(8.3)
eT x = 1
x ≥ 0,
where µ ∈ Rn is a vector of expected returns for n different assets and Σ ∈ S+n denotes the
minimize xT Σx
subject to µT x ≥ δ
(8.4)
eT x = 1
x ≥ 0.
Both problems (8.3) and (8.4) are equivalent in the sense that they describe the same Pareto-
optimal trade-off curve by varying δ and γ. In fact, given a solution to either problem, we
can easily find the parameters for the other problem, which results in the same solution. The
optimality conditions for (8.4) are
where ν ∈ R+ , λ ∈ R, z ∈ Rn+ are dual variables, and the optimality conditions for (8.3) are
where α ∈ R+ , ζ ∈ R, v ∈ Rn+ are dual variables. Furthermore, from (8.5) we see that
Thus, given a solution (x? , ν ? , λ? , z ? ) to (8.5), we see that for γ = (x? )T Σx? = (1/2)(ν ? δ + λ? ),
x = x? , α = 1, ζ = λ? , v = z?
is a solution to (8.6).
Next consider a factorization
Σ = V TV (8.7)
75
for some V ∈ Rk×n . We can then rewrite both problems (8.3) and (8.4) in conic quadratic form
as
maximize µT x
subject to (1/2, γ, V x) ∈ Qk+2
r
(8.8)
eT x = 1
x ≥ 0,
and
minimize t
subject to (1/2, t, V x) ∈ Qk+2
r
µT x ≥ δ (8.9)
eT x = 1
x ≥ 0,
respectively. In practice, a suitable factorization (8.7) is either readily available, or it easily
obtained. We mention typical choices below.
• Data-matrix. Σ might be specified directly in the form (8.7), where V is a normalized
data-matrix with k observations of market data (for example daily returns) of the n assets.
When the observation horizon k is shorter than n, which is typically the case, the conic
representation is both more parsimonious and has better numerical properties.
• Cholesky. Given Σ 0, we may factor it as (8.7) where V is an upper-triangular Cholesky
factor. In this case k = n, i.e., V ∈ Rn×n , so there is little difference in complexity between
the conic and quadratic formulations.
• Factor model. For a factor model we have
Σ = D + UT U
where D 0 is a diagonal matrix, and U ∈ Rk×n is a low-rank matrix (k n). The
factor model directly gives us a factor
1/2
D
V = (8.10)
U
of dimensions (n + k) × n. The dimensions of V in (8.10) are larger than the dimensions
of the Cholesky factors of Σ, but V in (8.10) is very sparse, which usually results in a
significant reduction of solution time.
We can also combine the formulations (8.3) and (8.4) into an alternative problem that more
explicitly computes a trade-off between risk and return,
maximize µT x − λxT Σx
subject to eT x = 1 (8.11)
x ≥ 0,
with a trade-off parameter λ ≥ 0. One advantage of (8.11) is that the problem is always feasible.
This formulation also produces the same optimal trade-off curve, i.e., for a particular (feasible)
choice of δ (or γ), there exists a value of λ for which the different problems have the same
solution x. The conic formulation of (8.11) is
maximize µT x − λt
subject to (1/2, t, V x) ∈ Qk+2
r
(8.12)
eT x = 1
x ≥ 0.
76
8.3.2 Slippage costs
In a more realistic model we correct the expected return of investment by slippage terms Tj (xj ),
maximize µT x − j Tj (xj )
P
subject to xT Σx ≤ γ
(8.13)
eT x = 1
x ≥ 0,
Market impact
The phenomenon that the price of an asset changes when large volumes are traded is called
market impact, and for large-volume trading it is the most important slippage contribution.
Empirically it has been demonstrated that market impact, i.e., the additional cost of buying or
selling an asset is well-approximated by the conic representable function
3/2
Tj (xj ) = αj xj
with coefficients αj ≥ 0 estimated from historical data. Using Lemma 3.1 Prop. (iv), we can
write the epigraph
3/2
xj ≤ tj
as
(sj , tj , xj ), (1/8, xj , sj ) ∈ Q3r ,
i.e, we get a conic formulation of (8.13)
maximize µT x − αT t
subject to (1/2, γ, V x) ∈ Qk+2 r
(sj , tj , xj ) ∈ Q3r
(8.14)
(1/8, xj , sj ) ∈ Q3r
eT x = 1
x ≥ 0.
Transaction costs
For smaller volume trading the main slippage contribution is transaction costs, which can mod-
eled as a linear function
Tj (xj ) = αk xj
or as a fixed-charge plus linear function
0, x = 0,
Tj (xj ) =
αj xj + βk , 0 < xj ≤ ρj ,
which can be modeled as using indicator variables similar to Example 6.1. To that end, let zj
be an indicator variable,
(zj = 0) → (xj = 0)
which we can express as
xj ≤ ρj zj , zj ∈ B. (8.15)
77
We thus get a conic integer problem
maximize µT x − eT t
subject to (1/2, γ, V x) ∈ Qk+2
r
αj xj + βj zj ≤ tj
xj ≤ ρj zj (8.16)
eT x = 1
x≥0
z ∈ Bn .
Since the objective function minimizes tj we exclude the possibility that xj = 0 and zj = 1.
y = γµT x, γ ≥ 0.
Since a positive γ can be chosen arbitrarily and (µ − rf e)T x > 0, we have without loss of
generality that
(µ − rf e)T y = 1.
Thus, we obtain the following conic problem for maximizing the Sharpe ratio,
minimize t
subject to (t, V y) ∈ Qk+1
eT y = γ
(µ − rf e)T y = 1
γ ≥ 0
y ≥ 0,
78
Appendix A
Let R denote the set of real numbers, Z the set of integers, and B the boolean set {0, 1},
respectively. Rn denotes the set n dimensional vectors of real numbers (and similary for Zn
and Bn ); in most cases we denote such vectors by lower case letters, e.g., a ∈ Rn . A subscripted
value ai then refers to the ith entry in a, i.e.,
a = (a1 , a2 , . . . , an ).
All vectors considered in this manual are interpreted as column-vectors. For a, b ∈ Rn we use
the standard inner product,
ha, bi := a1 b1 + a2 b2 + · · · + an bn ,
which we also write as aT b := ha, bi. We let Rm×n denote the set of m × n matrices, and we
use upper case letters to represent them, e.g., B ∈ Rm×n organized as
b11 b12 . . . b1n
b21 b22 . . . b2n
B= . .. .. ,
. . .
. . . .
bm1 bm2 . . . bmn
i.e, Bij = bij , and for matrices A, B ∈ Rm×n we use the inner product
m X
X n
hA, Bi := aij bij .
i=1 j=1
i.e., a square matrix with x on the diagonal and zero elsewhere. Similarly, for a square matrix
X ∈ Rn×n we have
diag(X) := (x11 , x22 , . . . , xnn ).
A set S ⊆ Rn is convex if and only if for any x, y ∈ S we have
θx + (1 − θ)y ∈ S
79
epi(f )
Figure A.1: The shaded region is the epigraph of the function f (x) = − log(x).
for all θ ∈ [0, 1]. A function f : Rn 7→ R is convex if and only if dom(f ) is convex and
for all θ ∈ [0, 1], where dom(f ) is the domain of the function f . A function f : Rn 7→ R is
concave if and only if −f is convex. For example, the function f : R 7→ R given by
f (x) = − log(x)
has dom(f ) = {x | x > 0} and is convex. The epigraph of a function f : Rn 7→ R is the set
minimize t
subject to f (x) ≤ t
is equivalent to minimizing f (x). Furthermore, f is convex if and only if epi(f ) is a convex set.
Similarly, the hypograph of a function f : Rn 7→ R is the set
maximize t
subject to f (x) ≥ t,
80
Appendix B
B.1 Introduction
Duality theory is a rich and powerful area of convex optimization, and central to understanding
sensitivity analysis and infeasibility issues in linear (and convex) optimization. Furthermore,
it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal
value for many difficult non-convex problem.
81
and where the Lagrangian is unbounded below in x, we assign the dual function the value −∞.
For fixed (y, s) the dual function is an affine function, so we can think of g(y, s) as the pointwise
infimum of infinitely many such affine functions (recall our discussion in Sec. 2.2.1 of convex
piecewise-linear functions defined as the maximum of a set of affine function); thus g(y, s) is a
concave function, even when the problem (B.1) is not convex.
The Lagrange dual problem is then obtained by maximixing g(x, y), i.e., is it defined as
maximize g(y, s)
(B.3)
subject to s ≥ 0,
provides a bound on the optimal value p? = f0 (x? ) by minimizing over x. The best such lower
bound is given by the dual problem (i.e., d? = supy,s g(y, s)), and the relationship
p? ≥ d?
is called the weak duality property, and holds for any kind of optimization problem with a
well-defined Lagrangian function (including, e.g., nonconvex problems, but excluding general
integer problems).
Theorem B.1 (Separating hyperplane theorem). Let S by a closed convex set, and b ∈
/ S.
Then there exists a strictly separating hyperplane a such that
aT b > aT x, ∀x ∈ S.
82
S
b
aT x < γ
aT x > γ
Figure B.1: Separating hyperplane for a closed convex set S and a point b ∈
/ S.
Lemma B.1 (Farkas’ lemma). Given A and b, exactly one of the two propositions is true:
1. ∃x : Ax = b, x ≥ 0,
2. ∃y : AT y ≤ 0, bT y > 0.
bT y = xT AT y ≤ 0,
y T b > y T Ax, ∀x ≥ 0
Proof. Assume that the dual optimal value problem (2.8) is attained (i.e., d? := bT y ? < ∞).
Since s? = c − AT y ? ≥ 0 there can then be no z satisfying
bT z > 0, AT z ≤ 0, (B.4)
because otherwise
bT (y ? + z) > d? , c − AT (y ? + z) ≥ 0
contradicting the optimality assumption on y ? . From Farkas’ lemma (B.4) is infeasible if and
only if there exists x such that
Ax = b, x ≥ 0.
For that particular choice of x we have
cT x = xT AT y = bT y ? ,
i.e., p? = d? . On the other hand, assume that −∞ < p? < ∞. From weak duality we then have
d? < ∞, and from Farkas’ lemma d? > −∞, i.e., the dual optimal value is attained. In other
words, if either the primal or dual optimal values are attained, we have p? = d? .
83
Farkas’ lemma is an example of theorems of alternatives, where exactly one of two systems
have a solution, and we can also formulate a dual variant.
Lemma B.2 (Dual variant of Farkas’ lemma). Given A and c, exactly one of the two proposi-
tions is true:
1. ∃x : Ax = 0, x ≥ 0, cT x < 0,
2. ∃y : c − AT y ≥ 0.
cT x = xT (c − AT y) ≥ 0.
Next assume that 2. is not true, i.e., ∀y : c < AT y. But then for x ≥ 0 we have
cT x < xT AT y, ∀y
84
Bibliography
85
[PS98] C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and
complexity. Dover publications, 1998.
[Wil93] H. P. Williams. Model building in mathematical programming. John Wiley and Sons,
3rd edition, 1993.
[Zie82] H Ziegler. Solving certain singly constrained convex optimization problems in pro-
duction planning. Operations Research Letters, I(6), 1982.
86
Index
87
geometry of, 4 symmetric positive semidefinite, see semidefi-
nite
Markowitz portfolio optimization, 26, 75
case studies, 75 weak duality, 12, 28, 36, 82
mixed integer
optimization, 62–70
mixed integer nonlinear
case studies, 74
objective function, 3
pack constraints, 65
partition constraints, 65
penalization
method of, 9
piecewise-linear function
continuous (nonconvex), 65
convex, 5, 7, 8
polyhedron, 5
polynomial set, 18
88