Mosek Modeling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 93

MOSEK modeling manual

August 12, 2014


Contents

1 Introduction 1

2 Linear optimization 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Geometry of linear optimization . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Linear modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Convex piecewise-linear functions . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Absolute values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 The `1 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.4 The `∞ norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.5 Avoid ill-posed problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.6 Avoid badly scaled problems . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Duality in linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 The dual problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Duality properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Infeasibility in linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Locating infeasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Conic quadratic optimization 15


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Quadratic cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Rotated quadratic cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Conic quadratic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Absolute values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Euclidean norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Squared Euclidean norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.4 Convex quadratic sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.5 Second-order cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.6 Simple polynomial sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.7 Geometric mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.8 Harmonic mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.9 Convex increasing rational powers . . . . . . . . . . . . . . . . . . . . . . 22
3.2.10 Convex decreasing rational powers . . . . . . . . . . . . . . . . . . . . . . 22
3.2.11 Power cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.12 Quadratic forms with one negative eigenvalue . . . . . . . . . . . . . . . . 24
3.2.13 Ellipsoidal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i
3.3 Conic quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Quadratically constrained quadratic optimization . . . . . . . . . . . . . . 25
3.3.2 Robust optimization with ellipsoidal uncertainties . . . . . . . . . . . . . 26
3.3.3 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Duality in conic quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Primal versus dual form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2 Examples of bad duality states . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Infeasibility in conic quadratic optimization . . . . . . . . . . . . . . . . . . . . . 30
3.6 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Semidefinite optimization 32
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Semidefinite matrices and cones . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Properties of semidefinite matrices . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Semidefinite duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Semidefinite modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Linear matrix inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Eigenvalue optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.3 Singular value optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.4 Matrix inequalities from Schur’s Lemma . . . . . . . . . . . . . . . . . . . 41
4.2.5 Nonnegative polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.6 Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.7 Nonnegative trigonometric polynomials . . . . . . . . . . . . . . . . . . . 44
4.3 Semidefinite optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.1 Nearest correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.2 Extremal ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.3 Polynomial curve-fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.4 Filter design problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.5 Relaxations of binary optimization . . . . . . . . . . . . . . . . . . . . . . 52
4.3.6 Relaxations of boolean optimization . . . . . . . . . . . . . . . . . . . . . 54
4.3.7 Converting a problem to standard form . . . . . . . . . . . . . . . . . . . 55
4.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Quadratic optimization 57
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Convex quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2.1 Geometry of quadratic optimization . . . . . . . . . . . . . . . . . . . . . 57
5.2.2 Duality in quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.3 Infeasibility in quadratic optimization . . . . . . . . . . . . . . . . . . . . 59
5.3 Quadratically constrained optimization . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.1 Duality in quadratically constrained optimization . . . . . . . . . . . . . . 60
5.4 Separable quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6 Mixed integer optimization 62


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Integer modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.1 Implication of positivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.2 Semi-continuous variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2.3 Constraint satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.4 Disjunctive constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

ii
6.2.5 Pack constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.6 Partition constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.7 Continuous piecewise-linear functions . . . . . . . . . . . . . . . . . . . . 65
6.2.8 Lower semicontinuous piecewise-linear functions . . . . . . . . . . . . . . 67
6.3 Boolean primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.1 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.2 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.3 Disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.4 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.3.5 Exclusive disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Integer optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 Practical optimization 71
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 The quality of a solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.3 Distance to a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

8 Case studies 74
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 A resource constrained production and inventory problem . . . . . . . . . . . . . 74
8.3 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.2 Slippage costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.3.3 Maximizing the Sharpe ratio . . . . . . . . . . . . . . . . . . . . . . . . . 78

A Notation and definitions 79

B Duality in conic optimization 81


B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.2 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.3 The dual function and problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.4 Weak duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.5 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

iii
Preface

This manual is about model building using convex optimization. It is intended as a modeling
guide for the MOSEK optimization package. However, the style is intentionally quite generic
without specific MOSEK commands or API descriptions.
There are several excellent books available on this topic, for example the recent books by
Ben-Tal and Nemirovski [BN01] and Boyd and Vandenberghe [BV04], which have both been
a great source of inspiration for this manual. The purpose of this manual is to collect the
material which we consider most relevant to our customers and to present it in a practical self-
contained manner; however, we highly recommend the books [BN01, BV04] as a supplement to
this manual.
Some textbooks on building models using optimization (or mathematical programming) in-
troduce different concept through practical examples. In this manual we have chosen a different
route, where we instead show the different sets and functions that can be modeled using convex
optimization, which can subsequently be combined into realistic examples and applications. In
other words, we present simple convex building blocks, which can then be combined into more
elaborate convex models. With the advent of more expressive and sophisticated tools like conic
optimization, we feel that this approach is better suited.
The first three chapters discuss self-dual conic optimization, namely linear optimization
(Chap. 2), conic quadratic optimization (Chap. 3) and semidefinite optimization (Chap 4), which
should be read in succession. Chap. 5 discusses quadratic optimization, and has Chap. 3 as a
prerequisite. Finally, the last chapter (Chap. 6) diverges from the path of convex optimization
and discusses mixed integer conic optimization. App. A contains details on notation used in
the manual.

iv
Chapter 1

Introduction

Many decision problems can be phrased as making the best choice subject to some constraints.
An example is the problem of choosing an optimal portfolio of assets such as stocks and bonds.
For this and many other decision problems it is hard to identify or even quantify an optimal
solution, but it might be possible to build a mathematical optimization model representing the
problem. Given such a model and appropriate software it may then be tractable to find the
optimal decision for the mathematical model.
Loosely speaking an optimization model consists of a goal (or objective) as well as some
constraints that must be satisfied. The goal could be to obtain the highest possible return and
the constraint could be not too assume too much risk. Finally, the variables or decisions is the
amount to buy of each asset. In mathematical terms we may state the problem as

minimize f (x)
(1.1)
subject to x ∈ X ,

where x is the variable, i.e., xj denotes the amount we should buy of asset j. The function
f (x) is the goal or objective function and X denotes the set of feasible solutions for the decision
variables x. For example X may specify that we can only buy a nonnegative amount of each
asset, that we only have a limited budget and do not want too much risk. Subsequently we will
denote “x ∈ X ” as the constraints.
If we want to solve problem (1.1) using a computer we must be explicit about the goal
function f and the set X . This leads to the question of tractable objective functions and
constraints. Some choices for the objective function and the constraints might be desirable, but
for those choices we cannot expect find a solution within a reasonable amount of time even with
the fastest computer. Other choices are more tractable in the sense that we are guaranteed to
find the optimal solution within a reasonable time using an suitable piece of software; those
choices leads to the field of convex optimization, which is the main subject of this manual.
Thus, when formulating an optimization model for a decision problem we are facing at
least two issues; (i) How to represent the problem using a mathematical notation. (ii) The
choice of formulation should abide by what we can actually solve. For instance we may use
an approximate model if the approximation can be solved much faster than a more accurate
model. Thus, formulating a good optimization model for given decision problems can often be
nontrivial; infact it can be somewhat of an art.
In our experience it is advantageous to view optimization models as a combination of a
number of smaller blocks that can be treated more or less independently. Ultimately the different
building blocks are put together to form a more complicated model, but by using simple tractable
building blocks we know that the overall problem will remain tractable. Moreover, the building
blocks can be reused in different models, which is less error-prone and allows us to develop

1
complicated models more rapidly. Therefore, in this manual we discuss different typical (and
tractable) building blocks, and we give examples of complete optimization models formed using
those building blocks.

2
Chapter 2

Linear optimization

2.1 Introduction
In this chapter we discuss different aspects of linear optimization. We first introduce the basic
concepts of a linear optimization and discuss the underlying geometric interpretations. We
then give examples of the most frequently used reformulations or modeling tricks used in linear
optimization, and we finally discuss duality and infeasibility theory in some detail.

2.1.1 Basic notions


The most basic class of optimization is linear!optimization. In linear optimization we minimize
a linear function given a set of linear constraints. For example, we may wish to minimize a
linear function
x1 + 2x2 − x3
under the constraints that

x1 + x2 + x3 = 1, x1 , x2 , x3 ≥ 0.

The function we minimize is often called the objective function; in this case we have a linear
objective function. The constraints are also linear and consists of both linear equality constraints
and linear inequality constraints. We typically use a more compact notation

minimize x1 + 2x2 − x3
subject to x1 + x2 + x3 = 1 (2.1)
x1 , x2 , x3 ≥ 0,

and we call (2.1) an linear optimization problem The domain where all constraints are satisfied
is called the feasible set; the feasible set for (2.1) is shown in Fig. 2.1. For this simple problem we
see by inspection that the optimal value of the problem is −1 obtained by the optimal solution

(x?1 , x?2 , x?3 ) = (0, 0, 1).

Linear optimization problems are typically formulated with a notation similar to

minimize cT x
subject to Ax = b
x ≥ 0.

3
x3

x1 x2

Figure 2.1: Feasible set for x1 + x2 + x3 = 1 and x1 , x2 , x3 ≥ 0.


a
a Tx

x0
x

Figure 2.2: The dashed line illustrates a hyperplane {x | aT x = γ}.

For example, we can pose (2.1) in this form with


   
x1 1  
x =  x2  , c =  2  , A= 1 1 1 .
x3 −1

There are many other formulations for linear optimization problems; we can have different types
of constraints,
Ax = b, Ax ≥ b, Ax ≤ b, lc ≤ Ax ≤ uc ,
and different bounds on the variables

lx ≤ x ≤ ux

or we may have no bounds on some xi , in which case we say that xi is a free variable.
All these formulations are equivalent in the sense that by simple transformations and in-
troduction of auxiliary variables they can represent the same problem with the same optimal
solution. The important feature is that the objective function and the constraints are all linear
in x.

2.1.2 Geometry of linear optimization


A hyperplane is a set {x | aT (x − x0 ) = 0} or equivalenty {x | aT x = γ} with aT x0 = γ, see
Fig. 2.2. Thus linear constraints
Ax = b
with A ∈ Rm×n represents an intersection of m hyperplanes.
Next consider a point x above the hyperplane in Fig. 2.2. Since x − x0 forms an acute angle
with a we have that aT (x − x0 ) ≥ 0, or aT x ≥ γ. The set {x | aT x ≥ γ} is called a halfspace,

4
a Tx a

x0
x

Figure 2.3: The gray area is the halfspace {x | aT x ≥ γ}.

a2
a3

a1 a4

a5

Figure 2.4: A polyhedron formed as an intersection of halfspaces.

see Fig. 2.3. Similarly the set {x | aT x ≤ 0} forms another halfspace; in Fig. 2.3 it corresponds
to the area below the dashed line. A set of linear inequalities

Ax ≤ b

corresponds to an intersection of halfspaces and forms a polyhedron, see Fig. 2.4. The polyhedral
description of the feasible set gives us a very intuitive intepretation of linear optimization, which
is illustrated in Fig. 2.5. The dashed lines are normal to the objective c = (−1, 1), and to
minimize cT x we move as far as possible in the opposite direction of c, to a point where one the
normals intersect the polyhedron; an optimal solution is therefore always either a vertex of the
polyhedron, or an entire facet of the polyhedron may be optimal.
The polyhedron shown in the figure in bounded, but this is not always the case for polyhe-
dra coming from linear inequalities in optimization problems. In such cases the optimization
problem may be unbounded, which we will discuss in detail in Section 2.4.

2.2 Linear modeling


In this section we discuss both useful reformulation techniques to model different functions using
linear programming, as well as common practices that are best avoided. By modeling we mean
equivalent reformulations that lead to the same optimal solution; there is no approximation or
modeling error involved.

2.2.1 Convex piecewise-linear functions


Perhaps the most basic and frequently used reformulation for linear optimization involves mod-
eling a convex piecewise-linear function by introducing a number of linear inequalities. Consider

5
c

a2
a3

a1 a4
x⋆
a5

Figure 2.5: Geometric interpretation of linear optimization. The optimal solution x? is at point
where the normals to c (the dashed lines) intersect the polyhedron.

a3 x + b3
a1 x + b1

a2 x + b2

Figure 2.6: A convex piecewise-linear function (solid lines) of a single variable x. The function
is defined as the maximum of 3 affine functions.

the convex piecewise-linear (or rather piecewise-affine) function illustrated in Fig. 2.6, where
the function can be described as max{a1 x + b1 , a2 x + b2 , a3 x + b3 }.
More generally, we consider convex piecewise-linear functions f : Rn 7→ R defined as

f (x) := max {aTi x + bi },


i=1,...,m

where the important notion is that f is defined as the maximum of a set of affine functions. To
model the epigraph f (x) ≤ t (see App. A), we note that t ≥ maxi {aTi x + bi } if and only if t is
larger than all the affine functions, i.e., we have an equivalent formulation with m inequalities,

aTi x + bi ≤ t, i = 1, . . . , m.

Piecewise-linear functions have many uses linear in optimization; either we have a convex
piecewise-linear formulation from the onset, or we may approximate a more complicated (non-
linear) problem using piecewise-linear approximations.
A word of caution is in order at this point; in the past the use of linear optimization was
significantly more widespread than the use of nonlinear optimation, and piecewise-linear mod-
eling was used to approximately solve nonlinear optimization problems. The field of nonlinear
optimization (especially conic optimization) has since matured, and with modern optimization
software it is both easier and more efficient to directly formulate and solve conic problems
without piecewise-linear approximations.

6
2.2.2 Absolute values
The absolute value of a scalar variable is an example of a simple convex piecewise-linear function,

|x| := max{ x, −x },

so we model |x| ≤ t using two inequalities

−t ≤ x ≤ t.

2.2.3 The `1 norm


All norms are convex functions, but the `1 and `∞ norms are of particular interest for linear
optimization. The `1 norm of vector x ∈ Rn is defined as

kxk1 := |x1 | + |x2 | + · · · + |xn |.

To model the epigraph


kxk1 ≤ t, (2.2)
we first introduce the following system
n
X
|xi | ≤ zi , i = 1, . . . , n, zi = t, (2.3)
i=1

with additional (auxiliary) variable z ∈ Rn , and we claim that (2.2) and (2.3) are equivalent.
They are equivalent if all zi = |xi | in (2.3). Suppose that t is optimal (as small as possible) for
(2.3), but that some zi > |xi |. But then we could reduce t further by reducing zi contradicting
the assumption that t optimal, so the two formulations are equivalent. Therefore, we can model
(2.2) using linear (in)equalities
n
X
−zi ≤ xi ≤ zi , zi = t,
i=1

with auxiliary variables z. Similarly, we can describe the epigraph of the norm of an affine
function of x,
kAx − bk1 ≤ t
as
n
X
−zi ≤ aTi x − bi ≤ zi , zi = t,
i=1

where ai is the ith row of A (taken as a column-vector).


The `1 norm is overwhelmingly popular as a convex approximation of the cardinality (i.e.,
number on nonzero elements) of a vector x. For example, suppose we are given an overdeter-
mined linear system
Ax = b
where A ∈ Rm×n and m << n. The basis pursuit problem

minimize kxk1
(2.4)
subject to Ax = b,

7
uses the `1 norm of x as a heuristic of finding a sparse solution (one with many zero elements)
to Ax = b, i.e., it aims to represent b using few columns of A. Using the reformulation above
we can pose the problem as a linear optimization problem,

minimize eT z
subject to −z ≤ x ≤ z (2.5)
Ax = b,

where e = (1, . . . , 1)T .

2.2.4 The `∞ norm


The `∞ norm of a vector x ∈ Rn is defined as

kxk∞ := max |xi |,


i=1,...,n

which is another example of simple piecewise-linear functions. To model

kxk∞ ≤ t (2.6)

we use that t ≥ maxi=1,...,n |xi | if and only if t is greater than each term, i.e., we can model
(2.6) as
−t ≤ xi ≤ t, i = 1, . . . , n.
Again, we can also consider an affine function of x, i.e.,

kAx − bk∞ ≤ t,

which can be described as


−t ≤ aTi x − b ≤ t, i = 1, . . . , n.
It is interesting to note that the `1 and `∞ norms are dual norms. For any norm k · k on Rn ,
the dual norm k · k∗
is defined as
kxk∗ = max{xT v | kvk ≤ 1}.
Let us verify that the dual of the `∞ norm is the `1 norm. Consider

kxk∗,∞ = max{xT v | kvk∞ ≤ 1}.

Obviously the maximum is attained for



+1, xi ≥ 0,
vi =
−1, xi < 0,
P
i.e., kxk∗,∞ = kxk1 = i |xi |. Similarly, consider the dual of the `1 norm,

kxk∗,1 = max{xT v | kvk1 ≤ 1}.

To maximize xT v subject to |v1 | + · · · + |vn | ≤ 1 we identify the largest element of x, say |xk |,
The optimizer v is then given by vk = ±1 and vi = 0, i 6= k, i.e., kxk∗,1 = kxk∞ . This illustrates
a more general property of dual norms, namely that kxk∗∗ = kxk.

8
2.2.5 Avoid ill-posed problems
A problem is ill-posed if small perturbations of the problem data result in arbitrarily large
perturbations of the solution, or change feasibility of the problem. Such problem formulations
should always be avoided as even the smallest numerical perturbations (for example rounding
errors, or solving the problem on a different computer) can result in different or wrong solutions.
Additionally, from an algorithmic point of view, even computing a wrong solution is very difficult
for ill-posed problems.
A rigorous definition of the degree of ill-posedness is possible by defining a condition number
for a linear optimization, but unfortunately this is not a very practical metric, as evaluating
such a condition number requires solving several optimization problems. Therefore even though
being able to quatify the difficulty of an optimization problem from a condition number is very
attractive, we only make the modest recommendations to avoid problems
• that are nearly infeasible,

• with constraints that are linearly dependent, or nearly linearly dependent.


Typical examples of problems with nearly linearly dependent constraints are discretizations of
continuous processes, where the constraints invariably become more correlated as we make the
discretization finer; as such there may be nothing wrong with the discretization or problem
formulation, but we should expect numerical difficulties for sufficiently fine discretizations.

2.2.6 Avoid badly scaled problems


Another difficulty encountered in practice is with model that are badly scaled. Loosely defined,
we consider a model to be badly scaled if
• variables are measured on very different scales,

• constraints or bounds are measured on very different scales.


For example if one variable x1 has units of molecules and another variable x2 measures temper-
ature, we expect both the coefficients c1 and c2 to be of a very different scale. In that case we
might have an objective
1012 x1 + x2
and if we normalize the objective by 1012 and round coefficients to 8 digits, we see that x2
effectively disappears from the objective. In practice we do not round coefficients, but in finite
precision arithmetic any contribution of x2 to the objective will be insignificant (and unreliable).
A similar situation (from a numerical point of view) is encountered with the method of
penalization or big-M strategies. For sake of argument, assume we have a standard linear
optimization problem (2.7) with the additional constraint that x1 = 0. We may eliminate x1
completely from the model, or we might add an additional constraint, but suppose we choose
to formulate a penalized problem instead

minimize cT x + 1012 x1
subject to Ax = b
x ≥ 0,

reasoning that the large penalty term will force x1 = 0. However, if kck < 1 we have the
exact same problem, namely that in finite precision the penalty term will complety dominate
the objective and render the contribution cT x insignificant or unreliable. Therefore, penality or
big-M terms should never be used explicitly as part of an optimization model.

9
As a different example, consider again a problem (2.7), but with additional (redundant)
constraints xi ≤ γ. For optimization practioners this is a common approach for stabilizing
the optimization algorithm since it improves conditioning of the linear systems solved by an
interior-point algorithm. The problem we solve is then

minimize cT x
subject to Ax = b
x≥0
x ≤ γe,

with a dual problem


maximize bT y − γeT z
subject to AT y + s − z = c
s, z ≥ 0.
Suppose we do not know a-priori an upper bound on kxk∞ , so we choose γ = 1012 reasoning
that this will not change the optimal solution. Note that the large variable bound becomes a
penalty term in the dual problem; in finite precision such a large bound will effectively destroy
accuracy of the solution.

2.3 Duality in linear optimization


Duality theory is a rich and powerful area of convex optimization, and central to understanding
sensitivity analysis and infeasibility issues in linear (and convex) optimization. Furthermore,
it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal
value for many difficult non-convex problem. In this section we only discuss duality theory at a
descriptive level suited for practioners; we refer to Appendix B for a more advanced treatment.

2.3.1 The dual problem


Initially, consider the standard linear optimization problem

minimize cT x
subject to Ax = b (2.7)
x ≥ 0.

Associated with (2.7) is a so-called Lagrangian function L : Rn × Rm × Rn 7→ R that augments


the objective with a weighted combination of all the constraints,

L(x, y, s) = cT x + y T (b − Ax) − sT x,

The variables y ∈ Rp and s ∈ Rn+ are called Lagrange multipliers or dual variables. It is easy
to verify that
L(x, y, s) ≤ cT x
for any feasible x. Indeed, we have b − Ax = 0 and xT s ≥ 0 since x, s ≥ 0, i.e., L(x, y, s) ≤ cT x.
Note the importance of nonnegativity of s; more generally of all Langrange multipliers associated
with inequality constraints. Without the nonnegativity constraint the Lagrangian function is
not a lower bound.
The dual function is defined as the minimum of L(x, y, s) over x. Thus the dual function of
(2.7) is
g(y, s) = min L(x, y, s) = min xT (c − AT y − s) + bT y.
x x

10
We see that the Langrangian function is linear in x, so it is unbounded below unless when
c − AT y − s = 0, i.e.,  T
b y, c − AT y − s = 0
g(y, s) =
∞, otherwise.
Finally, we get a dual problem is by maximizing g(y, s). The dual problem of (2.7) is
maximize bT y
subject to c − AT y = s (2.8)
s ≥ 0.

Example 2.1 (Dual of basis pursuit). As another example, let us derive the dual of the basis
pursuit formulation (2.5). The Lagrangian function is

L(x, z, y, u, v) = eT z + uT (x − z) − v T (x + z) + y T (b − Ax)

with Lagrange multipliers y ∈ Rm and u, v ∈ Rn+ . The dual function

g(y, u, v) = min L(x, z, y, u, v) = min z T (e − u − v) + xT (u − v − AT y) + y T b


x,z x,z

is linear in z and x so unbounded below unless e = u + v and AT y = u − v, i.e., the dual


problem is
maximize bT y
subject to e = u + v,
(2.9)
AT y = u − v
u, v ≥ 0.

Example 2.2 (Dual of basis pursuit revisited). We can also derive the dual of the basis pursuit
formulation (2.4) directly. The Lagrangian is
L(x, y) = kxk1 + y T (Ax − b) = kxk1 + xT AT y − bT y
with a dual function
g(y, s) = −bT y + min(kxk1 + xT AT y).
x
The term minx (kxk1 + xT AT y) can be simplified as
min(kxk1 + xT AT y) = min min (tkzk1 + tz T AT y)
x t≥0 kzk1 =1

= min t(1 − max z T AT y)


t≥0 kzk1 =1

= min t(1 − kAT yk∞ ),


t≥0

where we used the definition of the dual norm in the last line. Finally mint≥0 t(1 − kAT yk∞ )
is 0 if kAT yk∞ ≤ 1 and unbounded below otherwise. In other words, we get a dual function
−bT y, kAT yk∞ ≤ 1,

g(y) =
−∞, otherwise,
and a dual problem
maximize bT y
(2.10)
subject to kAT yk∞ ≤ 1.
Not surprisingly, (2.9) and (2.10) are equivalent in the sense that
e = u + v, AT y = u − v, u, v ≥ 0 ⇐⇒ kAT yk∞ ≤ 1.

11
2.3.2 Duality properties
Many important observations can be made for the dual problem (2.8), which we discuss in
details in Appendix B. We summarize the most important properties below; to that end let x?
and (y ? , s? ) denote the optimal primal and dual solutions with optimal values p? := cT x? and
d? := bT y ? , respectively.

• In both the primal and dual problem we optimize over the nonnegative orthant, i.e.,
x, s ≥ 0. The number of constraints in the primal problem becomes the number of
variables in the dual problem, and vice versa.

• The dual problem gives a lower bound on the primal problem for all (x, y, s). From the
dual problem we have c − AT y = s, so

xT (c − AT y) = xT s ≥ 0,

and since Ax = b we have


cT x − bT y = xT s ≥ 0
i.e., bT y ≤ cT x. This is called the weak duality property, and the nonnegative entity xT s
is called the complementarity gap.

• If the primal feasible set is nonempty

Fp = {x ∈ Rn | Ax = b, x ≥ 0} =
6 ∅

or the dual feasible set is nonempty

Fd = {y ∈ Rm | c − AT y ≥ 0} =
6 ∅

then the optimal primal and dual values coincide, i.e.,

d? = cT x? = bT y ? = p? .

This remarkable fact (see Appendix B.5 for a proof) is called the strong duality property.
If strong duality holds then (x? )T s? = 0 so x? and s? are complementary; since x? , s? ≥ 0
we have that x?i > 0 ⇔ s?i = 0 and vice versa.
For linear (and conic) optimization strong duality means that we have the choice of solving
either the primal or the dual problem (or both).

2.4 Infeasibility in linear optimization


2.4.1 Basic concepts
In Sec. 2.3.2 we summarized the main duality properties, namely weak and strong duality
properties. In this section we discuss situations where strong duality does not hold. Those
situations are captured by the following two results known as (variations of) Farkas’ lemma; for
proofs see Appendix B.

Lemma 2.1 (Farkas). Given A and b, exactly one of the two statements are true:

1. There exists an x ≥ 0 such that Ax = b.

2. There exists a y such that AT y ≤ 0 and bT y > 0.

12
Lemma 2.2 (Farkas dual variant). Given A and c, exactly one of the two statements are true:
1. There exists an x ≥ 0 such that Ax = 0 and cT x < 0.

2. There exists a y such c − AT y ≥ 0.


Farkas’ lemma (2.1) tells us that either the primal problem (2.7) is feasible (Fp 6= ∅) or there
exists a y such that AT y ≤ 0 and bT y > 0. In other words, any y satisfying

AT y ≤ 0, bT y > 0

is a certificate of primal infeasibility. We can also think of a Farkas certificate as an unbounded


direction for the dual problem; to that end assume that

6 ∃x ≥ 0 : Ax = b,

so we have a y satisfying AT y ≤ 0 and bT y > 0. If we further assume existence of point y0


satifying
c − AT y0 ≥ 0
then the dual remains feasible in the direction of y,

c − AT (ty + y0 ) ≥ 0, ∀t ≥ 0

with an unbounded objective bT (ty + y0 ) → ∞ for t → ∞, i.e., d? = ∞.


Similarly, the dual variant of Farkas’ lemma (2.2) states that either the dual problem is
feasible (Fd 6= ∅) or there exists an x ≥ 0 such that Ax = 0 and cT x < 0. In other words, any
x ≥ 0 satisfying Ax = 0 and cT x < 0 is a certificate of dual infeasibility. If the primal problem
is feasible, then the certificate is a feasible unbounded direction for the primal objection, i.e.,
p? = −∞.
Below we summarize the different cases that can occur in linear optimization:
• If the either the primal or dual problems are feasible, we have strong duality, i.e., p? = d? .

• If the primal problem is infeasible (p? = ∞), then from Farkas’ lemma the dual problem
is unbounded (d? = ∞) or infeasible (d? = −∞).

• If the primal problem is unbounded (p? = −∞), then from weak duality the dual problem
is infeasible (d? = −∞).

• If the dual problem is infeasible (d? = −∞), then from Farkas’ dual lemma then the
primal problem is unbounded (p? = −∞) or infeasible (p? = ∞).

Example 2.3. As an example exhibiting both primal and dual infeasibility consider the prob-
lem
minimize −x1 − x2
subject to x1 = −1
x1 , x2 ≥ 0
with a dual problem
maximize −y
   
1 −1
subject to y≤ .
0 −1
Both the primal and dual problems are trivially infeasible; y = −1 serves as a certificate of
primal infeasibility, and x = (0, 1) is a certificate of dual infeasiblity.

13
2.4.2 Locating infeasibility
In some cases we are interested in locating the cause of infeasibility in a model, for example if
expect the infeasibility to be caused by an error in the problem formulation. This can be difficult
in practice, but a Farkas certificate lets us reduce the dimension of the infeasible problem, which
in some cases pinpoints the cause of infeasibility.
To that end, suppose we are given a certificate of primal infeasibility,

AT y ≤ 0, bT y > 0,

and define the index-set


I = {i | yi 6= 0}.
Consider the reduced set of constraints

AI,: x = bI , x ≥ 0.

It is then easy to verify that


ATI,: yI ≤ 0, bTI yI > 0
is an infeasibility certificate for the reduced problem with fewer constraints. If the reduced
system is sufficiently small, it may be possible to locate the cause of infeasibility by manual
inspection.

2.5 Bibliography
The material in this chapter is very basic, and can be found in any textbook on linear optimiza-
tion. For further details, we suggest a few standard references [Chv83, BT97, PS98], which all
cover much more that discussed here. Although the first edition of [PS98] is not very recent,
it is still considered one of the best textbooks on graph and network flow algorithms. [NW06]
gives a more modern treatment of both theory and algorithmic aspects of linear optimization.

14
Chapter 3

Conic quadratic optimization

3.1 Introduction
This chapter extends the notion of linear optimization with quadratic cones; conic quadratic
optimization is a straightforward generalization of linear optimization, in the sense that we
optimize a linear function with linear (in)equalities with variables belonging to one or more
(rotated) quadratic cones. In general we also allow some of the variables to be linear variables
as long as some of the variables belong to a quadratic cone.
We discuss the basic concept of quadratic cones, and demonstrate the surprisingly large
flexibility of conic quadratic modeling with several examples of (non-trivial) convex functions
or sets that be represented using quadratic cones. These convex sets can then be combined
arbitrarily to form different conic quadratic optimization problems.
We finally extend the duality theory and infeasibility analysis from linear to conic optimiza-
tion, and discuss infeasibility of conic quadratic optimization problems.

3.1.1 Quadratic cones


We define an n-dimensional quadratic cone as
 q 
Qn = x ∈ Rn | x1 ≥ x22 + x23 + · · · + x2n . (3.1)

The geometric interpretation of a quadratic (or second-order) cone is shown in Fig. 3.1 for a
cone with three variables, and illustrates how the exterior of the cone resembles an ice-cream
cone. The 1-dimension quadratic cone simply implies standard nonnegativity x1 ≥ 0.
A set S is called a convex cone if for any x ∈ S we have αx ∈ S, ∀α ≥ 0. From the definition
(3.1) it is clear that if x ∈ Qn then obviously αx ∈ Qn , ∀α ≥ 0, which justifies the notion
quadratic cone.

3.1.2 Rotated quadratic cones


An n-dimensional rotated quadratic cone is defined as
Qnr = x ∈ Rn | 2x1 x2 ≥ x23 + · · · + x2n , x1 , x2 ≥ 0 .

(3.2)
As the name indicates, there is simple relationship between quadratic and rotated quadratic
cones. Define an orthogonal transformation
 √ √ 
1/√2 1/√2 0
Tn :=  1/ 2 −1/ 2 0 . (3.3)
0 0 In−2

15
x1

x2 x3
p
Figure 3.1: A quadratic or second-order cone satisfying x1 ≥ x22 + x23 .

Then it is easy to verify that

x ∈ Qn ⇐⇒ (Tn x) ∈ Qnr ,

and since T is orthogonal we call Qr a rotated cone; the transformation corresponds to a rotation
of π/4 of the (x1 , x2 ) axis. For example if x ∈ Q3 and
  √1 √1
   √1 
0 (x + x )

z1 2 2 x 1 1 2
 2
 z2  =   √12 − √12 0   x2  =  √12 (x1 − x2 ) 
 
z3 0 0 1 x3 x3

then
2z1 z2 ≥ z32 , z1 , z2 ≥ 0 =⇒ (x21 − x22 ) ≥ x23 , x1 ≥ 0,
and similarly (by interchanging roles of x and z) we see that

x21 ≥ x22 + x23 , x1 ≥ 0 =⇒ 2x1 x2 ≥ x3 , x1 , x2 ≥ 0.

Thus, one could argue that we only need quadratic cones, but there are many examples of
functions where using an explicit rotated quadratic conic formulation is more natural; in Sec. 3.2
we discuss many examples involving both quadratic cones and rotated quadratic cones.

3.2 Conic quadratic modeling


In the following we describe several convex sets that can be modeled using conic quadratic
formulations. We describe the convex sets in their simplest embodiment, which we interpret
as conic quadratic building blocks; it is then straightforward to combine those in arbitrary
intersections and affine mappings to model complex sets. For example we will show that the set
1
≤ t, x≥0
x
can be represented using quadratic cones. Sets that can be represented using quadratic cones
are called conic quadratic representable sets.

16
3.2.1 Absolute values
In Sec. 2.2.2 we saw how to model |x| ≤ t using two linear inequalities, but in fact the epigraph
of the absolute value is just the definition of a two-dimensional quadratic cone, i.e.,

|x| ≤ t ⇐⇒ (t, x) ∈ Q2 .

3.2.2 Euclidean norms


The Euclidean norm of x ∈ Rn ,
q
kxk2 = x21 + x22 + · · · + x2n

essentially defines the quadratic cone, i.e.,

kxk2 ≤ t ⇐⇒ (t, x) ∈ Qn+1 .

3.2.3 Squared Euclidean norms


The epigraph of the squared Euclidean norm can be described as the intersection of a rotated
quadratic cone with an affine hyperplane,

kxk22 ≤ t ⇐⇒ (1/2, t, x) ∈ Qn+2


r .

3.2.4 Convex quadratic sets


Assume Q ∈ Rn×n is a symmetric positive semidefinite matrix. The convex inequality

(1/2)xT Qx + cT x + r ≤ 0

may be rewritten as
t + cT x + r = 0,
(3.4)
xT Qx ≤ 2t.
Since Q is symmetric positive semidefinite the epigraph

xT Qx ≤ 2t (3.5)

is a convex set and there exists a matrix F ∈ Rk×n such that

Q = F T F, (3.6)

(see Chap. 4 for properties of semidefinite matrices); for instance F could be the Cholesky
factorization of Q. We then have an equivalent characterization of (3.5) as

F x − y = 0,
s = 1,
kyk2 ≤ 2st,

or more succinctly; if Q = F T F then

(1/2)xT Qx ≤ t ⇐⇒ (t, 1, F x) ∈ Q2+k


r .

Frequently Q has the structure


Q = I + FTF

17
where I is the identity matrix, so

xT Qx = xT x + xT F T F x

and hence
Fx − y = 0,
f +h = t
kxk2 ≤ 2ef, e = 1,
kyk2 ≤ 2gh, g = 1.
In other words,
(f, 1, x) ∈ Q2+n
r , (h, 1, F x) ∈ Q2+k
r , f +h=t
is a conic quadratic representation of (3.5).

3.2.5 Second-order cones


A second-order cone is occasionally specified as

kAx + bk2 ≤ cT x + d (3.7)

where A ∈ Rm×n and c ∈ Rn . The formulation (3.7) is equivalent to

(Ax + b, cT x + d) ∈ Qm+1

or equivalently
s = Ax + b,
t = cT x + d, (3.8)
(s, t) ∈ Qm+1 .
As will be explained later, we refer to (3.7) as the dual form and (3.8) as the primal form,
respectively. An alternative characterization of (3.7) is

kAx + bk22 − (cT x + d)2 ≤ 0, cT x + d ≥ 0 (3.9)

which shows that certain quadratic inequalities are conic quadratic representable. The next
section shows such an example.

3.2.6 Simple polynomial sets


We next show how to represent some frequently occuring convex sets represented by polynomials.
The result is stated in Lemma 3.1.
Lemma 3.1. The following five propositions are true.

(i) {(t, x) | |t| ≤ x, x ≥ 0} = (t, x) | (x, 1/2, t) ∈ Q3r .



(ii) (t, x) | t ≥ x1 , x ≥ 0 = (t, x) | (x, t, 2) ∈ Q3r .
 

(iii) (t, x) | t ≥ x3/2 , x ≥ 0 = (t, x) | (s, t, x), (x, 1/8, s) ∈ Q3r .


 

(iv) (t, x) | t ≥ x5/3 , x ≥ 0 = (t, x) | (s, t, x), (1/8, z, s), (s, x, z) ∈ Q3r .
 


(v) (t, x) | t ≥ x12 , x ≥ 0 = (t, x) | (t, 1/2, s), (x, s, 2) ∈ Q3r .
 

n 3
o 
(vi) (t, x, y) | t ≥ |x|
y2
, y ≥ 0 = (t, x, y) | (z, x) ∈ Q2 , ( y2 , s, z), ( 2t , z, s) ∈ Q3r .

18
Proof. Proposition (i) follows from
(x, 12 , t) ∈ Q3r
⇔ x ≥ t2 , x ≥ 0

⇔ x ≥ |t|, x ≥ 0.
Proposition (ii) follows from √
(x, t, 2) ∈ Q3r
⇔ 2xt ≥ 2, x, t ≥ 0
⇔ t ≥ x1 , x ≥ 0.
Proposition (iii) follows from
(s, t, x), (x, 18 , s) ∈ Q3r
⇔ 2st ≥ x2 , 41 x ≥ s2 , s, t, x ≥ 0

⇔ xt ≥ x2 , t, x ≥ 0
⇔ t ≥ x3/2 , x ≥ 0.
Proposition (iv) follows from
(s, t, x), (1/8, z, s), (s, x, z) ∈ Q3r
1 2 2 2
⇔ 4 z ≥ s , 2sx ≥ z , 2st ≥ x , s, t, x ≥ 0
⇔ 2 2 2
2sx ≥ (4s ) , 2st ≥ x , s, t, x ≥ 0
⇔ x ≥ 8s3 , 2st ≥ x2 , x, s ≥ 0
⇔ x1/3 t ≥ x2 , x ≥ 0
⇔ t ≥ x5/3 , x ≥ 0.
Proposition (v) follows from

(t, 21 , s), (x, s, 2) ∈ Q3r
⇔ t ≥ s2 , 2xs ≥ 2, t, x ≥ 0
⇔ t ≥ x12 .
Finally, proposition (vi) follows from
(z, x) ∈ Q2 , ( y2 , s, z), ( 2t , z, s) ∈ Q3r
⇔ z ≥ |x|, ys ≥ z 2 , zt ≥ s2 , z, y, s, t ≥ 0
4
⇔ z ≥ |x|, zt ≥ yz 2 , z, y, t ≥ 0
|x|3
⇔ t≥ y2
, y ≥ 0.

In the above proposition the terms x3/2 and x5/3 appeared. Those terms are special cases
of
2k−1
t≥x k ,x≥0 (3.10)
defined for k = 2, 3, 4, 5, .... (3.10) is identical to
tx1/k ≥ x2 , x ≥ 0
which implies
2tz ≥ x2 , x, z ≥ 0,
x1/k ≥ 2z.
The latter can be rewritten as
2tz ≥ x, x, s ≥ 0,
x ≥ 2k z k
for which it is easy to build a conic quadratic representation. Therefore we will leave it as an
exercise for reader.

19
3.2.7 Geometric mean
Closely related is the hypograph of the geometric mean of nonnegative variables x1 and x2 ,
2 √
Kgm = {(x1 , x2 , t) | x1 x2 ≥ t, x1 , x2 ≥ 0}.

This corresponds to a scaled rotated quadratic cone


2

Kgm = {(x1 , x2 , t) | (x1 , x2 , 2t) ∈ Q3r }.

More generally, we can obtain a conic quadratic representation of the hypograph


n o
n
Kgm = (x, t) ∈ Rn+1 | (x1 x2 · · · xn )1/n ≥ t, x ≥ 0 .

Initially let us assume that n = 2l for an integer l ≥ 1. In particular, let us assume l = 3, i.e.,
we seek a conic quadratic representation of

t ≤ (x1 x2 · · · x8 )1/8 , x ≥ 0. (3.11)

If we introduce the cone relationships

(x1 , x2 , y1 ), (x3 , x4 , y2 ), (x5 , x6 , y3 ), (x7 , x8 , y4 ) ∈ Q3r , (3.12)

then
2x1 x2 ≥ y12
and so forth, which is obviously equivalent to

x1 x2 ≥ (1/2)y12 .

This leads to a characterization

((1/2)y12 (1/2)y22 (1/2)y32 (1/2)y42 )1/8 ≤ (x1 x2 · · · x8 )1/8 ,

or equivalently
1
√ (y1 y2 y3 y4 )1/4 ≤ (x1 x2 · · · x8 )1/8 .
2
Hence, by introducing 4 3-dimensional rotated quadratic cones, we have obtained a simpler
problem with 4 y variables instead of 8 x variables. Clearly, we can apply the same idea to the
reduced problem. To that end, introduce the inequalities

(y1 , y2 , z1 ), (y3 , y4 , z2 ) ∈ Q3r ,

implying that
1
√ (z1 z2 )1/2 ≤ (y1 y2 y3 y4 )1/4 .
2
Finally, if we introduce the conic relationship

(z1 , z2 , w1 ) ∈ Q3r

we get √ √ √
w1 ≤ 2(z1 z2 )1/2 ≤ 4(y1 y2 y3 y4 )1/4 ≤ 8(x1 x2 · · · x8 )1/8 .

20
2z1 z2 ≥ w2

2y1 y2 ≥ z12 2y3 y4 ≥ z22

2x1 x2 ≥ y12 2x3 x4 ≥ y22 2x5 x6 ≥ y32 2x7 x8 ≥ y42

x1 x2 x3 x4 x5 x6 x7 x8

Figure 3.2: Tree-structure of relaxations.

Therefore
(x1 , x2 , y1 ), (x3 , x4 , y2 ), (x5 , x6 , y3 ), (x7 , x8 , y4 ) ∈ Q3r ,
(y1 , y2 , z1 ), (y3 , y4 , z2 ) ∈ Q3r ,
(3.13)
(z1 , z2 , w1 ) ∈ Q3r ,

w1 = 8t
is a representation of the set (3.11), which is by construction conic quadratic representable.
The relaxation using l levels of auxiliary variables and rotated quadratic cones can represented
using a tree-structure as shown shown in Fig. 3.2.
Next let us assume that n is not a power of two let, for example n = 6. We then wish to
characterize the set
t ≤ (x1 x2 x3 x4 x5 x6 )1/6 , x ≥ 0.
which is obviously equivalent to
t ≤ (x1 x2 x3 x4 x5 x6 )1/8 , x7 = x8 = t, x ≥ 0.
In other words, we can reuse the result (3.13) if we add simple equality constraints
x7 = x8 = t.
Thus, if n is not a power of two, we take l = dlog2 ne and build the conic quadratic quadratic
representation for that set, and we add 2l − n simple equality constraints.

3.2.8 Harmonic mean


We can also consider the hypograph of the harmonic mean,
n
!−1
1 X −1
xi ≥ t, x ≥ 0.
n
i=1

This is a convex inequality, but not conic quadratic representable. However, the reciprocal
inequality
n
1 X −1
xi ≤ y, x ≥ 0. (3.14)
n
i=1
with y = 1/t can be characterized as
n
X
xi zi ≥ 1, zi = ny.
i=1

21
In other words, the set (3.14) corresponding to the epigraph of the reciprocal harmonic mean
of x can be described using an intersection of rotated quadratic cones an affine hyperplanes
n
√ X
2xi zi ≥ wi2 , xi , zi ≥ 0, wi = 2, zi = ny,
i=1

or (xi , zi , 2) ∈ Q3r , eT z = ny.

3.2.9 Convex increasing rational powers


We can extend the formulations in Sec. 3.2.7 to describe the epigraph

K p/q = {(x, t) | xp/q ≤ t, x ≥ 0}

for any rational convex power p/q ≥ 1, p, q ∈ Z+ . For example, consider

x5/3 ≤ t, x ≥ 0.

We rewrite this as
x5 ≤ t3 , x ≥ 0,
which can be described (using the approach in Sec. 3.2.7) as

0 ≤ x ≤ (y1 y2 y3 y4 y5 )1/5 , y1 = y2 = y3 = t, y4 = y5 = 1,

where we note that


y1 y2 y3 y4 y5 = t3 .
The inequality
0 ≤ x ≤ (y1 y2 y3 y4 y5 )1/5
is a geometric mean inequality discussed in Sec. 3.2.7. If we let ek ∈ Rk denote the vector of
all ones, then for general p/q ≥ 1, p, q ∈ Z+ we have

K p/q = {(x, t) | (eq t, ep−q , x) ∈ Kgm


p
}.

3.2.10 Convex decreasing rational powers


In a similar way we can describe the epigraph

K −p/q = {(x, t) | x−p/q ≤ t, x ≥ 0}

for any p, q ∈ Z+ . For example, consider

x−5/2 ≤ t, x ≥ 0,

which we rewrite as
1 ≤ x5 t 2 , x ≥ 0,
or equivalently as

1 ≤ (y1 y2 . . . y7 )1/7 , y1 = · · · = y5 = x, y6 = y7 = t, y, t ≥ 0.

Let ek ∈ Rk denote the vector of all ones. For general p, q ∈ Z+ we then have

K −p/q = {(x, t) | (ep x, eq t, 1) ∈ Kgm


p+q
}.

22
3.2.11 Power cones
The (n + 1)-dimensional power cone is defined by
 
 n 
α
Y
Kαn+1 = (x, y) ∈ Rn+ × R | |y| ≤ xj j (3.15)
 
j=1

where α > 0 and


n
X
αj = 1.
j=1
Of particular interest is the three-dimensional power cone given by
n o
Kα3 = (x, y) ∈ R2+ × R | |y| ≤ xα1 1 x1−α
2
1
. (3.16)

For example, the intersection of the three-dimensional power cone with an affine hyperplane
|y| ≤ xα1 1 x1−α
2
1
, x2 = 1
is equivalent to
|y|1/α1 ≤ x1 ,
i.e., we can model the epigraph of convex increasing terms (1/α1 ≥ 1) using a three-dimensional
power cone. As another example, the cone
2
1− 23
|y| ≤ x13 x2
is equivalent to
|y|3
≤ x2 .
x21
Assume next that α1 = p/q where p and q are positive integers and p ≤ q. Then
p
(1− pq )
 
3 2 q
K( p ,1− p ) = (x, y) ∈ R+ × R | |y| ≤ x1 x2 . (3.17)
q q

Now p
(1− pq )
|y| ≤ x1q x2
is equivalent to
1
|y| ≤ (xp1 xq−p
2 ) ,
q (3.18)
so by introducing additional z variables and the constraints
z1 = z2 = . . . = zq−p = x1 , zq−p+1 = zq−p+2 = . . . = zq = x2
we can rewrite (3.18) as
1
|y| ≤ (z1 z2 . . . zq ) p (3.19)
which essentially is the geometric mean inequality discussed above.
We next consider the n + 1 dimensional power cone with rational powers,
 
 n 
p /q
Y
Kαn+1 = (x, y) ∈ Rn+ × R | |y| ≤ xj j j (3.20)
 
j=1
Pn
where pj , qj are integers satisfying 0 < pj ≤ qj and j=1 pj /qj = 1. To motivate the general
procedure, we consider a simple example.

23
Example 3.1. Consider the 4 dimensional power cone
 1 3 3

4 3
K( 1 , 3 , 3 ) = (x, y) ∈ R+ × R | |y| ≤ (x1 x2 x3 ) .
4 8 8
4 8 8

An equivalent representation is given by the geometric mean inequality

|y| ≤ (z1 z2 . . . z8 )1/8

with
z1 = z2 = x 1 , z3 = z4 = z5 = x2 , z6 = z7 = z8 = x3 .

In the general case, let


β = lcm(q1 , q2 , . . . , qn )
denote the least common multiple of {qi }. Then
  1 
 n βpj β
 Y qj 
n+1 n
Kα = (x, y) ∈ R+ × R | |y| ≤  xj  , (3.21)
 
 j=1 

βpj Pn βpj
where we note that qj are integers and j=1 qj = β. If we define

j
X βpk
sj = , j = 1, . . . , n − 1,
qk
k=1

we can then reformulate (3.20) as


1
|y| ≤ (z1 z2 . . . zβ ) q
z1 = . . . = zs1 = x 1 , zs1 +1 = . . . = zs2 = x 2 , . . . zsn−1 +1 = . . . = zβ = xn .

3.2.12 Quadratic forms with one negative eigenvalue


Assume that A ∈ Rn×n is a symmetric matrix with exactly one negative eigenvalue, i.e., A has
a spectral factorization (i.e., eigenvalue decomposition)
n
X
A = QΛQT = −α1 q1 q1T + αi qi qiT
i=2

where QT Q = I, Λ = diag(−α1 , α2 , . . . , αn ), αi ≥ 0, ∀i. Then

xT Ax ≤ 0

is equivalent to
n
X
αj (qjT x)2 ≤ α1 (q1T x)2 . (3.22)
j=2

Suppose q1T x ≥ 0. We can characterize (3.22) in different ways, e.g., as


√ √
( α1 /2q1T x, α1 /2q1T x, α2 q2T x, . . . , αn qnT x) ∈ Qn+1
p p
r (3.23)

24
or as
√ √ √
( α1 q1T x, α2 q2T x, . . . , αn qnT x) ∈ Qn . (3.24)
The latter equivalence follows directly by using the orthogonal transformation Tn+1 on (3.23),
i.e.,  √ √   √
Tx Tx
 p 
1/√2 1/√2 α1 /2q 1 α1 q 1
  α1 /2q T x  
p
  √ 0
 1/ 2 −1/ 2 
  √ 1 
T T

 1 

 α2 q2 x  =  α2 q2 x 
 
.
 ..  ..   .. 
 .   .   . 
√ T √ T
1 α n qn x αn qn x

3.2.13 Ellipsoidal sets


The set
E = {x ∈ Rn | kP (x − c)k2 ≤ 1}
describes an ellipsoid centered at c. It has a natural conic quadratic representation, i.e., x ∈ E
if and only if
y = P (x − c), (t, y) ∈ Qn+1 , t = 1.
Suppose P is nonsingular. We then get an alternatively characterization

E = {x ∈ Rn | x = P −1 y + c, kyk2 ≤ 1}.

Depending on the context one characterization may be more useful than the other.

3.3 Conic quadratic optimization


In this section we give different instructive examples of conic quadratic optimization problems
formed using the formulations from Sec. 3.2.

3.3.1 Quadratically constrained quadratic optimization


A general convex quadratically constrained quadratic optimization problem can be written as

minimize (1/2)xT Q0 x + cT0 x + r0


(3.25)
subject to (1/2)xT Qi x + cTi x + ri ≤ 0, i = 1, . . . , p,

where all Qi are symmetric positive semidefinite matrices. Let

Qi = FiT Fi , i = 0, . . . , p,

where Fi ∈ Rki ×n . Using the formulations in Sec. 3.2.4 we then get an equivalent conic quadratic
problem
minimize t0 + cT0 x + r0
subject to ti + cTi x + ri = 0, i = 1, . . . , p, (3.26)
(ti , 1, Fi x) ∈ Qkr i +2 , i = 0, . . . , p.
Assume next that Qi is a rank 1 matrix, i.e., Fi has 1 row and n columns. Storing Qi requires
about n2 /2 space whereas storing Fi then only requires n space. Moreover, the amount of work
required to evaluate
x T Qi x

25
is proportional to n2 whereas the work required to evaluate

xT FiT Fi x = kFi xk2

is proportional to n only. In other words, if Qi have low rank, then (3.25) will require much less
space to solve than (3.25). This fact usually also translates into much faster solution times.

3.3.2 Robust optimization with ellipsoidal uncertainties


Often in robust optimization some of the parameters in the model are assumed to be unknown,
but we assume that the unknown parameters belong to a simple set describing the uncertainty.
For example, for a standard linear optimization problem (2.7) we may wish to find a robust
solution for all vectors c in an ellipsoid

E = {c ∈ Rn | c = F y + g, kyk2 ≤ 1}.

A common approach is then to optimize for the worst-case realization of c, i.e., we get a robust
version
minimize supc∈E cT x
subject to Ax = b (3.27)
x ≥ 0.
The worst-case objective can be evaluated as

sup cT x = g T x + sup y T F T x = g T x + kF T xk2


c∈E kyk2 ≤1

where we used that supkuk2 ≤1 v T u = (v T v)/kvk2 = kvk2 . Thus the robust problem (3.27) is
equivalent to
minimize g T x + kF T xk2
subject to Ax = b
x ≥ 0,
which can be posed as a conic quadratic problem

minimize g T x + t
subject to Ax = b
(3.28)
(t, F T x) ∈ Qn+1
x ≥ 0.

3.3.3 Markowitz portfolio optimization


In classical Markowitz portfolio optimization we consider investment in n stocks or assets held
over a period of time. Let xi denote the amount we invest in asset i, and assume a stochastic
model where the return of the assets is a random variable r with known mean

µ = Er

and covariance
Σ = E(r − µ)(r − µ)T .
The return of our investment is also a random variable y = rT x with mean (or expected return)

Ey = µT x

26
and variance (or risk)
(y − Ey)2 = xT Σx.
We then wish to rebalance our portfolio to achieve a compromise between risk and expected
return, e.g., we can maximize the expected return given an upper bound γ on the tolerable risk
and a constraint that our total investment is fixed,

maximize µT x
subject to xT Σx ≤ γ
eT x = 1
x ≥ 0.

Suppose we factor Σ = GGT (e.g., using a Cholesky or a eigenvalue decomposition). We then


get a conic formulation
maximize µT x

subject to ( γ, GT x) ∈ Qn+1
eT x = 1
x ≥ 0.
In practice both the average return and covariance are estimated using historical data. A recent
trend is then to formulate a robust version of the portfolio optimization problem to combat the
inherent uncertainty in those estimates, e.g., we can constrain µ to an ellipsoidal uncertainty
set as in Sec. 3.3.2.

3.4 Duality in conic quadratic optimization


To discuss conic quadratic duality, we consider a primal problem
Pnq q q
minimize hcl , xl i + j=1 hcj , xj i
Pnq
subject to A x + j=1 Aqj xqj = b
l l
(3.29)
xl ∈ Rn+l , xqj ∈ Qnj ,

where cl ∈ Rnl , cqj ∈ Rnj , Al ∈ Rm×n , Aqj ∈ Rm×nj and b ∈ Rm , i.e., a problem with both a
l

linear cone and nq quadratic cones. We can also write (3.29) more compactly as

minimize cT x
subject to Ax = b (3.30)
x ∈ C,

with
xl cl
   
 xq1   cq1 
Al Aq1 . . . Aqnq

x= .. , c= .. , A= ,
   
 .   . 
xqnq cqnq
for the cone C = Rl+ × Qn1 × · · · × Qnq . We note that problem (3.30) resembles a standard
linear optimization, except for a more general cone. The Lagrangian function is
X
L(x, y, s) = cT x − y T (Ax − b) − xT s.

For linear optimization the choice of nonnegative Lagrange multipliers s ≥ 0 ensures that
xT s ≥ 0, ∀x ≥ 0 so the Lagrangian function provides a lower bound on the primal problem.

27
For conic quadratic optimization the choice of Lagrange multipliers is more involved, and we
need the notion of a dual cone

(Qn )∗ = {v ∈ Rn | uT v ≥ 0, ∀u ∈ Qn }.

Then for s ∈ (Qn )∗ we similarly have xT s ≥ 0, ∀x ∈ Qn , so the Lagrange function becomes a


global lower bound.
Lemma 3.2. The quadractic cone is self-dual, (Qn )∗ = Qn .
Proof. Assume first that u1 ≥ ku2:n k and v1 ≥ kv2:n k. Then

uT v = u1 v1 + uT2:n v2:n ≥ u1 v1 − ku2:n k · kv2:n k ≥ 0.

Conversely, assume that v1 < kv2:n k. Then u := (1, −v2:n /kv2:n k) ∈ Qn satisfies

uT v = v1 − kv2:n k < 0,

/ (Qn )∗ .
i.e., v ∈

The notion of a dual cone allows us to treat (3.30) as a linear optimization problem, where
the dual variables belong to the dual cone, i.e., we have a dual problem

maximize bT y
subject to c − AT y = s (3.31)
s ∈ C∗,

with a dual cone


C ∗ = C = Rl+ × Qn1 × · · · × Qnq .
Weak duality follows directly by taking inner product of x and the dual constraint,

cT x − xT AT y = cT x − bT y = xT s ≥ 0.

3.4.1 Primal versus dual form


Let us consider the dual formulation (3.31) without linear terms. If we eliminate the dual
variable s, we can rewrite (3.31) as

maximize bT y
subject to (cqj − (Aqj )T y) ∈ Qnj , j = 1, . . . , nq ,

or equivalently

maximize bT y
subject to k(cqj )2:nj − ((Aqj )2:nj ,: )T yk2 ≤ (cqj )1 − ((Aqj )1,: )T y, j = 1, . . . , nj ,

where the dual constraints

k(cqj )2:nj − ((Aqj )2:nj ,: )T yk2 ≤ (cqj )1 − ((Aqj )1,: )T y

correspond to the so-called dual form conic constraints in Sec. 3.2.5. If the matrix

A = Aq1 . . . Aqnq ,


has more columns than rows then usually it is more efficient to solve the primal problem (3.30),
and if the opposite is the case, then it is usually more efficient to solve the dual problem (3.31).

28
3.4.2 Examples of bad duality states
The issue of strong duality is more complicated than for linear programming, for example the
primal (or dual) optimal values may be finite but unattained, as illustrated in the following
example.

Example 3.2. Consider the problem

minimize x1 − x2
subject to x3 = p
1 (3.32)
x1 ≥ x22 + x23 ,

1 + x22 , x3 = 1}. The optimal objective value is p? = 0 for


p
with feasible set {x1 ≥
(x1 , x2 ) → ∞, and we say that the objective is unattained. The dual problem is

maximize y p
(3.33)
subject to 1 ≥ 1 + y 2 ,

with feasible set {y = 0} and optimal value d? = 0, so a positive duality gap exists for all
bounded (x, y).

It is also possible to have a positive duality gap even when both the primal and dual optimal
objectives are attained.

Example 3.3. We consider the problem

minimize x3
subject to x1 + x2 + u1 + u2 = 0
px3 + u2 = 1 (3.34)
x22 p
+ x23 ≤ x1
u22 ≤ u1 ,

with two quadratic cones x ∈ Q3 , u ∈ Q2 . It follows from the inequality constraints that

x1 ≥ |x2 |, u1 ≥ |u2 |

and therefore x1 + x2 ≥ 0 and u1 + u2 ≥ 0. But then x1 + x2 = 0 and u1 + u2 = 0, which


combined with x3 = 1 − u2 gives a simplified problem

minimize p 1 − u2
2
x1 + (1 − u2 )2 ≤ x1 ,

with feasible set {u2 = 1, x1 ≥ 0} and optimal value p? = 0. The dual problem of (3.34) is

maximize y2 p
subject to y1 ≥ py12 + (1 − y2 )2 (3.35)
y1 ≥ (y1 − y2 )2 ,

with feasible set {y2 = 1, y1 ≥ 21 } and the optimal value is d? = −1. For this example, both
the primal and dual optimal values are attained, but we have positive duality gap p? −d? = 1.

29
To ensure strong duality for conic quadratic optimization, we need an additional regularity
assumption. Consider the primal feasible set for (3.30),
Fp = {x = (xl , xq1 , . . . , xqnq ) | Ax = b, xl ≥ 0, xqj ∈ Qnj }.
and dual feasible set for (3.31),
Fd = {y ∈ Rm | c − AT y = (sl , sq1 , . . . , sqnq ), sl ≥ 0, sqj ∈ Qnj },
respectively. If
∃x ∈ Fp : (xqj )1 > k(xqj )2:nj k, j = 1, . . . , nq ,
or
∃y ∈ Fp : c − AT y = (sl , sq1 , . . . , sqnq ), (sqj )1 > k(sqj )2:nj k, j = 1, . . . , nq ,
then strong duality between the problems (3.30) and (3.31) holds. The additional regularity
assumptions are called a Slater constraint qualification. In other words, strong duality holds if
the conic inequalities in the primal or dual problem are strictly feasible (the proofs in App. B.5
can be adapted to handle this case).
Thus the deficiency of positive duality gap in Example 3.3 is caused by the fact that neither
the primal or dual problem is strictly feasible; the inequality in the first problem
q
u1 ≥ u22
is not strict for any u in the primal feasible set, and similarly the inequality
q
y1 ≥ y12 + (1 + y2 )2
is not strict for any y in the dual feasible set.

3.5 Infeasibility in conic quadratic optimization


Since duality theory for linear and conic quadratic optimization is almost identical, the same is
true for infeasibility concepts. In particular, consider a pair of primal and dual conic optimiza-
tion problems (3.30) and (3.31). We then have generalized versions of Farkas Lemma.
Lemma 3.3 (Generalized Farkas). Given A and b, exactly one of the two statements are true:
1. There exists an x ∈ C such that Ax = b.
2. There exists a y such that −AT y ∈ C and bT y > 0.
Lemma 3.4 (Generalized Farkas dual variant). Given A and c, exactly one of the two state-
ments are true:
1. There exists an x ∈ C such that Ax = 0 and cT x < 0.
2. There exists a y such (c − AT y) ∈ C.
Farkas’ lemma (3.3) tells us that either the primal problem (3.30) is feasible or there exists
a y such that −AT y ∈ C and bT y > 0. In other words, any y satisfying
−AT y ∈ C, bT y > 0
is a certificate of primal infeasibility. Similarly, the dual variant of Farkas’ lemma (3.4) states
that either the dual problem (3.31) is feasible or there exists an x ∈ C such that Ax = 0 and
cT x < 0. In other words, any x ∈ C satisfying Ax = 0 and cT x < 0 is a certificate of dual
infeasibility.

30
Example 3.4. Consider the conic quadratic problem

minimize x1
subject to 2x1 − x2 = 0
x1p− s = 1
x22 ≤ x1
s ≥ 0,

which is infeasible since x1 ≥ 2|x1 | and x1 ≥ 1. The dual problem is

maximize   y2 
1 2y1 + y2
subject to  0  +  −y1  ∈ Q3 .
0 −y2

Any point y = (0, t), t ≥ 0 is a certificate of infeasibility since (t, 0, −t) ∈ Q3 ; in fact (0, t) is
a dual feasible direction with unbounded objective.

3.6 Bibliography
The material in this chapter is based on the paper [LVBL98] and the books [BN01, BV04]. The
papers [AG03, ART03] contain additional theoretical and algorithmic aspects.

31
Chapter 4

Semidefinite optimization

4.1 Introduction
In this chapter we extend the conic optimization framework from Chap. 2 and 3 with symmetric
positive semidefinite matrix variables.

4.1.1 Semidefinite matrices and cones


A symmetric matrix X ∈ S n is called symmetric positive semidefinite if

z T Xz ≥ 0, ∀z ∈ Rn .

We then define the cone of symmetric positive semidefinite matrices as


n
S+ = {X ∈ S n | z T Xz ≥ 0, ∀z ∈ Rn }. (4.1)

For brevity we will often use the shorter notion semidefinite instead of symmetric positive
semidefinite, and we will write X  Y (X  Y ) as shorthand notation for (X − Y ) ∈ S+ n
n
((Y − X) ∈ S+ ). As inner product for semidefinite matrices, we use the standard trace inner
product for general matrices, i.e.,
X
hA, Bi := tr(AT B) = aij bij .
ij

It is easy to see that (4.1) indeed specifies a convex cone; it is pointed (with origin X = 0),
and X, Y ∈ S+ n implies that (αX + βY ) ∈ S n , α, β ≥ 0. Let us a review a few equivalent
+
definitions of S+n . It is well-known that every symmetric matrix A has a spectral factorization

n
X
A= λi qi qiT .
i=1

where qi ∈ Rn are the (orthogonal) eigenvectors and λi are eigenvalues of A. Using the spectral
factorization of A we have
Xn
T
x Ax = λi (xT qi )2 ,
i=1

which shows that xT Ax ≥ 0 ⇔ λi ≥ 0, i = 1, . . . , n. In other words,


n
S+ = {X ∈ S n | λi (X) ≥ 0, i = 1, . . . , n}. (4.2)

32
n if and only if it is a Grammian matrix A = V T V .
Another useful characterization is that A ∈ S+
Using the Grammian representation we have

xT Ax = xT V T V x = kV xk22 ,

i.e., if A = V T V then xT Ax ≥ 0, ∀x. On the other hand, from the positive


√ spectral
√ factorization
T T 1/2 T 1/2
A = QΛQ we have A = V V with V = Λ Q , where Λ = diag( λ1 , . . . , λn ). We thus
have the equivalent characterization
n n
S+ = {X ∈ S+ | X = V T V for some V ∈ Rk×n }. (4.3)

In a completely analogous way we define the cone of symmetric positive definite matrices as
n
S++ = {X ∈ S n | z T Xz > 0, ∀z ∈ Rn }
= {X ∈ S n | λi (X) > 0, i = 1, . . . , n}
n
= {X ∈ S+ | X = V T V for some V ∈ Rk×n , rank(V ) = n},
n ((Y − X) ∈ S n ).
and we write X  Y (X ≺ Y ) as shorthand notation for (X − Y ) ∈ S++ ++
1
The one dimensional cone S+ simply corresponds to R+ . Similarly consider
 
x1 x3
X=
x3 x2

with determinant det(X) = x1 x2 − x23 = λ1 λ2 and trace tr(X) = x1 + x2 = λ1 + λ2 . Therefore


X has positive eigenvalues if and only if

x1 x2 ≥ x23 , x1 , x2 ≥ 0,

which characterizes a three dimensional scaled rotated cone, i.e.,



 
x1 x3 2
∈ S+ ⇐⇒ (x1 , x2 , x3 / 2) ∈ Q2r .
x3 x2

More generally we have


x ∈ Rn+ ⇐⇒ n
diag(X) ∈ S+
and
t xT
 
n+1 n+1
(x, t) ∈ Q ⇐⇒ ∈ S+ ,
x tI
where the latter equivalence follows immediate from Lemma 4.1. Thus both the linear and
conic quadratic cone are embedded in the semidefinite cone. In practice, however, linear and
conic quadratic should never be described using semidefinite cones, which would result in a
large performance penalty by squaring the number of variables.

Example 4.1. As a more interesting example, consider the symmetric matrix


 
1 x y
A(x, y, z) = x 1
 z  (4.4)
y z 1

parametrized by (x, y, z). The set

S = {(x, y, z) ∈ R3 | A(x, y, z) ∈ S+
3
},

33
Figure 4.1: Plot of spectrahedron S = {(x, y, z) ∈ R3 | A(x, y, z)  0}.

(shown in Fig. 4.1) is called a spectrahedron and is perhaps the simplest bounded semidefinite
representable set, which cannot be represented using (finitely many) linear or conic quadratic
cones. To gain a geometric intuition of S, we note that

|A(x, y, z)| = −(x2 + y 2 + z 2 − 2xyz − 1),

so the boundary of S can be characterized as

x2 + y 2 + z 2 − 2xyz = 1,

or equivalently as
 T   
x 1 −z x
= 1 − z2.
y −z 1 y
For z = 0 this describes a circle in the (x, y)-plane, and for −1 ≤ z ≤ 1 it characterizes an
ellipse (for a fixed z).

4.1.2 Properties of semidefinite matrices


Many useful properties of (semi)definite matrices follow directly from the definitions (4.1)-(4.3)
and the definite counterparts.

• The diagonal elements of A ∈ S+ n are nonnegative. Let e denote the ith standard basis
i
vector (i.e., [ei ]j = 0, j 6= i, [ei ]i = 1). Then Aii = eTi Aei , so (4.1) implies that Aii ≥ 0.

• A block-diagonal matrix A = diag(A1 , . . . Ap ) is (semi)definite if and only if each diagonal


block Ai is (semi)definite.

• Given a quadratic transformation M := B T AB. Then M  0 if and only if A  0


and B has full rank. This follows directly from the Grammian characterization M =
(V B)T (V B). For M  0 we only require that A  0. As an example, if A is (semi)definite
then so is any permutation P T AP .

34
• Any submatrix of A ∈ S++n (A ∈ S+n ) is positive (semi)definite; this follows from the

Grammian characterization A = V T V since any subset of the columns of V is linearly


independent.

• The inner product of positive (semi)definite matrices is positive (nonnegative). For any
n let A = U T U and B = V T V where U and V have full rank. Then
A, B ∈ S++

hA, Bi = tr(U T U V T V ) = kU V T k2F > 0,

where strict positivity follows from the assumption that U has full column-rank, i.e.,
U V T 6= 0.

• The inverse of a positive definite matrix is positive definite. This follows from the positive
spectral factorization A = QΛQT , which gives us

A−1 = QT Λ−1 Q

where Λii > 0. If A is semidefinite then the pseudo-inverse of A is semidefinite.

• Consider a matrix X ∈ S n partitioned as

A BT
 
X= .
B C

Let us find necessary and sufficient conditions for X  0. We know that A  0 and C  0
(since any submatrix must be positive definite). Furthermore, we can simplify the analysis
using a nonsingular transformation
 
I 0
L=
F I

to diagonalize X as LXLT = D, where D is block-diagonal. Note that det(L) = 1, so L


is indeed nonsingular. Then X  0 if and only if D  0. Expanding LXLT = D, we get

AF T + B T
   
A D1 0
= .
F A + B F AF T + F B T + BF T + C 0 D2

Since det(A) 6= 0 (by assuming that A  0) we see that F = −BA−1 and direct substition
gives us    
A 0 D1 0
= .
0 C − BA−1 B T 0 D2
We have thus established the following useful result.

Lemma 4.1. (Schur complement lemma) A symmetric matrix

A BT
 
X= .
B C

is positive definite if and only if

A  0, C − BA−1 B T  0.

35
4.1.3 Semidefinite duality
Semidefinite duality is largely identical to conic quadratic duality. We define a primal semidef-
inite optimization problem as

minimize hC, Xi
subject to hAi , Xi = bi , i = 1, . . . , m (4.5)
X ∈ S+ n

with a dual cone


n ∗
(S+ ) = {Z ∈ Rn×n | hX, Zi ≥ 0, ∀X ∈ S+
n
}.
n )∗ = S n .
Lemma 4.2. The semidefinite cone is self-dual, (S+ +
n ⇒ hX, Zi ≥ 0. Conversely, assume that Z 6∈ S n .
Proof. We have already seen that X, Z ∈ S+ +
Then there exists a w satisfying w Zw < 0, i.e., X = wwT ∈ S+
T n satisfies hX, Zi < 0, so
n )∗ .
Z 6∈ (S+

We can now define a Lagrange function


m
X
L(X, y, S) = hC, Xi − yi (hAi , Xi − bi ) − hX, Si
i=1
m
X
= hX, Ci − yi Ai − Si + bT y
i=1

n . The Lagrange function is unbounded below unless C−


P
with a dual variable S ∈ S+ i yi Ai = S,
so we get a dual problem
maximize bT y P
subject to C − m i=1 yi Ai = S (4.6)
S ∈ S+n.

The primal constraints are a set of equality constraints involving inner products hAi , Xi = bi ,
i.e., an intersection of n(n + 1)/2-dimensional affine hyperplanes. If we eliminate S from the
dual constraints,
Xm
C− y i Ai  0
i=1

we get an affine matrix-valued function with coefficients (C, Ai ) ∈ S n and variable y ∈ Rm .


Such a matrix-valued affine inequality is called a linear matrix inequality, which we discuss in
greater detail in the next section.
Weak duality follows immediately by taking the inner product between X and the dual
constraint,
Xm
hX, C − yi Ai i = hC, Xi − bT y = hX, Si ≥ 0,
i=1

but (similar to conic quadratic optimization) the issue of strong duality is more complicated
than for linear optimization. In particular, it it possible that the primal or dual optimal values
are unattained, and it is also possible for a problem to have a positive duality gap , even though
both the primal and dual optimal values are attained. This is illustrated in the following
example.

36
Example 4.2. We consider a problem

minimize x
1 
0 x1 0
subject to  x1 x2 0 3,
 ∈ S+
0 0 1 + x1

with feasible set {x1 = 0, x2 ≥ 0} and optimal value p? = 0. The dual problem can be
formulated as
maximize −z 2 
z1 (1 − z2 )/2 0
subject to  (1 − z2 )/2 0 3,
0  ∈ S+
0 0 z2
which has a feasible set {z1 ≥ 0, z2 = 1} and dual optimal value d? = −1. Both problems are
feasible, but not strictly feasible.

Just as for conic quadratic optimization, existence of either a strictly feasible primal or dual
point (i.e., a Slater constraint qualification) ensures strong duality for the pair of primal and
dual semidefinite programs.
As discussed in the introduction, we can embed the nonnegative orthant and quadratic cone
within the semidefinite cone, and the Carthesian product of several semidefinite cones can be
modeled as a larger block-diagonal semidefinite matrix. For an optimization solver, however,
it is more efficient to explicitly include multiple cones in the problem formulation. A general
linear, conic quadratic and semidefinite problem format often used by optimization solvers is
Pnq q q
minimize hcl , xl i + j=1 hcj , xj i + nj=1
P s
hCj , Xj i
l l
Pnq q q Pns
subject to A x + j=1 Aj xj + j=1 Aj (Xj ) = b (4.7)
s
xl ∈ Rn+l , xqj ∈ Qqj , Xj ∈ S+j ∀j

with a linear variable xl , conic quadratic variables xqj , semidefinite variables Xj and a linear
mapping Aj : S sj 7→ Rm ,
hAs1j , Xjs i
 

Aj (Xjs ) :=  ..
.
 
.
s s
hAmj , Xj i
All the duality results we have presented for the simpler linear, conic quadratic and semidefinite
problems translate directly to (4.7); the dual problem is

maximize bT y
subject to cl − (Al )T y = sl
(4.8)
cqj − (Aqj )T y = sqj , j = 1, . . . , nq
Cj − A∗j (y) = Sj , j = 1, . . . , ns ,

where
m
X
A∗j (y) := yi Asij ,
i=1

and strong duality holds if a strictly feasible point exists for either (4.7) or (4.8). Similarly, if
there exist
s
xl ∈ Rn+l , xqj ∈ Qqj ∀j, Xj ∈ S+j ∀j

37
satisfying
nq ns nq ns
Aqj xqj q q
X X X X
l l l l
Ax + + Aj (Xj ) = 0, hc , x i + hcj , xj i + hCj , Xj i < 0,
j=1 j=1 j=1 j=1

then the dual problem is infeasible.

4.2 Semidefinite modeling


Having discussed different characterizations and properties of semidefinite matrices, we next
turn to different functions and sets that can be modeled using semidefinite cones and variables.
Most of those representations involve semidefinite matrix-valued affine functions, which we
discuss next.

4.2.1 Linear matrix inequalities


A linear matrix inequality is an affine matrix-valued mapping A : Rn 7→ S m ,
A(x) = A0 + x1 A1 + · · · + xn An  0, (4.9)
in the variable x ∈ Rn with symmetric coefficients Ai ∈ S m , i = 0, . . . , n. As a simple example
consider the matrix in (4.4),
A(x, y, z) = A0 + xA1 + yA2 + zA3  0
with      
0 1 0 0 0 1 0 0 0
A0 = I, A1 =  1 0 0  , A2 =  0 0 0  , A3 =  0 0 1  .
0 0 0 1 0 0 0 1 0
Alternatively, we can describe the linear matrix inequality A(x, y, z)  0 as
3
X ∈ S+ , x11 = x22 = x33 = 1,
i.e., as a semidefinite variable with fixed variables; these two alternative formulations illustrate
the difference between primal and dual semidefinite formulations, discussed in Sec. 4.1.3.

4.2.2 Eigenvalue optimization


Consider a symmetric matrix valued function A : Rn 7→ S m ,
A(x) = A0 + x1 A1 + · · · + xn An ,
and let the eigenvalues of A(x) be denoted by
λ1 (A(x)) ≥ λ2 (A(x)) ≥ · · · ≥ λm (A(x)).
A number of different functions of λi (A(x)) can then be described using simple linear matrix
inequalites.

Sum of eigenvalues
The sum of the eigenvalues corresponds to
m
X n
X
λi (A(x)) = tr(A(x)) = tr(A0 ) + xi tr(Ai ).
i=1 i=1

38
Largest eigenvalue
The largest eigenvalue can be characterized in epigraph form λ1 (A(x)) ≤ t as

K = {(x, t) ∈ Rn+1 | tI − A(x)  0}. (4.10)

To verify this, suppose we have a spectral factorization A(x) = Q(x)Λ(x)Q(x)T where Q(x) is
orthogonal and Λ(x) is diagonal. Then t is an upper bound on the largest eigenvalue if and
only if
Q(x)T (tI − A(x))Q(x) = tI − Λ(x)  0.
Thus we can minimize the largest eigenvalue of A(x).

Smallest eigenvalue
The smallest eigenvalue can be described in hypograph form λm (A(x)) ≥ t as

K = {(x, t) ∈ Rn+1 | A(x)  tI}, (4.11)

i.e., we can maximize the smallest eigenvalue of A(x).

Eigenvalue spread
The eigenvalue spread can be modeled in epigraph form

λ1 (A(x)) − λm (A(x)) ≤ t

by combining the two linear matrix inequalities in (4.10) and (4.11), i.e.,

K = {(x, t) ∈ Rn+1 | zI  A(x)  sI, s − z ≤ t}. (4.12)

Spectral radius
The spectral radius ρ(A(x)) := maxi |λi (A(x))| can be modeled in epigraph form ρ(A(x)) ≤ t
using two linear matrix inequalities
tI ± A(x)  0,
i.e., the epigraph is
K = {(x, t) ∈ Rn+1 | tI  A(x)  −tI}. (4.13)

Condition number of a positive definite matrix


The condition number of a positive definite matrix can be minimized by noting that λ1 (A(x))/λm (A(x)) ≤
t if and only if there exists a µ > 0 such that

µI  A(x)  µtI,

or equivalently if and only if I  µ−1 A(x)  tI. In other words, if we make a change of variables
z := x/µ, ν := 1/µ we can minimize the condition number as

minimize Ptm
subject to I  νA0 + i=1 zi Ai  tI,

and subsequently recover the solution x = νz. In essence, we first normalize the spectrum
by the smallest eigenvalue, and then minimize the largest eigenvalue of the normalized linear
matrix inequality.

39
Roots of the determininant
The determinant of a semidefinite matrix
m
Y
det(A(x)) = λi (A(x))
i=1
is neither convex or concave, but rational powers of the determinant can be modeled using linear
matrix inequalities. For a rational power 0 ≤ q ≤ 1/m we have that
t ≤ det(X)q
if and only if
 
A(x) Z
0 (4.14)
Z T Diag(Z)
(Z11 Z22 · · · Zmm )q ≥ t, (4.15)
where Z ∈ Rm×m is a lower-triangular matrix, and the inequality (4.15) can be modeled using
the formulations in Sec. 3.2.7; see [BN01, p. 150] for a proof. Similarly we can model negative
powers of the determinant, i.e., for any rational q > 0 we have
t ≥ det(X)−q
if and only if
 
A(x) Z
T 0 (4.16)
Z Diag(Z)
(Z11 Z22 · · · Zmm )−q ≥ t (4.17)
for a lower triangular Z.

4.2.3 Singular value optimization


We next consider a nonsymmetric matrix valued function A : Rn 7→ Rm×p ,
A(x) = A0 + x1 A1 + · · · + xn An ,
where Ai ∈ Rm×p . Assume p ≤ m and denote the singular values of A(x) by
σ1 (A(x)) ≥ σ2 (A(x)) ≥ · · · ≥ σp (A(x)) ≥ 0.
The singular values are simply connected to the eigenvalues of A(x)T A(x),
q
σi (A(x)) = λi (A(x)T A(x)), (4.18)
and if A(x) is symmetric the singular values corresponds to the absolute of the eigenvalues. We
can then optimize several functions of the singular values.

Largest singular value


The epigraph σ1 (A(x)) ≤ t can be characterized using (4.18) as
A(x)T A(x)  t2 I
which from Schur’s lemma is equivalent to the linear matrix inequality
 
tI A(x)
 0. (4.19)
A(x)T tI
The largest singular value σ1 (A(x)) is also called the spectral norm or the `2 -norm of A(x),
kA(x)k2 := σ1 (A(x)).

40
Sum of singular values
The trace norm or the nuclear norm of A(x) is the dual of the `2 -norm. We can characterize it
as
kXk∗ = sup tr(X T Z). (4.20)
kZk2 ≤1

It turns out that the nuclear norm corresponds to the sum of the singular values,
m
X
kXk∗ = σi (X), (4.21)
i=1

which is easy to verify using singular value decomposition X = U ΣV T . We then have

sup tr(X T Z) = sup tr(ΣT U T ZV )


kZk2 ≤1 kXk2 ≤1

= sup tr(ΣT U T ZV )
kU T ZV k2 ≤1

= sup tr(ΣT Y ),
kY k2 ≤1

where we used the unitary invariance of the norm k · k2 . We consider Y = diag(y1 , . . . , yp ), so


that
Xp p
X
sup tr(X T Z) = sup σ i yi = σi ,
kZk2 ≤1 |yi |≤1 i=1 i=1

which shows (4.21). Alternatively, we can express (4.20) as the solution to

maximize tr(X T Z)

I ZT
 
subject to  0,
Z I

with the dual problem


minimize tr(U + V )/2
U XT
 
subject to  0.
X V
In other words, using strong duality we can characterize the epigraph kA(x)k∗ ≤ t using a linear
matrix inequality
A(x)T
 
U
 0, tr(U + V )/2 ≤ t. (4.22)
A(x) V
For a symmetric matrix the nuclear norm corresponds to the sum of the absolute eigenvalues,
and for a semidefinite matrix it simply corresponds to the trace of the matrix.

4.2.4 Matrix inequalities from Schur’s Lemma


Several quadratic or quadratic-over-linear matrix inequalities follow immediately from Schur’s
lemma. Suppose A : Rn 7→ Rm×p and B : Rn 7→ Rp×p are matrix-valued affine functions

A(x) = A0 + x1 A1 + · · · + xn An
B(y) = B0 + y1 B1 + · · · + yr Br

41
where Ai ∈ Rm×p , Bi ∈ S p , x ∈ Rn and y ∈ Rr . Then

A(x)T B(y)−1 A(x)  C

if and only if
A(x)T
 
C
 0.
A(x) B(y)

4.2.5 Nonnegative polynomials


We next consider characterizations of polynomials constrained to be nonnegative on the real
axis. To that end, consider a polynomial basis function

v(t) = 1, t, . . . , t2n .


It is then well-known that a polynomial f : R 7→ R of even degree 2n is nonnegative on the


entire real axis
f (t) := xT v(t) = x0 + x1 t + · · · + x2n t2n ≥ 0, ∀t (4.23)
if and only if it can be written as a sum of squared polynomials of degree n (or less), i.e., for
some q1 , q2 ∈ Rn+1

f (t) = (q1T u(t))2 + (q2T u(t))2 , u(t) := (1, t, . . . , tn ). (4.24)

It turns out that an equivalent characterization of {x | f (t) ≥ 0, ∀t} can be given in terms of a
semidefinite variable X,
n+1
xi = hX, Hi i, i = 0, . . . , 2n, X ∈ S+ . (4.25)

where Hin+1 ∈ R(n+1)×(n+1) are Hankel matrices



1, k + l = i
[Hi ]kl =
0, otherwise.

When there is no ambiguity, we drop the superscript on Hi . For example, for n = 2 we have
     
1 0 0 0 1 0 0 0 0
H0 =  0 0 0  , H1 =  1 0 0  , . . . H4 =  0 0 0  .
0 0 0 0 0 0 0 0 1

To verify that (4.23) and (4.25) are equivalent, we first note that
2n
X
u(t)u(t)T = Hi vi (t),
i=0

i.e.,
T
tn
   
1 1 1 t ...
 t  t   t t2 ... tn+1 
.. ..  = .. .. .. .
    
  ..
 .  .   . . . . 
tn tn tn tn+1 . . . t2n

42
Assume next that f (t) ≥ 0. Then from (4.24) we have
f (t) = (q1T u(t))2 + (q2T u(t))2
= hq1 q1T + q2 q2T , u(t)u(t)T i
2n
X
= hq1 q1T + q2 q2T , Hi ivi (t),
i=0

i.e., we have f (t) = xT v(t) with xi = hX, Hi i, X = (q1 q1T + q2 q2T )  0. Conversely, assume that
(4.25) holds. Then
2n
X 2n
X
f (t) = hHi , Xivi (t) = hX, Hi vi (t)i = hX, u(t)u(t)T i ≥ 0
i=0 i=0
since X  0. In summary, we can characterize the cone of nonnegative polynomials over the
real axis as
n n+1
K∞ = {x ∈ Rn+1 | xi = hX, Hi i, i = 0, . . . , 2n, X ∈ S+ }. (4.26)
Checking nonnegativity of a univariate polynomial thus corresponds to a semidefinite feasibility
problem.

Nonnegativity on a finite interval


As an extension we consider a basis function of degree n,
v(t) = (1, t, . . . , tn ).
A polynomial f (t) := xT v(t) is then nonnegative on a subinterval I = [a, b] ⊂ R if and only
f (t) can be written as a sum of weighted squares,
f (t) = w1 (t)(q1T u1 (t))2 + w2 (t)(q2T u2 (t))2
where wi (t) are nonnegative polynomial wi (t) ≥ 0, i = 1, 2, ∀t ∈ I. To describe the cone
n
Ka,b = {x ∈ Rn+1 | f (t) = xT v(t) ≥ 0, ∀t ∈ [a, b]}
we distinguish between polynomials of odd and even degree.
• Even degree. Let n = 2m and denote
u1 (t) = (1, t, . . . , tm ), u2 (t) = (1, t, . . . , tm−1 ).
We choose w1 (t) = 1 and w2 (t) = (t − a)(b − t) and note that w2 (t) ≥ 0 on [a, b]. Then
f (t) ≥ 0, ∀t ∈ [a, b] if and only if
f (t) = (q1T u1 (t))2 + w2 (t)(q2T u2 (t))2
for some q1 , q2 , and an equivalent semidefinite characterization can be found as
m−1
n
Ka,b = {x ∈ Rn+1 | xi = hX1 , Him i + hX2 , (a + b)Hi−1 − abHim−1 − Hi−2
m−1
i,
m m−1
i = 0, . . . , n, X1 ∈ S+ , X2 ∈ S+ }. (4.27)

• Odd degree. Let n = 2m + 1 and denote u(t) = (1, t, . . . , tm ). We choose w1 (t) = (t − a)


and w2 (t) = (b − t). We then have that f (t) = xT v(t) ≥ 0, ∀t ∈ [a, b] if and only if
f (t) = (t − a)(q1T u(t))2 + (b − t)(q2T u(t))2
for some q1 , q2 , and an equivalent semidefinite characterization can be found as
n
Ka,b = {x ∈ Rn+1 | xi = hX1 , Hi−1
m
− aHim i + hX2 , bHim − Hi−1
m
i,
m
i = 0, . . . , n, X1 , X2 ∈ S+ }. (4.28)

43
4.2.6 Hermitian matrices
Semidefinite optimization can be extended to complex-valued matrices. To that end, let Hn
denote the cone of Hermitian matrices of order n, i.e.,

X ∈ Hn ⇐⇒ X ∈ Cn×n , XH = X
n if and only
where superscript ’H’ denotes Hermitian (or complex) transposition. Then X ∈ H+
if

z H Xz = (<z − i=z)T (<X + i=X)(<z + i=z)


 T   
<z <X −=X <z
= ≥ 0, ∀z ∈ Cn .
=z =X <X =z

In other words,  
n <X −=X 2n
X∈ H+ ⇐⇒ ∈ S+ . (4.29)
=X <X
Note that (4.29) implies skew-symmetry of =X, i.e., =X = −=X T .

4.2.7 Nonnegative trigonometric polynomials


As a complex-valued variation of the sum-of-squares representation we consider trigonomet-
ric polynomials; optimization over cones of nonnegative trigonometric polynomials has several
important engineering applications. Consider a trigonometric polynomial evaluated on the com-
plex unit-circle
Xn
f (z) = x0 + 2<( xi z −i ), |z| = 1 (4.30)
i=1

parametrized by x ∈ R × Cn .We are interested in characterizing the cone of trigonometric


polynomials that are nonnegative on the angular interval [0, π],

Xn
n
K0,π = {x ∈ R × Cn | x0 + 2<( xi z −i ) ≥ 0, ∀z = ejt , t ∈ [0, π].}.
i=1

Consider a complex-valued basis function

v(z) = (1, z, . . . , z n ).

The Riesz-Fejer Theorem states that a trigonometric polynomial f (z) in (4.30) is nonnegative
(i.e., x ∈ K0,π ) if and only if for some q ∈ Cn+1

f (z) = |q H v(z)|2 . (4.31)

Analogously to Sec. 4.2.5 we have a semidefinite characterization of the sum-of-squares repre-


sentation, i.e., f (z) ≥ 0, ∀z = ejt if and only if
n+1
xi = hX, Ti i, i = 0, . . . , n, X ∈ H+ (4.32)

where Tin+1 ∈ R(n+1)×(n+1) are Toeplitz matrices



1, k − l = i
[Ti ]kl = , i = 0, . . . , n.
0, otherwise

44
When there is no ambiguity, we drop the superscript on Ti . For example, for n = 2 we have
     
1 0 0 0 0 0 0 0 0
T0 = 0 1
 0  , T1 =  1 0 0  , T2 =  0 0 0  .
0 0 1 0 1 0 1 0 0

To prove correctness of the semidefinite characterization, we first note that


n n
(Ti vi (z))H
X X
H
v(z)v(z) =I+ (Ti vi (z)) +
i=1 i=1

i.e.,
H
z −1 z −n
   
1 1 1 ...
 z  z   z 1 ... z 1−n 
.. .. = .. .. .. .
    
   ..
 .  .   . . . . 
zn zn z n z n−1 . . . 1
Next assume that (4.31) is satisfied. Then

f (z) = hqq H , v(z)v(z)H i


n
X n
X
= hqq H , Ii + hqq H , Ti vi (z)i + hqq H , TiT vi (z)i
i=1 i=1
n
X n
X
= hqq H , Ii + hqq H , Ti ivi (z) + hqq H , TiT ivi (z)
i=1 i=1
n
X
= x0 + 2<( xi vi (z))
i=1

with xi = hqq H , Ti i. Conversely, assume that (4.32) holds. Then


n
X n
X
f (z) = hX, Ii + hX, Ti ivi (z) + hX, TiT ivi (z) = hX, v(z)v(z)H i ≥ 0.
i=1 i=1

In other words, we have shown that


n n+1
K0,π = {x ∈ R × Cn | xi = hX, Ti i, i = 0, . . . , n, X ∈ H+ }. (4.33)

Nonnegativity on a subinterval
We next sketch a few useful extensions. An extension of the Riesz-Fejer Theorem states that a
trigonometric polynomial f (z) of degree n is nonnegative on I(a, b) = {z | z = ejt , t ∈ [a, b] ⊆
[0, π]} if and only if it can be written as a weighted sum of squared trigonometric polynomials

f (z) = |f1 (z)|2 + g(z)|f2 (z)|2 (4.34)

where f1 , f2 , g are trigonemetric polynomials of degree n, n − d and d, respectively, and g(z) ≥ 0


on I(a, b). For example g(z) = z + z −1 − 2 cos α is nonnegative on I(0, α), and it can be verified
that f (z) ≥ 0, ∀z ∈ I(0, α) if and only if

xi = hX1 , Tin+1 i + hX2 , Ti+1


n n
i + hX2 , Ti−1 i − 2 cos(α)hX2 , Tin i,

45
n+1 n , i.e.,
for X1 ∈ H+ , X2 ∈ H+

n
K0,α = {x ∈ R × Cn | xi = hX1 , Tin+1 i + hX2 , Ti+1
n n
i + hX2 , Ti−1 i
n+1
− 2 cos(α)hX2 , Tin i, X1 ∈ H+ n
, X2 ∈ H+ }. (4.35)

Similarly f (z) ≥ 0, ∀z ∈ I(α, π) if and only if

xi = hX1 , Tin+1 i + hX2 , Ti+1


n n
i + hX2 , Ti−1 i − 2 cos(α)hX2 , Tin i

i.e.,

n
Kα,π = {x ∈ R × Cn | xi = hX1 , Tin+1 i + hX2 , Ti+1
n n
i + hX2 , Ti−1 i
n+1
+ 2 cos(α)hX2 , Tin i, X1 ∈ H+ n
, X2 ∈ H+ }. (4.36)

4.3 Semidefinite optimization


In this section we give examples of semidefinite optimization problems using some of the results
from Sec. 4.2.

4.3.1 Nearest correlation matrix


We consider the set
n
S = {X ∈ S+ | Xii = 1, i = 1, . . . , n}
(shown in Fig. 4.1 for n = 3). For A ∈ S n the nearest correlation matrix is

X ? = arg min kA − XkF ,


X∈S

i.e., the projection of A onto the set S. To pose this as a conic optimization we define the linear
operator
√ √ √ √
svec(U ) = (U11 , 2U21 , . . . , 2Un1 , U22 , 2U32 , . . . , 2Un2 , . . . , Unn ),

which extracts and scales the lower-triangular part of U . We then get a conic formulation of
the nearest correlation problem exploiting symmetry of A − X,

minimize t
subject to kzk2 ≤t
svec(A − X) = z (4.37)
diag(X) =e
X  0.

This is an example of a problem with both conic quadratic and semidefinite constraints in primal
form, i.e., it matches the generic format (4.7).
We can add different constraints to the problem, for example a bound γ on the smallest
eigenvalue
X − γI  0.
A naive way to add the eigenvalue bound would be to introduce a new variable U ,
n
U = X − γI, U ∈ S+ ,

46
which would approximately double the number of variables and constraints in the model. Instead
we should just interpret U as a change of variables leading to a problem

minimize t
subject to kzk2 ≤t
svec(A − U − λI) = z
diag(U + λI) =e
U  0.

4.3.2 Extremal ellipsoids


Given a polytope we can find the largest ellipsoid contained in the polytope, or the smallest
ellipsoid containing the polytope (for certain representations of the polytope).

Maximal inscribed ellipsoid


Consider a polytype
S = {x ∈ Rn | aTi x ≤ bi , i = 1, . . . , m}.
The ellipsoid
E := {x | x = Cu + d, kuk ≤ 1}
is contained in S if and only if

max aTi (Cu + d) ≤ bi , i = 1, . . . , m


kuk2 ≤1

or equivalently, if and only if

kCai k2 + aTi d ≤ bi , i = 1, . . . , m.

Since Vol(E) ≈ det(C)1/n the maximum-volume inscribed ellipsoid is the solution to

maximize det(C)1/n
subject to kCai k2 + aTi d ≤ bi i = 1, . . . , m
C  0.

If we use the semidefinite characterization of a fractional power of the determinant of a positive


definite matrix, we get an equivalent conic formulation,

maximize t
kCai k2 + aTi d ≤ bi , i = 1, . . . , m
subject to 
C Z
0
Z T Diag(Z)
t ≤ (Z11 Z22 · · · Znn )1/n ,

where t ≤ (Z11 Z22 · · · Znn )1/n can be modeled as the intersection of rotated quadratic cones,
see Sec. 3.2.7.

47
Figure 4.2: Example of inner and outer ellipsoidal approximations.

Minimal enclosing ellipsoid


Next consider a polytope given as the convex hull of a set of points,

S 0 = conv{x1 , x2 , . . . , xm }, xi ∈ Rn .

The ellipsoid
E 0 := {x | kP (x − c)k2 ≤ 1}
has Vol(E 0 ) ≈ det(P )−1/n , so the minimum-volume enclosing ellipsoid is the solution to

minimize det(P )−1/n


subject to kP (xi − c)k2 ≤ 1, i = 1, . . . , m
P  0.

Alternatively, we can maximize det(P )1/n to get an equivalent formulation

maximize t
subject to kP
 (xi − c)k2 ≤ 1, i = 1, . . . , m
P Z
0
Z T Diag(Z)
t ≤ (Z11 Z22 · · · Znn )1/n .

In Fig. 4.2 we illustrate the inner and outer ellipsoidal approximations of a polytope character-
ized by 5 points in R2 .

4.3.3 Polynomial curve-fitting


Consider a univariate polynomial of degree n,

f (t) = x0 + x1 t + x2 t2 + · · · + xn tn .

Often we wish to fit such a polynomial to a given set of measurements or control points

{(t1 , y1 ), (t2 , y2 ), . . . , (tm , ym )},

i.e., we wish to determine coefficients xi , i = 0, . . . , n such that

f (tj ) ≈ yj , j = 1, . . . , m.

48
To that end, define the Vandermonde matrix

1 t1 t21 . . . tn1
 
 1 t2 t2 . . . tn2 
2
A= . . .. .. .
 
.
 . . . . . 
2
1 tm tm . . . tnm

We can then express the desired curve-fit compactly as

Ax ≈ y,

i.e., as a linear expression in the coefficients x. When the degree of the polynomial equals the
number measurements, n = m, the matrix A is square and non-singular (provided there are no
duplicate rows), so we can can solve
Ax = y
to find a polynomial that passes through all the control points (ti , yi ). Similarly, if n > m there
are infinitely many solutions satisfying the underdetermined system Ax = y. A typical choice
in that case is the least-norm solution

xln = arg min kxk2


Ax=y

which (assuming again there are no duplicate rows) has a simply solution

xln = AT (AAT )−1 y.

On the other hand, if n < m we generally cannot find a solution to the overdetermined system
Ax = y, and we typically resort to a least-squares solution

xls = arg min kAx − yk2

which has a simple solution


xls = (AT A)−1 AT y.
In the following we discuss how the semidefinite characterizations of nonnegative polynomials
(see Sec. 4.2.5) lead to more advanced and useful polynomial curve-fitting constraints.

• Nonnegativity. One possible constraint is nonnegativity on an interval,

f (t) := x0 + x1 t + · · · + xn tn ≥ 0, ∀t ∈ [a, b]
n , see (4.27).
with a semidefinite characterization embedded in x ∈ Ka,b

• Monotonicity. We can ensure monotonicity of f (t) by requiring that f 0 (t) ≥ 0 (or f 0 (t) ≤
0), i.e.,
n−1
(x1 , 2x2 , . . . , nxn ) ∈ Ka,b ,
or
n−1
−(x1 , 2x2 , . . . , nxn ) ∈ Ka,b ,
respectively.

49
3
f2 (t)
2

1
2 f4 (t)

f8 (t)

−2 −1 0 1 2 t

Figure 4.3: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through
{(−1, 1), (0, 0), (1, 1)}. The higher-degree polynomials are increasingly smoother on [−1, 1].

• Convexity or concavity. Convexity (or concavity) of f (t) corresponds to f 00 (t) ≥ 0 (or


f 00 (t) ≤ 0), i.e.,
n−2
(2x2 , 6x3 , . . . , (n − 1)nxn ) ∈ Ka,b ,
or
n−2
−(2x2 , 6x3 , . . . , (n − 1)nxn ) ∈ Ka,b ,
respectively.

As an example, we consider fitting a smooth polynomial

fn (t) = x0 + x1 t + · · · + xn tn

to the points {(−1, 1), (0, 0), (1, 1)}, where smoothness is implied by bounding |fn0 (t)|. More
specifically, we wish to solve the problem

minimize z
subject to |fn0 (t)| ≤ z, ∀t ∈ [−1, 1]
fn (−1) = 1, fn (0) = 0, fn (1) = 1,

or equivalently
minimize z
subject to z − fn0 (t) ≥ 0, ∀t ∈ [−1, 1]
fn0 (t) − z ≥ 0, ∀t ∈ [−1, 1]
fn (−1) = 1, fn (0) = 0, fn (1) = 1.
n to get a conic problem
Finally, we use the characterizations Ka,b

minimize z
n−1
subject to (z − x1 , −2x2 , . . . , −nxn ) ∈ K−1,1
n−1
(x1 − z, 2x2 , . . . , nxn ) ∈ K−1,1
fn (−1) = 1, fn (0) = 0, fn (1) = 1.

In Fig. 4.3 we show the graphs for the resulting polynomails of degree 2, 4 and 8, respectively.
The second degree polynomial is uniquely determined by the three constraints f2 (−1) = 1,

50
3
f2 (t)
2

f4 (t)
1
2
f8 (t)

−2 −1 0 1 2 t

Figure 4.4: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through
{(−1, 1), (0, 0), (1, 1)}. The polynomials all have positive second derivative (i.e., they are convex)
on [−1, 1] and the higher-degree polynomials are increasingly smoother on that interval.

f2 (0) = 0, f2 (1) = 1, i.e., f2 (t) = t2 . Also, we obviously have a lower bound on the largest
derivative maxt∈[−1,1] |fn0 (t)| ≥ 1. The computed fourth degree polynomial is given by

3 1
f4 (t) = t2 − t4
2 2
after rounding coefficients to rational numbers. Furthermore, the largest derivative is given by
√ √
f40 (1/ 2) = 2,

and f400 (t) < 0 on (1/ 2, 1] so, although not visibly clear, the polynomial is nonconvex on [−1, 1].
In Fig. 4.4 we show the graphs of the corresponding polynomials where we added a convexity
constraint fn00 (t) ≥ 0, i.e.,
n−2
(2x2 , 6x3 , . . . , (n − 1)nxn ) ∈ K−1,1 .
In this case, we get
6 1
f4 (t) = t2 − t4
5 5
8
and the largest derivative increases to 5 .

4.3.4 Filter design problems


An important signal processing application is filter design problems over trigonometric polyno-
mials
n
X
H(ω) = x0 + 2<( xk e−jωk )
k=1
n
X
= a0 + 2 (ak cos(ωk) + bk sin(ωk))
k=1

where ak := <(xk ), bk := =(xk ). If the function H(ω) is nonnegative we call it a transfer


function, and it describes how different harmonic components of a periodic discrete signal are
attenuated when a filter with transfer function H(ω) is applied to the signal.

51
1+δ
1−δ

H(ω)

t?
ωp ωs π
ω

Figure 4.5: Plot of lowpass filter transfer function.

We often wish a transfer function where H(ω) ≈ 1 for 0 ≤ ω ≤ ωp and H(ω) ≈ 0 for
ωs ≤ ω ≤ π for given constants ωp , ωs . One possible formulation for achieving this is

minimize t
subject to 0 ≤ H(ω) ∀ω ∈ [0, π]
1 − δ ≤ H(ω) ≤ 1 + δ ∀ω ∈ [0, ωp ]
H(ω) ≤ t ∀ω ∈ [ωs , π],

which corresponds to minimizing H(w) on the interval [ωs , π] while allowing H(w) to depart
from unity by a small amount δ on the interval [0, ωp ]. Using the results from Sec. 4.2.7 (in
particular (4.33), (4.35) and (4.36)), we can pose this as a conic optimization problem

minimize t
subject to x ∈ n
K0,π
(x0 − (1 − δ), x1:n ) ∈ n
K0,ω p
(4.38)
−(x0 − (1 + δ), x1:n ) ∈ n
K0,ω p
−(x0 − t, x1:n ) ∈ Kωns ,π ,

which is a semidefinite optimization problem. In Fig. 4.5 we show H(ω) obtained by solving
(4.38) for n = 10, δ = 0.05, ωp = π/4 and ωs = ωp + π/8.

4.3.5 Relaxations of binary optimization


Semidefinite optimization is also useful for computing bounds on difficult non-convex or combi-
natorial optimization problems. For example consider the binary linear optimization problem

minimize cT x
subject to Ax = b (4.39)
x ∈ {0, 1}n .

In general, problem (4.39) is a very difficult non-convex problem where we have to explore 2n
different objectives. Alternatively, we can use semidefinite optimization to get a lower bound
on the optimal solution with polynomial complexity. We first note that

xi ∈ {0, 1} ⇐⇒ x2i = xi ,

52
which is, in fact, equivalent to a rank constraint on a semidefinite variable,

X = xxT , diag(X) = x.

By relaxing the rank 1 constraint on X we get a semidefinite relaxation of (4.39),

minimize cT x
subject to Ax = b
(4.40)
diag(X) = x
X  xxT ,

where we note that


1 xT
 
T
X  xx ⇐⇒  0.
x X
Since (4.40) is a semidefinite optimization problem, it can be solved very efficiently. Suppose x?
is an optimal solution for (4.39); then (x? , x? (x? )T ) is also feasible for (4.40), but the feasible
set for (4.40) is larger than the feasible set for (4.39), so in general the optimal solution of (4.40)
serves as a lower bound. However, if the optimal solution X ? of (4.40) has rank 1 we have found
a solution to (4.39) also.
The semidefinite relaxation (4.40) can be derived more systematically. The Lagrangian
function of (4.39) is
n
X
L(x, y) = cT x + y T (Ax − b) + zi (x2i − xi )
i=1
= xT (c + AT y − z) + xT diag(z)x − bT y,

which is bounded below only if diag(z)  0 and (c + AT y − z) ∈ R(diag(z)). Assume for


simplicity that diag(z)  0. Then the minimizer of L(x, y) is

arg min L(x, y) = −(1/2)diag(z)−1 (c + AT y − z)


x

and we get a dual problem

maximize −(1/4)(c + AT y − z)T diag(z)−1 (c + AT y − z) − bT y


subject to z > 0.

From Schur’s Lemma (Lemma 4.1) this is equivalent to a standard semidefinite optimization
problem
maximize −y T
 b−t 1
(c + AT y − z)

t 2 (4.41)
subject to 1 T T  0.
2 (c + A y − z) diag(z)
This dual problem gives us a lower bound. To obtain an explicit relaxation of (4.39) we derive
the dual once more, but this time of problem (4.41). The Lagrangian function of (4.41) is
 
1 T y−z)
T
 T t (c+A
L(t, y, X) = −y b − t + h xν xX , 1 (c+AT y−z)T diag(z)
2
i
2

= t(ν − 1) + y (Ax − b) + z T (Diag(X) − x) + cT x


T

which is only bounded above if ν = 1, Ax = b, Diag(X) = x, i.e., we get the dual problem
(4.40), which is called a Lagrangian relaxation. From strong duality, problem (4.40) has the
same optimal value as problem (4.41).

53
v1

v2 v0

v6 v5

v3 v4

Figure 4.6: Undirected graph. The cut {v2 , v4 , v5 } has capacity 9.

We can tighten (or improve) the relaxation (4.40) by adding other constraints the cuts away
part of the feasible set, without excluding rank 1 solutions. By tightening the relaxation, we
reduce the duality gap between the optimal values of the original problem and the relxation.
For example, we can add the constraints
0 ≤ xi ≤ 1, i = 1, . . . , n
and
0 ≤ Xij ≤ 1, i = 1, . . . , n, j = 1, . . . , n.
A semidefinite matrix X with Xij ≥ 0 is called a doubly nonnegative matrix. In practice,
constraining a semidefinite matrix to be doubly nonnegative has a dramatic impact on the
solution time and memory requirements of an optimization problem, since we add n2 linear
inequality constraints. We refer to [LR05] for a recent survey on semidefinite relaxations for
integer programming.

4.3.6 Relaxations of boolean optimization


Similarly to Sec. 4.3.5 we can use semidefinite relaxations of boolean constraints x ∈ {−1, +1}n .
To that end, we note that
x ∈ {−1, +1}n ⇐⇒ X = xxT , diag(X) = e, (4.42)
with a semidefinite relaxation X  xxT of the rank-1 constraint.
As a (standard) example of a combinatorial problem with boolean constraints, let us consider
an undirected graph G described by a set of vertices V = {v1 , . . . , vn } and a set of edges
E = {(i, j) | i, j ∈ V, i 6= j}, and we wish to find the cut of maximum capacity. A cut C
partitions the nodes V into two disjoint sets S and T = V \ S, and the cut-set I is the set of
edges with one node in S and another in T , i.e.,
I = {(u, v) ∈ E | u ∈ S, v ∈ T }.
The capacity of a cut is then defined as the number of edges in the cut-set |I|; for example
Fig. 4.6 illustrates a cut {v2 , v4 , v5 } of capacity 9. To maximize the capacity of a cut we define
the symmetric adjacency matrix A ∈ S n ,

1, (vi , vj ) ∈ E
[A]ij =
0, otherwize

54
where n = |V |, and let 
+1, vi ∈ S
xi =
−1, vi ∈
/ S.
Suppose vi ∈ S. Then 1 − xi xj = 0 if vj ∈ S and 1 − xi xj = 2 if vj ∈
/ S, so we get an expression
of the capacity as
1X 1 1
c(x) = (1 − xi xj )Aij = eT Ae − xT Ax,
4 4 4
ij

and discarding the constant term eT Ae gives us the MAX-CUT problem


minimize xT Ax
(4.43)
subject to x ∈ {−1, +1}n ,
with a semidefinite relaxation
minimize hA, Xi
subject to diag(X) = e (4.44)
X  0.
The MAX-CUT relaxation can also be derived using Lagrangian duality, analogous to Sec. 4.3.5.

4.3.7 Converting a problem to standard form


From a modeling perspective it does not matter whether constraints are given as linear matrix
inequalities or as an intersection of affine hyperplanes; one formulation is easily converted to
other using extraneous variables and constraints, and this transformation is often done trans-
parently by optimization software. Nevertheless, it is instructive to study an explicit example
of how to carry out this transformation.
Optimization solvers require a problem to be posed either in primal form (4.7) or dual form
(4.8), but problems often have both linear equality constraints and linear matrix inequalities,
so one must be converted to the other. A linear matrix inequality

A0 + x1 A1 + · · · + xn An  0

where Ai ∈ S m is converted to a set of linear equality constraints using a slack variable

A0 + x1 A1 + · · · + xn An = S, S  0. (4.45)

Apart from introducing an additional semidefinite variable S ∈ S+ m , we also add m(m + 1)/2

equality constraints. On the other hand, a semidefinite variable X ∈ S+n can be rewritten as a

linear matrix inequality with n(n + 1)/2 scalar variables


n
X n X
X n
X= ei eTi xii + (ei eTj + ej eTi )xij  0. (4.46)
i=1 i=1 j=i+1

Such convertions are often necessary, but they can make the computational cost of the resulting
problem prohibitively high.
Obviously we should only use the the transformations (4.45) and (4.46) when necessary; if
we have a problem that is more naturally interpreted in either primal or dual form, we should
be careful to recognize that structure. For example, consider a problem
minimize eT z
subject to A + Diag(z) = X
X  0.

55
with the variables X ∈ S+ n and z ∈ Rn . This is a problem in primal form with n(n + 1)/2

equality constraints, but they are more naturally interpreted as a linear matrix inequality
X
A+ ei eTi zi  0.
i

The dual problem is


maximize −hA, Zi
subject to diag(Z) = e
Z  0,
n , which corresponds to the MAX-CUT relaxation. The dual problem has
in the variable Z ∈ S+
only n equality constraints, which is a vast improvement over the n(n + 1)/2 constraints in the
primal problem.

4.4 Bibliography
Much of the material in this chapter is based on the paper [VB96] and the books [BN01,
BKVH04]. The section on optimization over nonnegative polynomials is based on [Nes99,
Hac03]. We refer to [LR05] for a comprehensive survey on semidefinite optimization and relax-
ations in combinatorial optimization.

56
Chapter 5

Quadratic optimization

5.1 Introduction
In this chapter we discuss convex quadratic optimization. Our discussion is fairly brief compared
to the previous chapters for three reasons; (i ) quadratic optimization is a special case of conic
quadratic optimization, (ii ) for most problems it is actually more efficient for the solver to pose
the problem in conic form, and (iii ) duality theory (including infeasibility certificates) are much
simpler for conic quadratic optimization. Therefore, we generally recommend a conic quadratic
formulation.

5.2 Convex quadratic optimization


A standard (convex) quadratic optimization problem

minimize (1/2)xT Qx + cT x
subject to Ax = b (5.1)
x ≥ 0,

with Q ∈ S+ n is conceptually a simple extension of a linear optimization problem (2.7) with a

quadratic term xT Qx. Note the important requirement that Q is symmetric positive semidefi-
nite; otherwise the objective function is not convex.

5.2.1 Geometry of quadratic optimization


Quadratic optimization has a simple geometric interpretation; we minimize a convex quadratic
function over a polyhedron. In Fig. 5.1 we illustrate this interpretation for the same example
as in Fig. 2.5 extended with an additional quadratic term with

1 12
 
Q= 1 .
2 1

It is intuitively clear that the following different cases can occur:

• The optimal solution x? is at the boundary of the polyhedron (as shown in Fig. 5.1). At
x? one of the hyperplanes is tangential to an ellipsoidal level curve.

• The optimal solution is inside the polyhedron; this occurs if the unconstrained minimizer
arg minx (1/2)xT Qx + cT x = −Q† c (i.e., the center of the ellipsoidal level curves) is inside
the polyhedron.

57
a1
a2
x⋆

a0 a3

a4

Figure 5.1: Geometric interpretation of quadratic optimization. At the optimal point x? the
hyperplane {x | aT1 x = b} is tangential to an ellipsoidal level curve.

• If the polyhedron is unbounded in the opposite direction of c, and if the ellipsoid level
curves are degenerate in that direction (i.e., Qc = 0), then the problem is unbounded. If
Q ∈ S++n , then the problem cannot be unbounded.

• The problem is infeasible, i.e., {x | Ax = b, x ≥ 0} = ∅.

Possibly because of its simple geometric interpretation and similarity to linear optimization,
quadratic optimization has been more widely adopted by optimization practioners than conic
quadratic optimization.

5.2.2 Duality in quadratic optimization


The Lagrangian function for (5.1) is

L(x, y, s) = (1/2)xT Qx + xT (c − AT y − s) + bT y (5.2)

with Lagrange multipliers s ∈ Rn+ , and from ∇x L(x, y, s) = 0 we get the necessary first-order
optimality condition
Qx = AT y + s − c,
i.e., (AT y + s − c) ∈ R(Q). Then

arg min L(x, y, s) = Q† (AT y + s − c),


x

which can be substituted into (5.2) to give a dual function

b y − 21 (AT y + s − c)Q† (AT y + s − c), (AT y + s − c) ∈ R(Q)


 T
g(y, s) =
−∞ otherwise,

Thus we get a dual problem

maximize bT y − (1/2)(AT y + s − c)Q† (AT y + s − c)


subject to (AT y + s − c) ∈ R(Q) (5.3)
s ≥ 0.

58
Alternatively, we can characterize the constraint (AT y + s − c) ∈ R(Q) as Qw = AT y + s − c
with an extraneous variable w to get an equivalent dual problem
maximize bT y − (1/2)wT Qw
subject to Qw = AT y − c + s (5.4)
s ≥ 0.
Note from the optimality conditions that w = x, so (5.4) is an unusual dual problem in the
sense that it involves both primal and dual variables.
Weak duality follows from the inner product between x and the dual equality constraint,
xT Qx = bT y − cT x + xT s,
which shows that
(1/2)xT Qx + cT x − bT y − (1/2)xT Qx = xT s ≥ 0.


Furthermore, strong duality holds if a Slater constraint qualification is satisfied, i.e., if either
the primal or dual problem is strictly feasible (as in the previous chapters).

5.2.3 Infeasibility in quadratic optimization


Feasibility of the primal problem (5.1) is described by Lemma B.1; either problem (5.1) is
feasible or there exists a certificate y of primal infeasibility satisfying
AT y ≤ 0, bT y > 0.
In order to characterize feasibility of the dual problem, we need another pair of alternatives,
which follows directly from Lemma B.2 by considering
 
0 A
A := .
−Q
Lemma 5.1. Either there exists an x ≥ 0 satisfying
Ax = 0, Qx = 0, cT x < 0
or there exists (y, w) satisfying
Qw − AT y + c ≥ 0.

5.3 Quadratically constrained optimization


A general convex quadratically constrained quadratic optimization problem is
minimize (1/2)xT Q0 x + cT0 x + r0
(5.5)
subject to (1/2)xT Qi x + cTi x + ri ≤ 0, i = 1, . . . , m,
where Qi ∈ S+ n , ∀i. This corresponds to minimizing a convex quadratic function over an

intersection of ellipsoids (or affine halfspaces if some of Qi are zero). Note the important
requirement Qi  0, ∀i, so that the objective function is convex and the constraints
(1/2)xT Qi x + cTi x + ri ≤ 0
characterize convex sets. It is important to note that neither quadratic equalities
(1/2)xT Qi x + cTi x + ri = 0
or inequalities of the form
(1/2)xT Qi x + cTi x + ri ≥ 0
characterize convex sets, and therefore cannot be included.

59
5.3.1 Duality in quadratically constrained optimization
The Lagrangian function for (5.5) is
m
X
T
cT0 x λi (1/2)xT Qi x + cTi x + ri

L(x, λ) = (1/2)x Q0 x + + r0 +
i=1
T T
= (1/2)x Q(λ)x + c(λ) x + r(λ),

where
m
X m
X m
X
Q(λ) = Q0 + λi Qi , c(λ) = c0 + λi ci , r(λ) = r0 + λi ri .
i=1 i=1 i=1

From the Lagrangian we get the first-order optimality conditions

Q(λ)x = −c(λ), (5.6)

and similar to the case of quadratic optimization we get a dual problem

maximize −(1/2)c(λ)T Q(λ)† c(λ) + r(λ)


subject to c(λ) ∈ R (Q(λ)) (5.7)
λ ≥ 0,

or equivalently
maximize −(1/2)wT Q(λ)w + r(λ)
subject to Q(λ)w = −c(λ) (5.8)
λ ≥ 0,
Weak duality is easily verified; from (5.6) we have

xT Q(λ)x + c(λ)T x = 0

which implies

(1/2)xT Q0 x + cT0 x + r0 − −(1/2)xT Q(λ)x + r(λ) =



m
X
λi (1/2)xT Qi x + cTi x + ri ≥ 0,


i=1

and strong duality holds provided the quadratic inequalities are strictly feasible,

(1/2)xT Qi x + cTi x + ri < 0, i = 1, . . . , m

for some x ∈ Rn . Using a general version of Lemma 4.1 for singular matrices (see, e.g., [BV04,
p. 651]), we can also write (5.7) as an equivalent semidefinite optimization problem,

maximize t
2(r(λ) − t) c(λ)T

subject to 0 (5.9)
c(λ) Q(λ)
λ ≥ 0.

60
5.4 Separable quadratic optimization
Often a factorization of Q = F T F is either known or readily available, in which case we get an
alternative formulation of (5.1) as

minimize (1/2)z T z + cT x
subject to Ax = b
(5.10)
Fx = z
x ≥ 0.

The formulation (5.10) has several advantages; convexity of the problem is obvious (which
can occasionally be difficult to detect in finite precision arithmetic), and the structure and
sparsity of the separable reformulation is typically more efficient for an optimization solver.
In the following example we consider a related reformulation, which can result in a significant
reduction of complexity and solution time (see also Sec. 3.3.3).

Example 5.1. Consider a standard quadratic optimization problem (5.10) where

Q = diag(w) + F T F, w > 0.

We then get a separable problem

minimize (1/2)xT diag(w)x + z T z + cT x


subject to Ax = b
Fx = z
x ≥ 0.

In general we can reformulate (5.5) as

minimize (1/2)z0T z0 + cT0 x + r0


subject to (1/2)ziT zi + cTi x + ri ≤ 0, i = 1, . . . , m (5.11)
Fi x = zi , i = 0, . . . , m,

which is very similar to the conic formulation (3.26). If the formulation (5.11) is sparse compared
to (5.5) (i.e., if it is described using fewer non-zeros), then it generally results in a reduction
of computation time. On the other hand, if we form (5.11), we might as well use the conic
formulation (3.26) instead.

61
Chapter 6

Mixed integer optimization

6.1 Introduction
In the previous chapters we have considered different classes of convex problems with continuous
variables. In this chapter we consider a much wider range of problems by allowing integer
variables. In particular, we consider conic and convex quadratic optimization problems where
some of the variables are constrained to be integer valued. In principle, this allows us to model
a vastly larger range of problems, but in practice we may not able to solve the resulting integer
optimization problems within a reasonable amount of time.

6.2 Integer modeling


In this section we consider different building blocks for integer optimization, which we think of
as building blocks for general integer problems.

6.2.1 Implication of positivity


Often we have a real-valued variable x ∈ R satisfying 0 ≤ x < M for a known upper bound M ,
and we wish to model the implication

(x > 0) → (z = 1) (6.1)

where z ∈ {0, 1} is an indicator variable. Since 0 ≤ x < M we can rewrite (6.1) as a linear
inequality
x ≤ M z, (6.2)
where z ∈ {0, 1}. From the characterization (6.2) we see that that (6.1) is equivalent to

(z = 0) → (x = 0),

which is the well-known property of contraposition. In Sec. 6.3 we discuss such standard boolean
expressions in more details.

Example 6.1. Assume that production of a specific item i has a production cost of ui per
unit produced with an additional fixed charge of wi if we produce item i, i.e., a discontinuous
production cost 
wi + ui xi , xi > 0
ci (xi ) =
0, xi = 0.

62
ui

Total cost
wi

0 1 2 3 4 5
Quantities produced

Figure 6.1: Production cost with fixed charge wi .

shown in Fig. 6.1. If we let M denote an upper bound on the quantities we can produce, we
get an alternative expression for the production cost,

ci (xi ) = ui xi + wi zi , xi ≤ M zi , zi ∈ {0, 1},

where zi is a binary variable indicating whether item i is produced or not. We can then
minimize the total production cost under an affine constraint Ax = b as

minimize uT x + wT z
subject to Ax = b
xi ≤ M zi , i = 1, . . . , n
x≥0
z ∈ {0, 1}n ,

which is a linear mixed-integer optimization problem. Note that by minimizing the produc-
tion cost, we rule out the possibility that zi = 1 and xi = 0.

On the other hand, suppose x satisfies 0 < m ≤ x for a known lower bound m. We can then
model the implication
(z = 1) → (x > 0), (6.3)
(which is the reverse implication of (6.1)) as a linear inequality
x ≥ mz, (6.4)
where z ∈ {0, 1}.

6.2.2 Semi-continuous variables


We can also model semi-continuity of a variable x ∈ Rn ,
x ∈ 0 ∪ [a, b] (6.5)
where 0 < a ≤ b using a double inequality
az ≤ x ≤ bz (6.6)
where z ∈ {0, 1}.

63
6.2.3 Constraint satisfaction
Suppose we have an upper bound M on the affine expression aT x − b where x ∈ Rn . We can
then model the implication (z = 1) → (aT x ≤ b) as

aT x ≤ b + M (1 − z) (6.7)

where z ∈ {0, 1}, i.e., z indicates whether or not the constraint aT x ≤ b is satisfied. To
model the reverse implication (aT x ≤ b) → (z = 1) we equivalently consider the contraposition
(z = 0) → (aT x > b), which we can rewrite as

aT x > b + mz (6.8)

where m < aT x − b is a lower bound. In practice, we relax the strict inequality using a small
amount of slack, i.e.,
aT x ≥ b + (m − )z + . (6.9)
Collectively, we thus model (aT x ≤ b) ↔ (z = 1) as

aT x ≤ b + M (1 − z), aT x ≥ b + (m − )z + . (6.10)

In a similar fashion, we can model (z = 1) → (aT x ≥ b) as

aT x ≥ b + m(1 − z) (6.11)

and (z = 0) → (aT x < b) as


aT x ≤ b + (M + )z − , (6.12)
so we model (z = 1) ↔ (aT x ≥ b) as

aT x ≥ b + m(1 − z), aT x ≤ b + (M + )z − , (6.13)

using the alower bound m < aT x − b and upper bound M > aT x − b. We can combine (6.10)
and (6.10) to model indication of equality constraints aT x = b. To that end, we note that
(z = 1) → (aT x = b) is equivalent to both (6.7) and (6.11), i.e.,

aT x ≤ b + M (1 − z), aT x ≥ b + m(1 − z).

On the other hand, (z = 0) → (aT x 6= b) (or equivalently (z = 0) → (aT x > b) ∨ (aT x < b)) can
be written using (6.9) and (6.12) as

aT x ≥ b + (m − )z1 + , aT x ≤ b + (M + )z2 − , z1 + z2 − z ≤ 1,

where z1 + z2 − z ≤ 1 is equivalent to (z = 0) → (z1 = 0) ∨ (z2 = 0).

6.2.4 Disjunctive constraints


With disjunctive constraints we require that at least one out of a set of constraints is satisfied.
For example, we may require that

(aT1 x ≤ b1 ) ∨ (aT2 x ≤ b2 ) ∨ · · · ∨ (aTk x ≤ bk ).

To that end, let M > aTj x − bj , ∀j be a collective upper bound. Then we can model

(z = 1) → (aT1 x ≤ b1 ) ∨ (aT2 x ≤ b2 ) ∨ · · · ∨ (aTk x ≤ bk ) (6.14)

64
as
z = z1 + · · · + zk ≥ 1, aTj x ≤ bj + M (1 − zj ) ∀j, (6.15)
where zj ∈ {0, 1} are binary variables. To characterize the reverse implication (aT1 x ≤ b1 ) ∨
(aT2 x ≤ b2 ) ∨ · · · ∨ (aTk x ≤ bk ) → (z = 1) we consider the contrapositive
(z = 0) → (aT1 x > b1 ) ∧ (aT2 x > b2 ) ∧ · · · ∧ (aTk x > bk ), (6.16)
which we can write as
aTj x ≥ b + (m − )z + , j = 1, . . . , k, (6.17)
for a lower bound m < aTj x − bj , ∀j.

6.2.5 Pack constraints


For a pack constraint we require that at most one of the constraints are satisfied. Considering
(6.15) we can formulate this as
z1 + · · · + zk ≤ 1, aTj x ≤ bj + M (1 − zj ) ∀j. (6.18)

6.2.6 Partition constraints


For a partition constraint we require that exactly one of the constraints are satisfied. Considering
(6.15) we can formulate this as
z1 + · · · + zk = 1, aTj x ≤ bj + M (1 − zj ) ∀j. (6.19)

6.2.7 Continuous piecewise-linear functions


Consider the continuous univariate piecewise-linear non-convex function f : [α1 , α5 ] 7→ R shown
in Fig. 6.2. At the interval [αj , αj+1 ], j = 1, 2, 3, 4 we can describe the function as
f (x) = λj f (αj ) + λj+1 f (αj+1 )
for some λj , λj+1 ≥ 0 with λj + λj+1 = 1. If we add a constraint that only two (adjacent)
variables λj , λj+1 can be nonzero we can characterize f (x) over the entire interval [α1 , α5 ] as a
convex combination,
X4
f (x) = λj f (αj ).
j=1

The condition that only two adjacant variables can be nonzero is sometimes called an SOS2
constraint. If we introduce indicator variables zi for each pair of adjacent variables (λi , λi+1 ),
we can model it as
λ1 ≤ z 1 , λ2 ≤ z1 + z2 , λ3 ≤ z2 + z3 , λ4 ≤ z4 + z3 , λ5 ≤ z4
z1 + z2 + z3 + z4 = 1, z ∈ B4 ,
which satisfies (zj = 1) → λi = 0, i 6= {j, j + 1}. Collectively, we can then model the epigraph
f (x) ≤ t as
x = nj=1 λj αj ,
P Pn
j=1 λj f (αj ) ≤ t
λ 1 ≤ z1 , λ ≤ z + zj−1 , j = 2, . . . , n − 1, λn ≤ zn−1 , (6.20)
Pj n j Pn−1 n−1
λ ≥ 0, j=1 λj = 1, j=1 zj = 1, z∈B ,
for a piecewise-linear function f (x) with n terms. This approach is often called the lambda-
method. For the function in Fig. 6.2 we can reduce the number of integer variables by using a
Gray encoding

65
f (x)
α1 α2 α3 α4 α5 x

Figure 6.2: A univariate piecewise-linear non-convex function.

Figure 6.3: A multivariate continuous piecewise-linear non-convex function.

00 10 11 01
α1 α2 α3 α4 α5

of the intervals [αj , αj+1 ] and an indicator variable y ∈ B2 to represent the four different values
of Gray code. We can then describe the constraints on λ using only two indicator variables,

(y1 = 0) → λ3 =0
(y1 = 1) → λ1 = λ5 = 0
(y2 = 0) → λ4 = λ5 = 0
(y2 = 1) → λ1 = λ2 = 0,

which leads to a more efficient characterization of the epigraph f (x) ≤ t,

x = 5j=1 λj αj ,
P P5
j=1 λj f (αj ) ≤ t,
λ3 ≤ y1 , λ1 + λ5 ≤ (1 − y1 ), λ4 + λ5 ≤ y2 , λ1 + λ2 ≤ (1 − y2 ),
P5
λ ≥ 0, j=1 λj = 1, y1 + y2 = 1, y ∈ B2 ,

The lambda-method can also be used to model multivariate continuous piecewise-linear non-
convex functions, specified on a set of polyhedra Pk . For example, for the function shown in
Fig. 6.3 we can model the epigraph f (x) ≤ t as

x = 6i=1 λi vi ,
P P6
i=1 λi f (vi ) ≤ t,
λ1 ≤ z1 + z2 , λ2 ≤ z1 , λ3 ≤ z2 + z3 ,
(6.21)
λ4 ≤ z1 + z2 + z3 + z4 , λ5 ≤ z3 + z4 , λ6 ≤ z4 ,
P6 P4
λ ≥ 0, i=1 λi = 1, i=1 zi = 1, z ∈ B4 .

Note, for example, that z2 = 1 implies that λ2 = λ5 = λ6 = 0 and x = λ1 v1 + λ3 v3 + λ4 v4 .


For this function we can also reduce the number of indicator variables using a Gray encoding of

66
f (x)
α1 α2 α3 x

Figure 6.4: A univariate lower continuous piecewise-linear function.

decision variables for the different polyhedra. If we use a decision variable y ∈ B2 with a Gray
encoding
(P1 , P2 , P3 , P4 ) → (00, 10, 11, 01),
we have
(y1 = 0) → λ3 =0
(y1 = 1) → λ2 = λ6 = 0
(y2 = 0) → λ5 = λ6 = 0
(y2 = 1) → λ1 = λ2 = 0,
which gives the characterization of the epigraph

x = 6i=1 λi vi ,
P P6
i=1 λi f (vi ) ≤ t,
λ1 ≤ y1 , λ1 + λ5 ≤ y2 , λ4 + λ5 ≤ y3 , λ1 + λ2 ≤ (1 − y2 ), (6.22)
P6 P2
λ ≥ 0, i=1 λi = 1, i=1 yi = 1, y ∈ B2 .

For multidimensional piecewise linear functions, a reduction of the number of binary decision
variables is only possible when the polyhedral regions have a special topoly, for example a
“Union Jack” arrangement as shown in Fig. 6.3. We refer to the recent survey [VAN10] for
further references and comparisons between different mixed-integer formulations for piecewise
linear functions.

6.2.8 Lower semicontinuous piecewise-linear functions


The ideas in Sec. 6.2.7 can be applied to lower semicontinuous piecewise-linear functions as
well. For example, consider the function shown in Fig. 6.4. If we denote the one-sided limits
by f− (c) := limx↑c f (x) and f+ (c) := limx↓c f (x), respectively, the one-sided limits, then we can
describe the epigraph f (x) ≤ t for the function in Fig. 6.4 as

x = λ1 α1 + (λ2 + λ3 + λ4 )α2 + λ5 α5 ,
λ1 f (α1 ) + λ2 f− (α2 ) + λ3 f (α2 ) + λ4 f+ (α2 ) + λ5 f (α3 ) ≤ t,
(6.23)
λ1 + λ2 ≤ z1 , λ3 ≤ z2 , λ4 + λ5 ≤ z3 ,
P6 P3
λ ≥ 0, i=5 λi = 1, i=1 zi = 1, z ∈ B3 ,

where we a different decision variable for the intervals [α1 , α2 ), [α2 , α2 ], and (α2 , α3 ]. As a
special case this gives us an alternative characterization of fixed charge models considered in
Sec. 6.2.1.

67
6.3 Boolean primitives
In the previous sections we used a binary decision variables to indicate whether or not a real
valued affine expression was positive. In general, we can express a multitude of boolean ex-
pressions as linear inequality constraints involving integer variables. The basic operators from
boolean algebra are complement, conjuction and disjunction, which are used to construct more
complicated boolean expressions. In the following we let x and y denote binary variables with
values {0, 1}.

6.3.1 Complement
Logical complement ¬x is defined by the following truth table.
x ¬x
0 1
1 0
Thus in terms of an affine relationship,
¬x = 1 − x. (6.24)

6.3.2 Conjunction
Logical conjuction (x ∧ y) is defined by the following truth table.
x y x∧y
0 0 0
1 0 0
0 1 0
1 1 1
Thus z = (x ∧ y) is equivalent to
z + 1 ≥ x + y, x ≥ z, y ≥ z. (6.25)

6.3.3 Disjunction
Logical disjuction (x ∨ y) is defined by the following truth table.
x y x∨y
0 0 0
1 0 1
0 1 1
1 1 1
Thus (x ∨ y) is equivalent to
x + y ≥ 1. (6.26)
The following table shows basic properties that are easily verified using the truth tables for
complement, conjunction and disjunction.
Associativity x ∨ (y ∨ z) = (x ∨ y) ∨ z x ∧ (y ∧ z) = (x ∧ y) ∧ z
Commutativity x∨y =y∨x x∧y =y∧x
Distributivity x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z) x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z)
De Morgan ¬(x ∧ y) = ¬x ∨ ¬y ¬(x ∨ y) = ¬x ∧ ¬y
The last row in the table shows the properties called De Morgan’s laws, which shows that
conjugation and disjunction in a sense are dual operators.

68
6.3.4 Implication
Logical implication (x → y) is defined by the following truth table.
x y x→y
0 0 1
1 0 0
0 1 1
1 1 1
It is easy verify that that (x → y) is equivalent to both (¬x ∨ y) and
¬y → ¬x. (6.27)
In terms of indicator variables we have that x → y is equivalent to
x ≤ y. (6.28)

Example 6.2. Consider two real-valued variables 0 ≤ u, v ≤ M . We can model complementary


uv = 0 by introducing two indicator variables x and y,

(u > 0) → (x = 1), (v > 0) → (y = 1),

or equivalently
¬x → (u = 0), ¬y → (v = 0).
We then model (u = 0) ∨ (v = 0) as (¬x ∨ ¬y), which can be written as

u ≤ M x, v ≤ M y, (1 − x) + (1 − y) ≥ 1.

6.3.5 Exclusive disjunction


Logical exclusive disjunction (x ⊕ y) is defined by the following truth table.
x y x⊕y
0 0 0
1 0 1
0 1 1
1 1 0
Thus z = x ⊕ y is equivalent to
z ≤ x + y, z ≥ x − y, z ≥ −x + y, z ≤ 2 − x − y. (6.29)
It is easy to verify from the truth table that (x ⊕ y) is equivalent to (x ∨ y) ∧ ¬(x ∧ y).

6.4 Integer optimization


A general mixed integer conic optimization problem has the form
minimize cT x
subject to Ax = b
(6.30)
x∈C
xi ∈ Z, ∀i ∈ I,
where C is direct product of self-dual cones (discussed in Chaps. 2-4) and I ⊆ {1, . . . , n} denotes
the set of variables that are constrained to be integers.

69
6.5 Bibliography
This chapter is based on the books [NW88, Wil93]. Modeling of piecewise linear functions is
described in the survey paper [VAN10].

70
Chapter 7

Practical optimization

7.1 Introduction
In this chapter we discuss different practical aspects of creating optimization models.

7.2 The quality of a solution


In this section we will discuss how to validate an obtained solution. Assume we have a conic
model with continuous variables only and that the optimization software has reported an optimal
primal and dual solution. Given such a solution, we might ask how can we verify that the
reported solution is indeed optimal?
To that end, consider a simple the model

minimize −x2
subject to x1 + x2 ≤ 2,
√ (7.1)
x2 ≤ 2,
x1 , x2 ≥ 0,

where MOSEK might approximate the solution as

x1 = 0.0000000000000000 and x2 = 1.4142135623730951.

and therefore the approximate optimal objective value is

1.4142135623730951.

The true objective value is − 2, so the approximate objective value is wrong by the amount

1.4142135623730951 − 2.

Most likely this difference is irrelevant for all practical purposes. Nevertheless, in general a
solution obtained using floating point arithmetic is only an approximation. Most (if not all)
commercial optimization software uses double precision floating point arithmetic, implying that
about 16 digits are correct in the computations
√ performed internally by the software. This also
means that irrational numbers such as 2 and π can only be stored accurately within 16 digits.
A good practice after solving an optimization problem is to evaluate whether the reported
solution is indeed an optimal solution; at the very least this process should be caried out during
the initial phase of building a model, or if the reported solution is unexpected in some way. The

71
first step in that process is to check that is it feasible; in case of the small example (7.1) this
amounts to verifying that

x1 + x2 = 0.0000000000000000 + 1.4142135623730951 ≤ 2,

x2 = 1.4142135623730951 ≤ 2,
x1 = 0.0000000000000000 ≥ 0,
x2 = 1.4142135623730951 ≥ 0,

which demonstrates that the solution is feasible. However, in general the constraints might be
violated due to computations in finite precision. It can be difficult to access the significance of
a specific violation; for example a violation of one unit in

x1 + x2 ≤ 1

may or may not be more significant than a violation of one unit in

x1 + x2 ≤ 109 .

The right-hand side of 109 may itself be the result of a computation in finite precision, and my
only be known with, say 3 digits of accuracy. Therefore, a violation of 1 unit is not significant
since the true right-hand side could just as well be 1.001 · 109 .
Another question is how to verify that a feasible approximate solution is actually optimal,
which is answered by duality theory. From the discussion of duality in Sec. 2.3, the dual problem
to (7.1) is √
maximize 2y1 + 2y2
subject to y1 ≤ 0, (7.2)
y1 + y2 ≤ −1.
and weak duality mandates thats the primal objective value is greater than dual objective value
for any dual feasible solution, i.e.,

−x2 ≥ 2y1 + 2y2 .

Furthermore, if the bound is tight (i.e., if the inequality holds as an equality), then the primal
solution must be optimal. In other words,
√ if there exists a dual feasible solution such that the
dual objective value is identical to − 2, then we know (from duality) that the primal solution
is optimal. For example, the optimization software may report

y1 = 0 and y2 = −1,

which is easily verified to be a feasible solution with dual objective value − 2, proving opti-
mality of both the primal and solution.

7.3 Distance to a cone


Assume we want to solve the conic optimization problem

minimize cT x
subject to Ax = b, (7.3)
x∈K

where K is a convex cone. Let P (x, K) denotes the projection of x onto K. The distance measure

d(x, K) := min kx − yk2 = kx − P (x, K)k2 (7.4)


y∈K

72
then gives the distance from x to the nearest feasible point in K, and obviously

x∈K ⇔ d(x, K) = 0.

Thus d(x, K) is natural measure of the cone-feasibility of a point x; if d(x, K) > 0 then x 6∈ K
(i.e., x is infeasible) and the smaller the value, the closer x is to K. For x ∈ R, let

[x]+ := max{x, 0}.

The following lemmas then give the projections and distance measure for the different cones.

Lemma 7.1 (Distance to linear cones). The projection is

P (x, R+ ) = [x]+ (7.5)

and the distance is


d(x, R+ ) = [−x]+ . (7.6)

Lemma 7.2 (Distance to quadratic cones). Let x := (x1 , x2 ) ∈ R × Rn−1 . Then


(
n
x, x1 ≥ kx2 k,
P (x, Q ) = [x1 +kx2 k]+ (7.7)
2kx2 k (kx2 k, x2 ), x1 < kx2 k,

see [FLT02, Prop. 3.3], and


(
[kx2 k−x1 ]+
n √ , x1 ≥ −kx2:n k,
d(x, Q ) = 2 (7.8)
kxk, x1 < −kx2:n k.

Lemma 7.3 (Distance to rotated quadratic cones). Let x := (x1 , x2 , x3 ) ∈ R × R × Rn−2 .


Then
P (x, Qnr ) = Tn P (Tn x, Qn ) (7.9)
and
d(x, Qnr ) = d(Tn x, Qn ), (7.10)
where Tn is the orthogonal transformation defined in (3.3).

Lemma 7.4 (Distance to semidefinite cones). Let X = ni=1 λi qi qiT ∈ S n be an eigendecompo-


P
sition. Then
Xn
n
P (X, S+ )= [λi ]+ qi qiT , (7.11)
i=1

and
n
d(X, S+ ) = max[−λi ]+ . (7.12)
i

73
Chapter 8

Case studies

8.1 Introduction
This chapter discusses a number of case studies and examples with details that are beyound the
general character of the previous chapters.

8.2 A resource constrained production and inventory problem


The resource constrained production and inventory problem [Zie82] can be formulated as follows:
Pn
minimize (dj xj + ej /xj )
Pj=1
n
subject to j=1 rj xj ≤ b, (8.1)
xj ≥ 0, j = 1, . . . , n,

where n denotes the number of items to be produced, b denotes the amount of common resource,
and rj is the consumption of the limited resource to produce one unit of item j. The objective
function represents production and inventory costs. Let cpj denote the cost per unit of product
j and cij denote the rate of of holding costs, respectively. Further, let

cpj crj
dj =
2
so that
dj xj
is the average holding costs for product j. If Dj denotes the total demand or product j and coj
the ordering cost per order of product j then let

ej = coj Dj

and hence
ej coj Dj
=
xj xj
is the average ordering costs for product j. It is not always possible to produce a fractional
number of items. In such cases a constraint saying xj has to be integer valued should be added.
In summary, the problem finds the optimal batch size such that the inventory and ordering
cost are minimized while satisfying the constraints on the common resource. Given dj , ej ≥ 0

74
problem (8.1) is equivalent to the conic quadratic problem
Pn
minimize (dj xj + ej tj )
Pj=1
n
subject to j=1 r√ j xj ≤ b,
(8.2)
(tj , xj , 2) ∈ Q3r , j = 1, . . . , n,
xj ≥ 0, j = 1, . . . , n.

8.3 Markowitz portfolio optimization


8.3.1 Basic notions
Recall from Sec. 3.3.3 that the standard Markowitz portfolio optimization problem is

maximize µT x
subject to xT Σx ≤ γ
(8.3)
eT x = 1
x ≥ 0,

where µ ∈ Rn is a vector of expected returns for n different assets and Σ ∈ S+n denotes the

corresponding covariance matrix.


Problem (8.3) maximizes the expected return of investment given a budget constraint and
an upper bound γ on the allowed risk. Alternatively, we can minimize the risk given a lower
bound δ on the expected return of investment, i.e., we can solve a problem

minimize xT Σx
subject to µT x ≥ δ
(8.4)
eT x = 1
x ≥ 0.

Both problems (8.3) and (8.4) are equivalent in the sense that they describe the same Pareto-
optimal trade-off curve by varying δ and γ. In fact, given a solution to either problem, we
can easily find the parameters for the other problem, which results in the same solution. The
optimality conditions for (8.4) are

2Σx = νµ + λe + z, ν(µT x − δ) = 0, eT x = 1, x, z, ν ≥ 0, xT z = 0, (8.5)

where ν ∈ R+ , λ ∈ R, z ∈ Rn+ are dual variables, and the optimality conditions for (8.3) are

2αΣx = µ + ζe + v, α(xT Σx − γ) = 0, eT x = 1, α, v ≥ 0, xT v = 0, (8.6)

where α ∈ R+ , ζ ∈ R, v ∈ Rn+ are dual variables. Furthermore, from (8.5) we see that

xT Σx = (1/2)(νµT x + λeT x) = (1/2)(νδ + λ).

Thus, given a solution (x? , ν ? , λ? , z ? ) to (8.5), we see that for γ = (x? )T Σx? = (1/2)(ν ? δ + λ? ),

x = x? , α = 1, ζ = λ? , v = z?

is a solution to (8.6).
Next consider a factorization
Σ = V TV (8.7)

75
for some V ∈ Rk×n . We can then rewrite both problems (8.3) and (8.4) in conic quadratic form
as
maximize µT x
subject to (1/2, γ, V x) ∈ Qk+2
r
(8.8)
eT x = 1
x ≥ 0,
and
minimize t
subject to (1/2, t, V x) ∈ Qk+2
r
µT x ≥ δ (8.9)
eT x = 1
x ≥ 0,
respectively. In practice, a suitable factorization (8.7) is either readily available, or it easily
obtained. We mention typical choices below.
• Data-matrix. Σ might be specified directly in the form (8.7), where V is a normalized
data-matrix with k observations of market data (for example daily returns) of the n assets.
When the observation horizon k is shorter than n, which is typically the case, the conic
representation is both more parsimonious and has better numerical properties.
• Cholesky. Given Σ  0, we may factor it as (8.7) where V is an upper-triangular Cholesky
factor. In this case k = n, i.e., V ∈ Rn×n , so there is little difference in complexity between
the conic and quadratic formulations.
• Factor model. For a factor model we have
Σ = D + UT U
where D  0 is a diagonal matrix, and U ∈ Rk×n is a low-rank matrix (k  n). The
factor model directly gives us a factor
 1/2 
D
V = (8.10)
U
of dimensions (n + k) × n. The dimensions of V in (8.10) are larger than the dimensions
of the Cholesky factors of Σ, but V in (8.10) is very sparse, which usually results in a
significant reduction of solution time.
We can also combine the formulations (8.3) and (8.4) into an alternative problem that more
explicitly computes a trade-off between risk and return,
maximize µT x − λxT Σx
subject to eT x = 1 (8.11)
x ≥ 0,
with a trade-off parameter λ ≥ 0. One advantage of (8.11) is that the problem is always feasible.
This formulation also produces the same optimal trade-off curve, i.e., for a particular (feasible)
choice of δ (or γ), there exists a value of λ for which the different problems have the same
solution x. The conic formulation of (8.11) is
maximize µT x − λt
subject to (1/2, t, V x) ∈ Qk+2
r
(8.12)
eT x = 1
x ≥ 0.

76
8.3.2 Slippage costs
In a more realistic model we correct the expected return of investment by slippage terms Tj (xj ),

maximize µT x − j Tj (xj )
P

subject to xT Σx ≤ γ
(8.13)
eT x = 1
x ≥ 0,

which accounts for several different factors.

Market impact
The phenomenon that the price of an asset changes when large volumes are traded is called
market impact, and for large-volume trading it is the most important slippage contribution.
Empirically it has been demonstrated that market impact, i.e., the additional cost of buying or
selling an asset is well-approximated by the conic representable function
3/2
Tj (xj ) = αj xj

with coefficients αj ≥ 0 estimated from historical data. Using Lemma 3.1 Prop. (iv), we can
write the epigraph
3/2
xj ≤ tj
as
(sj , tj , xj ), (1/8, xj , sj ) ∈ Q3r ,
i.e, we get a conic formulation of (8.13)

maximize µT x − αT t
subject to (1/2, γ, V x) ∈ Qk+2 r
(sj , tj , xj ) ∈ Q3r
(8.14)
(1/8, xj , sj ) ∈ Q3r
eT x = 1
x ≥ 0.

Transaction costs
For smaller volume trading the main slippage contribution is transaction costs, which can mod-
eled as a linear function
Tj (xj ) = αk xj
or as a fixed-charge plus linear function

0, x = 0,
Tj (xj ) =
αj xj + βk , 0 < xj ≤ ρj ,

which can be modeled as using indicator variables similar to Example 6.1. To that end, let zj
be an indicator variable,
(zj = 0) → (xj = 0)
which we can express as
xj ≤ ρj zj , zj ∈ B. (8.15)

77
We thus get a conic integer problem

maximize µT x − eT t
subject to (1/2, γ, V x) ∈ Qk+2
r
αj xj + βj zj ≤ tj
xj ≤ ρj zj (8.16)
eT x = 1
x≥0
z ∈ Bn .

Since the objective function minimizes tj we exclude the possibility that xj = 0 and zj = 1.

8.3.3 Maximizing the Sharpe ratio


The Sharpe ratio defines an efficiency metric of a portfolio as the expected return per unit risk,
i.e.,
µT x − rf
S(x) = T ,
(x Σx)1/2
where rf denotes the return of a risk-free asset. By assumption, we have that µT x > rf
(i.e., there exists a risk-associated portfolio with a greater yield than the risk-free asset), so
maximizing the Sharpe ratio is equivalent to minimizing 1/S(x). In other words, we have the
following problem
T xk
minimize µkV T x−r
f
subject to eT x = 1
x ≥ 0.
We next introduce a variable transformation,

y = γµT x, γ ≥ 0.

Since a positive γ can be chosen arbitrarily and (µ − rf e)T x > 0, we have without loss of
generality that
(µ − rf e)T y = 1.
Thus, we obtain the following conic problem for maximizing the Sharpe ratio,

minimize t
subject to (t, V y) ∈ Qk+1
eT y = γ
(µ − rf e)T y = 1
γ ≥ 0
y ≥ 0,

and we recover x := y/γ.

78
Appendix A

Notation and definitions

Let R denote the set of real numbers, Z the set of integers, and B the boolean set {0, 1},
respectively. Rn denotes the set n dimensional vectors of real numbers (and similary for Zn
and Bn ); in most cases we denote such vectors by lower case letters, e.g., a ∈ Rn . A subscripted
value ai then refers to the ith entry in a, i.e.,

a = (a1 , a2 , . . . , an ).

All vectors considered in this manual are interpreted as column-vectors. For a, b ∈ Rn we use
the standard inner product,

ha, bi := a1 b1 + a2 b2 + · · · + an bn ,

which we also write as aT b := ha, bi. We let Rm×n denote the set of m × n matrices, and we
use upper case letters to represent them, e.g., B ∈ Rm×n organized as
 
b11 b12 . . . b1n
 b21 b22 . . . b2n 
B= . .. ..  ,
 
. . .
 . . . . 
bm1 bm2 . . . bmn

i.e, Bij = bij , and for matrices A, B ∈ Rm×n we use the inner product
m X
X n
hA, Bi := aij bij .
i=1 j=1

For a vector x ∈ Rn we have


 
x1 0 . . . 0
 0 x2 . . . 0 
Diag(x) :=  .. .. . . .. ,
 
 . . . . 
0 0 ... xn

i.e., a square matrix with x on the diagonal and zero elsewhere. Similarly, for a square matrix
X ∈ Rn×n we have
diag(X) := (x11 , x22 , . . . , xnn ).
A set S ⊆ Rn is convex if and only if for any x, y ∈ S we have

θx + (1 − θ)y ∈ S

79
epi(f )

Figure A.1: The shaded region is the epigraph of the function f (x) = − log(x).

for all θ ∈ [0, 1]. A function f : Rn 7→ R is convex if and only if dom(f ) is convex and

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).

for all θ ∈ [0, 1], where dom(f ) is the domain of the function f . A function f : Rn 7→ R is
concave if and only if −f is convex. For example, the function f : R 7→ R given by

f (x) = − log(x)

has dom(f ) = {x | x > 0} and is convex. The epigraph of a function f : Rn 7→ R is the set

epi(f ) := {(x, t) | x ∈ dom(f ), f (x) ≤ t},

shown in Fig. A.1. Thus, minimizing over the epigraph

minimize t
subject to f (x) ≤ t

is equivalent to minimizing f (x). Furthermore, f is convex if and only if epi(f ) is a convex set.
Similarly, the hypograph of a function f : Rn 7→ R is the set

hypo(f ) := {(x, t) | x ∈ dom(f ), f (x) ≥ t}.

Maximizing f is equivalent to maximizing over the hypograph

maximize t
subject to f (x) ≥ t,

and f is concave if and only if hypo(f ) is a convex set.

80
Appendix B

Duality in conic optimization

B.1 Introduction
Duality theory is a rich and powerful area of convex optimization, and central to understanding
sensitivity analysis and infeasibility issues in linear (and convex) optimization. Furthermore,
it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal
value for many difficult non-convex problem.

B.2 The Lagrangian


Duality theory is inherently connected with a (primal) optimization problem. Initially we
consider an optimization problem in a general form
minimize f0 (x)
subject to f (x) ≤ 0 (B.1)
h(x) = 0,
where f0 : Rn 7→ R is the objective funtion, f : Rn 7→ Rm encodes inequality constraints, and
h : Rn 7→ Rp encodes equality constraints. We will denote (B.1) the primal problem, and its
optimal value by p? . A general problem such as (B.1) is well-suited for our initial discussion
of duality (especially weak duality); for our discussion of strong duality in Sec. B.5 we only
consider a particular class of linear optimization problems.
The Lagrangian for (B.1) is a function L : Rn × Rm × Rp 7→ R that augments the objective
with a weigthed combination of all the constraints,
L(x, y, s) = f0 (x) + y T h(x) + sT f (x). (B.2)
The variables y ∈ Rp and s ∈ Rm
+ are called Lagrange multipliers or dual variables. Readers
familiar with the method of penalization for enforcing constraints may recognize the Lagrangian
as something similar; if we fix y and s large enough, then the minimizer of L(x) may approxi-
mately satisfy h(x) = 0 and f (x) ≤ 0. That is just a (very) simplifying interpretation, however.
The main property of the Lagrangian is that it provides a lower bound on feasible solutions
to (B.1); for a feasible x we have y T h(x) = 0 and sT f (x) ≤ 0 (since s ≥ 0), so obviously
L(x, y, s) ≤ f0 (x).

B.3 The dual function and problem


The Lagrange dual function is defined as the minimum of the Lagrangian over x,
g(y, s) = inf L(x, y, s),
x

81
and where the Lagrangian is unbounded below in x, we assign the dual function the value −∞.
For fixed (y, s) the dual function is an affine function, so we can think of g(y, s) as the pointwise
infimum of infinitely many such affine functions (recall our discussion in Sec. 2.2.1 of convex
piecewise-linear functions defined as the maximum of a set of affine function); thus g(y, s) is a
concave function, even when the problem (B.1) is not convex.
The Lagrange dual problem is then obtained by maximixing g(x, y), i.e., is it defined as

maximize g(y, s)
(B.3)
subject to s ≥ 0,

which is then always a concave optimization problem.

B.4 Weak duality


We already established that
L(x, y, s) ≤ f0 (x),
which is a global (and weak) inequality, i.e., it holds for all (x, y, s). The Lagrange dual

g(y, s) = inf L(x, y, s)


x

provides a bound on the optimal value p? = f0 (x? ) by minimizing over x. The best such lower
bound is given by the dual problem (i.e., d? = supy,s g(y, s)), and the relationship

p? ≥ d?

is called the weak duality property, and holds for any kind of optimization problem with a
well-defined Lagrangian function (including, e.g., nonconvex problems, but excluding general
integer problems).

B.5 Strong duality


Strong duality is one the most important concepts in convex optimization. The problem (B.3) is
generally not convex; for that we require that the objective function f0 (x) and the inequalities
fi (x), i = 1, . . . , m are convex functions, and that the equality constraints are affine functions.
Then a sufficient condition for strong duality is known as Slater’s constraint qualifications, which
basically assumes that the existence of a strictly feasible point exists for either the primal or
dual problem.
In this section we only show strong duality for the linear optimization problem (2.7). Since
all linear optimization can be rewritten into this form, this incurs no loss of generality. For linear
optimization strong duality is traditionally proved using ideas from the simplex algorithm; our
proof is based on a more recent approach using separating hyperplanes and Farkas’ lemma,
which are also important for the discussion of infeasibility in Sec. 2.4.
To prove strong duality we need two intermediate results. The first (which we state without
proof) is a special case of a more general separating hyperplane theorem.

Theorem B.1 (Separating hyperplane theorem). Let S by a closed convex set, and b ∈
/ S.
Then there exists a strictly separating hyperplane a such that

aT b > aT x, ∀x ∈ S.

82
S

b
aT x < γ
aT x > γ

Figure B.1: Separating hyperplane for a closed convex set S and a point b ∈
/ S.

The geometric interpretating of the (point) separating hyperplane theorem is given in


Fig. B.1, and indicates that a constructive proof follows from projecting b onto S. We next
show an important result, which makes use of the separating hyperplane theorem.

Lemma B.1 (Farkas’ lemma). Given A and b, exactly one of the two propositions is true:

1. ∃x : Ax = b, x ≥ 0,

2. ∃y : AT y ≤ 0, bT y > 0.

Proof. Both cannot be true, because then

bT y = xT AT y ≤ 0,

which is a contradiction. Next assume that 1. is not true, i.e., b ∈


/ S = {Ax | x ≥ 0}. The set S
is a closed and convex, so there exists a separating hyperplane y (separating b and S) such that

y T b > y T Ax, ∀x ≥ 0

But this implies that bT y > 0 and AT y ≤ 0.

We can now prove strong duality.

Proof. Assume that the dual optimal value problem (2.8) is attained (i.e., d? := bT y ? < ∞).
Since s? = c − AT y ? ≥ 0 there can then be no z satisfying

bT z > 0, AT z ≤ 0, (B.4)

because otherwise
bT (y ? + z) > d? , c − AT (y ? + z) ≥ 0
contradicting the optimality assumption on y ? . From Farkas’ lemma (B.4) is infeasible if and
only if there exists x such that
Ax = b, x ≥ 0.
For that particular choice of x we have

cT x = xT AT y = bT y ? ,

i.e., p? = d? . On the other hand, assume that −∞ < p? < ∞. From weak duality we then have
d? < ∞, and from Farkas’ lemma d? > −∞, i.e., the dual optimal value is attained. In other
words, if either the primal or dual optimal values are attained, we have p? = d? .

83
Farkas’ lemma is an example of theorems of alternatives, where exactly one of two systems
have a solution, and we can also formulate a dual variant.

Lemma B.2 (Dual variant of Farkas’ lemma). Given A and c, exactly one of the two proposi-
tions is true:

1. ∃x : Ax = 0, x ≥ 0, cT x < 0,

2. ∃y : c − AT y ≥ 0.

Proof. Both cannot be true, because then

cT x = xT (c − AT y) ≥ 0.

Next assume that 2. is not true, i.e., ∀y : c < AT y. But then for x ≥ 0 we have

cT x < xT AT y, ∀y

implying that cT x < 0 and Ax = 0.

84
Bibliography

[AG03] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Programming,


95(1):3–51, 2003.
[ART03] E. D. Andersen, C. Roos, and T. Terlaky. On implementing a primal-dual interior-
point method for conic quadratic optimization. Math. Programming, 95(2), February
2003.
[BKVH04] S. Boyd, S.J. Kim, L. Vandenberghe, and A. Hassibi. A Tutorial on Geometric
Programming. Technical report, ISL, Electrical Engineering Department, Stanford
University, Stanford, CA, 2004. Available at http://www.stanford.edu/~boyd/
gp_tutorial.html.
[BN01] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis,
Algorithms, and Engineering Applications. MPS/SIAM Series on Optimization.
SIAM, 2001.
[BT97] D. Bertsimas and J. N. Tsitsiklis. Introduction to linear optimization. Athena
Scientific, 1997.
[BV04] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press,
2004. http://www.stanford.edu/~boyd/cvxbook/.
[Chv83] V. Chvátal. Linear programming. W.H. Freeman and Company, 1983.
[FLT02] M. Fukushima, Z.Q. Luo, and P. Tseng. Smoothing Functions for Second-Order-
Cone Complementarity Problems. SIAM J. Optimization, 12(2):436–460, 2002.
[Hac03] Y. Hachez. Convex optimization over non-negative polynomials: structured algo-
rithms and applications. PhD thesis, Université Catholique De Lovain, 2003.
[LR05] M. Laurent and F. Rendl. Semidefinite programming and integer programming.
Handbooks in Operations Research and Management Science, 12:393–514, 2005.
[LVBL98] M. S. Lobo, L. Vanderberghe, S. Boyd, and H. Lebret. Applications of second-order
cone programming. Linear Algebra Appl., 284:193–228, November 1998.
[Nes99] Yu. Nesterov. Squared functional systems and optimization problems. In H. Frenk,
K. Roos, T. Terlaky, and S. Zhang, editors, High Performance Optimization. Kluwer
Academic Publishers, 1999.
[NW88] G. L. Nemhauser and L. A. Wolsey. Integer programming and combinatorial opti-
mization. John Wiley and Sons, New York, 1988.
[NW06] J. Nocedal and S. Wright. Numerical optimization. Springer Science, 2nd edition,
2006.

85
[PS98] C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and
complexity. Dover publications, 1998.

[VAN10] J. P. Vielma, S. Ahmed, and G. Nemhauser. Mixed-integer models for nonsepara-


ble piecewise-linear optimization: unifying framework and extensions. Operations
research, 58(2):303–315, 2010.

[VB96] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Rev., 38(1):49–95,


March 1996.

[Wil93] H. P. Williams. Model building in mathematical programming. John Wiley and Sons,
3rd edition, 1993.

[Zie82] H Ziegler. Solving certain singly constrained convex optimization problems in pro-
duction planning. Operations Research Letters, I(6), 1982.

86
Index

absolute values, 7, 17 eigenvalue decomposition, see spectral factor-


ization
basis pursuit problem, 7 eigenvalue optimization, 38–39
ellipsoidal set, 25, 26
complementarity gap, 12
epigraph, 6, 17, 80
concave function, 80
extremal ellipsoids, 47
cone
conic quadratic, 15, 33 Farkas certificate, 13, 30, 83, 84
rotated conic quadratic, 15, 33 feasible set, 3, 12
semidefinite, 32 filter design, 51
conic quadratic fixed charge model, 62
cone, see cone
duality, 27 geometric mean, 20
modeling, 16 Grammian matrices, 33, 34
optimization, 15–31
representable, 16 halfspace, 5
constraint satisfaction, 64 harmonic mean, see also geometric mean
convex Hermitian matrices, 44
function, 80 hyperplane, 4
set, 79 hypograph, 80
convex quadratic
ill-posed problem, 9
optimization, 57–59
implication of positivity, 62
duality, 58
indicator variable, 62
quadratically constrained, 25, 59
infeasibility
sets, 17, 59
conic quadratic optimization, 30
curve-fitting, 48
convex quadratic optimization, 59
determinant Farkas certificate, 12
roots of the, 40 linear optimization, 12
disjunctive constraints, 64 locating, 14
domain, 3, 80 semidefinite optimization, 38
dual inner product
cone, 28, 36 matrices, 79
function, 10
Lagrange multiplier, 10
norm, 8
Lagrangian function, 10, 81
problem
lambda method, 65
basis pursuit, 11
linear
conic quadratic, 28
matrix inequality, 38, 40, 55
linear optimization, 10
modeling, 5
semidefinite, 36
optimization, 3, 14
duality gap, 29, 36
basic notions, 3
duality, 10, 81

87
geometry of, 4 symmetric positive semidefinite, see semidefi-
nite
Markowitz portfolio optimization, 26, 75
case studies, 75 weak duality, 12, 28, 36, 82
mixed integer
optimization, 62–70
mixed integer nonlinear
case studies, 74

nearest correlation matrix, 46


nonnegative polynomials, 42
nonnegative trigonometric polynomials, 44
norm
`1 , 7
`2 , 17
`∞ , 8
dual, see dual norm
nuclear, 41

objective function, 3

pack constraints, 65
partition constraints, 65
penalization
method of, 9
piecewise-linear function
continuous (nonconvex), 65
convex, 5, 7, 8
polyhedron, 5
polynomial set, 18

rational power, 22, 40


relaxation
of binary optimization, 52
of boolean optimization, 54
robust optimization, 26

Schur complement, 35, 41, 53


second-order cone, 18
semicontinuity, 63
semidefinite
matrices, 17, 25, 32, 57
properties of, 34
modeling, 38
optimization, 32–56
duality, 36
separable quadratic optimization, 61
separating hyperplane, 82
singular value optimization, 40–41
spectral factorization, 24, 32, 35
strong duality, 12, 29, 37, 82

88

You might also like