Lecture 10: Second-Order Cone Program: Recap

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

EE523 Convex Optimization April 2, 2019

KAIST, Spring 2019

Lecture 10: Second-Order Cone Program

Recap
So far we have studied three instances of convex optimization problems: LP, Least-Squares and
QP. Last time we studied QP, investigating a special case of QP in which there is a closed-form
solution: the equality-constrained least-squares. In particular I emphasized the KKT equations
and KKT matrix that appear in the closed-form solution. I also mentioned that the KKT
equations are part of the KKT conditions that we will study in depth in Part II.

Today’s lecture
Today we are going to study the follow-up instance that includes the LP, Least-Squares and QP
as special cases: Second-Order Cone Program (SOCP). Specifically we are going to cover three
stuffs. First we will study what SOCP is and also verify that it indeed belongs to a convex
optimization problem. Next, we will demonstrate that it subsumes LP and QP. Finally, we will
discuss some applications in which SOCP can play a role and therefore one can see the power
of the problem.

What is Second-Order Cone Program (SOCP)?


The standard form of Second-Order Cone Program (SOCP) is as follows:
min wT x :
kAi x − bi k ≤ cTi x + ei , i = 1, . . . , m, (1)
Fx = g

where Ai ∈ Rmi ×d and F ∈ Rp×d . Here the complicated-looking inequality constraint is the
one that you have never seen thus far. Let us first verify that the problem belongs to a convex
problem. To this end, we need to show that the following function is convex (why?):
kAi x − bi k − cTi x − ei .
Notice that the latter term −cTi x − ei in the above is affine and also the inside term of the
Euclidean norm is affine. Since convexity preserves under addition and affine transformation, it
suffices to show that kxk is convex. In the one-dimensional case, the function kxk is “V”-shaped.
So it looks like a convex function. It turns out this is indeed the case. Please prove it rigorously
in PS.

We we call SOCP?
As you may guess, the naming comes from the never-seen inequality constraint:
kAi x − bi k ≤ cTi x + ei . (2)
To see the rationale behind the naming, let us consider a very simple setting which gives a hint:
Ai = I, bi = 0, ci = 0, ei = t. In this case, the constraint is simplified as: kxk ≤ t. Now consider
a set of (x, t) which satisfies the constraint (2).
C := {(x, t) : kxk ≤ t}. (3)

1
CN10_1
Take a look at the shape of the set, illustrated in Fig. 1. It looks like an ice-cream cone. Also the
norm that appears in the set is the Euclidean norm, which is the `2 norm. Hence, it is called the
second-order cone (SOC). Another names are quadratic cone, ice-cream cone or Lorentz cone.

Figure 1: Picture of the second-order cone: {(x, t) : kxk ≤ t}.

Since the constraint (3) is a special case of the original constraint (2), you may still wonder why
the problem (1) is called SOCP. Here a key observation is that the set of affine transformation
of x is also a SOC:
(Ai x − bi , cTi x + ei ) ∈ C.
And the convexity preserves under affine transformation. Hence, one can also view the original
constraint (2) as a SOC upto affine transformation (which does not alter the convexity property).
So we can interpret the problem (1) as a SOC-constraint-based Program, which can be simply
called SOCP.

Subsumes QP: QP → SOCP


Now let us show that the problem (1) includes LP and QP as special cases. One can immediately
see the inclusion of LP by setting:
Ai = 0 ∀i = 1, . . . , m,
in the original problem.
The proof of the inclusion of QP is slightly involved. To this end, we will show that QP can be
cast into the form of SOCP. So let us start with the standard form of QP:
min wT x + xT Qx :
Ax − b ≤ 0 (4)
Cx − e = 0

where Q ∈ Rd×d  0, A ∈ Rm×d and C ∈ Rp×d . Here what is annoying is the quadratic term
xT Qx that appears in the objective function. In an effort to translate the term into an affine
term, let us first manipulate the matrix Q via singular value decomposition (SVD)1 . Since Q is
symmetric, one can always apply SVD to get:
Q = U ΛU T
1
If you are not familiar with SVD, then please take a look at the appendix of the linear algebra book uploaded
on the course website: “Append SVD others.pdf”.

2
where U ∈ Rd×d is a unitary matrix (i.e., U T U = I) and Λ is a diagonal matrix: Λ =
diag(λ1 , . . . , λd ) where λi indicates the ith eigenvalue of Q. Now we define y := Q1/2 x where

Q1/2 := U Λ1/2 U T
√ √
where Λ1/2 := diag( λ1 , . . . , λd ). This yields: y T y = xT Qx. Applying this to (4), we then
get:

min wT x + y T y :
x, y

Ax − b ≤ 0
(5)
Cx − e = 0
y = Q1/2 x.

While the newly introduced constraint y = Q1/2 x is okay as it is affine, the quadratic term y T y
in the objective is still problematic. To translate this into an affine term, we introduce a new
variable, say t, such that

t ≥ y T y. (6)

Here the key observation is that minimizing t is equivalent to minimizing y T y, i.e., minimizing
t, one can minimize y T y and vice versa. Hence, we can replace y T y in the objective with t while
adding the constraint (6), thus obtaining:

min wT x + t :
x, y, t

Ax − b ≤ 0
Cx − e = 0 (7)
y = Q1/2 x
y T y ≤ t.

Is y T y ≤ t a SOC constraint?
T
√ y y ≤ t is a√SOC? At first glance, it
Now the question is: Is the newly introduced constraint
looks not the case, as it can be represented as: kyk ≤ t. Notice that t is not affine in t. But
it turns out it is the case with some modification. To see this, let us first manipulate it into:
4kyk2 ≤ 4t. This is then equivalent to:

4kyk2 + (t − 1)2 ≤ (t + 1)2 .

Observe in the above that we have square exponents in every term. So one can represent it as:
  2

2y ≤ (t + 1)2

t−1

Now dropping the square exponents in both sides and then using the definition (3) of SOC, we
see that the set of ([2y; t − 1], t + 1) (affine transformation of the variables) is a SOC:
  
2y
, t + 1 ∈ C.
t−1

3
Hence, we obtain the following SOCP (from QP):

min wT x + t :
x, y, t

Ax − b ≤ 0
Cx − e = 0 (8)
1/2
y=Q x (affine)
 

2y
≤ t + 1 (SOC).
t−1

Applications
Now why do we care about SOCP? The obvious reason is that it has many applications as LP
and Least-Squares. In particular, it plays an important role in two specific settings.
The first setting is the one which represents very practical scenarios in which there is uncertainty
in data and/or parameters. For example, in the legitimate-vs-spam email classification problem,
data points can be viewed as sort of random quantities. It turns out that taking this probabilistic
aspect into account, one can modify the original LP (that we formulated in Lecture 5) into a
SOCP. In fact, such a modified LP is categorized into a broader class of LPs, called Robust LP,
which covers all the probabilistic variations of LPs.
Another example is the least-squares problem with uncertain matrix A. For instance, in the
CT application that we investigated in Lecture 8, the matrix A contains the length information
of a beam trajectory for a small grid. Since the length is a measured quantity, it may contain
some measurement noise, thus yielding some uncertainty. It turns out that taking this aspect,
one can modify the original Least-Squares into a SOCP.
The second setting is the case in which optimization problems are formulated with Euclidean
norms. Examples include: (i) distance-minimizing location planning in which one wants to
locate a warehouse so as to serve many service locations while minimizing the transportation
cost, which is usually proportional to the Euclidean distance; (ii) image denoising in which
one wishes to remove the noise effect on the edges of an image while incorporating a sort of
regularization term which involves an Euclidean norm; (iii) penalized Least-Squares in which
one wants to minimize a noise effect while adding an Euclidean-norm-associated term (in the
object function) which penalizes the noise effect.
Here we cannot cover all of the above applications due to the interest of time. Instead we are
going to cover one of the most important applications in this lecture: Robust LP. Some of the
other applications will be dealt with in PS.

Example of Robust LP: Chance Program (CP)


The application that I would like to emphasize on is a prominent example of Robust LP, called
the Chance Program (CP). Just for illustration purpose, let us consider a very simple LP in
which there is only one inequality:

min wT x : aT x ≤ b. (9)

In the legitimate-vs-spam email classification problem, here a, marked in red, indicates a data
point. As mentioned earlier, such data point can be viewed as a random quantity. So in this
case, a can be modeled as a random vector.
In an effort to deal with uncertainty that is induced by such random vector, one may want to

4
instead consider a probabilistic constraint which can be formulated as:

Pr aT x ≤ b ≥ 1 − 

(10)

for some small  > 0. Now then the question is: How to compute Pr(aT x ≤ b)?

Gaussian approximation for Pr(aT x ≤ b)


Actually the exact computation is almost impossible in reality as we have no idea of the proba-
bility distribution that the vector a is subject to. One way to go is to instead approximate the
computation assuming that the vector a follows a well-known distribution in which the proba-
bility calculation is tractable. One such well-known distribution is the Gaussian distribution.
In fact, the Gaussian distribution is not only computation-tractable, but it also well represents
many practical settings which include the legitimate-vs-spam email classification problem.
So here we will use the Gaussian distribution to approximate the probability computation.
Specifically assume that a follows the Gaussian distribution with:

a ∼ N (ā, K) (11)

where ā indicates the mean E[a] and K denotes the covariance matrix of a, defined as K :=
E[(a − ā)(a − ā)T ]. Here the symbol “∼” means “is distributed according to”, and N denotes
the Gaussian distribution.
Now consider a linear combination of a, aT x, which is of our interest. Under the Gaussian
assumption (11), aT x is also Gaussian:

aT x ∼ N (āT x, xT Kx).

Why? Check in PS. Using this, we can then compute:


 T
a x − āT x b − āT x

T

Pr a x ≤ b = Pr √ ≤√
xT Kx xT Kx
(12)
b − āT x
 
=Φ √
xT Kx
where Φ(·) indicates the cumulative distribution function (CDF) of the normal Gaussian dis-
tribution (with zero mean and variable 1): Φ(x) := Pr(t ≤ x) where t ∼ N (0, 1); also see
Fig. 2.
Applying (12) into (10), we get:

b − āT x
 
Φ √ ≥ 1 − .
xT Kx
Since Φ(·) is a non-decreasing one-to-one function (again see Fig. 2), we can invert the function
to get:

b − āT x
√ ≥ Φ−1 (1 − ),
T
x Kx
which in turns yields:

Φ−1 (1 − ) xT Kx ≤ b − āT x. (13)

CP → SOCP

5
CN10_2

Figure 2: Cumulative density function (CDF) Φ(x) of the normal Gaussian distribution N (0, 1).

Applying (13) to (9), we obtain:



min wT x : Φ−1 (1 − ) xT Kx ≤ b − āT x. (14)

Now the question is: Is the constraint in the above a SOC? To figure this out, let us simplify
the constraint by letting y := K 1/2 x. We then get:

b − āT x
kyk ≤ .
Φ−1 (1 − )

Notice that the set of affine transformation of the optimization variables is a SOC:

b − āT x
 
y, −1 ∈ C.
Φ (1 − )

Hence, we get the following SOCP (from CP):

min wT x :
x,y

y = K 1/2 x (affine) (15)


b− āT x
kyk ≤ (SOC).
Φ−1 (1 − )

How to solve SOCP?


Like QP, there is no closed-form solution for SOCP in general. So as mentioned earlier, we
should rely on strong duality to gain some insights into algorithms. Hence, we will study in
depth in Part II.

Look ahead
So far, we have studied four instances of convex optimization problems: LP, Least-Squares, QP
and SOCP. Next time, we will study the final (from this course’s point of view) instance that
subsume all of the prior problems as special cases: Semi-Definite Program (SDP).

6
CN10_3
Convex Optimization

Least-squares Linear Program


1800s (LP) 1939
Quadratic Program (QP) 1956

Second-Order Cone Program


(SOCP) 1994
Semi-Definite Program
(SDP) 1994
……

Figure 3: Hierarchy of convex optimization problems

You might also like