LIPIcs SEA 2022 21 PDF
LIPIcs SEA 2022 21 PDF
LIPIcs SEA 2022 21 PDF
Linear Programming
Leo Liberti #Ñ
LIX CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France
Benedetto Manca #
Department of Mathematics and Informatics, University of Cagliari, Italy
Pierre-Louis Poirion #
RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Abstract
The use of random projections in mathematical programming allows standard solution algorithms
to solve instances of much larger sizes, at least approximately. Approximation results have been
derived in the relevant literature for many specific problems, as well as for several mathematical
programming subclasses. Despite the theoretical developments, it is not always clear that random
projections are actually useful in solving mathematical programs in practice. In this paper we
provide a computational assessment of the application of random projections to linear programming.
Funding Benedetto Manca: Partly supported by grant STAGE, Fondazione Sardegna 2018.
1 Introduction
This paper is about applying Random Projections (RP) to Linear Programming (LP)
formulations. RPs are dimensional reduction operators that usually apply to data. The
point of applying RPs to LPs is to obtain an approximate solution of the high-dimensional
formulation by solving a related lower-dimensional one. The main goal of this paper is to
discuss the pros and cons of this technique from a computational (practical) point of view.
The JLL directly applies to all problems involving the Euclidean distance between points
in a Euclidean space of high dimension, e.g. the design of an efficient nearest-neighbor data
structure (i.e. given X ⊂ Rm and q ∈ Rm quickly return x ∈ X closest to q) [12].
More in general, the JLL shows that RPs can transform the point set X to a lower
dimensional set T X such that X and T X are “approximately congruent”: the pairwise
distances in X are approximately the same (multiplicatively) as the corresponding pairwise
distances in T X, even if X has m dimensions and T X only k (proportional to ϵ−2 ln |X|).
Since “approximately congruence” means “almost the same, aside from translations, rotations,
and reflections”, it is reasonable to hope that RPs might apply to other constructs than just
sets of points, and still deliver a theoretically quantifiable approximation. In this paper we
consider LP.
These issues pose nontrivial theoretical challenges, and the proof techniques vary consid-
erably depending on the MP subclass being considered. The first issue mentioned above is
addressed by applying RPs to the problem parameters (the input data); in the LP case, we
project the linear system Ax = b. We speak of the original formulation P and the projected
formulation T P . This yields a fourth issue: the solution of T P may be infeasible in P : in
such cases, a solution retrieval phase is necessary in order to construct a feasible solution of
P from that of T P .
The second and third issues are addressed in [25], leading to statements similar to the
JLL, but concerning approximate LP feasibility and optimality. If E(P, T ) is a statement
about the feasibility or optimality error between the LP formulations P and T P , the general
structure of these results is similar to the probabilistic version of the JLL:
Prob E(P, T ) ≥ 1 − δ, (2)
where δ usually depends on ϵ, k and possibly even the solution of P . We shall recall the
statements of these results more precisely in Sect. 2.
LP structure. Other works in applying RPs to different types of MP subclasses are [24]
(quadratic programs with a ball constraint), [4] (general quadratic programs), [18] (conic
programs including second-order cone and semidefinite programs).
SEA 2022
21:4 Practical Random Projections for LP
2 3
where δ = 4ne−C(ϵ −ϵ )k , ϵ = O(γ/(θ2 ∥y ∗ ∥2 )), and y ∗ is an optimal dual solution of (LP).
Like other approximate optimality results in this field, some quantities in the probabilistic
statement depend on the norm of a dual optimal solution. This adds a further difficulty to
computational evaluations, since they cannot be computed prior to solving the problem.
Let x̄ be a projected solution, i.e. an optimal solution of the projected formulation. In [25,
Prop. 3], it is proved that x̄ is feasible in the original formulation with zero probability. We
therefore need to provide a solution retrieval method. A couple were proposed in [25], but
the one found in [18, Eq. (6)] comes with an approximation guarantee and a good practical
performance. The retrieved solution x̃ is defined as the projection of x̄ on the affine subspace
Ax = b, and computed using the pseudoinverse:
The fact that we only project on Ax = b without enforcing x ≥ 0 is necessary, since otherwise
we would need to solve the whole high-dimensional LP. On the other hand, it causes potential
infeasibility errors w.r.t. x ≥ 0. A probabilistic bound on this error is cast in general terms
for conic programs in [25]. Let κ(A) be the condition number of A; applying [25, Thm. 4.4] to
LP, we obtain the following result, which bound the (negativity of) the smallest component
of x̃ in terms of that of x̄.
The proof is based on an improvement of [18, Eq. (7)] based on computing the Gaussian
width and diameter of {x ≥ 0 | ⟨1, x⟩ ≤ 1}. As a corollary, we also have the following
result about the difference between objective function values of the retrieved and projected
solutions.
▶ Corollary 2. Let f˜ be the objective function value of the retrieved solution x̃, and f¯ be the
optimal objective function value of the projected formulation. There is a universal constant
C2 such that, for any u ≥ 0, we have:
2
Prob |f˜ − f¯| ≤ ϵθκ(A)∥c∥2 (C2 + u 2/ ln(n)) ≥ 1 − 2e−u .
p
All componentwise sampled sub-Gaussian distributions [7] can be used to ensure the results
cited in this paper. Some sparse variants also exist, along the lines of [1, 15]. We use the
sparse
p RPs described in [4, §5.1]. For a given density σ ∈ (0, 1) and standard deviation
p with probability σ we sample a component of the k × m RP T from the distribution
1/(kσ),
N(0, 1/(kσ)), and set it to zero with probability 1 − σ. In our computational study, we set
σ = dA /2, where dA is the density of the constraint matrix A.
3.2 LP structures
We consider randomly generated LPs of the following four classes: Max Flow problems [8],
Diet problems [6], Quantile Regression problems [16], and Basis Pursuit problems from
sparse coding [3]. This choice yields a set of LP problems going from extremely sparse (Max
Flow) to completely dense (Basis Pursuit), with the Diet and Quantile Regression
providing cases of various intermediate densities. These four test cases arise from a diverse
range of application settings: combinatorial optimization, continuous optimization, statistics,
data science.
The Max Flow formulation is defined on a weighted digraph G = (N, A, u) with a source
node s ∈ N , a target node t ∈ N (with s ̸= t) and u : A → R+ , as follows:
P P
max xsi − xis
|A|
x∈R+ i∈N ∖{s} i∈N ∖{s}
(s,i)∈A (i,s)∈A
P P
∀i ∈ N ∖ {s, t} xij = xji (MF)
j∈N j∈N
(i,j)∈A (j,i)∈A
∀(i, j) ∈ A 0 ≤ xij ≤ uij .
We generate random weighted digraphs G = (N, A, u) with the property that a single
(randomly chosen) node s is connected (through paths) to all of the other nodes: we first
generate a random tree on N ∖ {t}, orient it so that s is the root, add a node t with the same
indegree as the outdegree of s, and then proceed to enrich this digraph with arcs generated at
random using the Erdős-Renyi model with probability 0.05. We then generate the capacities
u uniformly from [0, 1]. Finally, we compute the digraph’s incidence matrix A, which has
m = |N | − 2 rows and |A| columns. Instances are feasible because the graph always has a
path from s to t by construction, and the zero flow is always feasible.
Although (MF) is an LP, it is not in standard form, because of the upper bounding
constraints x ≤ u. But, by [25, §4.2], we can devise a block-structured RP matrix that only
projects the equations Ax = b, leaving the inequalities x ≤ u alone. In this case, A is a flow
matrix with two nonzeros per column, one set to 1 the other to −1, aside from columns
referring to source and target nodes s, t that only have one nonzero; and b = 0. The density
2|A|−2
of A is dA = (m−2)|A| ≈ 2/m.
For our random (MF) instances, θ = |A| is a valid upper bound to (i,j)∈A x∗ij , since
P
SEA 2022
21:6 Practical Random Projections for LP
c⊤ q
)
min
mn
q∈R+ (DP)
Dq ≥ b.
We sample c, D, b uniformly componentwise in [0, 1], and set the density of D to dD = 0.5.
Instances are feasible because one can always buy enough food to satisfy all nutrient
requirements. If ∥Di ∥0 = |nonzeros of row Di |, then q̂ = max(bi /(∥Di ∥0 Dij )) | j ≤ n is a
i≤m
feasible solution.
Again, (DP) is not in standard form, but the transformation is immediate using slack
variables ri ≥ 0 for i ≤ m. We let A = (D | −I), where I is m × m. The decision variable
vector is x = (q, r). The density of A is dA = (dD mn + m)/(m(n + m)) = (dD n + 1)/(n + m).
For (DP), the upper bounding solution q̂ yields slack values r̂i = Di q̂ − bi for all i ≤ m,
where Di is the i-th row of D. So we let θ = j q̂j + i r̂i be an upper bound for j x∗j .
P P P
data b, D so that errors from the τ -quantile are minimized. Instances may only have nonzero
optimal value if m > p, as is clear from the constraints of the formulation below:
τ 1⊤ u+ + (1 − τ )1⊤ u−
min
β∈Rp
u+ ,u− ∈Rm
+ (QR)
Dβ + Iu+ − Iu−
= b,
where the constraint system Ax = b has A = (D|I| −I), x = (β, u+ , u− ), and τ (the quantile
level) is given, and fixed at 0.2 in our experiments. The data matrix (D, b) is sampled
uniformly componentwise from [−1, 1], with dD = 0.8. Instances are all feasible because
the problem reduces to solving the overconstrained linear system Dβ = b with a “skewed”
version of an ℓ1 error function.
We note that (QR) is not in standard form, since the components of β are unconstrained;
but this is not an issue, insofar as the problem is bounded (since it is feasible and it minimizes
a weighted sum of non-negative variables), and this is enough to have the results in [25] hold.
On the contrary, the lack of non-negative bounds on β is an advantage, since we need not
worry about negativity errors in the β components of the retrieved solution (Prop. 1). The
density of A is dA = (dD mp + 2m)/(mp + 2m2 ) = (dD p + 2)/(p + 2m).
For (QR), given that all data is sampled uniformly from [−1, 1], no optimum can ever
have |βj | > 1. As for u+ , u− , we note that any feasible β yields an upper bound to the
optimal objective function value, which only depends on u+ , u− : we can therefore choose
− −
β = 0, and obtain u+ +
i − ui = bi for all i ≤ m; we then let ui = bi ∧ ui =P 0 if bi > 0, and
+ −
ui = 0 ∧ ui = −bi otherwise. This yields an upper bound estimate θ = p + i |bi | to j x∗j .
P
L. Liberti, B. Manca, and P.-L. Poirion 21:7
1⊤ s
min
x,s∈Rn
Ax = b (BP)
∀j ≤ n −sj ≤ xj ≤ sj .
According to sparse coding theory [5], we work with a fully dense m × n matrix A sampled
componentwise from N(0, 1) (with density dA = 1), a random message obtained as z/Z from
a sparse z ∈ (Z ∩ [−Z, Z])n (with density 0.2) and Z = 10, and compute the encoded message
b = Az. We then solve (BP) in order to recover the sparsest solution of the underconstrained
system Ax = b, which should provide an approximation of z. Basis pursuit problems undergo
a phase transition as m decreases from n down to zero [2], so it shouldn’t really make sense
to decrease m by using RPs, and yet some mileage can unexpectedly be extracted from this
operation [17].
Similarly to (MF), in (BP) we can partition the constraints into equations Ax = b and
inequalities −s ≤ x ≤ s. Again by [25, §4.2], we devise a block-structured RP matrix which
only projects the equations.
As in Sect. 3.2.3, (BP) is not in standard form, since none of the variables are non-
negative. In this case, moreover, it is not easy to establish a bound θ on j (x∗j + s∗j ), since
P
A is sampled from a normal distribution. On the other hand, for Aij ∼ N(0, 1) we have
Prob Aij ∈ [−3, 3] = 0.997. By construction, we have b ∈ [−3n, 3n]m , which implies
a defining interval [−n, n] on the components of optimal solutions, yielding θ = 2n2 with
probability 0.997.
3.3 Methodology
The goal of this paper is to provide a computational assessment of RPs applied to LP.
As discussed at the beginning of Sect. 3, the actual determination of all relevant parameters
is theoretically hopeless. We can certainly simplify the task a little by noting that the
coefficient C can be removed since it suffices to decide a value for ϵ in order to decide k.
Ideally we would like to decide γ first (see Eq. (4)), then compute ϵ as O(γ/(θ2 ∥y ∗ ∥2 )),
and sample an appropriate RP. Unfortunately, estimating θ and ∥y ∗ ∥2 prior to solving the
original LP leads to tiny values for ϵ (e.g. 10−i for i ∈ {2, . . . , 11} in some preliminary tests),
2
which would require the rows of A to be at least O(10i ) in order to yield a useful projection.
Since we are interested in applying RPs to LPs with O(102 ) and O(103 ) rows, this “ideal”
approach is inapplicable.
Instead, we repeatedly solve sets of instances of each LP structure. Each projected
instance is solved with different values of ϵ ∈ E = {0.15, 0.2, 0.25, 0.3, 0.35, 0.4} (these values
have been found to be the most relevant in preliminary computational experiments performed
over several years). Moreover, to mitigate the effect of randomness, we solve each instance
with each ϵ multiple times. For each instance and ϵ we collect performance measures on
objective function values, infeasibility errors, and CPU time. This allows us to illustrate the
co-variability of ϵ and instance size with the performance measures.
SEA 2022
21:8 Practical Random Projections for LP
4 The benchmark
The solution pipeline is based on Python 3 [21] and the libraries scipy [14] and amplpy [9]
(besides other standard python libraries). For each problem type, we loop over instances
(based on row size of the equality constraint system, varying in S, see below), over ϵ ∈ E,
and over 5 different runs for each instance and ϵ in order to amortize the result randomness
depending on the choice of T . We solve all of the original and projected instances using
CPLEX 20.1 [10]. We use the barrier solver, because we found this to be more efficient with
large dense LPs than the simplex-based solvers in CPLEX. Our code can be downloaded
here.1 All tests have been carried out on a MacBook 2017 wih a 1.4GHz dual-core Intel Core
i7 with 16GB RAM.
1
The URL is https://mega.nz/file/p8MQhbpT#0TJBUVgaBf4KPVk2fu_5k05cMy2VozJk-0fQ1PZdJ0U.
L. Liberti, B. Manca, and P.-L. Poirion 21:9
SEA 2022
21:10 Practical Random Projections for LP
Figure 1 Max Flow plots (increasing ϵ on abscissae): instances of growing size on rows, objective
function ratios on the first column, feasibility errors on the second, k/m and CPU time ratio on the
third.
We can see that the objective function ratios of this instance provide a definite improvement
with respect to the three largest instances in Fig. 4 (m ∈ {1500, 2000, 2500}). The negativity
error is, however, of the same magnitude as before.
Figure 2 Diet plots (increasing ϵ on abscissae): instances of growing size on rows, objective
function ratios on the first column, feasibility errors on the second, k/m and CPU time ratio on the
third.
(for Ax = b and x ≥ 0), which happens because the variables x are unbounded. The CPU
time ratio is not as regular as for the other structures, but still denotes a remarkable time
saving when solving projected formulations.
To see whether increasing sizes and decreasing ϵ improved performances, we solved an
instance with m = 5000 and n = 6000 with ϵ = 0.1, obtaining the following results.
ϵ f¯/f ∗ f˜/f ∗ avgin avgeq k/m t̄/t∗
basispursuit-5000
0.10 0.4925 1.5395 0.0000 0.0000 0.17 0.09
An improvement with respect to the three largest instances in Fig. 5 (m ∈ {1500, 2000, 2500})
is present, which points to the correct trend, albeit not substantial.
SEA 2022
21:12 Practical Random Projections for LP
Figure 3 Diet plots with modified objective attempting to drive the slack variables to zero.
5 Conclusion
In this paper we have pursued a computational study of the application of random projections
to linear program data, based on solving original and projected formulations linear program
instances of various structures and sizes. We found that original formulations only involving
inequalities are particularly challenging, but those that natively involve equations behave
better. The sparsity of the constraint matrix does not appear to pose issues, as long as
sparse RPs are used. Lastly, the sizes we considered here are possibly at the lower end of the
range allowed by RPs: better results should be obtained with larger sizes and smaller values
of ϵ, which in turn imply larger CPU times.
L. Liberti, B. Manca, and P.-L. Poirion 21:13
SEA 2022
21:14 Practical Random Projections for LP
Figure 5 Basis Pursuit plots (increasing ϵ on abscissae): instances of growing size on rows,
objective function ratios on the first column, feasibility errors on the second, k/m and CPU time
ratio on the third.
References
4 C. D’Ambrosio, L. Liberti, P.-L. Poirion, and K. Vu. Random projections for quadratic
programs. Mathematical Programming B, 183:619–647, 2020.
5 S. Damelin and W. Miller. The mathematics of signal processing. CUP, Cambridge, 2012.
6 G. Dantzig. The Diet Problem. Interfaces, 20(4):43–47, 1990.
7 S. Dirksen. Dimensionality reduction with subgaussian matrices: A unified theory. Foundations
of Computational Mathematics, 16:1367–1396, 2016.
8 L. Ford and D. Fulkerson. Flows in Networks. Princeton University Press, Princeton, NJ,
1962.
9 R. Fourer and D. Gay. The AMPL Book. Duxbury Press, Pacific Grove, 2002.
10 IBM. ILOG CPLEX 20.1 User’s Manual. IBM, 2020.
11 P. Indyk. Algorithmic applications of low-distortion geometric embeddings. In Foundations of
Computer Science, volume 42 of FOCS, pages 10–33, Washington, DC, 2001. IEEE.
12 P. Indyk and A. Naor. Nearest neighbor preserving embeddings. ACM Transactions on
Algorithms, 3(3):Art. 31, 2007.
13 W. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In
G. Hedlund, editor, Conference in Modern Analysis and Probability, volume 26 of Contemporary
Mathematics, pages 189–206, Providence, RI, 1984. AMS.
14 E. Jones, T. Oliphant, and P. Peterson. SciPy: Open source scientific tools for Python, 2001.
[Online; accessed 2016-03-01]. URL: http://www.scipy.org/.
15 D. Kane and J. Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM,
61(1):4, 2014.
16 R. Koenker. Quantile regression. CUP, Cambridge, 2005.
17 L. Liberti. Decoding noisy messages: a method that just shouldn’t work. In A. Deza, S. Gupta,
and S. Pokutta, editors, Data Science and Optimization. Fields Institute, Toronto, pending
minor revisions.
18 L. Liberti, P.-L. Poirion, and K. Vu. Random projections for conic programs. Linear Algebra
and its Applications, 626:204–220, 2021.
19 L. Liberti and K. Vu. Barvinok’s naive algorithm in distance geometry. Operations Research
Letters, 46:476–481, 2018.
20 D. Pucci de Farias and B. Van Roy. On constraint sampling in the Linear Programming
approach to approximate Dynamic Programming. Mathematics of Operations Research,
29(3):462–478, 2004.
21 G. van Rossum and et al. Python Language Reference, version 3. Python Software Foundation,
2019.
22 S. Vempala. The Random Projection Method. Number 65 in DIMACS Series in Discrete
Mathematics and Theoretical Computer Science. AMS, Providence, RI, 2004.
23 S. Venkatasubramanian and Q. Wang. The Johnson-Lindenstrauss transform: An empirical
study. In Algorithm Engineering and Experiments, volume 13 of ALENEX, pages 164–173,
Providence, RI, 2011. SIAM.
24 K. Vu, P.-L. Poirion, C. D’Ambrosio, and L. Liberti. Random projections for quadratic
programs over a Euclidean ball. In A. Lodi and et al., editors, Integer Programming and
Combinatorial Optimization (IPCO), volume 11480 of LNCS, pages 442–452, New York, 2019.
Springer.
25 K. Vu, P.-L. Poirion, and L. Liberti. Random projections for linear programming. Mathematics
of Operations Research, 43(4):1051–1071, 2018.
26 J. Yang, X. Meng, and M. Mahoney. Quantile regression for large-scale applications. SIAM
Journal of Scientific Computing, 36(5):S78–S110, 2014.
SEA 2022