Report
Report
Report
II-B.Tech-CSE-AI (Semester 4)
Batch (2020-24)
End Semester Project
Fast and Flexible ADMM Algorithms for Trend Filtering
Submitted by:
Anirudh Edpuganti (CB.EN.U4AIE20005)
Jyothis Viruthi Santhosh (CB.EN.U4AIE20025)
Onteddu Chaitanya Reddy (CB.EN.U4AIE20045)
Pillalamarri Akshaya (CB.EN.U4AIE20049)
Pingali Sathvika (CB.EN.U4AIE20050)
We would like to express our special thanks of gratitude to our teacher (Dr. Sowmya
Mam) who gave us the golden opportunity to do this wonderful project on the topic (Fast
and Flexible ADMM Algorithms for Trend Filtering ), which also helped us in doing a
lot of research and we came to know about so many new things. We are thankful for the
opportunity given.
2
ABSTRACT
It is often desirable to find the underlying trends in time series data. This is a well
known signal processing problem that has many applications in areas such as financial
data analysis, climatology, biological and medical sciences etc. Mean filtering finds a
piece-wise constant trend in the data while trend filtering finds a piece-wise linear trend.
When the signal is noisy, the main difficulty is finding the changing points in the data.
These are the points where the mean or the trend changes. We focus on a quadratic cost
function with a penalty term on the number of changing points. We use the l1 norm for
the penalty term as it leads to a sparse solution. This is attractive because the problem
is convex in the unknown parameters and well known optimization algorithms exist for
this problem. We investigate the Specialised Alternating Direction Method of Multipliers
(ADMM).
In the proposed report, the performance of standard ADMM algorithm for Trend Fil-
tering is compared with that of specialised ADMM algorithm based on Dynamic pro-
gramming approach.
3
Contents
CHAPTER 1: INTRODUCTION 2
1.1 Overview 2
1.2 Problem Formulation 2
1.3 Trend Filtering 2
1.3.1 Objectives Of Trend Filtering 4
CHAPTER 5: Appendix 18
5.1 Code for Standard ADMM Algorithm in Matlab 18
5.2 Code for Specialized ADMM Algorithm in Matlab 19
5.3 Code for variation of a denoised signal with λ 21
5.4 References 22
1
Chapter 1
INTRODUCTION
1.1 Overview
The main aim of this project is to get familiarised with different forms of ADMM primar-
ily focusing on Scaled Form ADMM for Trend Filtering.
We will start by understanding the objectives of Trend Filtering followed by rewriting
the ADMM equations accordingly to get effective results.
2
analysis, biological and medical sciences, and climatology, to name a few. Mean filtering
is the process of filtering data to find consistent trends, which promotes a piece-wise
constant solution. Trend filtering is the process of filtering data to find linear trends,
which promotes a piece-wise linear solution. This is beneficial for finding a simple linear
model to the observed data.
The main challenge in mean and trend filtering is locating the trend’s so-called chang-
ing points. These are the points where the data abruptly changes. Consider a vector
indicating the change points that is the same length as the data. This vector will have zero
entries in the majority of its entries; this is known as a sparse vector. The non-zero entries
correspond to the abrupt changes in the data known as changing points. To find trends in
data, we assume a sparse vector of changing points.
The l1 norm promotes a sparse solution. In our problem, this leads to piece-wise
constant or a piece-wise linear solution depending on the exact constraint we set. The
constraint we set is used to penalize variations in the solution which is the changing point
vector. If we choose to penalize the first order difference equation, we promote a piece-
wise constant trend. If we instead choose to penalize the second order difference equation,
we promote a piece-wise linear signal.
An early trend filtering method called Hodrick-Prescott (H-P) filtering uses a quadratic
cost function which means that the minimization can be expressed as a solution to a linear
set of equations in closed form. However, the solution is not sparse and therefore does not
promote a piece-wise constant or linear solution.It was proposed that l1 trend filtering, a
variation on H-P filtering substitutes a sum of absolute values (i.e., an l1 norm) for the sum
of squares used in H-P filtering to penalize variations in the estimated trend.This problem
proved to be convex in nature for which there exist several well known algorithms that
deal with these optimization problems. A well studied method is the Alternating Direction
Method of Multipliers (ADMM) algorithm , which is an iterative algorithm that can reach
an approximate solution fast.
Even faster algorithms based on the so called taut string method have been developed
to solve the mean filtering problem. A similar fast algorithm is introduced based on
the Karush-Kuhn-Tucker (KKT) conditions. These taut string algorithms reach an exact
solution to the mean filtering problem very efficiently. However, no fast method exists
for the trend filtering problem. We present a novel fast algorithm similar to the taut string
3
that approximates the solution to the l1 trend filtering problem.
The proposed l1 trend filter method shares many properties with the H-P filter and has
the same (linear) computational complexity. The principal difference is that the l1 trend
filter produces trend estimates that are smooth in the sense of being piece-wise linear. The
l1 trend filter is thus well suited to analyzing time series with an underlying piece-wise
linear trend. The kinks, knots, or changes in slope of the estimated trend can be interpreted
as abrupt changes or events in the underlying dynamics of the time series; thel1 trend filter
can be interpreted as detecting or estimating changes in an underlying linear trend.
The goal of trend filtering is to smooth out a time series by filtering out the ‘noise’.
The trend filtering algorithm faces a trade-off between two objectives.Firstly, it wants to
minimise the residual ‘noise’ between the actual and smooth series. Think of this as the
algorithm wanting to stay as true as possible to the original series. Secondly, it wants to
maximise the smoothness of the filtered series, which tends to be at odds with the first
objective.
Mathematically, we aim to minimise the following objective function:
1
min ∥y − x∥22 + λ ∥Dx∥1 (1.1)
2
y is the actual time series, while x is the estimated filtered time series. The first part
of the loss function represents the objective of minimising the sum of squared residuals
between the actual and fitted series. The second part of the loss function represents the
desire for smoothness. Dx captures the smoothness between every set of two points.
−1 1 0 ... 0 0
0 −1 1 ... 0 0
D=
..
.
0 0 0 0 −1 1
4
Chapter 2
Now the update Equations are obtained by removing the constants terms,differentiating
and equating to 0
xk+1 := proxλ f zk − uk
zk+1 := proxλ g xk+1 + uk (2.2)
ρ ρ
Lρ (x, z, w) = f (x) + g(z) + ∥Ax + Bz − c + w∥22 − ∥w∥22 (2.3)
2 2
ρ
2
(k) (k−1) (k−1)
x = argmin f (x) +
Ax + Bz −c+w (2.4)
2
x 2
ρ
2
(k)
(k) (k−1)
z = argming(z) +
Ax + Bz − c + w (2.5)
2
z 2
5
w(k) = w(k−1) + Ax(k) + Bz(k) − c (2.6)
k
w(k) = w(0) + ∑ Ax(i) + Bz(i) − c (2.7)
i=1
Lasso regression performs l1 regularization, which adds a penalty equal to the absolute
value of the magnitude of coefficients. This type of regularization can result in sparse
models with few coefficients; Some coefficients can become zero and eliminated from the
model. Larger penalties result in coefficient values closer to zero, which is the ideal for
producing simpler models. On the other hand, L2 regularization (e.g. Ridge regression)
does not result in elimination of coefficients or sparse models. This makes the Lasso far
easier to interpret than Ridge.
1
min ∥y − Xβ ∥22 + λ ∥β ∥1 (2.8)
β 2
1
min ∥y − Xβ ∥22 + λ ∥α∥1 subject to β − α = 0 (2.9)
β ,α 2
ADMM Steps:
(k) T
−1 T
(k−1) (k−1)
β = X X + ρI X y+ρ α −w (2.10)
α (k) = Sλ /ρ β (k) + w(k−1) (2.11)
The matrix X T X+ρ ∗ I is always invertible, regardless of X The update applies the
6
soft-thresholding operator St , which recall is defined as
x j − t x > t
[St (x)] j = 0 −t ≤ x ≤ t, j = 1, . . . p (2.13)
x j + t x < −t
1
β̂ = argmin ∥y − β ∥22 + λ
D(k+1) β
(2.14)
β ∈Rn 2 1
where Dk+1 = D1 ∗ Dk
The standard ADMM approach is based on rewriting above equation as
1
min ∥y − β ∥22 + λ ∥α∥1 subject to α = D(k+1) β . (2.15)
β ∈Rn ,α∈Rn−k−1 2
The augmented Lagrangian can be written as
1 ρ
2 ρ
L(β , α, u) = ∥y − β ∥22 + λ ∥α∥1 +
α − D(k+1) β + u
− ∥u∥22 (2.16)
2 2 2 2
1 ρ 2
L(β , α, k) = ||y − β ||22 + λ ||α||1 + zT (α − Dk+1 β ) + ||α − Dk+1 β ||2 (2.17)
2 2
ρ
zT r + ||r||22 (2.19)
2
7
ρ 2 T
z r + ||r||22 (2.20)
2 ρ
ρ 2 T
z r + rrT (2.21)
2 ρ
ρ 2 T 1 1
z r + rrT + 2 zT z − 2 zT z (2.22)
2 ρ ρ ρ
ρ 2 ⊤ ⊤ ⊤ ⊤
z r+r r+u u−u u (2.23)
2 ρ
ρ 2 ⊤ ⊤ ⊤ ρh ⊤ i
z r+r r+u u − u u (2.24)
2 ρ 2
ph ⊤ ⊤ ⊤ ⊤
i p
u r + r r + u r + u u − ∥u∥22 (2.25)
2 2
ρ h ⊤ ⊤
⊤
i ρ
u + r r + u (r + u) − ||u||22 . (2.26)
2 2
ρh ⊤ ⊤
i ρ
(u + r) r + u (r + u) − ∥u∥22 (2.27)
2 2
ph ⊤ ⊤
i ρ
r (u + r) + u (r + u) − ∥u∥22 (2.28)
2 2
ph
⊤ ⊤
i ρ
(r + u) u + r − ∥u∥22 (2.29)
2 2
ρh ⊤
i ρ
(r + u)(u + r) − ∥u∥22 (2.30)
2 2
ρ ρ
∥r + u∥22 − ∥u∥22 (2.31)
2 2
Finally the augmented Lagrangian equation is obtained
1 ρ
2 ρ
2 (k+1)
L(β , α, u) = ∥y − β ∥2 + λ ∥α∥1 +
α − D β + u
− ∥u∥22 (2.32)
2 2 2 2
Now the standard ADMM updates can derived in the following way:
Removing the constant terms that does not depend on β
1 ρ
2
2
k+1
β = argmin ∥y − β ∥2 +
−D β + α + u
(2.33)
2 2 2
8
Now on differentiating and equating to 0
1 ⊤
× 2 × −I(y − β ) + ρ −Dk+1 β + α + u −Dk+1 = 0 (2.34)
2
⊤ ⊤
−y + Iβ + ρ Dk+1 Dk+1 β = ρ Dk+1 (α + u) (2.35)
⊤ ⊤
k+1 k+1 k+1
Iβ + ρ D D β = y+ρ D (α + u) (2.36)
⊤ ⊤
k+1 k+1
β I +ρ D D = y + ρ Dk+1 (α +U) (2.37)
⊤
y + ρ Dk+1 (α + 4)
β= ⊤ (2.38)
I + ρ Dk+1 Dk+1
⊤ −1 ⊤
k+1 k+1 k+1
β = I +ρ D D y+ρ D (α + u) (2.39)
Now finding the update for α, Removing the terms that does not depend on α
ρ
2
(k+1)
α = argmin λ ∥α∥1 + − D + u (2.40)
α β
2
2
1
2
(k+1)
α = argmin ∥α∥1 + ρ
α − (D β − u)
(2.41)
2( λ ) 2
α = S λ (D(k+1) β − u) (2.42)
ρ
1
where Sλ (v) = argmin ∥α∥1 + 2λ ∥α − (v)∥22
1
(1)
min 2
∥y − β ∥2 + λ
D α
subject to α = D(k) β (2.44)
β ∈Rn ,α∈Rn−k 2 1
9
where the recursive property Dk+1 =D1 *Dk comes into picture
The augmented Lagrangian now becomes
1
ρ
2 ρ
L(β , α, u) = ∥y − β ∥22 + λ
D(1) α
+
α − D(k) β + u
− ∥u∥22 (2.45)
2 1 2 2 2
u ← u + α − D(k) β .
10
Chapter 3
Then a for loop is initialized to find the Discrete Difference operator of higher order
by recursive nature.The dimensions of the matrix are taken in such a way that they are
suitable for matrix multiplication
11
Figure 3.2: Initializing for loop
This is followed by specifying the objective function along with the constraints in the
cvx optimization tool.
For the implementation of specialized ADMM the only difference comes in the ini-
tialization of for loop.
12
Here the constraint is obtained by recursive nature of the Discrete Difference Opera-
tor.The constraint here is α=Dk β
13
Now lambda-list variable consists of different values of lambda varying from 0 to
10000 with non-uniform step size.
We will minimize our objective function with each value of lambda in the lambda-list
and plot the corresponding denoised signal.
14
Chapter 4
15
4.2 Variation of λ in python
When λ = 0 we can see that our denoised signal is same as the noisy signal and the optimal
value would be zero. The smoothness of our denoised signal increases with increase in
the value of λ .We can see that in the below figure 4.2
16
4.3 Conclusion
From the results obtained it can be concluded that Specialised ADMM algorithm con-
verges faster than standard ADMM algorithm since it uses Dynamic Programming Ap-
proach for the α update.
17
Chapter 5
Appendix
y = T.first_bidding;
y = y/norm(y);
n = size(y)
n = n(1)
D_1 = toeplitz([-1,zeros(1,n-2)],[-1,1,zeros(1,n-2)]);
k = 2
for i=1:k
D_1_k = D_1(1:n-i-1,1:n-i);
if i==1
D_k = D_1_k*D_1;
else
D_k = D_1_k*D_k;
end
end
18
D_k
lambda = 0.1;
tic
cvx_begin
variables x(n) alpha(n-k-1)
objective = 0.5*sum_square((y-x)) + lambda*sum(abs(alpha))
minimize(objective)
subject to
alpha == D_k*x
cvx_end
toc
TIME = (toc - tic)
scatter(1:n,y)
hold on
grid on
plot(1:n,x)
y = T.first_bidding;
y = y/norm(y);
n = size(y)
n = n(1)
D_1 = toeplitz([-1,zeros(1,n-2)],[-1,1,zeros(1,n-2)]);
19
k = 3
for i=1:k-1
D_1_k = D_1(1:n-i-1,1:n-i);
if i==1
D_k = D_1_k*D_1;
else
D_k = D_1_k*D_k;
end
end
D_k
lambda = 0.1;
tic
cvx_begin
variables x(n) alpha(n-k)
objective = 0.5*sum_square((y-x)) + lambda*sum(abs((D_1(1:n-k-1,1:n-k)*alpha)))
minimize(objective)
subject to
alpha == D_k*x
cvx_end
toc
TIME = (toc - tic)
scatter(1:n,y)
hold on
grid on
plot(1:n,x)
20
5.3 Code for variation of a denoised signal with λ
import p a n d a s a s pd
import numpy a s np
import cvxpy
import m a t p l o t l i b . p y p l o t a s p l t
from s c i p y . l i n a l g import t o e p l i t z
y = pd . r e a d c s v ( ’ coe − 2 0 2 1 . c s v ’ )
y = y [ ’ s e c o n d b i d d i n g ’ ] . to numpy ( )
y = np . l o g ( y )
n = y . size
c = np . a r r a y ( ( [ − 1 ] + [ 0 ] * ( n − 2 ) ) )
r = np . a r r a y ( ( [ − 1 , 1 ] + [ 0 ] * ( n − 2 ) ) )
D 1 = toeplitz (c , r )
k = 1
D k = np . z e r o s ( ( np . s h a p e ( D 1 ) ) )
f o r i i n range ( 1 , k + 1 ) :
D 1 k = D 1 [ 0 : n− i − 1 , 0 : n− i ]
i f i ==1:
D k = D 1 k@D 1
else :
D k = D 1 k@D k
l a m b d a l i s t = [ 0 , 0 . 1 , 0 . 5 , 1 , 2 , 5 , 10 , 50 , 200 ,
500 , 1000 , 2000 , 5000 , 10000 , 100000]
s o l v e r = cvxpy . CVXOPT
reg norm = 1
f i g , ax = p l t . s u b p l o t s ( l e n ( l a m b d a l i s t ) / / 3 , 3 , f i g s i z e = ( 2 0 , 2 0 ) )
ax = ax . r a v e l ( )
ii = 0
21
for lambda value in l a m b d a l i s t :
x = cvxpy . V a r i a b l e ( s h a p e =n )
p r o b l e m = cvxpy . P r o b l e m ( o b j e c t i v e )
problem . s o l v e ( s o l v e r = s o l v e r , verbose = F a l s e )
ax [ i i ] . p l o t ( np . a r a n g e ( 1 , n + 1 ) , y , c= ’ b ’ )
ax [ i i ] . p l o t ( np . a r a n g e ( 1 , n + 1 ) , np . a r r a y ( x . v a l u e ) ,
’ b− ’ , l i n e w i d t h = 1 . 0 , c= ’ r ’ )
ax [ i i ] . s e t x l a b e l ( ’ Time ’ )
ax [ i i ] . s e t y l a b e l ( ’ Log Premium ’ )
ax [ i i ] . s e t t i t l e ( ’ Lambda : {}\ n S o l v e r : {}\ n O b j e c t i v e V a l u e : {} ’
. format ( l a m b d a v a l u e , p r o b l e m . s t a t u s , round ( o b j e c t i v e . v a l u e , 3 ) ) )
i i +=1
5.4 References
1. Fast and Flexible ADMM Algorithms for Trend Filtering
2 .Introduction to Trend Filtering with Applications in Python
3. Scaled form ADMM
4. l1 Trend Filtering
5. A Case for using Trend Filtering over Splines
22