Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

21MAT212(21-22(Even))

II-B.Tech-CSE-AI (Semester 4)
Batch (2020-24)
End Semester Project
Fast and Flexible ADMM Algorithms for Trend Filtering

Submitted by:
Anirudh Edpuganti (CB.EN.U4AIE20005)
Jyothis Viruthi Santhosh (CB.EN.U4AIE20025)
Onteddu Chaitanya Reddy (CB.EN.U4AIE20045)
Pillalamarri Akshaya (CB.EN.U4AIE20049)
Pingali Sathvika (CB.EN.U4AIE20050)

Under the supervision of


Prof. Dr.V.Sowmya Ma’am

DEPT. OF COMPUTATIONAL ENGINEERING AND NETWORKING


AMRITA SCHOOL OF ENGINEERING
5th July 2022
Acknowledgement

We would like to express our special thanks of gratitude to our teacher (Dr. Sowmya
Mam) who gave us the golden opportunity to do this wonderful project on the topic (Fast
and Flexible ADMM Algorithms for Trend Filtering ), which also helped us in doing a
lot of research and we came to know about so many new things. We are thankful for the
opportunity given.

2
ABSTRACT

Keywords - Trend filtering,tuning parameter,fused lasso,piece-wise,augmented La-


grangian,convergence,L2 norm,L1 norm

It is often desirable to find the underlying trends in time series data. This is a well
known signal processing problem that has many applications in areas such as financial
data analysis, climatology, biological and medical sciences etc. Mean filtering finds a
piece-wise constant trend in the data while trend filtering finds a piece-wise linear trend.
When the signal is noisy, the main difficulty is finding the changing points in the data.
These are the points where the mean or the trend changes. We focus on a quadratic cost
function with a penalty term on the number of changing points. We use the l1 norm for
the penalty term as it leads to a sparse solution. This is attractive because the problem
is convex in the unknown parameters and well known optimization algorithms exist for
this problem. We investigate the Specialised Alternating Direction Method of Multipliers
(ADMM).
In the proposed report, the performance of standard ADMM algorithm for Trend Fil-
tering is compared with that of specialised ADMM algorithm based on Dynamic pro-
gramming approach.

3
Contents

CHAPTER 1: INTRODUCTION 2
1.1 Overview 2
1.2 Problem Formulation 2
1.3 Trend Filtering 2
1.3.1 Objectives Of Trend Filtering 4

CHAPTER 2: Algorithm & its Derivation 5


2.1 Standard ADMM 5
2.2 Scaled Form ADMM 5
2.2.1 Example:Lasso Regression 6
2.3 Specialised ADMM Algorithm 7

CHAPTER 3: Coding & Implementation 11


3.1 Implementation in Matlab 11
3.2 Implementation in Python 13

CHAPTER 4: Results & Conclusion 15


4.1 Plots in Matlab 15
4.2 Variation of λ in python 16
4.3 Conclusion 17

CHAPTER 5: Appendix 18
5.1 Code for Standard ADMM Algorithm in Matlab 18
5.2 Code for Specialized ADMM Algorithm in Matlab 19
5.3 Code for variation of a denoised signal with λ 21
5.4 References 22

1
Chapter 1

INTRODUCTION

1.1 Overview
The main aim of this project is to get familiarised with different forms of ADMM primar-
ily focusing on Scaled Form ADMM for Trend Filtering.
We will start by understanding the objectives of Trend Filtering followed by rewriting
the ADMM equations accordingly to get effective results.

1.2 Problem Formulation


Initially the concept and objectives of Trend Filtering should be understood.Then the
ADMM equation should be formulated corresponding to the objectives.The dimensions
of each of the variables should be given utmost importance.Then the ADMM equations
are to be formulated subject to the constraints.The performance of standard ADMM and
specialised ADMM is then compared.The specialised ADMM is formulated using the
Dynamic programming approach.The primal-dual interior point method is examined for
its performance in comparison to specialised ADMM algorithm.

1.3 Trend Filtering


Estimating underlying trends in time series data is a problem that arises in a variety of
disciplines. Due to noise and imperfections in the data, this could be a difficult estimation
problem. As a result, several techniques for filtering and removing disturbances in order
to find the underlying trend in the data have been developed. This is a broad research
area with many applications not only in signal processing but also in financial time series

2
analysis, biological and medical sciences, and climatology, to name a few. Mean filtering
is the process of filtering data to find consistent trends, which promotes a piece-wise
constant solution. Trend filtering is the process of filtering data to find linear trends,
which promotes a piece-wise linear solution. This is beneficial for finding a simple linear
model to the observed data.
The main challenge in mean and trend filtering is locating the trend’s so-called chang-
ing points. These are the points where the data abruptly changes. Consider a vector
indicating the change points that is the same length as the data. This vector will have zero
entries in the majority of its entries; this is known as a sparse vector. The non-zero entries
correspond to the abrupt changes in the data known as changing points. To find trends in
data, we assume a sparse vector of changing points.
The l1 norm promotes a sparse solution. In our problem, this leads to piece-wise
constant or a piece-wise linear solution depending on the exact constraint we set. The
constraint we set is used to penalize variations in the solution which is the changing point
vector. If we choose to penalize the first order difference equation, we promote a piece-
wise constant trend. If we instead choose to penalize the second order difference equation,
we promote a piece-wise linear signal.
An early trend filtering method called Hodrick-Prescott (H-P) filtering uses a quadratic
cost function which means that the minimization can be expressed as a solution to a linear
set of equations in closed form. However, the solution is not sparse and therefore does not
promote a piece-wise constant or linear solution.It was proposed that l1 trend filtering, a
variation on H-P filtering substitutes a sum of absolute values (i.e., an l1 norm) for the sum
of squares used in H-P filtering to penalize variations in the estimated trend.This problem
proved to be convex in nature for which there exist several well known algorithms that
deal with these optimization problems. A well studied method is the Alternating Direction
Method of Multipliers (ADMM) algorithm , which is an iterative algorithm that can reach
an approximate solution fast.
Even faster algorithms based on the so called taut string method have been developed
to solve the mean filtering problem. A similar fast algorithm is introduced based on
the Karush-Kuhn-Tucker (KKT) conditions. These taut string algorithms reach an exact
solution to the mean filtering problem very efficiently. However, no fast method exists
for the trend filtering problem. We present a novel fast algorithm similar to the taut string

3
that approximates the solution to the l1 trend filtering problem.
The proposed l1 trend filter method shares many properties with the H-P filter and has
the same (linear) computational complexity. The principal difference is that the l1 trend
filter produces trend estimates that are smooth in the sense of being piece-wise linear. The
l1 trend filter is thus well suited to analyzing time series with an underlying piece-wise
linear trend. The kinks, knots, or changes in slope of the estimated trend can be interpreted
as abrupt changes or events in the underlying dynamics of the time series; thel1 trend filter
can be interpreted as detecting or estimating changes in an underlying linear trend.

1.3.1 Objectives Of Trend Filtering

The goal of trend filtering is to smooth out a time series by filtering out the ‘noise’.
The trend filtering algorithm faces a trade-off between two objectives.Firstly, it wants to
minimise the residual ‘noise’ between the actual and smooth series. Think of this as the
algorithm wanting to stay as true as possible to the original series. Secondly, it wants to
maximise the smoothness of the filtered series, which tends to be at odds with the first
objective.
Mathematically, we aim to minimise the following objective function:

1
min ∥y − x∥22 + λ ∥Dx∥1 (1.1)
2
y is the actual time series, while x is the estimated filtered time series. The first part
of the loss function represents the objective of minimising the sum of squared residuals
between the actual and fitted series. The second part of the loss function represents the
desire for smoothness. Dx captures the smoothness between every set of two points.

 
−1 1 0 ... 0 0
 
 0 −1 1 ... 0 0
 
D=
 ..


 . 
 
0 0 0 0 −1 1

4
Chapter 2

Algorithm & its Derivation

2.1 Standard ADMM


min f (x) + g(z) subject to x − z = 0
The Lagrangian equation will be written as follows:

Lρ (x, y, z) = f (x) + g(z) + yT (x − z) + (ρ/2)∥x − z∥22 (2.1)

Now the update Equations are obtained by removing the constants terms,differentiating
and equating to 0  
xk+1 := proxλ f zk − uk
 
zk+1 := proxλ g xk+1 + uk (2.2)

uk+1 := uk + xk+1 − zk+1

2.2 Scaled Form ADMM

Scaled form: denote w = u/ρ, so augmented Lagrangian becomes

ρ ρ
Lρ (x, z, w) = f (x) + g(z) + ∥Ax + Bz − c + w∥22 − ∥w∥22 (2.3)
2 2

and ADMM updates become

ρ 2
(k) (k−1) (k−1)
x = argmin f (x) + Ax + Bz −c+w (2.4)

2

x 2

ρ 2
(k) (k) (k−1)
z = argming(z) + Ax + Bz − c + w (2.5)
2

z 2

5
w(k) = w(k−1) + Ax(k) + Bz(k) − c (2.6)

Note that here kth iterate wk is just a running sum of residuals:

k  
w(k) = w(0) + ∑ Ax(i) + Bz(i) − c (2.7)
i=1

2.2.1 Example:Lasso Regression

Lasso regression performs l1 regularization, which adds a penalty equal to the absolute
value of the magnitude of coefficients. This type of regularization can result in sparse
models with few coefficients; Some coefficients can become zero and eliminated from the
model. Larger penalties result in coefficient values closer to zero, which is the ideal for
producing simpler models. On the other hand, L2 regularization (e.g. Ridge regression)
does not result in elimination of coefficients or sparse models. This makes the Lasso far
easier to interpret than Ridge.

1
min ∥y − Xβ ∥22 + λ ∥β ∥1 (2.8)
β 2

We can rewrite this as:

1
min ∥y − Xβ ∥22 + λ ∥α∥1 subject to β − α = 0 (2.9)
β ,α 2

ADMM Steps:

(k) T
−1  T 
(k−1) (k−1)

β = X X + ρI X y+ρ α −w (2.10)

 
α (k) = Sλ /ρ β (k) + w(k−1) (2.11)

w(k) = w(k−1) + β (k) − α (k) (2.12)

The matrix X T X+ρ ∗ I is always invertible, regardless of X The update applies the

6
soft-thresholding operator St , which recall is defined as

x j − t x > t





[St (x)] j = 0 −t ≤ x ≤ t, j = 1, . . . p (2.13)




x j + t x < −t

2.3 Specialised ADMM Algorithm


We will arrive to the specialized ADMM by applying the recursion relation and breaking
down the original Dk+1 to D1 Dk . We will see the actual specialized ADMM algorithm in
the equation 2.44.
Suppose that we are given output points y = (y1 , . . . yn ) ∈ Rn , observed across evenly
spaced input points x = (x1 ,...xn ) ∈ Rn , say, x1 = 1,...xn = n for simplicity. The trend
filtering estimate β̂ = (βˆ1 , .....βˆn ) ∈ Rn of a specified order k ≥ 0 is defined as

1
β̂ = argmin ∥y − β ∥22 + λ D(k+1) β (2.14)

β ∈Rn 2 1

where Dk+1 = D1 ∗ Dk
The standard ADMM approach is based on rewriting above equation as

1
min ∥y − β ∥22 + λ ∥α∥1 subject to α = D(k+1) β . (2.15)
β ∈Rn ,α∈Rn−k−1 2
The augmented Lagrangian can be written as

1 ρ 2 ρ
L(β , α, u) = ∥y − β ∥22 + λ ∥α∥1 + α − D(k+1) β + u − ∥u∥22 (2.16)

2 2 2 2

where u is a scaled variable u= z/ρ


The derivation for writing the Lagrangian equation

1 ρ 2
L(β , α, k) = ||y − β ||22 + λ ||α||1 + zT (α − Dk+1 β ) + ||α − Dk+1 β ||2 (2.17)
2 2

let r be α − Dk+1 β (2.18)

ρ
zT r + ||r||22 (2.19)
2

7
 
ρ 2 T
z r + ||r||22 (2.20)
2 ρ
 
ρ 2 T
z r + rrT (2.21)
2 ρ
 
ρ 2 T 1 1
z r + rrT + 2 zT z − 2 zT z (2.22)
2 ρ ρ ρ
 
ρ 2 ⊤ ⊤ ⊤ ⊤
z r+r r+u u−u u (2.23)
2 ρ
 
ρ 2 ⊤ ⊤ ⊤ ρh ⊤ i
z r+r r+u u − u u (2.24)
2 ρ 2

ph ⊤ ⊤ ⊤ ⊤
i p
u r + r r + u r + u u − ∥u∥22 (2.25)
2 2

ρ h ⊤ ⊤


i ρ
u + r r + u (r + u) − ||u||22 . (2.26)
2 2

ρh ⊤ ⊤
i ρ
(u + r) r + u (r + u) − ∥u∥22 (2.27)
2 2

ph ⊤ ⊤
i ρ
r (u + r) + u (r + u) − ∥u∥22 (2.28)
2 2

ph 
⊤ ⊤
i ρ
(r + u) u + r − ∥u∥22 (2.29)
2 2

ρh ⊤
i ρ
(r + u)(u + r) − ∥u∥22 (2.30)
2 2

ρ ρ
∥r + u∥22 − ∥u∥22 (2.31)
2 2
Finally the augmented Lagrangian equation is obtained
1 ρ 2 ρ
2 (k+1)
L(β , α, u) = ∥y − β ∥2 + λ ∥α∥1 + α − D β + u − ∥u∥22 (2.32)

2 2 2 2
Now the standard ADMM updates can derived in the following way:
Removing the constant terms that does not depend on β
1 ρ 2
2 k+1
β = argmin ∥y − β ∥2 + −D β + α + u (2.33)

2 2 2

8
Now on differentiating and equating to 0

1   ⊤
× 2 × −I(y − β ) + ρ −Dk+1 β + α + u −Dk+1 = 0 (2.34)
2

 ⊤    ⊤
−y + Iβ + ρ Dk+1 Dk+1 β = ρ Dk+1 (α + u) (2.35)

 ⊤    ⊤
k+1 k+1 k+1
Iβ + ρ D D β = y+ρ D (α + u) (2.36)

  ⊤    ⊤
k+1 k+1
β I +ρ D D = y + ρ Dk+1 (α +U) (2.37)

⊤
y + ρ Dk+1 (α + 4)
β= ⊤  (2.38)
I + ρ Dk+1 Dk+1
  ⊤  −1   ⊤ 
k+1 k+1 k+1
β = I +ρ D D y+ρ D (α + u) (2.39)

Now finding the update for α, Removing the terms that does not depend on α

ρ 2
(k+1)
α = argmin λ ∥α∥1 + − D + u (2.40)

α β
2

2

Dividing the equation by λ the equation can be rewritten as

1 2
(k+1)
α = argmin ∥α∥1 + ρ α − (D β − u) (2.41)

2( λ ) 2

Now the above equation can we written as

α = S λ (D(k+1) β − u) (2.42)
ρ

1
where Sλ (v) = argmin ∥α∥1 + 2λ ∥α − (v)∥22

The α-update, where sλ |ρ denotes coordinate-wise soft-thresholding at the level λ | ρ


Now the dual update is obtained in the same way as standard ADMM

uk+1 = uk + α − Dk+1 β . (2.43)

Specialised ADMM begins by writing

1
(1)
min 2
∥y − β ∥2 + λ D α subject to α = D(k) β (2.44)
β ∈Rn ,α∈Rn−k 2 1

9
where the recursive property Dk+1 =D1 *Dk comes into picture
The augmented Lagrangian now becomes

1 ρ 2 ρ
L(β , α, u) = ∥y − β ∥22 + λ D(1) α + α − D(k) β + u − ∥u∥22 (2.45)

2 1 2 2 2

The corresponding ADMM updates change accordingly


  T −1   T 
(k) (k) (k)
β ← I +ρ D D y+ρ D (α + u) ,
1 2
α ← argmin D(k) β − u − α + λ /ρ D(1) α ,
(2.46)
α∈Rn−k 2 2 1

u ← u + α − D(k) β .

10
Chapter 3

Coding & Implementation

3.1 Implementation in Matlab


Initially the dataset is processed for analysis.One of the columns from the dataset is taken
into consideration for trend filtering.Then a discrete difference operator matrix of order 1
is created using the toeplitz command

Figure 3.1: Reading the dataset & creating D matrix

Then a for loop is initialized to find the Discrete Difference operator of higher order
by recursive nature.The dimensions of the matrix are taken in such a way that they are
suitable for matrix multiplication

11
Figure 3.2: Initializing for loop

This is followed by specifying the objective function along with the constraints in the
cvx optimization tool.

Figure 3.3: Specifying objective & Constraints

For the implementation of specialized ADMM the only difference comes in the ini-
tialization of for loop.

Figure 3.4: Specialized ADMM

12
Here the constraint is obtained by recursive nature of the Discrete Difference Opera-
tor.The constraint here is α=Dk β

Figure 3.5: Specialized ADMM

3.2 Implementation in Python


In this section we will see how the change in value of lambda affects our denoised signal
in Standard ADMM. As the firststep we import the import all the required libraries and
read the dataset. We only consider one column of the Dataset known as second bidding.

Figure 3.6: Variation of λ in python

Here the variable Dk consists of k+1th Discrete Difference Operator Matrix.

13
Now lambda-list variable consists of different values of lambda varying from 0 to
10000 with non-uniform step size.

Figure 3.7: Variation of λ in python

We will minimize our objective function with each value of lambda in the lambda-list
and plot the corresponding denoised signal.

Figure 3.8: Variation of λ in python

14
Chapter 4

Results & Conclusion

4.1 Plots in Matlab


The Standard ADMM Algorithm took 2.03 sec for getting the optimal value whereas
we can see that our specialized ADMM Algorithm took 1.23 sec. The corresponding
denoised signal in show in figure 4.1

Figure 4.1: Standard ADMM and Specialized ADMM

15
4.2 Variation of λ in python
When λ = 0 we can see that our denoised signal is same as the noisy signal and the optimal
value would be zero. The smoothness of our denoised signal increases with increase in
the value of λ .We can see that in the below figure 4.2

Figure 4.2: Plots for various λ in python

16
4.3 Conclusion
From the results obtained it can be concluded that Specialised ADMM algorithm con-
verges faster than standard ADMM algorithm since it uses Dynamic Programming Ap-
proach for the α update.

17
Chapter 5

Appendix

5.1 Code for Standard ADMM Algorithm in Matlab


clc; clear all
T = readtable(’coe-2021.csv’);

y = T.first_bidding;
y = y/norm(y);
n = size(y)
n = n(1)

D_1 = toeplitz([-1,zeros(1,n-2)],[-1,1,zeros(1,n-2)]);

k = 2
for i=1:k
D_1_k = D_1(1:n-i-1,1:n-i);
if i==1
D_k = D_1_k*D_1;
else
D_k = D_1_k*D_k;
end
end

18
D_k

lambda = 0.1;
tic
cvx_begin
variables x(n) alpha(n-k-1)
objective = 0.5*sum_square((y-x)) + lambda*sum(abs(alpha))
minimize(objective)
subject to
alpha == D_k*x
cvx_end
toc
TIME = (toc - tic)

scatter(1:n,y)
hold on
grid on
plot(1:n,x)

5.2 Code for Specialized ADMM Algorithm in Matlab


clc; clear all
T = readtable(’coe-2021.csv’);

y = T.first_bidding;
y = y/norm(y);
n = size(y)
n = n(1)

D_1 = toeplitz([-1,zeros(1,n-2)],[-1,1,zeros(1,n-2)]);

19
k = 3
for i=1:k-1
D_1_k = D_1(1:n-i-1,1:n-i);
if i==1
D_k = D_1_k*D_1;
else
D_k = D_1_k*D_k;
end
end

D_k

lambda = 0.1;

tic
cvx_begin
variables x(n) alpha(n-k)
objective = 0.5*sum_square((y-x)) + lambda*sum(abs((D_1(1:n-k-1,1:n-k)*alpha)))
minimize(objective)
subject to
alpha == D_k*x
cvx_end
toc
TIME = (toc - tic)
scatter(1:n,y)
hold on
grid on
plot(1:n,x)

20
5.3 Code for variation of a denoised signal with λ

import p a n d a s a s pd
import numpy a s np
import cvxpy
import m a t p l o t l i b . p y p l o t a s p l t
from s c i p y . l i n a l g import t o e p l i t z
y = pd . r e a d c s v ( ’ coe − 2 0 2 1 . c s v ’ )
y = y [ ’ s e c o n d b i d d i n g ’ ] . to numpy ( )
y = np . l o g ( y )
n = y . size
c = np . a r r a y ( ( [ − 1 ] + [ 0 ] * ( n − 2 ) ) )
r = np . a r r a y ( ( [ − 1 , 1 ] + [ 0 ] * ( n − 2 ) ) )
D 1 = toeplitz (c , r )
k = 1
D k = np . z e r o s ( ( np . s h a p e ( D 1 ) ) )
f o r i i n range ( 1 , k + 1 ) :
D 1 k = D 1 [ 0 : n− i − 1 , 0 : n− i ]
i f i ==1:
D k = D 1 k@D 1
else :
D k = D 1 k@D k

l a m b d a l i s t = [ 0 , 0 . 1 , 0 . 5 , 1 , 2 , 5 , 10 , 50 , 200 ,
500 , 1000 , 2000 , 5000 , 10000 , 100000]

s o l v e r = cvxpy . CVXOPT
reg norm = 1
f i g , ax = p l t . s u b p l o t s ( l e n ( l a m b d a l i s t ) / / 3 , 3 , f i g s i z e = ( 2 0 , 2 0 ) )
ax = ax . r a v e l ( )
ii = 0

21
for lambda value in l a m b d a l i s t :
x = cvxpy . V a r i a b l e ( s h a p e =n )

o b j e c t i v e = cvxpy . M i n i m i z e ( 0 . 5 * cvxpy . s u m s q u a r e s ( y−x )


+ l a m b d a v a l u e * cvxpy . norm ( D k@x , r e g n o r m ) )

p r o b l e m = cvxpy . P r o b l e m ( o b j e c t i v e )
problem . s o l v e ( s o l v e r = s o l v e r , verbose = F a l s e )
ax [ i i ] . p l o t ( np . a r a n g e ( 1 , n + 1 ) , y , c= ’ b ’ )
ax [ i i ] . p l o t ( np . a r a n g e ( 1 , n + 1 ) , np . a r r a y ( x . v a l u e ) ,
’ b− ’ , l i n e w i d t h = 1 . 0 , c= ’ r ’ )
ax [ i i ] . s e t x l a b e l ( ’ Time ’ )
ax [ i i ] . s e t y l a b e l ( ’ Log Premium ’ )
ax [ i i ] . s e t t i t l e ( ’ Lambda : {}\ n S o l v e r : {}\ n O b j e c t i v e V a l u e : {} ’
. format ( l a m b d a v a l u e , p r o b l e m . s t a t u s , round ( o b j e c t i v e . v a l u e , 3 ) ) )
i i +=1

plt . tight layout ()


p l t . show ( )

5.4 References
1. Fast and Flexible ADMM Algorithms for Trend Filtering
2 .Introduction to Trend Filtering with Applications in Python
3. Scaled form ADMM
4. l1 Trend Filtering
5. A Case for using Trend Filtering over Splines

22

You might also like