Homework 9 Due: March 13, 2020, 11:59PM PT

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

EE 510: Mathematical Foundations of Machine Learning Winter 2020

Homework 9
Due: March 13, 2020, 11:59PM PT
Student Name: Instructor Name: John Lipor

Problem 1 (5 pts each)

In Homework 8 we focused on solving sparse regression problems by incorporating an `1 -norm regularizer. If


we want to encourage a different type of behavior, we can consider different regularizers. Moreover, there are
structures in signals besides sparsity that still allow recovery in the case of highly under-determined systems.
One popular regularizer is called the total variation (TV), which has applications in image denoising and
inpainting. For a signal w ∈ RD , the TV is defined as
D−1
X
TV(w) = |wi+1 − wi | ,
i=1

where wi denotes the ith element of w. In this problem, you will implement TV-regularized regression, which
is useful for estimating signals that are known to be piecewise constant, as shown in Fig. 1 below.

4.5

3.5

2.5

1.5

0.5

0
0 100 200 300 400 500

Figure 1: Example piecewise constant function for total variation-penalized regression.

(a) Define a matrix C ∈ RD−1×D such that


D−1
X
TV(w) = |wi+1 − wi | = kCwk1 .
i=1

1
Homework 9 2

(b) The TV-regularized regression problem can be written as


1 2
min kXw − yk2 + λ kCwk1 (1)
w∈RD 2
or in summation form as
N D−1
1X T 2 X
min xi w − yi + λ |wi+1 − wi | .
w∈RD 2 i=1 i=1

Determine the ADMM updates to solve Eq. (1). You may need to apply the splitting technique and
introduce a new optimization variable.

(c) Implement your devised ADMM algorithm by completing tvADMM in the files for download and test your
algorithm on the script prob1. Turn in:
• Your tvADMM code
• A plot of the true and estimated functions
• A plot of the relative error between iterations as a function of the iteration number, where the error
is defined as (k)
w − w(k−1)
errk =
w(k) ,

where w(k) ∈ RD denotes the estimate of w at the kth iteration.

Problem 2 (5 pts)

In Demo 2, we used the matrix inversion lemma to make ridge regression amenable to the kernel trick. Go
through the description of how Eq. (1) in Demo 2 is derived and write out the steps explicitly here.

Problem 3 (5 pts, 5 pts)

On previous assignments, you developed classifiers based on ridge regression/least squares as well as through
minimization of both the hinge and logistic loss. One drawback to all of these approaches is that they result
in linear decision boundaries. However, many datasets are not linearly separable, and hence our goal in
this problem is to develop a classifier that allows for nonlinear decision boundaries. To do this, you will use
kernel ridge regression (KRR, see demo code) to form a kernelized least squares classifier, which you will
test on the MNIST dataset.

(a) Use either my KRR function or your own to perform classification on MNIST digits 1 and 2 after reducing
the dimensionality to 2 using PCA by completing the prob3 script, taking λ = 0.01 in KRR. Turn in
plots showing the separator and the traning data for σ ∈ {0.1, 1, 5}. Provide the training and test error
for each combination and comment on each plot.
(b) Describe in words how the provided code plots the decision boundary formed by KRR.

Problem 4 (10 pts)

In this problem, your task is to complete a concept map of the material covered in the course this year. The
goal of a concept map is to organize the topics covered in a way that is meaningful to you, helping you gain
a high-level perspective on the tools you have learned throughout the quarter. Follow the instructions given
at this website and create a concept map with roughly 25 items. Note: You may draw this by hand and
still get credit for typing this assignment.

You might also like