Homework Exercise 3: Statistical Learning, Fall 2020
Homework Exercise 3: Statistical Learning, Fall 2020
Homework Exercise 3: Statistical Learning, Fall 2020
Homework exercise 3
Due date: 22 December in class
1. ESL 4.2: Similarity of LDA and linear regression for two classes
In this problem you will show that for two classes, linear regression leads to the same discriminating
direction as LDA, but not to the exact same classification rule in general.
The derivations for this problem are rather lengthy. Consider part (b) (finding the linear regression
direction) to be extra credit. If you fail to prove one step, try to comment on its geometric interpretation
instead, and move to the next step.
2. Short intuition problems
Choose and explain briefly. If you need additional assumptions to reach your conclusion, specify them.
(a) What is not an advantage of using logistic loss over using squared error loss with 0-1 coding for
2-class classification?
i. That the expected prediction error is minimized by correctly predicting P (Y |X).
ii. That it has a natural probabilistic generalization to K > 2 classes.
iii. That its predictions are always legal probabilities in the range (0, 1).
(b) In the generative 2-class classification models LDA and QDA, what type of distribution does
P (Y |X = x) have?
i. Unknown
ii. Gaussian
iii. Bernoulli
(c) We mentioned in class that Naive Bayes assumes P (x|Y = g) = Πpj=1 Pj (xj |Y = g). In what
situation would you expect this simplifying assumption to be most useful?
i. Small number of predictors, not highly correlated.
ii. Small number of predictors, highly correlated between them.
iii. Large number of predictors, not highly correlated.
iv. Large number of predictors, many highly correlated between them.
1
with resulting probabilities:
exp{X T βk }
P (G = k|X) = P , k<K
1 + l<K exp{X T βl }
1
P (G = K|X) = P .
1 + l<K exp{X T βl }
Show that if we choose a different class in the denominator, we can obtain the same set of probabilities
by a different set of linear models (i.e., values of β). Hence the two representations are equivalent in
the probabilities they yield.
4. Separability and optimal separators
ESL 4.5: Show that the solution of logistic regression is undefined if the data are separable.
5. (* A real challenge1 )
In the separable case, consider adding a small amount of ridge-type regularization to the likelihood:
X
β̂(λ) = arg min −l(β; X, y) + λ βj2
β
j
1 +50 points extra credit for original solution; +20 points for finding a solution in the literature and explaining it clearly; +5