Assignment 2
Assignment 2
Assignment 2
Sol. (a)
2. Suppose that we have N independent variables (X1, X2, . . . Xn ) and the dependent variable
is Y . Now imagine that you are applying linear regression by fitting the best fit line using
the least square error on this data. You found that the correlation coefficient for one of its
variables (Say X1 ) with Y is -0.005.
Sol. (a)
The absolute value of the correlation coefficient denotes the strength of the relationship. Since
absolute correlation is significantly less, regressing Y on X1 mostly does not explain away Y .
3. (2 marks) Consider the following five training examples
x y
2 7
3 8.6
4 3
5 11.5
6 22
We want to learn a function f (x) of the form f (x) = ax + b which is parameterised by (a, b).
Using mean squared error as the loss function, which of the following parameters would you
use to model this function to get a solution with the minimum loss?
(a) (4, 3)
(b) (1, 4)
(c) (4, 1)
(d) (3, 4)
Sol. (b)
1
4. The relation between studying time (in hours) and grade on the final examination (0-100) in
a random sample of students in the Introduction to Machine Learning Class was found to be:
Grade = 30.5 + 15.2 (h)
How will a student’s grade be affected if she studies for four hours?
Sol. (c)
The slope of the regression line gives the average increase in grade for every hour increase in
studying. So, if studying is increased by four hours, the grade will increase by 4(15.2) = 60.8.
5. Which of the statements is/are True?
(a) Ridge has sparsity constraint, and it will drive coefficients with low values to 0.
(b) Lasso has a closed form solution for the optimization problem, but this is not the case
for Ridge.
(c) Ridge regression does not reduce the number of variables since it never leads a coefficient
to zero but only minimizes it.
(d) If there are two or more highly collinear variables, Lasso will select one of them randomly.
Sol. (c),(d)
Refer to the lecture
6. Consider the following statements:
Assertion(A): Orthogonalization is applied to the dimensions in linear regression.
Reason(R): Orthogonalization makes univariate regression possible in each orthogonal dimen-
sion separately to produce the coefficients.
Sol. (a)
Refer to the lecture
7. Consider the following statements:
Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the
maximum correlation with the residual, then the residual is regressed on that variable, and it
is added to the predictor.
Statement B: In Forward stagewise selection, the variables are added one by one to the previ-
ously selected variables to produce the best fit till then
2
(a) Both the statements are True.
(b) Statement A is True, and Statement B is False
(c) Statement A if False and Statement B is True
(d) Both the statements are False.
Sol. (d)
Refer to the lecture
8. (2 Marks) The linear regression model y = a0 + a1 x1 + a2 x2 + ... + ap xp is to be fitted to a
set of N training data points having p attributes each. Let X be N × (p + 1) vectors of input
values (augmented by 1‘s), Y be N × 1 vector of target values, and θ be (p + 1) × 1 vector of
parameter values (a0 , a1 , a2 , ..., ap ). If the sum squared error is minimized for obtaining the
optimal regression model, which of the following equation holds?
(a) X T X = XY
(b) Xθ = X T Y
(c) X T Xθ = Y
(d) X T Xθ = X T Y
Sol. (d)
This comes from minimizing the sum of the least squares.
RSS(θ) = (Y − XθT )(Y − Xθ) (in matrix form)
If we take the derivative and equate it to 0, then we get,
X T (Y − Xθ) = 0
So,
X T Xθ = X T Y, θ = (X T X)−1 X T Y.