End-Term Solutions
End-Term Solutions
End-Term Solutions
End-Term Examination
Machine Learning: MSD527
(Academic Year 2021-22)
Solution 3c.): For a new user who come in, we have little information about them, and thus
the matrix factorization method cannot learn much associations between the new user and the
existing users. ……(2 Marks).
We should use the demographics information of the user to bridge its associations with existing
users. Many ideas can be applied here. The most intuitive way is perhaps to do regression based
on the demographics features and compute a similarity between the new user and existing users,
then approximate vu with a linear combination. ……(2 Marks).
Solution 3d.): Using meta data of the movie as additional information to encode the similarity.
(1 mark)
Perhaps approximating the corresponding wm as a linear combination of existing movies based
on their similarities in terms of meta information. (1 mark)
This can be encoded in the objective function. (1 Mark)
Solution 3e.): If we want to map sample points to a very high-dimensional feature space, the
kernel trick can save us from having to compute those features explicitly, thereby saving a lot
of time. (2 Marks)
The kernel trick enables the use of infinite-dimensional feature spaces. (1 Mark)
Solution 3f.): The large number of centroids means that most centroids are likely identifying
individual data points. (1.5 Mark)
In real sense there is no learning, as the whole data is memorized with no generalization. (1.5
Mark)
Processing of new data will likely be unreliable. (1 mark)
Solution 6a.): The data points can be separated linearly, e.g. by the line 𝑥1 + 𝑥2 = 2.5 (2
marks)
Solution 6b.): For A, we get 𝑧 = 𝑤0𝑥0 + 𝑤1𝑥1 + 𝑤2𝑥2 = 0(−1) − 1(1) + 1(2) = 1. Since 𝑧 > 0,
we get the prediction 𝑦 = 1. Since 𝑦 = 𝑡, there is no change to 𝒘. (2 marks)
For B, we get 𝑧 = 𝑤0𝑥0 + 𝑤1𝑥1 + 𝑤2𝑥2 = 0(−1) − 1(2) + 1(1) = −1. Since 𝑧 < 0, we get the
prediction 𝑦 = 0. (2 marks)
The update 𝒘 = 𝒘 − 𝜂(𝑦 − 𝑡)𝒙 = (0, −1,1) − 0.1(0 − 1)(−1,2,1) = (−0.1, −0.8,1.1) (2 marks)
Solution 6c.): Point A. Since 𝑦 = 𝑧 = 1 = 𝑡, there is no update.
Data point B. Since 𝑦 = 𝑧 = −1, the update is 𝒘 = 𝒘 – 𝜂(𝑦 − 𝑡)𝒙 = (0, −1,1) − 0.1(−1 − 1)(−1,2,1)
= (−0.2, −0.6,1.2) (2 marks)
Solution 7a.): Assume the values on the graph & give answer, for example we assume that P(1
| x=0) = 0.2 and P(0 | x=0) = 0.8. (3 Marks)
Solution 7b.): You can never be 100% sure with a logistic regression model. (3 Marks)
Solution 7c.): This is normally done by choosing the class 1 if P (1 | x) > 0.5. Based on the
assumption considered in part a.) the logistic classifier classify 4 spams correctly and 3 spams
incorrectly and 5 no-spams correctly and 2 incorrectly. Altogether 9 out of 14 are classified
correctly, yielding and accuracy of 9/14. (3 Marks)
Solution 7d.): We could either say that the goal is to get a good precision for spam, or a good
recall for no-spam. This can be achieved by raising the threshold from 0.5. For the training
data, a threshold of 0.7 would suffice. If we want to be prepared for more variation in the test
data, we could set the threshold even higher. (3 Marks)