Topic 4
Topic 4
Topic 4
Q1. In the context of k-NN, for k=1, consider the following labeled data points
where and are the input variables. If the Euclidian distance is used as a
similarity measure, a new data point (-1,2) will be classi ed as ________ with the
minimum distance equals
(a) “–1“, 1.41
(b) “–1“, 1
(c) “+1“, 1.41
(d) “+1“, 2
(e) “+1”, 1
Solution:
The Euclidian distance from x=(-1,2) to every other point y can be calculated
v
u n
using ( , ) =
uX
t (yi ¡ xi)
2
i=1
The following table shows the resulting distances with point (0,2) having the
shortest distance of 1 and a class of “-1”.
Q2. For the datapoints shown below, select the classi er from the list below that
can be used to classify the data:
(a) Perceptron
(b) SVM
(c) K-Means, K=2
(d) Q-Learning
(e) All of the given answers are correct.
Solution:
Since the data is not linearly separable, the perceptron cannot be used. SVM
with the kernel trick is the most suitable for this set of data. The k-Means will
not be able to work as the data is not linearly separable.
Q3. Given the labeled data set shown in table, and using the Perceptron
learning algorithm with learning rate = 1, and an initial = (1, 1, 1), the nal
vector that classi es all points correctly is given by:
(a) = (−3,1,0)
(b) = (−1, − 1,1)
(c) = (−3,0,1)
(d) = (−5,1,1)
(e) None of the given answers is correct
Solution:
If points are plotted – it is visually clear that points are not linearly separable.
Hence, the perceptron learning algorithm will not converge. Ans = (e).
Solution:
= (−4,2,0,−1), = (1,3,2,2)è . = (−4)(1)+(2)(3)+(0)(2)+(−1)(2) = −4+6+0−
2 = 0. Since . >= 0 è hw( ) = 1. Ans = (a).
Q5. Consider the confusion matrix of the testing of 1000 data points. The obtained values of TP=260,
FN=300, and FP= 130. The accuracy of the model is:
(a) 0.13
(b) 0.26
(c) 0.3
(d) 0.56
(e) 0.57
Solution:
There 1000 data points, therefore TN=1000-260-300-130=310 => accuracy=(TP+TN)/1000=>(310+260)/
1000=570/1000=0.57
Q6. Consider the two separate classes of data shown in the gure below where the rst class of data points
are shown as circles and the other class data is shown as a triangle.
If point 3and point 3 are support vectors, what can be said about points 4 and 2:
a) Both points also identify support vectors.
b) Both points do not identify support vectors.
c) Point 4 identi es a support vector while point 2 does not.
d) Point ! does not identify a support vector while point " does.
e) None of the given answers is correct.
Solution:
Since points c3 and t3 identify support vector for 2 classes, the Maximal Marginal Classi er Line (MMCL) will
bisect the line connecting any line connecting two support vectors from di erent classes. Thus point (1,3)
lies on the MMCL. Let us assume that both c4 and t2 identify support vectors and since the MMCL bisects
the q
2 2
lines connecting any two support vectors, the distance from c4 to point (1,3) = (1 ¡ (¡1)) + (3 ¡ 2) = √5
q
should be equal to the distance between t2 and point (1,3) = = √5. Since point (1,3) is on H0 (i.e.,
2 2
(3 ¡ 1) + (4 ¡ 3)
Q7. Assume we have two models (X and Y) that were trained using the same dataset. Model X was trained
using k-NN algorithm, while model Y was trained using SVM. When tested using 100 data points, the
“0-1 loss function” for the two models is shown in the table below
Solution:
a. Correct, model Y accuracy 75% and model X is 68%
b. Correct because accuracy is (TP+TN)/(TP+TN+FP+FN)
c. Correct: because 0-1 loss function = (FP+FN)
d. Correct, we only know (FP+FN), (TP+TN) and (TP+TN+FP+FN)
e. Incorrect, we only know (TP+TN) but not the exact values of TP or TN
Q8. Consider two separate classes of data shown in the gure below where the rst class of data points
are shown as circles and the other class data is shown as triangles.
If the circles data points are classi ed as label “+1” while the triangle data points are classi ed as label “-1”,
what is a possible unit vector :
Solution:
We need to nd the H0 line. Using the support vectors, we nd 3 points that bisects lines connecting known
support vectors. We know that point (1,3) is on H0. Using c3 and t2 and c4 and t3 we can identify points (2,4)
and (0,2) respectively that lies on H0. Therefore, the correct answer is (e) since the unit vector must be
pointing towards +1 label and perpendicular on H0.
Solution:
(a) Incorrect: more k => better estimate of generalization performance
(b) Incorrect: more k => more computations (more training cycles)
(c) Incorrect: The model will be trained 10 times using 9 training subset and 1 training subset. It will be
repeated 10 times
(d) Correct: Slide 71 in Topic 4
(e) Incorrect: (d) is correct
Q10. For the following data points and using 2-means algorithm with initial cluster centers of 1 = (2, 1) and 2
= (4, 6), then the nal two cluster centers, respectively, must be:
(a) (2, 1) and (4, 6)
(b) (1.5, 2) and (2.5, 5)
(c) (2, 1.5) and (5, 2.5)
(d) (√2, √2) and (2√2, 3√2)
(e) None of the given statements is correct.
Solution:
Inspecting the points and the initial guess of c1 and c2, visually, c1 is the right cluster center for p1 and p2,
while c2 is the right cluster center for p3 and p4. No need to calculations. Ans = (a).
Alternatively:
Iteration 1:
Cluster Centers = 1 - ( 2.000, 1.000), 2 - ( 4.000, 6.000)
Cluster 1 - no of points = 2, updated center = ( 2.000, 1.000) Cluster 2 - no of points = 2, updated center =
( 4.000, 6.000) No change in clusters centers - Procedure converges.
No of iterations = 1
Q11. The cluster center for the following 3 unlabeled points (1, 1), (3, 3), and (4, 3) is equal to
(a) (1.333, 2.667)
(b) (2.333, 1.333)
(c) (2.667, 2.333)
(d) (2.666, 1.333)
(e) None of the given statements is correct.
Solution:
Cluster center given by ((1+3+4)/3, (1+3+3)/3) = (2.667, 1.333). Ans = (a).
Consider a game with 5 states { , , , , }. At each state, there are ve possible actions each
corresponds to moving to a state. The reward of each action is provided in the reward matrix R. The (*)
symbol in the matrix indicates that it’s not possible to go from one state to another state. You can start at
any state and the game ends when reaching state E, for which the player receives a reward of +10. The
reward matrix for each action in the game is shown below.
Q12. The symbol ( → ) denotes the action to move from state A to state B. Which of the following sets of
actions gives the maximum cumulative reward when starting from State A?
(a) ( → ), ( → ), ( → )
(b) ( → ), ( → ), ( → )
(c) ( → ), ( → )
(d) ( → ), ( → )
(e) ( → ), ( → ), ( → )
Solution:
By checking matrix R, solution is v
(A→B),(B→D),(D→E) => reward = -1
(A→C),(C→E) => reward = 3
(A→D),(D→E) =>reward = 2
(A→B),(B→C),(C→E) =>reward = 2
(A→D),(D→C),(C→E) =>reward = 6
Q13. Using the reward matrix R, we executed 11 iterations of Q-learning algorithm and the resulting Q
matrix is shown below.
Using 11, assume that we want to update Q(B,C) with = 1 and = 0.5. What would be the new value for
Q(B,C)?
(a) -6
(b) -4.5
(c) -4
(d) -1.5
(e) 4
Sol
Q14. While updating ( , ) in !!, the maximum future reward estimation (maxa’ Q(s’,a’)) is:
(a) -2
(b) -1
(c) 1
(d) 3
(e) 4
Solution:
From 11, the possible actions from state D is Q(D,A), Q(D,C), and Q(D,E) with values {0,3,1} => 3
Q15. Classi cation accuracy alone can be misleading if the data set contains ____ number of observations
in each class.
(a) a correct
(b) a balanced
(c) an equal
(d) an unbalanced
(e) All of the above
Q16. Which of the following statements is true about the k-NN classi er?
(a) The decision boundary is smoother with very small values of k.
(b) The decision boundary is linear.
(c) For small values of k, the algorithm is less sensitive to noise.
(d) k-NN does not require an explicit training step.
(e) The classi cation accuracy is better with very large values of k.
Q17. The sum of the false positives and false negatives represents the total number of samples ____ .
(choose the best answer below that completes the sentence)
(a) detected as belonging to the same class
(b) that are misclassi ed
(c) predicted negative and are actually positive
(d) correctly predicted as multiple class
(e) detected as belonging to their correct classes
Q20. In the above gure, which of the following are the support vectors found by the maximum margin
SVM boundary?
(a) points:(4,2), (6,4), (4,6), (6,8) only
(b) points:(4,2), (4,6), (2,8) only
(c) points:(4,2), (6,4), (2,8) only
(d) points:(4,2), (6,4), (6,1), (6,8) only
(e) points:(4,2), (6,1), (6,1), (4,6) only
Q21. Using Q0 as a starting point, what is the updated Q value after taking the action (B, right)?
(a) 3.75
(b) 3.5
(c) 2.75
(d) 2.5
(e) None of the above
Q22. Using Q0 as a starting point and a greedy decision-policy, which action is supposed to be taken from
state D?
(a) up
(b) right
(c) down
(d) left
(e) Any of the possible actions
Q23. Using Q0 as a starting point and a ε-greedy decision-policy, which action is supposed to be taken from
state D?
(a) down
(b) up
(c) right
(d) left
(e) Any of the possible actions
Q24. Given the following confusion matrix, which of the following is true?
(a) accuracy = 0.78
(b) accuracy = 0.90
(c) accuracy = 0.83
(d) accuracy = 0.80
(e) accuracy = 0.89
Q26. Based on the ”0-1 loss function”, which boundary line is considered to be better?
(a) Boundary line 2
(b) The two boundary lines can not be compared
(c) Boundary line 1
(d) The two boundary lines are the same
(e) None of the above
Q27. Consider the following sample data points shown in the below table that represents two classes,
namely class A and class B. The same data points are plotted in the associated graph, where class A is
represented by the symbol (⚫ ) while class B is represented by the symbol (♦ ).
We want to classify any new point using the k-NN classi er based on using the Manhattan distance instead
of the Euclidean distance as a distance metric, where the Manhattan distance (D) between two point (x1, y1)
and (x2, y2) is de ned as follows:
D = |x1 − x2|+|y1 − y2|
What would be the class of the new point (−1,+1) for the following values of k: 3, 5, and 10?
(a) B, B, A
(b) B, B, B
(c) A, B, A
(d) A, A, A
(e) B, A, A
Use the following gure for question 28:
Q28. Assuming the use of a single simple perceptron where the output is equal to 1 if Pi wixi+b ≥ 0, what
will be the correct values of the weights for this perceptron that is classifying any point to the left of line AB
to 1:
(a) w2 =−1,w1 =1,b=0
(b) w2 =0,w1 =−1,b=1
(c) w2 =1,w1 =1,b=0
(d) w2 =1,w1 =0,b=1
(e) None of the above
Q29. A machine learning model was trained on some training data and tested on testing data. The error
rate on training set is low and that on testing set was high. Which of the following statements is TRUE:
(a) The model is over tting on testing data
(b) The model is under tting on testing data
(c) The model is over tting on training data
(d) The model is under tting on training data
(e) None of the options are correct
Q30. In terms of the bias-variance tradeo , a 1-Nearest Neighbor classi er has a 25-Nearest Neighbor
classi er.
(a) higher variance and lower bias than
(b) lower variance and higher bias than
(c) lower variance and lower bias than
(d) higher variance and higher bias than
(e) the same variance and bias as
Q31. Which of the following statements is TRUE when considering the perceptron algorithm?
(a) A single perceptron can compute the XOR function.
(b) If while running the perceptron algorithm we make one pass through the data and make a single
classi cation mistake, the algorithm has converged.
(c) Given a linearly separable dataset, the perceptron algorithm is guaranteed to nd a max-margin
hyperplane.
(d) A perceptron is guaranteed to learn a separating decision boundary for a separable dataset within a
nite number of training steps.
(e) All of the above statements are true.
Q32. Consider a k-NN classi er. Based on the below graph that shows the error rate vs. value of k, what
value of k at which k-NN performs optimally?
(a) 4
(b) 40
(c) 3
(d) 10
(e) 20
Q33. Given the following training data samples with two classes (circles and triangles). Which of the
following classi ers can we use to separate the two classes with 100% training accuracy?
Q35. Which of the following machine learning applications is considered as unsuper- vised?
(a) Grouping similar documents together according to their content.
(b) Identifying types of tra c signs by an autonomous driving car.
(c) Diagnosing cancer patients based on example patients’ including both malignant and benign cases.
(d) Predicting stock market based on historical stock market data.
(e) Predicting customers’ service time in a bank based on historical data of service times in this bank.
Q36. Given the dataset below were 5 samples are labeled as squares and diamonds. Which of the given
gures shows the correct decision boundary for 1-Nearest Neighbor classi er trained on the this dataset?
(a) Figure 2
(b) Figure 4
(c) Figure 1
(d) Figure 5
(e) Figure 3
Q38. The table below provides a training dataset containing six examples with their labels (Red and Green).
These examples are used to train a k-Nearest Neighbor classi er. Using the Euclidean distance, identify
which class will this classi er predict for the new point “T” when the value of k = 1 and k = 3, respectively?
(a) Green, Red
(b) The test point cannot be predicted using these values of k.
(c) Red, Red
(d) Red, Green
(e) Green, Green
Q39. In k-fold cross-validation, the dataset is randomly divided into k equal sets, where___ .
(a) 2k sets are used for training and 1 set is used for testing
(b) 1 random set is used for training and the same random set is used for testing
(c) 1 random set is used for training and a di erent 1 random set is used for testing
(d) k-1 sets are used for training and 1 set is used for testing
(e) 1 set is used for training and k-1 sets are used for testing
Q40. Suppose, in order to train and test a machine learning model, you split the dataset into a training and
testing sets. You observed that the model yields 30% accuracy on the training set and 30% accuracy on the
testing set. Which of the following statements is TRUE about this model?
(a) The model is over tting.
(b) The model achieves the best t possible like all machine learning models.
(c) The model has high variance.
(d) The model is under tting.
(e) None of the above statements is true.
Q41. Which of the following statements is TRUE when considering k-Means Clustering algorithm?
(a) It takes a long time to train the algorithm compared to other AI algorithms.
(b) It is an example of unsupervised machine learning algorithms.
(c) It is equivalent to the K-Nearest Neighbor algorithm.
(d) Increasing the value of k always leads to a better result.
(e) It is an example of supervised machine learning algorithms.
Q42. Suppose we trained a k-Nearest Neighbor classi er where k = 3. The dataset is divided into a training
set of 90 points and a testing set of 10 points. What is the total number of distance calculations carried out
during the testing phase?
(a) 1000
(b) 30
(c) 900
(d) 190
(e) 270
Q43. Suppose we tested a binary classi er which predicts two classes, namely “+” and “-“. Testing result
is given in the below confusion matrix. What are the values of precision, recall and accuracy computed
based on the given confusion matrix?
(a) Precision = 0.75, recall = 0.80, accuracy = 1.0
(b) Precision = 0.80, recall = 0.75, accuracy = 0.90
(c) Precision = 0.80, recall = 0.90, accuracy = 0.75
(d) Precision = 0.90, recall = 0.80, accuracy = 0.75
(e) Precision = 0.75, recall = 0.90, accuracy = 0.80
Q44. Which of the following statements is TRUE about AI (Arti cial Intelligence) and ML (Machine
Learning)?
(a) AI and ML are not related to each other.
(b) Like AI, ML has to be explicitly programmed to produce the expected results.
(c) AI has a very wide range and scope while ML has limited scope.
(d) AI and ML are sub elds of Deep Learning.
(e) ML cannot deal with structured data.
Q45. In the gure below, there are 10 samples from a 1-dimensional dataset (i.e., 1 feature only, labeled p).
Assume that all the samples belong to two classes only, namely “X” and “O”. These samples are used to
train a K-Nearest Neighbor classi er. In the case of ties, we prefer class “X”. Suppose we de ne the
training error rate as the fraction of data points incorrectly classi ed.
What is the training error rate of a 1-Nearest-Neighbor classi er trained on this data?
(a) 0.3
(b) 1
(c) 0.1
(d) 0
(e) 0.5
Q46. Referring to question 46 (previous question), what is the training error rate of a 2-Nearest-Neighbor
classi er trained on the same data?
(a) 0.1
(b) 0.3
(c) 1
(d) 0.5
(e) 0
Q47. In 10-fold cross validation, the dataset is randomly divided into 10 equal sets, where
(a) 5 sets are used for training and 5 sets are used for testing.
(b) 9 sets are used for training and 1 set is used for testing.
(c) 1 random set is used for training and a di erent 1 random set is used for testing.
(d) 1 set is used for training and 9 sets are used for testing.
(e) 1 random set is used for training and the same random set is used for testing.
Q48. Consider a k-NN classi er. Based on the below graph that shows the error rate vs. value of k, what
value of k at which k-NN performs optimally?
(a) k=10
(b) k=29
(c) k=9
(d) k=22
(e) k=20
Q49. Which of the following statement is most accurate when considering k-NN algorithm:
(a) A very large value of k always leads to a more accurate classi cation.
(b) A very small value of k makes the algorithm highly sensitive to noisy data (e.g., outliers).
(c) A very small value of k makes the algorithm less sensitive to noisy data (e.g., outliers).
(d) A very large value of k makes the algorithm highly sensitive to noisy data (e.g., outliers).
(e) None of the above.
Q50. Suppose we have two machine learning models, named Model1 and Model2 respectively. These
models are trained and tested using some data. The error rate of Model1 on training set is low and that
on testing set is high. The error rate of Model 2 is high on both the training set and testing set. Which of
the following statements is most likely to be TRUE:
(a) Model1 is under tting on training data, Model2 is over tting on training data.
(b) Model1 is over tting on testing data, Model2 is over tting on testing data.
(c) Model1 is over tting on training data, Model2 is under tting on training data.
(d) Model1 is under tting on testing data, Model is under tting on testing data.
(e) None of the above.
Q51. Using k-means algorithm and Euclidean distance for the clustering, some training data were
grouped into three clusters. Suppose the centroids for the three learnt clusters are C1, C2 and C3. Let the
current cluster centroids be C1 = (2,10), C2 = (6,6), C3 = (1.5,3.5). If there are three new test points A1 =
(2.5, 10), A2 = (2,4), A3 = (4,9), which cluster centroid would each of the given test points be assigned to?
(a) A1 to C2, A2 to C3, A3 to C1
(b) A1 to C3, A2 to C2, A3 to C3
(c) A1 to C2, A2 to C3, A3 to C3
(d) A1 to C1, A2 to C3, A3 to C1
(e) A1 to C1, A2 to C1, A3 to C2
Q52. Which of the following statement(s) are true about SVM kernels?
i. Kernel functions map low dimensional data to a higher dimensional space
ii. Kernel functions stretch the space to separate di erent classes
iii. Kernel functions always cause SVM over tting
(a) Statement (ii) is correct only
(b) Statements (i) and (ii) are correct
(c) Statements (i), (ii) and (iii) are correct
(d) Statement (i) is correct only
(e) Statements (ii) and (iii) are correct
Q53. When are you more likely to consider using SVM over other classi ers?
(a) When there is a need to increase data points (more training data).
(b) When there is a need to decrease data points (remove some training data).
(c) When there is a need to calculate more variables that are related to the train data to have a better
classi cation.
(d) When other classi ers fail due to the lack of hidden structure in the training data.
(e) None of the above.
Q55. Identify the type of learning in which labeled training data is used:
(a) Supervised Learning
(b) Unsupervised Learning
(c) Discovery based learning
(d) Learning by induction
(e) None of the above
Q56. Consider performing K-Means Clustering on a one-dimensional dataset containing ve sample points:
p1=5,p2=7,p3=10,p4=12 and p5=13. Using k=2 and the initial centroids are c1 = 3.0 and c2 = 15.0.
What are the initial cluster assignments? (Which sample points are in cluster c1 and which sample points
are in cluster c2?)
(a) C1 = {P1}, C2 = {P2, P3, P4, P5}
(b) C1 = {P1, P2, P3}, C2 = {P4, P5}
(c) C1 = {P1, P2}, C2 = {P3, P4, P5}
(d) C1={},C2={P1,P2,P3,P4,P5}
(e) None of the above
Q57. In the gure below there are 12 samples from a 1-dimensional dataset (i.e., 1 feature only, labeled p).
Assume that all the samples belong to two classes only, namely “X” and “O”. In the gure, if more than one
samples in the training data have the same feature value, they are drawn vertically stacked as shown in the
gure. For example, two samples in the training data have a value equal to 4, 6 and 10 as shown in the
gure. Using the given gure as the training data, indicate the correct predictions resulting from the k-NN
classi cation of a new test point at p=7 (Not shown in the gure) when we use k = 1, k = 3 and k =7.
(a) “O” (if k=1), “O” (if k=3), “O” (if k=7)
(b) “X” (if k=1), “O” (if k=3), “X” (if k=7)
(c) “X” (if k=1), “X” (if k=3), “O” (if k=7)
(d) “X” (if k=1), “X” (if k=3), “X” (if k=7)
(e) “O” (if k=1), “X” (if k=3), “X” (if k=7)
Q58. Which of the following statements is true about k-Nearest Neighbors classi er:
(a) k-NN requires similar computation time in testing as in training.
(b) k-NN requires higher computation time in testing than in training.
(c) k-NN requires higher computation time in training than in testing.
(d) k-NN is not a classi er but rather an e cient clustering technique.
(e) None of the above.
Q59. Which of the following statements is True about AI (Arti cial Intelligence) and ML (Machine Learning)?
a. AI is considered to be a sub eld of ML
b. ML is considered to be a sub eld of AI
c. ML cannot deal with structured data.
d. AI cannot deal with structured data, but ML can.
e. AI and ML are not related to each other.
Q63. In Support Vector Machine (SVM), transforming data to a higher dimension is done using a:
a. maximal-margin classi er.
b. soft margin.
c. Kernel function.
d. perceptron algorithm.
e. None of the given answer is correct.
Q64. Consider the data shown below, in which there are two classes of data indicated by O and X. A new
data point (0,0) is to be classi ed using the K-NN algorithm. Which of the following is true?
Q65. If we use the Euclidean distance to nd the error in a perceptron learning algorithm, the error function
minimized by the gradient descent is given by:
n
1X
A: E = (ti ¡ Oi)
2 i=1
n
1X
B: E = (ti + Oi)
2 i=1
n
1X 2
C: E = (ti ¡ Oi)
2 i=1
n
1X 2
D: E = (ti + Oi)
2 i=1
Q68. In the below gure, the curve labeled with “Training Error” shows how the error behaves when
training an AI model uses gradient decent. The curve labeled “Test Error” shows the corresponding error in
the testing set. The dotted line shows the “Best Fit” point. Which of the following statement is true given
the gure?
a. Under tting occurs if we move from the “Best Fit” point in the direction of B and over tting occur if we
move from the ‘Best Fit” point in the direction of A.
b. Both under tting and over tting occur as we move in direction of A.
c. Both under tting and over tting occur as we move in direction of B.
d. Under tting occurs if we move from the “Best Fit” point in the direction of A and over tting occur if we
move from the “Best Fit” point in the direction of B.
e. None of the statement shown are true.
Q69. Given the following data points and labels shown below:
a) Ignoring the point “T” with class labeled as “New”, draw on the graph above the decision boundaries
for 1-Nearest Neighbors
Answer:
b) Identify which class will a k-NN classi er predict if the point labeled “New” is classi ed using the value of
k that are shown in the table below:
c) Indicate the e ect of removing the following points (only one at a time) on the 1-NN decision boundary
determined in (a):
Q70. In this problem, we will be dealing with one-dimensional data represented by a set of points ( ). Each
point has only one feature labeled . The label has two classes, namely 0 or 1. We will show you the points
on the axis, labeled by their class values; we also give you a table of values.
a. In the gure below, draw the output of a 1-Nearest-Neighoubur (1-NN) classi er over the range indicated
in the eld. (Show the output of the classi er by indicate at which value of does the classi er changes and
what is the classi er output)?
b. In the gure below, draw the output of a 5-Nearest-Neighbour (5-NN) classi er over the range indicated
in the eld. (Show the output of the classi er by indicate at which value of does the classi er changes
and what is the classi er output)?
Old Quizzes Questions
.
Old HWs Questions
Scratch paper
Scratch paper