Topic 4

Old Finals / Midterms Questions
Q1. In the context of k-NN, for k=1, consider the following labeled data points
where and are the input variables. If the Euclidian distance is used as a
similarity measure, a new data point (-1,2) will be classi ed as ________ with the
minimum distance equals
(a) “–1“, 1.41
(b) “–1“, 1
(c) “+1“, 1.41
(d) “+1“, 2
(e) “+1”, 1
Solution:
The Euclidian distance from x=(-1,2) to every other point y can be calculated
v
u n
using ( , ) =
uX
t (yi ¡ xi)
2
i=1
The following table shows the resulting distances with point (0,2) having the
shortest distance of 1 and a class of “-1”.
Q2. For the datapoints shown below, select the classi er from the list below that
can be used to classify the data:
(a) Perceptron
(b) SVM
(c) K-Means, K=2
(d) Q-Learning
(e) All of the given answers are correct.
Solution:
Since the data is not linearly separable, the perceptron cannot be used. SVM
with the kernel trick is the most suitable for this set of data. The k-Means will
not be able to work as the data is not linearly separable.
Q3. Given the labeled data set shown in table, and using the Perceptron
learning algorithm with learning rate = 1, and an initial = (1, 1, 1), the nal
vector that classi es all points correctly is given by:
(a) = (−3,1,0)
(b) = (−1, − 1,1)
(c) = (−3,0,1)
(d) = (−5,1,1)
(e) None of the given answers is correct
Solution:
If points are plotted – it is visually clear that points are not linearly separable.
Hence, the perceptron learning algorithm will not converge. Ans = (e).
Q4. Using the Perceptron learning algorithm with = ( 0 , 1 , 2 , 3 ) = (−4, 2,

0, −1) and 3- dimensional data point = ( 1, 2, 3) = (3, 2, 2), the function hw( )
is evaluated to be
(a) 0
(b) −2;
(c) 1.
(d) 2
(e) None of the given answers is correct.
Solution:
= (−4,2,0,−1), = (1,3,2,2)è . = (−4)(1)+(2)(3)+(0)(2)+(−1)(2) = −4+6+0−
2 = 0. Since . >= 0 è hw( ) = 1. Ans = (a).
Q5. Consider the confusion matrix of the testing of 1000 data points. The obtained values of TP=260,
FN=300, and FP= 130. The accuracy of the model is:
(a) 0.13
(b) 0.26
(c) 0.3
(d) 0.56
(e) 0.57
Solution:
There 1000 data points, therefore TN=1000-260-300-130=310 => accuracy=(TP+TN)/1000=>(310+260)/
1000=570/1000=0.57
Q6. Consider the two separate classes of data shown in the gure below where the rst class of data points
are shown as circles and the other class data is shown as a triangle.
If point 3and point 3 are support vectors, what can be said about points 4 and 2:
a) Both points also identify support vectors.
b) Both points do not identify support vectors.
c) Point 4 identi es a support vector while point 2 does not.
d) Point ! does not identify a support vector while point " does.
e) None of the given answers is correct.
Solution:
Since points c3 and t3 identify support vector for 2 classes, the Maximal Marginal Classi er Line (MMCL) will
bisect the line connecting any line connecting two support vectors from di erent classes. Thus point (1,3)
lies on the MMCL. Let us assume that both c4 and t2 identify support vectors and since the MMCL bisects
the q
2 2
lines connecting any two support vectors, the distance from c4 to point (1,3) = (1 ¡ (¡1)) + (3 ¡ 2) = √5
q
should be equal to the distance between t2 and point (1,3) = = √5. Since point (1,3) is on H0 (i.e.,
2 2
(3 ¡ 1) + (4 ¡ 3)
MMCL line) it follows that the c4 and t2 are support vectors.
Q7. Assume we have two models (X and Y) that were trained using the same dataset. Model X was trained
using k-NN algorithm, while model Y was trained using SVM. When tested using 100 data points, the
“0-1 loss function” for the two models is shown in the table below
Which of the following statements is NOT correct?

(a) Model Y has better accuracy than model X
(b) There are more TPs and TNs in model Y than in model X
(c) There are more FPs and FNs in Model X than in Model Y
(d) We cannot tell the exact value of FNs in both models
(e) The number of TPs is more than the number of TNs in both models
Solution:
a. Correct, model Y accuracy 75% and model X is 68%
b. Correct because accuracy is (TP+TN)/(TP+TN+FP+FN)
c. Correct: because 0-1 loss function = (FP+FN)
d. Correct, we only know (FP+FN), (TP+TN) and (TP+TN+FP+FN)
e. Incorrect, we only know (TP+TN) but not the exact values of TP or TN
Q8. Consider two separate classes of data shown in the gure below where the rst class of data points
are shown as circles and the other class data is shown as triangles.
If the circles data points are classi ed as label “+1” while the triangle data points are classi ed as label “-1”,
what is a possible unit vector :
Solution:
We need to nd the H0 line. Using the support vectors, we nd 3 points that bisects lines connecting known
support vectors. We know that point (1,3) is on H0. Using c3 and t2 and c4 and t3 we can identify points (2,4)
and (0,2) respectively that lies on H0. Therefore, the correct answer is (e) since the unit vector must be
pointing towards +1 label and perpendicular on H0.
Q9. Which of the following statements about cross-validation is correct

(a) 4-fold cross validation gives better estimate of the model’s generalization performance than 10-fold cross
validation
(b) 5-fold cross validation requires more computations than 6-fold cross validation
(c) Using 10-fold cross validation, the model will be trained 9 times and tested 1 time only
(d) The overall performance of the model is obtained by averaging the error over k validation sets
(e) None of the given statements is correct.
Solution:
(a) Incorrect: more k => better estimate of generalization performance
(b) Incorrect: more k => more computations (more training cycles)
(c) Incorrect: The model will be trained 10 times using 9 training subset and 1 training subset. It will be
repeated 10 times
(d) Correct: Slide 71 in Topic 4
(e) Incorrect: (d) is correct
Q10. For the following data points and using 2-means algorithm with initial cluster centers of 1 = (2, 1) and 2
= (4, 6), then the nal two cluster centers, respectively, must be:
(a) (2, 1) and (4, 6)
(b) (1.5, 2) and (2.5, 5)
(c) (2, 1.5) and (5, 2.5)
(d) (√2, √2) and (2√2, 3√2)
Solution:
Inspecting the points and the initial guess of c1 and c2, visually, c1 is the right cluster center for p1 and p2,
while c2 is the right cluster center for p3 and p4. No need to calculations. Ans = (a).
Alternatively:
Iteration 1:
Cluster Centers = 1 - ( 2.000, 1.000), 2 - ( 4.000, 6.000)
Cluster 1 - no of points = 2, updated center = ( 2.000, 1.000) Cluster 2 - no of points = 2, updated center =
( 4.000, 6.000) No change in clusters centers - Procedure converges.
No of iterations = 1
Q11. The cluster center for the following 3 unlabeled points (1, 1), (3, 3), and (4, 3) is equal to
(a) (1.333, 2.667)
(b) (2.333, 1.333)
(c) (2.667, 2.333)
(d) (2.666, 1.333)
Solution:
Cluster center given by ((1+3+4)/3, (1+3+3)/3) = (2.667, 1.333). Ans = (a).
Consider a game with 5 states { , , , , }. At each state, there are ve possible actions each
corresponds to moving to a state. The reward of each action is provided in the reward matrix R. The (*)
symbol in the matrix indicates that it’s not possible to go from one state to another state. You can start at
any state and the game ends when reaching state E, for which the player receives a reward of +10. The
reward matrix for each action in the game is shown below.
Q12. The symbol ( → ) denotes the action to move from state A to state B. Which of the following sets of
actions gives the maximum cumulative reward when starting from State A?
(a) ( → ), ( → ), ( → )
(b) ( → ), ( → ), ( → )
(c) ( → ), ( → )
(d) ( → ), ( → )
(e) ( → ), ( → ), ( → )
Solution:
By checking matrix R, solution is v
(A→B),(B→D),(D→E) => reward = -1
(A→C),(C→E) => reward = 3
(A→D),(D→E) =>reward = 2
(A→B),(B→C),(C→E) =>reward = 2
(A→D),(D→C),(C→E) =>reward = 6
Q13. Using the reward matrix R, we executed 11 iterations of Q-learning algorithm and the resulting Q
matrix is shown below.
Using 11, assume that we want to update Q(B,C) with = 1 and = 0.5. What would be the new value for
Q(B,C)?
(a) -6
(b) -4.5
(c) -4
(d) -1.5
(e) 4
Sol
Q14. While updating ( , ) in !!, the maximum future reward estimation (maxa’ Q(s’,a’)) is:
(a) -2
(b) -1
(c) 1
(d) 3
(e) 4
Solution:
From 11, the possible actions from state D is Q(D,A), Q(D,C), and Q(D,E) with values {0,3,1} => 3
Q15. Classi cation accuracy alone can be misleading if the data set contains ____ number of observations
in each class.
(a) a correct
(b) a balanced
(c) an equal
(d) an unbalanced
(e) All of the above
Q16. Which of the following statements is true about the k-NN classi er?
(a) The decision boundary is smoother with very small values of k.
(b) The decision boundary is linear.
(c) For small values of k, the algorithm is less sensitive to noise.
(d) k-NN does not require an explicit training step.
(e) The classi cation accuracy is better with very large values of k.
Q17. The sum of the false positives and false negatives represents the total number of samples ____ .
(choose the best answer below that completes the sentence)
(a) detected as belonging to the same class
(b) that are misclassi ed
(c) predicted negative and are actually positive
(d) correctly predicted as multiple class
(e) detected as belonging to their correct classes
Use the following text for questions 18 to 20:

SVM maximizes the margin around the separating hyperplane. Suppose that you train SVM on a dataset
with 6 points as shown in the following gure. This dataset contains three samples with class label -1, and
three samples with class label +1.
Q18. Which of the following options is the correct

equation corresponding to the maximum margin decision boundary?
1 1 1 1
(a) x2 ¡ x1 = 0 (b) x2 + x1 = 0
2 2 2 2
1 1 1 1 (e) None of the answers

(c) x2 ¡ x1 = ¡1 (d) x2 + x1 = 1
2 2 2 2
Q19. In the above gure, what would be the width of the maximum margin decision boundary
(d = d− + d+) learnt by SVM:
(a) 4
(b) 2
(c) 2.82
(d) 1.41
(e) 1
Q20. In the above gure, which of the following are the support vectors found by the maximum margin
SVM boundary?
(a) points:(4,2), (6,4), (4,6), (6,8) only
(b) points:(4,2), (4,6), (2,8) only
(c) points:(4,2), (6,4), (2,8) only
(d) points:(4,2), (6,4), (6,1), (6,8) only
(e) points:(4,2), (6,1), (6,1), (4,6) only

Consider the shown (3x2) game world that has 6 states A, B, C, D, E, F and four actions (right, left, up,
down). In every new episode, the game starts by choosing a random state and ends when state F is
reached, for which the player receives a reward of +10. For all other actions that do not lead to state F, the
reward is -1. Shown below, Q0 is the Q function after initial training using the Q-learning algorithm.
Q21. Using Q0 as a starting point, what is the updated Q value after taking the action (B, right)?
(a) 3.75
(b) 3.5
(c) 2.75
(d) 2.5
(e) None of the above
Q22. Using Q0 as a starting point and a greedy decision-policy, which action is supposed to be taken from
state D?
(a) up
(b) right
(c) down
(d) left
(e) Any of the possible actions
Q23. Using Q0 as a starting point and a ε-greedy decision-policy, which action is supposed to be taken from
state D?
(a) down
(b) up
(c) right
(d) left
(e) Any of the possible actions
Q24. Given the following confusion matrix, which of the following is true?
(a) accuracy = 0.78
(b) accuracy = 0.90
(c) accuracy = 0.83
(d) accuracy = 0.80
(e) accuracy = 0.89

Given the shown data points comprising two classes represented as diamonds and circles. Two di erent
classi cation algorithms produced the two shown boundary lines “Boundary 1” and “Boundary 2”, for which
the diamonds-class are above the lines and the circles- class are below the line .
Q25. What is the total ”0-1 loss function” for boundary 1?

(a) 8
(b) 7
(c) 6
(d) 4
(e) 5
Q26. Based on the ”0-1 loss function”, which boundary line is considered to be better?
(a) Boundary line 2
(b) The two boundary lines can not be compared
(c) Boundary line 1
(d) The two boundary lines are the same
Q27. Consider the following sample data points shown in the below table that represents two classes,
namely class A and class B. The same data points are plotted in the associated graph, where class A is
represented by the symbol (⚫ ) while class B is represented by the symbol (♦ ).
We want to classify any new point using the k-NN classi er based on using the Manhattan distance instead
of the Euclidean distance as a distance metric, where the Manhattan distance (D) between two point (x1, y1)
and (x2, y2) is de ned as follows:
D = |x1 − x2|+|y1 − y2|
What would be the class of the new point (−1,+1) for the following values of k: 3, 5, and 10?
(a) B, B, A
(b) B, B, B
(c) A, B, A
(d) A, A, A
(e) B, A, A
Use the following gure for question 28:
Q28. Assuming the use of a single simple perceptron where the output is equal to 1 if Pi wixi+b ≥ 0, what
will be the correct values of the weights for this perceptron that is classifying any point to the left of line AB
to 1:
(a) w2 =−1,w1 =1,b=0
(b) w2 =0,w1 =−1,b=1
(c) w2 =1,w1 =1,b=0
(d) w2 =1,w1 =0,b=1
Q29. A machine learning model was trained on some training data and tested on testing data. The error
rate on training set is low and that on testing set was high. Which of the following statements is TRUE:
(a) The model is over tting on testing data
(b) The model is under tting on testing data
(c) The model is over tting on training data
(d) The model is under tting on training data
(e) None of the options are correct
Q30. In terms of the bias-variance tradeo , a 1-Nearest Neighbor classi er has a 25-Nearest Neighbor
classi er.
(a) higher variance and lower bias than
(b) lower variance and higher bias than
(c) lower variance and lower bias than
(d) higher variance and higher bias than
(e) the same variance and bias as
Q31. Which of the following statements is TRUE when considering the perceptron algorithm?
(a) A single perceptron can compute the XOR function.
(b) If while running the perceptron algorithm we make one pass through the data and make a single
classi cation mistake, the algorithm has converged.
(c) Given a linearly separable dataset, the perceptron algorithm is guaranteed to nd a max-margin
hyperplane.
(d) A perceptron is guaranteed to learn a separating decision boundary for a separable dataset within a
nite number of training steps.
(e) All of the above statements are true.
Q32. Consider a k-NN classi er. Based on the below graph that shows the error rate vs. value of k, what
value of k at which k-NN performs optimally?
(a) 4
(b) 40
(c) 3
(d) 10
(e) 20
Q33. Given the following training data samples with two classes (circles and triangles). Which of the
following classi ers can we use to separate the two classes with 100% training accuracy?
(a) SVM with a polynomial kernel (polynomial degree is 2)

(b) The two classes cannot be separated using any of the given classi ers.
(c) The Perceptron algorithm (single perceptron)
(d) K-Nearest Neighbor classi er (K = 20)
(e) SVM with a polynomial kernel (polynomial degree is 1)
Q34. In the gure below, there are 5 samples from a 1-dimensional dataset (i.e., 1 feature only, labeled x).
Assume that all the samples belong to two classes only, “triangle” and “square”. The samples are used to
train a Support Vector Classi er (SVC). At which point lies the decision boundary obtained by the hard
margin?
(a) x = -0.75
(b) x = 0
(c) x = -0.5
(d) x = 1
(e) x=-1
Q35. Which of the following machine learning applications is considered as unsupervised?
(a) Grouping similar documents together according to their content.
(b) Identifying types of tra c signs by an autonomous driving car.
(c) Diagnosing cancer patients based on example patients’ including both malignant and benign cases.
(d) Predicting stock market based on historical stock market data.
(e) Predicting customers’ service time in a bank based on historical data of service times in this bank.
Q36. Given the dataset below were 5 samples are labeled as squares and diamonds. Which of the given
gures shows the correct decision boundary for 1-Nearest Neighbor classi er trained on the this dataset?
(a) Figure 2
(b) Figure 4
(c) Figure 1
(d) Figure 5
(e) Figure 3
Q37. Machine learning approaches can be categorized as:

(a) Reinforcement and conductive learning
(b) Supervised, unsupervised and reinforcement learning
(c) Learning by search and learning by doing
(d) Deductive and inductive learning
(e) Rote learning and deep learning
Q38. The table below provides a training dataset containing six examples with their labels (Red and Green).
These examples are used to train a k-Nearest Neighbor classi er. Using the Euclidean distance, identify
which class will this classi er predict for the new point “T” when the value of k = 1 and k = 3, respectively?
(a) Green, Red
(b) The test point cannot be predicted using these values of k.
(c) Red, Red
(d) Red, Green
(e) Green, Green
Q39. In k-fold cross-validation, the dataset is randomly divided into k equal sets, where___ .
(a) 2k sets are used for training and 1 set is used for testing
(b) 1 random set is used for training and the same random set is used for testing
(c) 1 random set is used for training and a di erent 1 random set is used for testing
(d) k-1 sets are used for training and 1 set is used for testing
(e) 1 set is used for training and k-1 sets are used for testing
Q40. Suppose, in order to train and test a machine learning model, you split the dataset into a training and
testing sets. You observed that the model yields 30% accuracy on the training set and 30% accuracy on the
testing set. Which of the following statements is TRUE about this model?
(a) The model is over tting.
(b) The model achieves the best t possible like all machine learning models.
(c) The model has high variance.
(d) The model is under tting.
(e) None of the above statements is true.
Q41. Which of the following statements is TRUE when considering k-Means Clustering algorithm?
(a) It takes a long time to train the algorithm compared to other AI algorithms.
(b) It is an example of unsupervised machine learning algorithms.
(c) It is equivalent to the K-Nearest Neighbor algorithm.
(d) Increasing the value of k always leads to a better result.
(e) It is an example of supervised machine learning algorithms.
Q42. Suppose we trained a k-Nearest Neighbor classi er where k = 3. The dataset is divided into a training
set of 90 points and a testing set of 10 points. What is the total number of distance calculations carried out
during the testing phase?
(a) 1000
(b) 30
(c) 900
(d) 190
(e) 270
Q43. Suppose we tested a binary classi er which predicts two classes, namely “+” and “-“. Testing result
is given in the below confusion matrix. What are the values of precision, recall and accuracy computed
based on the given confusion matrix?
(a) Precision = 0.75, recall = 0.80, accuracy = 1.0
(b) Precision = 0.80, recall = 0.75, accuracy = 0.90
(c) Precision = 0.80, recall = 0.90, accuracy = 0.75
(d) Precision = 0.90, recall = 0.80, accuracy = 0.75
(e) Precision = 0.75, recall = 0.90, accuracy = 0.80
Q44. Which of the following statements is TRUE about AI (Arti cial Intelligence) and ML (Machine
Learning)?
(a) AI and ML are not related to each other.
(b) Like AI, ML has to be explicitly programmed to produce the expected results.
(c) AI has a very wide range and scope while ML has limited scope.
(d) AI and ML are sub elds of Deep Learning.
(e) ML cannot deal with structured data.
Q45. In the gure below, there are 10 samples from a 1-dimensional dataset (i.e., 1 feature only, labeled p).
Assume that all the samples belong to two classes only, namely “X” and “O”. These samples are used to
train a K-Nearest Neighbor classi er. In the case of ties, we prefer class “X”. Suppose we de ne the
training error rate as the fraction of data points incorrectly classi ed.
What is the training error rate of a 1-Nearest-Neighbor classi er trained on this data?
(a) 0.3
(b) 1
(c) 0.1
(d) 0
(e) 0.5
Q46. Referring to question 46 (previous question), what is the training error rate of a 2-Nearest-Neighbor
classi er trained on the same data?
(a) 0.1
(b) 0.3
(c) 1
(d) 0.5
(e) 0
Q47. In 10-fold cross validation, the dataset is randomly divided into 10 equal sets, where
(a) 5 sets are used for training and 5 sets are used for testing.
(b) 9 sets are used for training and 1 set is used for testing.
(c) 1 random set is used for training and a di erent 1 random set is used for testing.
(d) 1 set is used for training and 9 sets are used for testing.
(e) 1 random set is used for training and the same random set is used for testing.
Q48. Consider a k-NN classi er. Based on the below graph that shows the error rate vs. value of k, what
value of k at which k-NN performs optimally?
(a) k=10
(b) k=29
(c) k=9
(d) k=22
(e) k=20
Q49. Which of the following statement is most accurate when considering k-NN algorithm:
(a) A very large value of k always leads to a more accurate classi cation.
(b) A very small value of k makes the algorithm highly sensitive to noisy data (e.g., outliers).
(c) A very small value of k makes the algorithm less sensitive to noisy data (e.g., outliers).
(d) A very large value of k makes the algorithm highly sensitive to noisy data (e.g., outliers).
(e) None of the above.
Q50. Suppose we have two machine learning models, named Model1 and Model2 respectively. These
models are trained and tested using some data. The error rate of Model1 on training set is low and that
on testing set is high. The error rate of Model 2 is high on both the training set and testing set. Which of
the following statements is most likely to be TRUE:
(a) Model1 is under tting on training data, Model2 is over tting on training data.
(b) Model1 is over tting on testing data, Model2 is over tting on testing data.
(c) Model1 is over tting on training data, Model2 is under tting on training data.
(d) Model1 is under tting on testing data, Model is under tting on testing data.
Q51. Using k-means algorithm and Euclidean distance for the clustering, some training data were
grouped into three clusters. Suppose the centroids for the three learnt clusters are C1, C2 and C3. Let the
current cluster centroids be C1 = (2,10), C2 = (6,6), C3 = (1.5,3.5). If there are three new test points A1 =
(2.5, 10), A2 = (2,4), A3 = (4,9), which cluster centroid would each of the given test points be assigned to?
(a) A1 to C2, A2 to C3, A3 to C1
(b) A1 to C3, A2 to C2, A3 to C3
(c) A1 to C2, A2 to C3, A3 to C3
(d) A1 to C1, A2 to C3, A3 to C1
(e) A1 to C1, A2 to C1, A3 to C2
Q52. Which of the following statement(s) are true about SVM kernels?
i. Kernel functions map low dimensional data to a higher dimensional space
ii. Kernel functions stretch the space to separate di erent classes
iii. Kernel functions always cause SVM over tting
(a) Statement (ii) is correct only
(b) Statements (i) and (ii) are correct
(c) Statements (i), (ii) and (iii) are correct
(d) Statement (i) is correct only
(e) Statements (ii) and (iii) are correct
Q53. When are you more likely to consider using SVM over other classi ers?
(a) When there is a need to increase data points (more training data).
(b) When there is a need to decrease data points (remove some training data).
(c) When there is a need to calculate more variables that are related to the train data to have a better
classi cation.
(d) When other classi ers fail due to the lack of hidden structure in the training data.
Q54. The Maximum Margin Classi er is described as:

(a) The classi er that nds the smallest distance among support vectors of di erent classes.
(b) The classi er that nds some arbitrary line that can classify di erent sets of data.
(c) The classi er that uses a kernel to nd a good line that separates di erent classes.
(d) The classi er that nds the widest distance among support vectors of di erent classes.
Q55. Identify the type of learning in which labeled training data is used:
(a) Supervised Learning
(b) Unsupervised Learning
(c) Discovery based learning
(d) Learning by induction
Q56. Consider performing K-Means Clustering on a one-dimensional dataset containing ve sample points:
p1=5,p2=7,p3=10,p4=12 and p5=13. Using k=2 and the initial centroids are c1 = 3.0 and c2 = 15.0.
What are the initial cluster assignments? (Which sample points are in cluster c1 and which sample points
are in cluster c2?)
(a) C1 = {P1}, C2 = {P2, P3, P4, P5}
(b) C1 = {P1, P2, P3}, C2 = {P4, P5}
(c) C1 = {P1, P2}, C2 = {P3, P4, P5}
(d) C1={},C2={P1,P2,P3,P4,P5}
Q57. In the gure below there are 12 samples from a 1-dimensional dataset (i.e., 1 feature only, labeled p).
Assume that all the samples belong to two classes only, namely “X” and “O”. In the gure, if more than one
samples in the training data have the same feature value, they are drawn vertically stacked as shown in the
gure. For example, two samples in the training data have a value equal to 4, 6 and 10 as shown in the
gure. Using the given gure as the training data, indicate the correct predictions resulting from the k-NN
classi cation of a new test point at p=7 (Not shown in the gure) when we use k = 1, k = 3 and k =7.
(a) “O” (if k=1), “O” (if k=3), “O” (if k=7)
(b) “X” (if k=1), “O” (if k=3), “X” (if k=7)
(c) “X” (if k=1), “X” (if k=3), “O” (if k=7)
(d) “X” (if k=1), “X” (if k=3), “X” (if k=7)
(e) “O” (if k=1), “X” (if k=3), “X” (if k=7)
Q58. Which of the following statements is true about k-Nearest Neighbors classi er:
(a) k-NN requires similar computation time in testing as in training.
(b) k-NN requires higher computation time in testing than in training.
(c) k-NN requires higher computation time in training than in testing.
(d) k-NN is not a classi er but rather an e cient clustering technique.
Q59. Which of the following statements is True about AI (Arti cial Intelligence) and ML (Machine Learning)?
a. AI is considered to be a sub eld of ML
b. ML is considered to be a sub eld of AI
c. ML cannot deal with structured data.
d. AI cannot deal with structured data, but ML can.
e. AI and ML are not related to each other.
Q60. Machine learning can be divided into two main classes:

a. Deep learning and neural network
b. Supervised learning and unsupervised learning
c. K-means and k- Nearest Neighbor
d. Over tting learning and under tting learning
e. Learning by search and learning by doing.
Q61. Which statement is true when considering -Nearest Neighbor algorithm:

a. It can be used for classi cation.
b. It is an example of unsupervised learning algorithms.
c. It takes a long time to train the algorithm compared to other AI algorithms.
d. It is equivalent to the K-means algorithm
e. Increasing the value of always leads to a better result.
Q62. In four-fold cross validation, the dataset is divided into 4 equal sets, where
a. 2 sets are used for training and 2 sets are used for testing.
b. 4 sets are used for both the training and testing.
c. 1 set is used for training and 3 sets are used for testing.
d. 3 sets are used for training and 1 set is used for testing.
e. None of the given answer is correct.
Q63. In Support Vector Machine (SVM), transforming data to a higher dimension is done using a:
a. maximal-margin classi er.
b. soft margin.
c. Kernel function.
d. perceptron algorithm.
e. None of the given answer is correct.
Q64. Consider the data shown below, in which there are two classes of data indicated by O and X. A new
data point (0,0) is to be classi ed using the K-NN algorithm. Which of the following is true?
a. 1-NN classi es the new data as X

b. 5-NN classi es the new data as O
c. 3-NN classi es the new data as X
d. 1-NN cannot be used to classify the data since there are lots of points.
e. 1-NN classi es the new data as O
Q65. If we use the Euclidean distance to nd the error in a perceptron learning algorithm, the error function
minimized by the gradient descent is given by:
n
1X
A: E = (ti ¡ Oi)
2 i=1
n
1X
B: E = (ti + Oi)
2 i=1
n
1X 2
C: E = (ti ¡ Oi)
2 i=1
n
1X 2
D: E = (ti + Oi)
2 i=1
E: N one of the above

Q66. Assume an AI model is not complex enough to accurately capture relationships between
a dataset’s features and a target variable (under tting). What can be concluded when we compare
the training set error and the testing set error?
a. The training error is expected to be larger than the testing error.
b. The training error is expected to be less than the testing error.
c. The training error is expected to be equal to the testing error.
d. The relation between the training error and the testing error cannot be concluded.
Q67. If a perceptron has two inputs and that are associated with weights and respectively.
Moreover, assume that the bias input to the perceptron is and the activation function is given by the
function ( ).Then, the output of the perceptron is given by:
a. =h( 1+ 2+ 0)
b. = 1 1 + 2 2 + 0
c. = h( 1 1 + 2 2 + 0)
d. =h( 1 1+ 2 2− 0)
e. = h( 1 1) + h( 2 2) + h( 0)
Q68. In the below gure, the curve labeled with “Training Error” shows how the error behaves when
training an AI model uses gradient decent. The curve labeled “Test Error” shows the corresponding error in
the testing set. The dotted line shows the “Best Fit” point. Which of the following statement is true given
the gure?
a. Under tting occurs if we move from the “Best Fit” point in the direction of B and over tting occur if we
move from the ‘Best Fit” point in the direction of A.
b. Both under tting and over tting occur as we move in direction of A.
c. Both under tting and over tting occur as we move in direction of B.
d. Under tting occurs if we move from the “Best Fit” point in the direction of A and over tting occur if we
move from the “Best Fit” point in the direction of B.
e. None of the statement shown are true.
Q69. Given the following data points and labels shown below:
a) Ignoring the point “T” with class labeled as “New”, draw on the graph above the decision boundaries
for 1-Nearest Neighbors
Answer:
b) Identify which class will a k-NN classi er predict if the point labeled “New” is classi ed using the value of
k that are shown in the table below:
c) Indicate the e ect of removing the following points (only one at a time) on the 1-NN decision boundary
determined in (a):
Q70. In this problem, we will be dealing with one-dimensional data represented by a set of points ( ). Each
point has only one feature labeled . The label has two classes, namely 0 or 1. We will show you the points
on the axis, labeled by their class values; we also give you a table of values.
a. In the gure below, draw the output of a 1-Nearest-Neighoubur (1-NN) classi er over the range indicated
in the eld. (Show the output of the classi er by indicate at which value of does the classi er changes and
what is the classi er output)?
b. In the gure below, draw the output of a 5-Nearest-Neighbour (5-NN) classi er over the range indicated
in the eld. (Show the output of the classi er by indicate at which value of does the classi er changes
and what is the classi er output)?
Old Quizzes Questions
.
Old HWs Questions
Scratch paper
Scratch paper

Topic 4

Uploaded by

Copyright:

Available Formats

Topic 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 4

Uploaded by

Copyright:

Available Formats

Old Finals / Midterms Questions

Q4. Using the Perceptron learning algorithm with = ( 0 , 1 , 2 , 3 ) = (−4, 2,

MMCL line) it follows that the c4 and t2 are support vectors.

Which of the following statements is NOT correct?

Q9. Which of the following statements about cross-validation is correct

Use the following text for questions 18 to 20:

Q18. Which of the following options is the correct

1 1 1 1 (e) None of the answers

Use the following text for questions 21 to 23:

Use the following text for questions 25 to 26:

Q25. What is the total ”0-1 loss function” for boundary 1?

(a) SVM with a polynomial kernel (polynomial degree is 2)

Q37. Machine learning approaches can be categorized as:

Q54. The Maximum Margin Classi er is described as:

Q60. Machine learning can be divided into two main classes:

Q61. Which statement is true when considering -Nearest Neighbor algorithm:

a. 1-NN classi es the new data as X

E: N one of the above

You might also like