2020 answer v2 by sallam

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ML Final 2020 Answer (V2.

0)
By Dr\ Hanaa Bayoumi
Solved by Ahmed Sallam if you find any mistakes, please contact me.
‫ال تنسونا من صالح دعائكم‬
Question 1:
You will find Mistakes in red.

‫ و‬logistic regression ‫ ودول بيمثلو‬decision boundaries of classifiers ‫ ل‬plot ‫بيقولك معاك كذا‬
‫ عملها‬model ‫ الي ال‬data ‫ و كام عدد ال‬algorithm ‫ اسم ال‬plot ‫ اكتب جمب كل‬Decision tree ‫ و‬KNN
wrong classification

Decision tree
Logistic regression
Number of mistakes = 2
Number of mistakes = 6

KNN
No Mistake
Question 2: CNN is out of course scope.
Question 3: in SVM show how to express the margin in terms of weight (W)
Weight (W) ‫ بالنسبه ل‬margin ‫ ل‬proof ‫ببساطه اكتب ال‬
• The margin is the distance between the decision boundary (hyperplane) and the nearest data point
from either class. The margin is typically expressed in terms of the weight vector (W) and bias term
(b) of the hyperplane.
• Hyperplane = W • X + b = 0

x1,x2 ‫ وتحدد ال‬Hyperplane ‫خلي بالك الزم ترسم ال‬


Question 4:
1) Give One Sentence Reason
a) we might prefer Decision Tree learning over Logistic Regression for a particular learning task.
1) when the relationship between input features and the target variable is nonlinear and involves
complex interactions, as Decision Trees can capture such nonlinearity and interactions more
effectively.
2)If we want our model to produce results and rules easily interpreted by humans.
‫ هتكون أكثر فعاليه في التعامل مع‬DT ‫ وفيها تعقيدات كثير ال‬nonlinear ‫ تبقي‬target ‫ و ال‬features ‫ لما العالقه بين ال‬-1
‫ دي‬data ‫ال‬
.‫ يطلع نتائج وقواعد سهل االنسان الطبيعي يفهمها‬model ‫ لو احنا عايزين ال‬-2
b) we might prefer Logistic Regression over Naive Bayes for a particular learning task.
• If we have lots of training data Logistics Regression will be better, and if we know that the
conditional independence assumptions made by Naive Bayes are not true for our problem.
‫ هيتعامل معاه احسن‬Logistic Regression ‫ كبيره ال‬data ‫ – لو عندنا‬1
Logistic Regression ‫ يطلعها غلط فهنلجئ ل‬Naïve ‫ ممكن ال‬conditional independence -2
c) we need re-estimate probabilities (smoothing) in Naive Bayes classifier.
• to prevent zero probabilities and improve the model's Accuracy by avoiding situations where a
feature in the test data was not observed in the training data, leading to a probability of zero and
causing the entire classification to be based on that feature.
models ‫ عشان ننمنع أي احتمال انه يطلع بصفر ونحسن ال‬re-estimate probabilities (smoothing) ‫ بنحتاج نعمل‬-
‫ وده هيسبب ظهور‬train data ‫ ومش موجوده في‬test data ‫ موجوده في‬feature ‫ عشان نتنجب موقف ان تكون فيه‬accuracy
‫احتماالت صفرية‬

3) CNN is out of course scope.

- Logistic regression: logistic regression has linear decision boundaries, so it may not be able to
correctly separate the training data because the data is not linearly separable.
- SVM with kernel: it can be a non-linear decision boundary. It can capture more complex
relationships in this data, making it suitable for the classes that are not linearly separable.
- Decision tree: can be model complex decision boundaries by making splits along different features.
They can accommodate non-linear relationships in the data.
- 3-nearest-neighbor classifier: may not be able to correctly separate this training data because it relies
on the local neighborhood of points. In this case where different classes are mixed closely, it might
make errors in classification.
5) Describe the difference between parametric methods and nonparametric methods.
Parametric methods make assumptions about the functional form of the underlying data distribution
and have a fixed number of parameters, while nonparametric methods do not make explicit
assumptions about the distribution and can adapt to more complex patterns without a predetermined
number of parameters.
6) What is the similarity and difference between feature selection and dimensionality reduction?
Feature selection involves choosing a subset of relevant features from the original feature set, while
dimensionality reduction aims to transform the data into a lower-dimensional space, preserving
essential information by combining or projecting the original features.

It is not well separated because some points in each cluster are closer to points in another cluster than
to points in the same cluster.
Notic:
Parametric model:
- Model fit the data exactly
- this models have a parameters that model are try to find and calculate them exactly
Like Linear Regression W.X+b=0
Non- Parametric model:
- The data tell you what the fit method look like.
- they have a parameters but we don’t know how many of them, the data will tell the model how many
of them.
10) Most machine learning approaches use training sets, test sets and validation sets to derive
models. Describe the role each of the three sets plays!
‫ اشرحهم بالطريقه االحسن ليك والي انت فاهمها‬-
Training Set: is used to train the machine learning model. It consists of a labeled dataset where the
algorithm learns the patterns, relationships, and features present in the data. The model adjusts its
parameters during training to minimize the difference between its predictions and the actual labels in
the training set.
Validation Set: is used to fine-tune the hyperparameters of the model and to assess its performance
during training. The model is not directly trained on the validation set, but its performance on this set
helps in selecting the best model architecture, tuning parameters, and preventing overfitting. It
provides an unbiased evaluation before testing on unseen data.
Test Set: is a completely independent dataset that the model has not seen during training or validation.
It is used to evaluate the final performance of the trained model. its simulates real-world scenarios
where the model encounters new, unseen data. Evaluating on it provides an unbiased estimate of the
model's generalization performance and helps assess its ability to make accurate predictions on new,
unseen examples.

Question 5: Use complete-link agglomerative clustering to group the data


described by the following distance matrix. Show the dendrograms.

Notice: complete-link agglomerative is we consider the distance


between one cluster and another cluster to be equal to the greatest
distance from any member of one cluster to any member of the other
cluster.

A B C D
Solution: A 0 1 4 5
B 0 2 6
First, we choose the minimum distance to choose the clusters (A, B) = 1.
C 0 3
Distance between Cluster (A, B) and C = Max (AC, BC) = (4, 2) = 4 D 0
Distance between Cluster (A, B) and D = Max (AD, BD) = (5, 6) = 6 (A, B) C D
(A, B) 0 4 6
C 0 3
D 0
Second, we choose the minimum distance to choose the new clusters (C, D) = 3
Distance between Cluster (C, D) and (A, B) = Max (C, (A, B), D, (A, B)) = (4, 6) = 6

(A, B), (C, D)


(A, B), (C, D) 0
Dendrograms:

Question 6:
You are a robot in an animal shelter, and must learn to
discriminate Dogs from Cats. You choose to learn a
Naive Bayes classifier. You are given the following
examples:
a) Construct a classifier using Naive byes to
discriminate Dogs from Cats.
4 4
P(Cat) = P(Dog) =
8 8
Color Dog Cat
Sound Dog Cat Fur Dog Cat
2 2
1 3 3 1 Brown
Meow Coarse 4 4
4 4 4 4
2 2
3 1 1 3 Black
Bark Fine 4 4
4 4 4 4

b) Consider a new example (Sound=Bark ∧ Fur=Coarse ∧ Color=Brown) Which Class


belong to ?
P(Class=Dog)×P(Sound=Bark∣Class=Dog)×P(Fur=Coarse∣Class=Dog)×P(Color=Brown∣Class=Dog)
4 3 3 2 9
= (   ) =
8 4 4 4 64
P(Class=Cat)×P(Sound=Bark∣Class=Cat)×P(Fur=Coarse∣Class=Cat)×P(Color=Brown∣Class=Cat)=
4 1 1 2 1
(   ) =
8 4 4 4 64
Normalize the probabilities:
9
64 9
P (Class=Dog | Sound=Bark ∧ Fur=Coarse ∧ Color=Brown) = =
9 1 10
+
64 64
1
64 1
P(Class=Cat | Sound=Bark ∧ Fur=Coarse ∧ Color=Brown) = =
9 1 10
+
64 64
So, We label (Sound=Bark ∧ Fur=Coarse ∧ Color=Brown) as Dog

Done

You might also like