ML Unit 3
ML Unit 3
ML Unit 3
Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labelled input data, which means it contains input with the
corresponding output.
The main goal of the Classification algorithm is to identify the category of a given dataset,
and these algorithms are mainly used to predict the output for the categorical data.
The algorithm which implements the classification on a dataset is known as a classifier. There
are two types of Classifications:
● Binary Classifier: If the classification problem has only two possible outcomes, then it
is called as Binary Classifier. Examples: YES or NO, MALE or FEMALE, SPAM or
NOT SPAM, CAT or DOG, etc.
● Multi-class Classifier: If a classification problem has more than two outcomes, then it
is called a Multi-class Classifier. Example: Classifications of types of crops,
Classification of types of music.
Learners in Classification Problems:
In the classification problems, there are two types of learners:
● Lazy Learners: Lazy Learner firstly stores the training dataset and waits until it
receives the test dataset. In the Lazy learner case, classification is done on the basis of
the most related data stored in the training dataset. It takes less time in training but
more time for predictions. Example: K-NN algorithm, Case-based reasoning
● Eager Learners: Eager Learners develop a classification model based on a training
dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes
more time in learning, and less time in prediction. Example: Decision Trees, Naïve
Bayes, ANN.
KNN :
● The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning method
employed to tackle classification and regression problems.
● KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application in
pattern recognition, data mining, and intrusion detection.
● K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.
● It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs
an action on the dataset.
● KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
Manhattan Distance
Manhattan Distance metric is generally used when we are interested in the total distance
travelled by the object instead of the displacement. This metric is calculated by summing the
absolute difference between the coordinates of the points in n-dimensions.
Minkowski Distance
We can say that the Euclidean, as well as the Manhattan distance, are special cases of the
Minkowski distance.
From the formula above we can say that when p = 2 then it is the same as the formula for the
Euclidean distance and when p = 1 then we obtain the formula for the Manhattan distance.
● There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
● A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
Steps :
● Step 1: Selecting the optimal value of K
K represents the number of nearest neighbors that needs to be considered while
making prediction.
● Step 2: Calculating distance
To measure the similarity between target and training data points, Euclidean distance
is used. Distance is calculated between each of the data points in the dataset and target
point.
● Step 3: Finding Nearest Neighbors
The k data points with the smallest distances to the target point are the nearest
neighbors.
● Step 4: Voting for Classification or Taking Average for Regression
In the classification problem, the class labels of are determined by performing
majority voting. The class with the most occurrences among the neighbors becomes
the predicted class for the target data point. In the regression problem, the class label
is calculated by taking average of the target values of K nearest neighbors. The
calculated average value becomes the predicted output for the target data point.
Advantages of the KNN Algorithm
● Easy to implement as the complexity of the algorithm is not that high.
● Adapts Easily – As per the working of the KNN algorithm it stores all the data in
memory storage and hence whenever a new example or data point is added then the
algorithm adjusts itself as per that new example and has its contribution to the future
predictions as well.
● Few Hyperparameters – The only parameters which are required in the training of a
KNN algorithm are the value of k and the choice of the distance metric which we
would like to choose from our evaluation metric.
Disadvantages of the KNN Algorithm
● Does not scale – As we have heard about this that the KNN algorithm is also
considered a Lazy Algorithm. The main significance of this term is that this takes lots
of computing power as well as data storage. This makes this algorithm both
time-consuming and resource exhausting.
● Curse of Dimensionality – There is a term known as the peaking phenomenon
according to this the KNN algorithm is affected by the curse of dimensionality which
implies the algorithm faces a hard time classifying the data points properly when the
dimensionality is too high.
● Prone to Overfitting – As the algorithm is affected due to the curse of dimensionality
it is prone to the problem of overfitting as well. Hence generally feature selection as
well as dimensionality reduction techniques are applied to deal with this problem.
LOGISTIC REGRESSION :
● Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
● Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
● Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
● In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
Hyperparameters :
Parameters are the variables that are used by the Machine Learning algorithm for predicting
the results based on the input historic data. These are estimated by using an optimization
algorithm by the Machine Learning algorithm itself. Thus, these variables are not set or
hardcoded by the user or professional. These variables are served as a part of model training.
Example of Parameters: Coefficient of independent variables Linear Regression and Logistic
Regression.
Hyperparameters are the variables that the user specify usually while building the Machine
Learning model. thus, hyperparameters are specified before specifying the parameters or we
can say that hyperparameters are used to evaluate optimal parameters of the model. the best
part about hyperparameters is that their values are decided by the user who is building the
model. For example, max_depth in Random Forest Algorithms, k in KNN Classifier.
Grid Search uses a different combination of all the specified hyperparameters and their values
and calculates the performance for each combination and selects the best value for the
hyperparameters. This makes the processing time-consuming and expensive based on the
number of hyperparameters involved.
It works great if there are an equal number of samples for each class. For example, we
have a 90% sample of class A and a 10% sample of class B in our training set. Then,
our model will predict with an accuracy of 90% by predicting all the training samples
belonging to class A. If we test the same model with a test set of 60% from class A
and 40% from class B. Then the accuracy will fall, and we will get an accuracy of
60%.
Classification accuracy is good but it gives a False Positive sense of achieving high
accuracy. The problem arises due to the possibility of misclassification of minor class
samples being very high.
● True Positive (TP) = 560, meaning the model correctly classified 560 positive
class data points.
● True Negative (TN) = 330, meaning the model correctly classified 330
negative class data points.
● False Positive (FP) = 60, meaning the model incorrectly classified 60 negative
class data points as belonging to the positive class.
● False Negative (FN) = 50, meaning the model incorrectly classified 50
positive class data points as belonging to the negative class.
4. Recall(sensitivity) : Recall tells us how many of the actual positive cases we were
able to predict correctly with our model.
5. F1 score : It gives a combined idea about Precision and Recall metrics. It is
maximum when Precision is equal to Recall. F1 Score is the harmonic mean of
precision and recall.
6. ROC - AUC : The Receiver Operator Characteristic (ROC) is a probability curve that
plots the TPR(True Positive Rate) against the FPR(False Positive Rate) at various
threshold values and separates the ‘signal’ from the ‘noise’.
The Area Under the Curve (AUC) is the measure of the ability of a classifier to
distinguish between classes. From the graph, we simply say the area of the curve
ABDE and the X and Y-axis.
● True positive rate: Also called or termed sensitivity. True Positive Rate is
considered as a portion of positive data points that are correctly considered as
positive, with respect to all data points that are positive.
● True Negative Rate : Also called or termed specificity. False Negative Rate is
considered as a portion of negative data points that are correctly considered as
negative, with respect to all data points that are negatives.
● False-positive Rate : False positive Rate is considered as a portion of positive
data points that are mistakenly considered as negative, with respect to all data
points that are negative.
● False-negative Rate : False Negative Rate is considered as a portion of
negative data points that are mistakenly considered as negative, with respect to
all data points that are negative.
● A true positive is an outcome where the model correctly predicts the positive
class. Similarly, a true negative is an outcome where the model correctly
predicts the negative class.
● A false positive is an outcome where the model incorrectly predicts the
positive class. And a false negative is an outcome where the model incorrectly
predicts the negative class.
7. Log loss : Log loss (Logistic loss) or Cross-Entropy Loss is one of the major metrics
to assess the performance of a classification problem. For a single sample with true
label y∈{0,1} and a probability estimate p=Pr(y=1), the log loss is:
ROC
The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary
classification problems. It is a probability curve that plots the TPR against FPR at various
threshold values and essentially separates the ‘signal’ from the ‘noise.’ In other words, it
shows the performance of a classification model at all classification thresholds. The Area
Under the Curve (AUC) is the measure of the ability of a binary classifier to distinguish
between classes and is used as a summary of the ROC curve.