1 - KNN
1 - KNN
1 - KNN
Learning
Prepared By:
Dr. Sara Sweidan
Lazy Learning – Classification
Using Nearest Neighbors
To know
• The key concepts that define nearest neighbor classifiers, and why they are
considered "lazy" learners
• Methods to measure the similarity of two examples using distance
• To apply a popular nearest neighbor classifier called k-NN
Nearest Neighbor Classification
➢In general, nearest neighbor classifiers are well-suited for classification tasks,
• Where relationships among the features and the target classes are numerous, complicated, or
extremely difficult to understand, yet the items of similar class type tend to be fairly
homogeneous.
• On the other hand, if the data is noisy and thus no clear distinction exists among the groups,
the nearest neighbor algorithms may struggle to identify the class boundaries.
The k-NN algorithm
• The nearest neighbors approach to classification is exemplified by the k-nearest
neighbors algorithm (k-NN).
• K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
• The strengths and weaknesses of this algorithm are as follows:
Strengths Weaknesses
• Simple and effective. • Does not produce a model, limiting the ability
to understand how the features are related to the
• Makes no assumptions about the underlying class.
data distribution. • Requires selection of an appropriate k.
• Slow classification phase.
• Fast training phase. • Nominal features and missing data require
additional processing.
The k-NN algorithm
• The k-NN algorithm gets its name from the fact that it uses information about an
example's k-nearest neighbors to classify unlabeled examples.
• The letter k is a variable term implying that any number of nearest neighbors could be
used.
• After choosing k, the algorithm requires a training dataset made up of examples that have
been classified into several categories, as labeled by a nominal variable.
• Then, for each unlabeled record in the test dataset, k-NN identifies k records in the
training data that are the "nearest" in similarity.
• The unlabeled test instance is assigned the class of the majority of the k nearest neighbors.
The k-NN algorithm
The k-NN
algorithm
The k-NN algorithm
The K-NN working can be explained on the
basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data
points in each category.
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
• Step-6: Our model is ready.
• Example:
There is a Car manufacturer
company that has
manufactured a new SUV car.
The company wants to give
the ads to the users who are
interested in buying that SUV.
So for this problem, we have a
dataset that contains multiple
user's information through
the social network.
160000
140000
120000
100000
80000
60000
40000
20000
0
0 10 20 30 40 50 60
160000
140000
120000
100000
80000
60000
40000
20000
0
0 10 20 30 40 50 60
Training Set
Learning Algorithm
1.In case of very large value of k, we may include points from other classes into the neighborhood.
2.In case of too small value of k the algorithm is very sensitive to noise
•There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
•A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
•Large values for K are good, but it may find some difficulties.
• k-NN algorithm does more computation on test time rather
than train time.
(a) True (b) false