Lecture 14 and 15

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 42

K NEAREST NEIGHBORS

ALGORITHM
Dr. Abid Ali
Apples vs. Bananas

Weig Color Label


ht
4 Red Apple

5 Yello Apple
w Can we visualize this data?
6 Yello Banana
w
3 Red Apple

7 Yello Banana
w
8 Yello Banana
w
6 Yello Apple
w
Apples vs. Bananas
Turn features into numerical values

Weig Color Label BB B


ht 1 A A
4 0 Apple

Color
5 1 Apple

6 1 Banana

3 0 Apple 0 A A

7 1 Banana

8 1 Banana 0 Weight 10
6 1 Apple

We can view examples as points in an n-dimensional


space where n is the number of features called the
Examples in a feature space

featur
e2

label 1
label 2

label 3

featur
Test example: what class?

featur
e2

label 1
label 2

label 3

featur
Test example: what class?

featur
e2

label 1
label 2

closest to label 3

red
featur
Another classification
algorithm?
To classify an example d:
Label d with the label of the closest
example to d in the training set
What about this example?

featur
e2

label 1
label 2

label 3

featur
What about this example?

featur
e2

label 1
label 2

closest to label 3

red, but…
featur
What about this example?

featur
e2

label 1

Most of the label 2

next closest label 3

are blue
featur
k-Nearest Neighbor (k-NN)

To classify an example d:
 Find k nearest neighbors of d
 Choose as the label the majority label

within the k nearest neighbors


k-Nearest Neighbor (k-NN)

To classify an example d:
 Find k nearest neighbors of d
 Choose as the label the majority label

within the k nearest neighbors

How do we measure “nearest”?


Euclidean distance
Euclidean distance! (or L1 or cosine or …)

(b1, b2,…,bn)

(a1, a2, …, an)


Manhattan distance
 This is also another popular distance
metric, which measures the absolute
value between two points.
 It is also referred to as taxicab distance
or city block distance as it is commonly
visualized with a grid, illustrating how
one might navigate from one address to
another via city streets.
Minkowski distance
 This distance measure is the generalized
form of Euclidean and Manhattan
distance metrics.
 The parameter, p, in the formula below,
allows for the creation of other distance
metrics.
 Euclidean distance is represented by this
formula when p is equal to two, and
Manhattan distance is denoted with p
equal to one.
Hamming distance
 This technique is used typically used
with Boolean or string vectors,
identifying the points where the vectors
do not match.
 As a result, it has also been referred to
as the overlap metric. This can be
represented with the following formula:
Working of K-NN
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of K
number of neighbors
 Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
 Step-4: Among these k neighbors, count the
number of the data points in each category.
 Step-5: Assign the new data points to that
category for which the number of the neighbor is
maximum.
 Step-6: Our model is ready.
Working of K-NN
 Suppose we have a new data point and
we need to put it in the required
category. Consider the below image:
Working of K-NN
 Firstly, we will choose
the number of
neighbors, so we will
choose the k=5.
 Next, we will calculate
the Euclidean distance
between the data
points. The Euclidean
distance is the distance
between two points,
which we have already
studied in geometry. It
can be calculated as:
Working of K-NN
 By calculating the
Euclidean distance
we got the nearest
neighbors, as three
nearest neighbors in
category A and two
nearest neighbors in
category B. Consider
the below image:

As we can see the 3 nearest


neighbors are from category A,
hence this new data point must
belong to category A.
Selection the Value of K in the K-NN
Algorithm?

 There is no particular way to determine


the best value for "K", so we need to try
some values to find the best out of
them. The most preferred value for K is
5.
 A very low value for K such as K=1 or
K=2, can be noisy and lead to the effects
of outliers in the model.
 Large values for K are good, but it may
find some difficulties.
K-NN Regression and
Classification

K-NN

Regressio Classificat
n ion
K-NN Regression
 KNN (K-Nearest Neighbors) regression is a type of instance-
based learning or non-parametric regression algorithm.
 It uses the training data to make predictions. It does not learn a
model that can be used to make predictions for new data points.
 The KNN algorithm doesn’t make any assumptions about the
training dataset.
 It is a simple and straightforward approach to regression that
can be used for both regression and classification problems.
 In KNN regression, we predict the value of a dependent variable
for a data point based on the average or mean of the target
values of its K nearest neighbors in the training data.
 The value of K is a user-defined hyperparameter. It determines
the number of nearest neighbors used to make the prediction.
K-NN Regression
 KNN regression is different from linear or
polynomial regression as it does not make any
assumptions about the underlying relationship
between the features and the target variables.
 For instance, linear regression and multiple
regression work on the assumption that the
dependent variables and the independent
variables in a dataset are linearly related.
 The KNN regression algorithm doesn’t make any
such assumptions. Instead, it relies on the
patterns in the data to make predictions.
KNN Regression Algorithm
 Choose a value for K: We first choose a value for K. This
determines the number of nearest neighbors used to make the
prediction.
 Calculate the distance: After choosing K, we calculate the
distance between each data point in the training set and the
target data point for which a prediction is being made. For this,
we can use a variety of distance metrics, including Euclidean
distance, Manhattan distance, or Minkowski distance.
 Find the K nearest neighbors: After calculating the distance
between the existing data points and the new data point, we
identify K nearest neighbors by selecting the K data points
nearest to the new data point.
 Calculate the prediction: After finding the neighbors, we
calculate the value of the dependent variable for the new data
point. For this, we take the average of the target values of the
K nearest neighbors.
Decision boundaries
The decision boundaries are places in the
features space where the classification of a
point/example changes

label 1
label 2

label 3

Where are the decision boundaries for k-NN?


k-NN decision boundaries

label 1
label 2

label 3

k-NN gives locally defined decision boundaries between


K Nearest Neighbour (k-NN) Classifier

K=1

hat is the decision boundary for k-NN for this one?


K Nearest Neighbour (kNN) Classifier

K=1
Machine learning models
Some machine learning approaches make strong
assumptions about the data
 If the assumptions are true this can often lead to
better performance
 If the assumptions aren’t true, they can fail
miserably

Other approaches don’t make many assumptions


about the data
 This can allow us to learn from more varied data
 But, they are more prone to overfitting (biasing too
much to the training data)
 and generally require more training data
What is the data generating distribution?
What is the data generating distribution?
What is the data generating distribution?
What is the data generating distribution?
What is the data generating distribution?
What is the data generating distribution?
Actual model
Applications Of K-NN in Machine
Learning
 Data preprocessing: Datasets frequently have missing values, but
the KNN algorithm can estimate for those values in a process known
as missing data imputation.
 Recommendation Engines: Using clickstream data from websites,
the KNN algorithm has been used to provide automatic
recommendations to users on additional content. This research (link
resides outside ibm.com) shows that the a user is assigned to a
particular group, and based on that group’s user behavior, they are
given a recommendation. However, given the scaling issues with
KNN, this approach may not be optimal for larger datasets.
 Finance: It has also been used in a variety of finance and economic
use cases. For example, one paper (link resides outside ibm.com)
shows how using KNN on credit data can help banks assess risk of a
loan to an organization or individual. It is used to determine the
credit-worthiness of a loan applicant. Another journal (link resides
outside ibm.com) highlights its use in stock market forecasting,
currency exchange rates, trading futures, and money laundering
analyses.
Applications Of K-NN in Machine
Learning
 Healthcare: KNN has also had application within
the healthcare industry, making predictions on
the risk of heart attacks and prostate cancer. The
algorithm works by calculating the most likely
gene expressions.
 Pattern Recognition: KNN has also assisted in
identifying patterns, such as in text and digit
classification (link resides outside ibm.com). This
has been particularly helpful in identifying
handwritten numbers that you might find on
forms or mailing envelopes.
Advantages of K-NN
 Easy to implement: Given the algorithm’s
simplicity and accuracy, it is one of the first
classifiers that a new data scientist will learn.

 Adapts easily: As new training samples are


added, the algorithm adjusts to account for any
new data since all training data is stored into
memory.

 Few hyperparameters: KNN only requires a k


value and a distance metric, which is low when
compared to other machine learning algorithms.
Disadvantages of K-NN
 Does not scale well: Since KNN is a lazy algorithm, it
takes up more memory and data storage compared to
other classifiers. This can be costly from both a time and
money perspective.
 Curse of dimensionality: The KNN algorithm tends to
fall victim to the curse of dimensionality, which means
that it doesn’t perform well with high-dimensional data
inputs.
 Prone to overfitting: Due to the “curse of
dimensionality”, KNN is also more prone to overfitting.
While feature selection and dimensionality reduction
techniques are leveraged to prevent this from occurring,
the value of k can also impact the model’s behavior.
Other successful classifiers
in NLP
Perceptron algorithm
 Linear classifier
 Trains “online”
 Fast and easy to implement
 Often used for tuning parameters (not necessarily for
classifying)

Logistic regression classifier (aka Maximum entropy


classifier)
 Probabilistic classifier
 Doesn’t have the NB constraints
 Performs very well
 More computationally intensive to train than NB

You might also like