Lecture 14 and 15
Lecture 14 and 15
Lecture 14 and 15
ALGORITHM
Dr. Abid Ali
Apples vs. Bananas
5 Yello Apple
w Can we visualize this data?
6 Yello Banana
w
3 Red Apple
7 Yello Banana
w
8 Yello Banana
w
6 Yello Apple
w
Apples vs. Bananas
Turn features into numerical values
Color
5 1 Apple
6 1 Banana
3 0 Apple 0 A A
7 1 Banana
8 1 Banana 0 Weight 10
6 1 Apple
featur
e2
label 1
label 2
label 3
featur
Test example: what class?
featur
e2
label 1
label 2
label 3
featur
Test example: what class?
featur
e2
label 1
label 2
closest to label 3
red
featur
Another classification
algorithm?
To classify an example d:
Label d with the label of the closest
example to d in the training set
What about this example?
featur
e2
label 1
label 2
label 3
featur
What about this example?
featur
e2
label 1
label 2
closest to label 3
red, but…
featur
What about this example?
featur
e2
label 1
are blue
featur
k-Nearest Neighbor (k-NN)
To classify an example d:
Find k nearest neighbors of d
Choose as the label the majority label
To classify an example d:
Find k nearest neighbors of d
Choose as the label the majority label
(b1, b2,…,bn)
K-NN
Regressio Classificat
n ion
K-NN Regression
KNN (K-Nearest Neighbors) regression is a type of instance-
based learning or non-parametric regression algorithm.
It uses the training data to make predictions. It does not learn a
model that can be used to make predictions for new data points.
The KNN algorithm doesn’t make any assumptions about the
training dataset.
It is a simple and straightforward approach to regression that
can be used for both regression and classification problems.
In KNN regression, we predict the value of a dependent variable
for a data point based on the average or mean of the target
values of its K nearest neighbors in the training data.
The value of K is a user-defined hyperparameter. It determines
the number of nearest neighbors used to make the prediction.
K-NN Regression
KNN regression is different from linear or
polynomial regression as it does not make any
assumptions about the underlying relationship
between the features and the target variables.
For instance, linear regression and multiple
regression work on the assumption that the
dependent variables and the independent
variables in a dataset are linearly related.
The KNN regression algorithm doesn’t make any
such assumptions. Instead, it relies on the
patterns in the data to make predictions.
KNN Regression Algorithm
Choose a value for K: We first choose a value for K. This
determines the number of nearest neighbors used to make the
prediction.
Calculate the distance: After choosing K, we calculate the
distance between each data point in the training set and the
target data point for which a prediction is being made. For this,
we can use a variety of distance metrics, including Euclidean
distance, Manhattan distance, or Minkowski distance.
Find the K nearest neighbors: After calculating the distance
between the existing data points and the new data point, we
identify K nearest neighbors by selecting the K data points
nearest to the new data point.
Calculate the prediction: After finding the neighbors, we
calculate the value of the dependent variable for the new data
point. For this, we take the average of the target values of the
K nearest neighbors.
Decision boundaries
The decision boundaries are places in the
features space where the classification of a
point/example changes
label 1
label 2
label 3
label 1
label 2
label 3
K=1
K=1
Machine learning models
Some machine learning approaches make strong
assumptions about the data
If the assumptions are true this can often lead to
better performance
If the assumptions aren’t true, they can fail
miserably