19-K-Nearest Neighbor Learning.-22-08-2024
19-K-Nearest Neighbor Learning.-22-08-2024
19-K-Nearest Neighbor Learning.-22-08-2024
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a new paper tissue that passes the
laboratory test with X1 = 3 and X2 = 7. Guess the classification of
this new tissue.
Step 1 : Initialize and Define k.
Lets say, k = 3
(Always choose k as an odd number if the number of
attributes is even to avoid a tie in the class prediction)
Step 2 : Compute the distance between input sample and
Training sample
- Co-ordinate of the input sample is (3,7).
- Instead of calculating the Euclidean distance, we
calculate the Squared Euclidean distance.
X1 = Acid Durability X2 = Strength Squared Euclidean distance
(seconds) (kg/square meter)
7 7 (7-3)2 + (7-7)2 =16
7 4 25 4 No
3 4 09 1 Yes
1 4 13 2 Yes
Step 4 : Take 3-Nearest Neighbours:
Gather the category Y of the nearest neighbours.
7 4 25 4 No -
3 4 09 1 Yes Good
1 4 13 2 Yes Good
Step 5 : Apply simple majority
Used in classification
Used to get missing values
Used in pattern recognition
Used in gene expression
Used in protein-protein prediction
Used to get 3D structure of protein
Used to measure document similarity
Comparison of various classifiers
Algorithm Features Limitations
C4.5 - Models built can be easily - Small variation in data can lead
Algorithm interpreted to different decision trees
- Easy to implement - Does not work very well on
- Can use both discrete and small training dataset
continuous values - Over-fitting
- Deals with noise
ID3 - It produces more accuracy - Requires large searching time
Algorithm than C4.5 - Sometimes it may generate
- Detection rate is increased very long rules which are
and space consumption is difficult to prune
reduced - Requires large amount of
memory to store tree
K-Nearest - Classes need not be linearly - Time to find the nearest
Neighbour separable neighbours in a large training
Algorithm - Zero cost of the learning dataset can be excessive
process - It is sensitive to noisy or
- Sometimes it is robust with irrelevant attributes
regard to noisy training data - Performance of the algorithm
- Well suited for multimodal depends on the number of
Naïve Bayes - Simple to implement - The precision of the
Algorithm - Great computational efficiency algorithm decreases if the
and classification rate amount of data is less
- It predicts accurate results for - For obtaining good results,
most of the classification and it requires a very large
prediction problems number of records