KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN

Uploaded by

KNN is a poor choice for spam filtering because: 1. KNN classifiers will only filter spam that is very similar to known spam examples and will not generalize well to new spam. 2. KNN also suffers from only confidently labeling emails as non-spam if they are very similar to a trained non-spam email. 3. KNN performs poorly on large datasets because calculating distances between all points is computationally expensive, and it is also sensitive to outliers, missing values, and the number of dimensions in the data.

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN

Uploaded by

Jessica Samuel

0% found this document useful (1 vote)

941 views9 pages

Original Description:

Presentation

Original Title

knn poor choice

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

0% found this document useful (1 vote)

941 views9 pages

KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN

Uploaded by

Jessica Samuel

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 9

Search inside document

Why KNN is poor choice for spam filter?

What is KNN?

 KNN is a very simple algorithm used to solve classification

problems. KNN stands for K-Nearest Neighbors. K is the
number of neighbors in KNN.
Why KNN is poor choice as spam filter
 KNN classifiers are good whenever there is a really
meaningful distance metric. In the spam case, KNN
classifiers are going to label as spam things that are “close”
to known spams being “close” in the sense of your distance
metric (which will likely be poor).
Therefore, KNN classifiers are only going to filter
spams that are really similar to what you already
know. It won’t really generalize properly.
Also, you have to train on non-spam examples too,
and KNN will suffer from the same problem: it will
only confidently say something is non-spam if it is
written very similarly to a non-spam email that KNN
was trained on.
 Limitations of KNN to use as spam filters

1. Doesn’t work well with a large dataset:

Since KNN is a distance-based algorithm, the cost of
calculating distance between a new point and each
existing point is very high which in turn degrades
the performance of the algorithm.
2. Doesn’t work well with a high number of
dimensions:
Again, the same reason as above. In higher
dimensional space, the cost to calculate distance
becomes expensive and hence impacts the
performance.
 Distribution of e-mails data set
 3. Sensitive to outliers and missing values:
KNN is sensitive to outliers and missing values
and hence we first need to impute the missing
values and get rid of the outliers before applying
the KNN algorithm.
 4. Need feature scaling: We need to do
feature scaling (standardization and
normalization) before applying KNN
algorithm to any dataset. If we don't do so,
KNN may generate wrong predictions.
 5. For different values of ‘k’
prediction of gain data may
varies, therefore accuracy may
be poor.
 For example
 With respect to given data if k=3
,the given data belongs to class B
 If K=7,the given data belongs to
classA
 So, for different values of k
predictions may varies
 Failure of KNN
CASE 1
In this case, the data is grouped in
clusters but the query point seems far
away from the actual grouping. In such
a case, we can use K nearest neighbors
to identify the class, however, it doesn’t
make much sense because the query
point (yellow point) is really far from the
data points and hence we can’t be very
sure about its classification.
Case 2
In this case, the data is randomly
spread and hence no useful
information can be obtained from it.
Now in such a scenario when we are
given a query point (yellow point), the
KNN algorithm will try to find the k
nearest neighbors but since the data
points are jumbled, the accuracy
is questionable
 Based on accuracy

ML Unit-1
Document12 pages
ML Unit-1
20-6616 Abhinay
100% (2)
FDS - Unit 1 Question Bank
Document16 pages
FDS - Unit 1 Question Bank
akshaya vijay
No ratings yet
Bda Viva Q&a
Document24 pages
Bda Viva Q&a
Sumeet Chauhan
No ratings yet
T24 Directory Structure and Core Parameter Files in T24
Document25 pages
T24 Directory Structure and Core Parameter Files in T24
BOUAFIF MED AMINE
No ratings yet
1000 Machine Learning MCQ (Multiple Choice Questions) - Sanfoundry
Document16 pages
1000 Machine Learning MCQ (Multiple Choice Questions) - Sanfoundry
nanimohammed240
No ratings yet
Iot Question Bank
Document1 page
Iot Question Bank
Punith kumar
100% (1)
Cs6402 DAA Notes (Unit-3)
Document25 pages
Cs6402 DAA Notes (Unit-3)
Jayakumar D
No ratings yet
Data Analytics - Unit-V
Document9 pages
Data Analytics - Unit-V
bhavya.shivani1473
No ratings yet
Unit-1 Problem Areas in A Distributed DDBMS
Document8 pages
Unit-1 Problem Areas in A Distributed DDBMS
Keshav Sharma
100% (1)
Distribution Model
Document24 pages
Distribution Model
chitraalavani
100% (1)
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
Document12 pages
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
ShanmugapriyaVinodkumar
No ratings yet
SPPU 2022 Solved Question Paper DWDM
Document25 pages
SPPU 2022 Solved Question Paper DWDM
KALPESH KUMBHAR
50% (2)
3-Enriching The Integration As A Service Paradigm For The Cloud Era
Document26 pages
3-Enriching The Integration As A Service Paradigm For The Cloud Era
govardhini S
No ratings yet
CAO Assignment
Document44 pages
CAO Assignment
Abhishek Dixit
40% (5)
SCSA3015 Deep Learning Unit 4 PDF
Document30 pages
SCSA3015 Deep Learning Unit 4 PDF
pooja vikirthini
No ratings yet
Solving Design Problems With Design Patterns
Document8 pages
Solving Design Problems With Design Patterns
Dhanwanth JP
No ratings yet
Dimensionality Reduction
Document4 pages
Dimensionality Reduction
PAWAN TIWARI
No ratings yet
DAA Lab Manual VTU
Document41 pages
DAA Lab Manual VTU
Manohar NV
100% (2)
Unit I-Introduction
Document23 pages
Unit I-Introduction
padmapriya
No ratings yet
Sentiment Analysis of Reviews Using Machine Learning
Document33 pages
Sentiment Analysis of Reviews Using Machine Learning
Isha Singh
No ratings yet
Fundamentals of Subprograms
Document3 pages
Fundamentals of Subprograms
Von Gary Ras
No ratings yet
DWDM Notes/Unit 1
Document31 pages
DWDM Notes/Unit 1
Thams Thamarai
No ratings yet
ME P4252-II Semester - MACHINE LEARNING
Document48 pages
ME P4252-II Semester - MACHINE LEARNING
Bibsy Adlin Kumari R
No ratings yet
Back Face Detection
Document13 pages
Back Face Detection
palanirec
100% (1)
Feature Creation in Data Mining
Document5 pages
Feature Creation in Data Mining
vijayesh ayu
No ratings yet
Course Exit Survey - DCS
Document2 pages
Course Exit Survey - DCS
Raji Pillai
No ratings yet
PPL Notes
Document126 pages
PPL Notes
Siri PSG
No ratings yet
R Viva Questions
Document4 pages
R Viva Questions
rn5983961
100% (1)
Data Link Layer Design Issue
Document28 pages
Data Link Layer Design Issue
Rekha V R
100% (1)
Multimedia Mining Presentation
Document18 pages
Multimedia Mining Presentation
Vivek Naragude
No ratings yet
Unit 4 - Lecture 3 - DGIM Algorithm Notes
Document8 pages
Unit 4 - Lecture 3 - DGIM Algorithm Notes
King Bavisi
No ratings yet
DWDM Online Bits
Document3 pages
DWDM Online Bits
srinu vas
No ratings yet
UNIT5 - Comparison Tree
Document52 pages
UNIT5 - Comparison Tree
roshankumar1patel
No ratings yet
AI Search Iterative Deepening
Document4 pages
AI Search Iterative Deepening
Nirjal Dhamala
No ratings yet
PAT Trees and PAT Arrays
Document12 pages
PAT Trees and PAT Arrays
Santosh Jhansi
No ratings yet
OGSA
Document164 pages
OGSA
JagadeesanSrinivasan
No ratings yet
Topic 2 - Intro To Data Science Machine Learning
Document30 pages
Topic 2 - Intro To Data Science Machine Learning
Madhu Evuri
75% (4)
OOAD Notes PDF
Document92 pages
OOAD Notes PDF
Sherril Vincent
100% (2)
Boundary Representation and Description PDF
Document7 pages
Boundary Representation and Description PDF
Manoj Kumar
No ratings yet
DWDM Bits
Document11 pages
DWDM Bits
jyothibellary2754
100% (1)
Unit-1 ML
Document19 pages
Unit-1 ML
ravi
No ratings yet
AD3501 Deep Learning Syllabus
Document1 page
AD3501 Deep Learning Syllabus
athirayanpericse
No ratings yet
Oose Viva Questions
Document2 pages
Oose Viva Questions
nehasingh121
No ratings yet
Problem Reduction: - AND-OR Graph - AO Search
Document31 pages
Problem Reduction: - AND-OR Graph - AO Search
miriyala nagendra
No ratings yet
Expert Systems: Dendral & Mycin
Document7 pages
Expert Systems: Dendral & Mycin
Abhilasha Choyal
100% (2)
Fundamentals of Image Compression PDF
Document40 pages
Fundamentals of Image Compression PDF
Nisha Aurora
No ratings yet
Introduction To Machine Learning IIT KGP Week 2
Document14 pages
Introduction To Machine Learning IIT KGP Week 2
Akash barapatre
100% (1)
Analytical Learning
Document42 pages
Analytical Learning
Likhitah Gottipati
No ratings yet
Pattern Warehouse
Document6 pages
Pattern Warehouse
Sahu Sahu Subham
No ratings yet
AIML Module 3
Document25 pages
AIML Module 3
sriharshapatilsb
No ratings yet
NOSQL Module-3
Document67 pages
NOSQL Module-3
Amina Sultana
100% (2)
Introduction To Machine Learning Week 2 Assignment
Document8 pages
Introduction To Machine Learning Week 2 Assignment
Akash barapatre
100% (1)
Architectural Design Challenges + Elasticity
Document8 pages
Architectural Design Challenges + Elasticity
NishaPauline
No ratings yet
Vtu 5th Sem Cse Computer Networks
Document91 pages
Vtu 5th Sem Cse Computer Networks
Hidayath
No ratings yet
Ai-Unit2 - QB-VDP
Document13 pages
Ai-Unit2 - QB-VDP
axar kumar
No ratings yet
Emergence of Software Enginnering
Document5 pages
Emergence of Software Enginnering
Dinesh
100% (1)
Applications and Trends in Data Mining
Document20 pages
Applications and Trends in Data Mining
lakshmi.s
100% (1)
KNN Poor Choice
Document9 pages
KNN Poor Choice
geetha.r
No ratings yet
Kenny-230720-8 Unique Machine Learning Interview Questions About K Nearest Neighbors
Document3 pages
Kenny-230720-8 Unique Machine Learning Interview Questions About K Nearest Neighbors
vanjchao
No ratings yet
Week 7 Part 1KNN K Nearest Neighbor Classification
Document47 pages
Week 7 Part 1KNN K Nearest Neighbor Classification
Michael Zewdie
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Text Classification Based On Random Forest Algorithm
Document4 pages
Text Classification Based On Random Forest Algorithm
NATIONAL ATTENDENCE
No ratings yet
Ijert Ijert: Database Security & Access Control Models: A Brief Overview
Document9 pages
Ijert Ijert: Database Security & Access Control Models: A Brief Overview
Dhananjay Singh
No ratings yet
Asp - Net Ajax Data Query
Document2 pages
Asp - Net Ajax Data Query
Niresh Maharaj
No ratings yet
No SQL Hive
Document144 pages
No SQL Hive
Mansi Sharma
No ratings yet
Lecture 3
Document36 pages
Lecture 3
Ram Rajput
No ratings yet
PostgreSQL Architecture Document by Subham Dash 1710404181
Document11 pages
PostgreSQL Architecture Document by Subham Dash 1710404181
bolisettyvaas
No ratings yet
Redis Why MongoDB Needs Redis
Document6 pages
Redis Why MongoDB Needs Redis
Esteban
No ratings yet
Assignment+1 +Regression+This+assignment+is+to+
Document6 pages
Assignment+1 +Regression+This+assignment+is+to+
Daniel Gonzalez
No ratings yet
Practice Question For Information Retrieval Subject
Document5 pages
Practice Question For Information Retrieval Subject
Karim Elnady
No ratings yet
Enhanced Entity Relationships Model
Document27 pages
Enhanced Entity Relationships Model
babylin_1988
No ratings yet
VamshiD_SAP Basis
Document2 pages
VamshiD_SAP Basis
pandimeena
No ratings yet
DBMS Project
Document14 pages
DBMS Project
Nishant Gaonkar
No ratings yet
Exadata Storgae Layout
Document4 pages
Exadata Storgae Layout
Mukarram Khan
No ratings yet
All Itt Questions 60
Document572 pages
All Itt Questions 60
Vaseem Ahmad
No ratings yet
Keynote 1 - Generative AI Inferencing For LLM and Multimodal Models With NEMO
Document46 pages
Keynote 1 - Generative AI Inferencing For LLM and Multimodal Models With NEMO
canh doi
No ratings yet
Appian Interview Question and Answers
Document17 pages
Appian Interview Question and Answers
ashwinisubramanian17
100% (1)
File System Interface
Document32 pages
File System Interface
Bilal Warraich
No ratings yet
Big Data
Document2 pages
Big Data
Dreamtech Press
0% (1)
Iso 10303 - Step
Document6 pages
Iso 10303 - Step
Mohamed
No ratings yet
1) Test Strategy Vs Test Plan?
Document22 pages
1) Test Strategy Vs Test Plan?
Yasar
No ratings yet
Cosmosdb Study
Document41 pages
Cosmosdb Study
Bryan Sanchez
No ratings yet
Iq8 Modguide
Document86 pages
Iq8 Modguide
DingleBerry McMemberBerry
No ratings yet
Unit 3
Document47 pages
Unit 3
Sai priyadarshini S
No ratings yet
Module 6
Document7 pages
Module 6
Saidatul Rashida SULTAN
No ratings yet
PA FNL End User Manual
Document72 pages
PA FNL End User Manual
epdcl test
No ratings yet
Part 2 File Organization L1&2
Document18 pages
Part 2 File Organization L1&2
Mohamed
No ratings yet
Database System Concepts and Architecture
Document52 pages
Database System Concepts and Architecture
Mahnoor Babar
No ratings yet
14 Websites To Download Research Paper For Free - 2023
Document28 pages
14 Websites To Download Research Paper For Free - 2023
therehey
No ratings yet
NIBSS - Webi
Document17 pages
NIBSS - Webi
sap.uti
No ratings yet