Lecture 19

Uploaded by

mbilal23640

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Lecture 19

Uploaded by

mbilal23640

0% found this document useful (0 votes)

10 views21 pages

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

10 views21 pages

Lecture 19

Uploaded by

mbilal23640

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 21

Search inside document

Data Science

CSE-4075
(k-means)
Partitional Clustering
• Output a single partition of the data into
clusters

• Good for large data sets

• Determining the number of clusters is a major

challenge

2
K-Means

Predetermined
number of clusters

Start with seed

clusters of one
element

Seeds
Assign Instances to Clusters
Find New Centroids
New Clusters
Example
Example
Example
Example
Example

Repeat if for the next iteration and check whether the clusters remain the same or not?
Choosing the Value of k
Choosing the Value of k
There is no easy way to choose the value of K
One way is the elbow method

First, compute the sum of squared error (SSE) for

some values of k (for instance 2,4,6,8 etc.)

The SSE is defied as the sum of the squared

distance between each member of the cluster and
its centroid
Choosing the Value of k
If you plot k against SSE, you will see that the error
decreases as k gets larger; this is because when the
number of clusters increases, they should be
smaller, so distortion is also smaller

The idea of elbow method is to choose the k at

which the SSE decreases abruptly
Choosing the Value of k
Strengths of k-means
• Strengths:
– Simple: easy to understand and to implement
– Efficient: Time complexity: O(tkn),
where n is the number of data points,
k is the number of clusters, and
t is the number of iterations.
– Since both k and t are small. k-means is considered a linear
algorithm.
• K-means is the most popular clustering algorithm.
• Note that: it terminates at a local optimum if SSE is used.
The global optimum is hard to find due to complexity.
Weaknesses of k-means
• The algorithm is only applicable if the mean is
defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away from
other data points.
– Outliers could be errors in the data recording or
some special data points with very different values.
Weaknesses of k-means: Problems with
outliers
Weaknesses of k-means: To deal with
outliers
• One method is to remove some data points in the
clustering process that are much further away from the
centroids than other data points.
– To be safe, we may want to monitor these possible outliers over
a few iterations and then decide to remove them.
• Another method is to perform random sampling. Since in
sampling we only choose a small subset of the data
points, the chance of selecting an outlier is very small.
– Assign the rest of the data points to the clusters by distance or
similarity comparison, or classification
Weaknesses of k-means: Sensitivity to
initial seeds
Weaknesses of k-means: Can’t handle each
type of data

Unit 4 Clustering - K-Means and Hierarchical
Document40 pages
Unit 4 Clustering - K-Means and Hierarchical
animeshrajak649
No ratings yet
Clustering K-Means
Document28 pages
Clustering K-Means
Faysal Ahammed
100% (2)
9.54 Class 13: Unsupervised Learning
Document54 pages
9.54 Class 13: Unsupervised Learning
GrantMwakipunda
No ratings yet
ML 8
Document31 pages
ML 8
Tejas Sharma
No ratings yet
Final Clustering
Document21 pages
Final Clustering
NEEL GHADIYA
No ratings yet
Summary - MachineLearning (Part 2)
Document19 pages
Summary - MachineLearning (Part 2)
aril dan
No ratings yet
Clustring Data Mining
Document21 pages
Clustring Data Mining
Maryam Syed
No ratings yet
Presentation: Operating System Concept CS-582
Document13 pages
Presentation: Operating System Concept CS-582
Mujtaba Hassan
No ratings yet
ML (Interview)
Document20 pages
ML (Interview)
ratnadepp
No ratings yet
Object Recognition
Document43 pages
Object Recognition
A J
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
Document95 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
Rahul ganth
No ratings yet
8ad59658 1701235711480
Document36 pages
8ad59658 1701235711480
kashyaputtam7
No ratings yet
Preethi
Document11 pages
Preethi
preethipgowda2004
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
Document22 pages
Clustering Techniques - Hierarchical, K-Means Clustering
Tanya Sharma
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
Document9 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
Nikhil Jojen
No ratings yet
Presentation UNIT-2(Old)
Document58 pages
Presentation UNIT-2(Old)
Devchand Chaudhari
No ratings yet
06 Cluster Analysis
Document34 pages
06 Cluster Analysis
hawariya abel
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
Document42 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
Dineshkannan Ravi
No ratings yet
Assignment 5
Document3 pages
Assignment 5
Pujan Patel
No ratings yet
1731009606_Clustering_(Class_38-39)
Document45 pages
1731009606_Clustering_(Class_38-39)
TANISHA SINHA
No ratings yet
K Means Clustering
Document6 pages
K Means Clustering
Alina Corina Bala
No ratings yet
Kmeans Clustering
Document3 pages
Kmeans Clustering
Zack Attack
No ratings yet
KMean Merged
Document13 pages
KMean Merged
Abhyudya Singh
No ratings yet
Meeting 7 Unsupervised Learnign
Document95 pages
Meeting 7 Unsupervised Learnign
Antonio Victory
No ratings yet
ML_lecture14
Document17 pages
ML_lecture14
Aniket Dwivedi
No ratings yet
CLUSTERING
Document11 pages
CLUSTERING
Swarnlata
No ratings yet
6.nsupervised Learning Clustering Lecture 7 Slides For4962
Document37 pages
6.nsupervised Learning Clustering Lecture 7 Slides For4962
nimrashafiq604
No ratings yet
Clustering
Document104 pages
Clustering
Dev kartik Agarwal
No ratings yet
Clustering
Document80 pages
Clustering
Aatmaj Salunke
No ratings yet
Clustering (Unit 3)
Document71 pages
Clustering (Unit 3)
vedang maheshwari
100% (2)
Week 11
Document49 pages
Week 11
SvipDag
No ratings yet
Cluster
Document72 pages
Cluster
Shashank Gangadharabhatla
100% (1)
Chapter 7
Document29 pages
Chapter 7
praveenm026
No ratings yet
w6 Clustering
Document29 pages
w6 Clustering
Srisha Prasad Rath
No ratings yet
K-Means Clustering Algorithm
Document13 pages
K-Means Clustering Algorithm
Gaurav Raut
No ratings yet
Clustering
Document7 pages
Clustering
Deepak Varma
No ratings yet
Big Data
Document21 pages
Big Data
adnansohail438
No ratings yet
W6 Clustering
Document29 pages
W6 Clustering
5599RAJNISH SINGH
No ratings yet
22AIP3101A Session 9
Document38 pages
22AIP3101A Session 9
Deeraj Annamaneedi
No ratings yet
A Tutorial On Clustering Algorithms
Document4 pages
A Tutorial On Clustering Algorithms
jczerna
No ratings yet
Unit5 - Unsupervised Learning
Document48 pages
Unit5 - Unsupervised Learning
Soumya Mishra
No ratings yet
CH 04 Classification Techniques
Document89 pages
CH 04 Classification Techniques
1032210687
No ratings yet
5 Algoritma Klastering
Document85 pages
5 Algoritma Klastering
icobes ur
No ratings yet
KNN VS Kmeans
Document3 pages
KNN VS Kmeans
Soubhagya Kumar Sahoo
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
Document31 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
kateryna.koval
No ratings yet
Week 10
Document41 pages
Week 10
sirajquirish
No ratings yet
Unit 5
Document63 pages
Unit 5
Asif EE-010
No ratings yet
Machine Learning Unit-3.1
Document20 pages
Machine Learning Unit-3.1
sahil.utube2003
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
Document25 pages
19-K-Nearest Neighbor Learning.-22-08-2024
gnana25036
No ratings yet
Data Analytics CSE704 Module-2
Document42 pages
Data Analytics CSE704 Module-2
suryanshmishra425
No ratings yet
Unsupervised Learning
Document66 pages
Unsupervised Learning
Karim Saad
No ratings yet
DWM Exp7 C49
Document11 pages
DWM Exp7 C49
yadneshshende2223
No ratings yet
13 3 KMeans
Document15 pages
13 3 KMeans
Fairooz Toroshe
No ratings yet
UNIT2SVMKNN
Document31 pages
UNIT2SVMKNN
Aditya Sharma
No ratings yet
DM Lecture 06
Document32 pages
DM Lecture 06
Sameer Ahmad
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
Document50 pages
Lecture 13 - Unsupervised Learning, PCA ICA
kateryna.koval
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
Document45 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
hub23
No ratings yet
IS4242 W8 Similarity, NN and Clusters
Document29 pages
IS4242 W8 Similarity, NN and Clusters
wongdeshun4
No ratings yet
Estimating Missing Values of Heterogeneous Datasets by Clustering
Document24 pages
Estimating Missing Values of Heterogeneous Datasets by Clustering
vishalatdwork573
No ratings yet
Machine Learning with Python for Beginners
From Everand
Machine Learning with Python for Beginners
Saimon Carrie
No ratings yet
The Chinese Economy and A PESTEL Analysis
Document8 pages
The Chinese Economy and A PESTEL Analysis
Anonymous g90fKyhP9k
No ratings yet
Roots of Peace
Document5 pages
Roots of Peace
Aryan
No ratings yet
David Isaac Report - Smoke Density Allowed at UL - April 2014
Document2 pages
David Isaac Report - Smoke Density Allowed at UL - April 2014
The World Fire Safety Foundation
No ratings yet
The Efficacy of Theoretical and Practical Teaching Approach: A Research Proposal Presented To: Prof. Edwin Nebria
Document20 pages
The Efficacy of Theoretical and Practical Teaching Approach: A Research Proposal Presented To: Prof. Edwin Nebria
Vanesa Lataran
No ratings yet
Academic Calendar AY 2023 24
Document4 pages
Academic Calendar AY 2023 24
jagyanjit
No ratings yet
7 - Gravitation
Document5 pages
7 - Gravitation
Avik Das
No ratings yet
Chapter 5 - Process Flow and Layout
Document35 pages
Chapter 5 - Process Flow and Layout
Kamlendran Baradidathan
No ratings yet
My Favourite Toy-1 E.P.
Document7 pages
My Favourite Toy-1 E.P.
candela
No ratings yet
Bihar School Examination Board
Document1 page
Bihar School Examination Board
Rohan Singh
No ratings yet
E-BPLS Readiness Assessment - Gubat, Sorsogon
Document20 pages
E-BPLS Readiness Assessment - Gubat, Sorsogon
perestain
No ratings yet
Bio Statistics
Document8 pages
Bio Statistics
Rani Khatun
No ratings yet
14 Transportation
Document779 pages
14 Transportation
Paul Pinto J
No ratings yet
Managing Political Risk, Government Relations, and Alliances
Document26 pages
Managing Political Risk, Government Relations, and Alliances
Nikita Sangal
No ratings yet
License 13868a
Document26 pages
License 13868a
L. A. Paterson
No ratings yet
Performance Dashboards
Document76 pages
Performance Dashboards
Vera Lucia Kovács
100% (2)
JQuery UI in Action
Document2 pages
JQuery UI in Action
Dreamtech Press
0% (3)
360 Sci / P1a.10 You're in Charge SQ 09 - 10
Document7 pages
360 Sci / P1a.10 You're in Charge SQ 09 - 10
Paul Burgess
No ratings yet
Harvard Guide
Document7 pages
Harvard Guide
KidLeader93
No ratings yet
CCS ProjectGuide
Document7 pages
CCS ProjectGuide
rajkumar_jain4855
No ratings yet
Whs-Risk-Assessment-Tool-Camps-Excursions-Station 2018 1
Document11 pages
Whs-Risk-Assessment-Tool-Camps-Excursions-Station 2018 1
api-459118418
No ratings yet
T4 An Vuah Thung
Document6 pages
T4 An Vuah Thung
Saint_Dada
100% (2)
Monitoring Online Tests Through Data Visualization
Document12 pages
Monitoring Online Tests Through Data Visualization
Rahul Reddy Pakala
No ratings yet
Mmkholy Process Chart
Document2 pages
Mmkholy Process Chart
Waqas Ahmed
100% (1)
The Prospectus of Plants - A Guide To Professional Herbalism in Pathfinder 1e
Document8 pages
The Prospectus of Plants - A Guide To Professional Herbalism in Pathfinder 1e
Patrick Scannell
No ratings yet
Panem Et Circenses
Document6 pages
Panem Et Circenses
SFnord
No ratings yet
Govt. General Hospital, Kurnool Recruitment of Child Psychologist On Contract Basis - 2020 For 01 Post - FINAL MERIT LIST (19.10.2020)
Document2 pages
Govt. General Hospital, Kurnool Recruitment of Child Psychologist On Contract Basis - 2020 For 01 Post - FINAL MERIT LIST (19.10.2020)
vamsi karna
No ratings yet
(Original PDF) Business Statistics A First Course, Second 2nd Canadian Edition 2024 Scribd Download
Document51 pages
(Original PDF) Business Statistics A First Course, Second 2nd Canadian Edition 2024 Scribd Download
asatkazayi
100% (5)
Sunbeam Corporation
Document2 pages
Sunbeam Corporation
Arch Faith Lejarso-Serisola
No ratings yet
APPENDIX E. Computation
Document14 pages
APPENDIX E. Computation
Marie Gonzales
No ratings yet
Support Loads
Document18 pages
Support Loads
mstojcevic
No ratings yet