Welcome to Scribd!

0% found this document useful (0 votes)

79 views

2 ADA Cluster Analysis

Uploaded by

Cluster analysis is an unsupervised machine learning technique that groups unlabeled data points into clusters. The goal is to categorize data objects such that objects within the same cluster are more similar to each other than objects in different clusters. Good clustering produces high intra-cluster similarity and low inter-cluster similarity. Hierarchical clustering builds clusters progressively by either merging or splitting clusters based on a distance measure between objects. The results are visualized in a dendrogram tree structure.

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

2 ADA Cluster Analysis

Uploaded by

Ash

0% found this document useful (0 votes)

79 views12 pages

Original Title

2 ADA cluster analysis.ppt

Copyright

Available Formats

PPT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Download as ppt, pdf, or txt

0% found this document useful (0 votes)

79 views12 pages

2 ADA Cluster Analysis

Uploaded by

Ash

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Download as ppt, pdf, or txt

Jump to Page

You are on page 1of 12

Search inside document

What is Cluster Analysis?

• Cluster: a collection of data objects

– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters

• Cluster analysis
– Natural grouping a set of data objects into
clusters

• Clustering is unsupervised classification: no

predefined classes
Objective of clustering
algorithms for categorical data
• Partition the objects into groups.
– Objects with similar categorical attribute
values are placed in the same group.
– Objects in different groups contain
dissimilar categorical attribute values.
General Applications of Clustering
• Pattern Recognition
• Spatial Data Analysis
– create thematic maps in GIS by clustering feature
spaces
– detect spatial clusters and explain them in spatial data
mining
• Image Processing
• Economic Science (especially market research)
• WWW
– Document classification
– Cluster Weblog data to discover groups of similar
access patterns
What Is Good Clustering?
• A good clustering method will produce high quality clusters with
– high intra-class similarity
– low inter-class similarity

• The quality of a clustering depends on:

– Appropriateness of method for dataset.
– The (dis)similarity measure used
– Its implementation.

• The quality of a clustering method is also measured by its ability to

discover some or all of the hidden patterns.
Data Structures
• Data matrix  x11 ... x1f ... x1p 
 
 ... ... ... ... ... 
x ... xif ... xip 
 i1 
 ... ... ... ... ... 
x ... xnf ... xnp 
 n1 

 0 
 d(2,1) 
• Dissimilarity matrix  0 
 d(3,1) d ( 3,2) 0 
 
 : : : 
d ( n,1) d ( n,2) ... ... 0
Similarity and Dissimilarity
Between Objects
• Distances are normally used to measure the similarity or
dissimilarity between two data objects
• Some popular ones include: Minkowski distance:
d (i, j)  q (| x  x |q  | x  x |q ... | x  x |q )
i1 j1 i2 j2 ip jp

where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two
p-dimensional data objects, and q is a positive integer
• If q = 1, d is Manhattan distance
d (i, j) | x  x |  | x  x | ... | x  x |
i1 j1 i2 j 2 ip jp
Similarity and Dissimilarity
Between Objects (Cont.)
• If q = 2, d is Euclidean distance:
d (i, j)  (| x  x | 2  | x  x |2 ... | x  x |2 )
i1 j1 i2 j2 ip jp
– Properties
• d(i,j)  0
• d(i,i) = 0
• d(i,j) = d(j,i)
• d(i,j)  d(i,k) + d(k,j)
• Also one can use weighted distance, parametric
Pearson product moment correlation, or other
disimilarity measures.
Clustering of genomic data sets
Hierarchical Clustering
• Use distance matrix as clustering criteria. This
method does not require the number of clusters
k as an input, but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
Agglomerative nesting
(AGNES)
a
ab
b abcde
c
cde
d
de
e
Divisive analysis
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
AGNES (Agglomerative
Nesting)
• Introduced in Kaufmann and Rousseeuw (1990)
• Merge nodes that have the least dissimilarity
• Go on in a non-descending fashion
• Eventually all nodes belong to the same cluster

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
A Dendrogram Shows How the
Clusters are Merged Hierarchically

Decompose data objects into several levels of nested

partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the

dendrogram at the desired level, then each connected
component forms a cluster.
Dendrogram

Information Security Hourly 1
Document2 pages
Information Security Hourly 1
MUHAMMAD FAHAD ALI KHAN
No ratings yet
The slang of astronomers: A small guide to understand the language of astrophysicists
From Everand
The slang of astronomers: A small guide to understand the language of astrophysicists
Alice Colzani
No ratings yet
Hopeless: Barack Obama and the Politics of Illusion
From Everand
Hopeless: Barack Obama and the Politics of Illusion
Jeffrey St. Clair
Rating: 3 out of 5 stars
3/5 (6)
To Buy or to Rent: That is the Question. The Great Australian Dream, Fact or Fiction.
From Everand
To Buy or to Rent: That is the Question. The Great Australian Dream, Fact or Fiction.
Dileepa Ethapane
No ratings yet
Clustering Data Mining
Document27 pages
Clustering Data Mining
Andrew
No ratings yet
Clustering
Document84 pages
Clustering
manmeet singh tuteja
No ratings yet
Lect 10 DM
Document36 pages
Lect 10 DM
Saba Tariq
No ratings yet
Lec09 Clustering
Document27 pages
Lec09 Clustering
Samreen Begum
No ratings yet
Clustering
Document125 pages
Clustering
Fariya Afrin
No ratings yet
21csc305p Machine Learning Unit 3_updated (2)
Document147 pages
21csc305p Machine Learning Unit 3_updated (2)
mn6186
No ratings yet
Introduction To Data Mining Clustering Analysis
Document84 pages
Introduction To Data Mining Clustering Analysis
ak
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
Document9 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
Irum Fatima
No ratings yet
Cluster
Document20 pages
Cluster
sondaravalli
No ratings yet
Chapter Two - Algo
Document41 pages
Chapter Two - Algo
nuredinmaru5
No ratings yet
Clustering
Document45 pages
Clustering
sujan.cseru
No ratings yet
CH-6 DM Clustering
Document28 pages
CH-6 DM Clustering
addis alemayhu
No ratings yet
Lecture 23 - Clustring
Document14 pages
Lecture 23 - Clustring
bscs-20f-0009
No ratings yet
Module5 - Outlier - Analysis: Reference: "Data Mining The Text Book", Charu C. Aggarwal, Springer, 2015. (Chapters 8)
Document21 pages
Module5 - Outlier - Analysis: Reference: "Data Mining The Text Book", Charu C. Aggarwal, Springer, 2015. (Chapters 8)
Rohith Roh
No ratings yet
Lecture 31
Document16 pages
Lecture 31
Tanveer Ramzan
No ratings yet
8.hierarchical AGNES DIANA
Document46 pages
8.hierarchical AGNES DIANA
Shreyas Paraj
No ratings yet
CS273a Final Exam
Document9 pages
CS273a Final Exam
Imelda
No ratings yet
Presentation On Hierarchical
Document10 pages
Presentation On Hierarchical
Shreyansh Shukla
No ratings yet
Viden Io Data Analytics Clustering Kmeans
Document32 pages
Viden Io Data Analytics Clustering Kmeans
Ram Chandu
No ratings yet
Lecture 12&13
Document89 pages
Lecture 12&13
QUANG ANH B18DCCN034 PHẠM
No ratings yet
Topic 6d - Hierarchical Algorithm
Document38 pages
Topic 6d - Hierarchical Algorithm
Nurizzati Md Nizam
No ratings yet
Clustering
Document29 pages
Clustering
01aasthathakur
No ratings yet
Hidden Variables, The EM Algorithm, and Mixtures of Gaussians
Document58 pages
Hidden Variables, The EM Algorithm, and Mixtures of Gaussians
DUDEKULA VIDYASAGAR
No ratings yet
Lecture 13
Document45 pages
Lecture 13
zafar.phdcs82
No ratings yet
Clustering Partitioning Methods
Document20 pages
Clustering Partitioning Methods
2K19/BMBA/13 RITIKA
No ratings yet
CGR U2T3 Presentation7
Document45 pages
CGR U2T3 Presentation7
Kaiwalya Matre
No ratings yet
đề học máy 1
Document3 pages
đề học máy 1
Năng Anh Nguyễn
No ratings yet
Clustering
Document25 pages
Clustering
Bhavani Viswa
No ratings yet
clustering
Document16 pages
clustering
aishwary srivastav
No ratings yet
What Is Cluster Analysis?
Document56 pages
What Is Cluster Analysis?
Kumar Vasimalla
No ratings yet
Computer Graphics: Lecture 05-Polygon Filling
Document32 pages
Computer Graphics: Lecture 05-Polygon Filling
devzani nipa
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
Document33 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
preetam
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
Document24 pages
Chp10 Cluster Analysis Basic Concepts and Methods
raadsha
No ratings yet
CT075!3!2 DTM Topic 10 Cluster Analysis
Document21 pages
CT075!3!2 DTM Topic 10 Cluster Analysis
kishanselvarajah80
No ratings yet
Digital Images
Document3 pages
Digital Images
Anubhav Bhatnagar
No ratings yet
Lecture 2.3.7-2.3.9
Document53 pages
Lecture 2.3.7-2.3.9
hw941885
No ratings yet
PPT6 Matrix Algebra
Document15 pages
PPT6 Matrix Algebra
Rano Acun
No ratings yet
Cluster Analysis: Concepts and Techniques - Chapter 7
Document60 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
Suchithra Salilan
100% (1)
FDMcode
Document9 pages
FDMcode
Sand Shukla
No ratings yet
Data Mining: Concepts and Techniques: Cluster Analysis
Document97 pages
Data Mining: Concepts and Techniques: Cluster Analysis
sunnynnus
No ratings yet
Junior78 Nov1
Document12 pages
Junior78 Nov1
Abacus Graphics
No ratings yet
Questions
Document5 pages
Questions
huzaifa mirza
No ratings yet
Image Data Compression
Document13 pages
Image Data Compression
Hussain
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Network Models
Document20 pages
CS109/Stat121/AC209/E-109 Data Science: Network Models
Matheus Silva
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Network Models
Document20 pages
CS109/Stat121/AC209/E-109 Data Science: Network Models
Matheus Silva
No ratings yet
Lecture 2 Geoprocessing
Document108 pages
Lecture 2 Geoprocessing
Abhishek Raj Soni
No ratings yet
SE-Comps SEM3 DS-CBCGS DEC19 SOLUTION
Document30 pages
SE-Comps SEM3 DS-CBCGS DEC19 SOLUTION
Tauros OP
No ratings yet
COMP 482: Design and Analysis of Algorithms: Spring 2013
Document34 pages
COMP 482: Design and Analysis of Algorithms: Spring 2013
shiv kumar
No ratings yet
Tarea 4
Document6 pages
Tarea 4
Jonathan Jiménez
No ratings yet
Chapter 4 Data Managmnt Lesson 3 Measures of Dispersion
Document9 pages
Chapter 4 Data Managmnt Lesson 3 Measures of Dispersion
Christine Cagara
No ratings yet
Week 09
Document26 pages
Week 09
THIRUKKULURU JHASHANK KUMAR
No ratings yet
K Means Clustering
Document17 pages
K Means Clustering
Wet
No ratings yet
Concepts and Techniques: - Chapter 7
Document123 pages
Concepts and Techniques: - Chapter 7
zienab
No ratings yet
4.3 K-Medoids
Document31 pages
4.3 K-Medoids
Pynshngain
No ratings yet
Dimensionality Reduction
Document33 pages
Dimensionality Reduction
Yashwanth Yashu
No ratings yet
Discrete Math Lec410!22!21
Document141 pages
Discrete Math Lec410!22!21
Christian James Capule
No ratings yet
CSE 319 Pattern Recognition: Clustering
Document58 pages
CSE 319 Pattern Recognition: Clustering
rumasum
No ratings yet
Global Production and Outsourcing
Document35 pages
Global Production and Outsourcing
Ash
100% (1)
2.diffrences in Culture
Document39 pages
2.diffrences in Culture
Ash
No ratings yet
14JE000423-Rajat Kumar Jain - B.Tech (Computer Science and Engineering)
Document2 pages
14JE000423-Rajat Kumar Jain - B.Tech (Computer Science and Engineering)
Ash
No ratings yet
Page 1 of 7
Document7 pages
Page 1 of 7
Ash
No ratings yet
Chronic Kidney Disease Prediction Using Machine Learning Techniques
Document19 pages
Chronic Kidney Disease Prediction Using Machine Learning Techniques
Vathsala B T Dept. of Computer Applications
No ratings yet
DEFINING COMPLEXITY-a Comentary Paper To Charles Bennett
Document9 pages
DEFINING COMPLEXITY-a Comentary Paper To Charles Bennett
dani
No ratings yet
Applied Econometrics With R
Document5 pages
Applied Econometrics With R
Sebastian Guatame
No ratings yet
Elliptic Curve Cryptography
Document21 pages
Elliptic Curve Cryptography
Tamizharasi A
No ratings yet
Option X Operation Research UoS
Document3 pages
Option X Operation Research UoS
mastermind_asia9389
No ratings yet
Thesis Slide
Document24 pages
Thesis Slide
maha
No ratings yet
Eigenvectors and Eigenvalues of A Matrix: I A I A
Document8 pages
Eigenvectors and Eigenvalues of A Matrix: I A I A
Selva Raj
No ratings yet
Engineering Math Rule Summary (Week 1 To 6)
Document9 pages
Engineering Math Rule Summary (Week 1 To 6)
Raghad Al-Shaikh
No ratings yet
Optimal Control
Document41 pages
Optimal Control
Adem BRITAH
No ratings yet
Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
Document24 pages
Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
jay
No ratings yet
Simulation of Job Shop Using Arena - Mini Project
Document30 pages
Simulation of Job Shop Using Arena - Mini Project
Kailas Sree Chandran
100% (3)
Ungraded Quiz Questions and Answers 13 Feb 2020
Document5 pages
Ungraded Quiz Questions and Answers 13 Feb 2020
Liu Mengrui
No ratings yet
Simulation Game Theory
Document15 pages
Simulation Game Theory
Prateek Khandelwal
No ratings yet
Dynamic - Characterstics of Instruments Aug 2023
Document96 pages
Dynamic - Characterstics of Instruments Aug 2023
Sc Raya
No ratings yet
A Personal View of Average-Case Complexity
Document14 pages
A Personal View of Average-Case Complexity
CampBuddy Files
No ratings yet
Are View of Railtrack Degradation Prediction Models
Document17 pages
Are View of Railtrack Degradation Prediction Models
RaghavendraS
No ratings yet
MGMT6170 Business Quantitative Method: Week 1 - Session 2 Introduction To Quantitative Analysis
Document45 pages
MGMT6170 Business Quantitative Method: Week 1 - Session 2 Introduction To Quantitative Analysis
novalia
No ratings yet
Wahyudi 2020
Document10 pages
Wahyudi 2020
yulianaliffano
No ratings yet
Aman Kumar
Document1 page
Aman Kumar
Nanssjjs
No ratings yet
Logistic Regression
Document12 pages
Logistic Regression
shahid
100% (1)
Design of FIR Half-Band Filter With Controllable T PDF
Document12 pages
Design of FIR Half-Band Filter With Controllable T PDF
neelima422
No ratings yet
Group Assignment
Document4 pages
Group Assignment
Adam GameChannel
No ratings yet
12 Apsp
Document6 pages
12 Apsp
seemasurana
No ratings yet
Lecture6.1 SJF OS
Document29 pages
Lecture6.1 SJF OS
Umair Saleem
No ratings yet
Cs 4511 Mid1 s18
Document2 pages
Cs 4511 Mid1 s18
Test
No ratings yet
Syllabus 1MA462 1
Document2 pages
Syllabus 1MA462 1
apostolos
No ratings yet
Summative 1Q2
Document2 pages
Summative 1Q2
elizabeth tangpos
No ratings yet
Mathematical Foundations of Computer Science Question Paper
Document1 page
Mathematical Foundations of Computer Science Question Paper
murugesh72
No ratings yet