UNIT5
UNIT5
UNIT5
Course Content
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
• Understanding
• Group related documents for browsing,
group genes and proteins that have similar
functionality, or group stocks with similar
price fluctuations
• Summarization
• Reduce the size of large data sets
Clustering
precipitation in
Australia
What is not Cluster Analysis?
• Supervised classification
• Have class label information
• Simple segmentation
• Dividing students into different registration groups alphabetically,
by last name
• Results of a query
• Groupings are a result of an external specification
• Graph partitioning
• Some mutual relevance and synergy, but areas are not identical
Notion of a Cluster can be Ambiguous
• Hierarchical clustering
• A set of nested clusters organized as a hierarchical tree
Partitional Clustering
• Center-based clusters
• Contiguous clusters
• Density-based clusters
• Property or Conceptual
• Well-Separated Clusters:
• A cluster is a set of points such that any point in a cluster is closer
(or more similar) to every other point in the cluster than to any
point not in the cluster.
3 well-separated clusters
Types of Clusters: Center-Based
• Center-based
• A cluster is a set of objects such that an object in a cluster is closer
(more similar) to the “center” of a cluster, than to the center of any
other cluster
• The center of a cluster is often a centroid, the average of all the
points in the cluster, or a medoid, the most “representative” point
of a cluster
4 center-based clusters
Types of Clusters: Contiguity-Based
8 contiguous clusters
Types of Clusters: Density-Based
• Density-based
• A cluster is a dense region of points, which is separated by low-
density regions, from other regions of high density.
• Used when the clusters are irregular or intertwined, and when noise
and outliers are present.
6 density-based clusters
Types of Clusters: Conceptual Clusters
2 Overlapping Circles
Types of Clusters: Objective Function
• Want to minimize the edge weight between clusters and maximize the edge
weight within clusters
Characteristics of the Input Data Are Important
• Hierarchical clustering
K-means Clustering
2.5
2
Original Points
1.5
y
1
0.5
3 3
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
Iteration 6
1
2
3
4
5
3
2.5
1.5
y 1
0.5
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Evaluating K-means Clusters
• Most common measure is Sum of Squared Error (SSE)
• For each point, the error is the distance to the nearest cluster
• To get SSE, we square these errors and sum them.
0.2
0.15
0.1
0.05
0
1 3 2 5 4 6
Strengths of Hierarchical Clustering
• Do not have to assume any particular number of clusters
• Any desired number of clusters can be obtained by ‘cutting’ the dendogram at
the proper level
• Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there are k clusters)
C2 C5
Intermediate Situation
C1 C2 C3 C4 C5
• We want to merge the two closest clusters (C2 and C5) and update the proximity
matrix. C1
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1
C2 C5
After Merging
C2
• The question is “How do we update the proximity matrix?” U
C1 C3 C4
C5
C1 ?
C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?
Proximity Matrix
C1
C2 U C5
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
Similarity?
p2
p3
p4
p5
MIN
.
MAX .
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX .
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX .
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX .
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX .
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
Cluster Similarity: MIN or Single Link
• Similarity of two clusters is based on the two most similar (closest)
points in the different clusters
• Determined by one pair of points, i.e., by one link in the proximity graph.
1 2 3 4 5
Hierarchical Clustering: MIN
5
1
3
5 0.2
2 1 0.15
2 3 6 0.1
0.05
4
4 0
3 6 2 5 4 1
1 2 3 4 5
Hierarchical Clustering: MAX
4 1
2 5 0.4
0.35
5
2 0.3
0.25
3 6
0.2
3 0.15
1 0.1
4 0.05
0
3 6 4 1 2 5
1 2 3 4 5
Hierarchical Clustering: Group Average
5 4 1
0.25
2
5 0.2
2
0.15
3 6 0.1
1 0.05
4 0
3 6 4 1 2 5
3
• Strengths
• Less susceptible to noise and outliers
• Limitations
• Biased towards globular clusters
Cluster Similarity: Ward’s Method
• Similarity of two clusters is based on the increase in squared error when
two clusters are merged
• Similar to group average if distance between points is distance squared
5
1 5 4 1
2 2
5 Ward’s Method 5
2 2
3 6 Group Average 3 6
3
4 1 1
4 4
3
Hierarchical Clustering: Time and Space requirements