Ex 9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

CMSC5724: Exercise List 9

Answer Problems 1-2 based on the following dataset:

10
9
8
7
6 d

5 a
b c
4
3
2
1

1 2 3 4 5 6 7 8 9 10

Problem 1. Recall that, in discussing hierarchical clustering, we introduced 3 distance metrics on


two sets of points: min, max, and mean. Let S1 = {a, c} and S2 = {b, d}. What is the distance
between S1 and S2 under those three metrics, respectively (assuming that the distance of two points
is calculated by Euclidean distance)?

Problem 2. Show the dendrogram returned by the Agglomerative algorithm under the min and
max metrics, respectively.

Problem 3. Suppose that we use dmin to define the similarity of two clusters C1 , C2 . Give an
algorithm to compute the dendrogram on n points in O(n2 log n) time.

Problem 4. Suppose that we use dmeanP to define the similarity of two clusters C1 , C2 . As discussed
1
in the lecture, dmean (C1 , C2 ) = |C1 ||C2 | (p1 ,p2 )∈C1 ×C2 dist(p1 , p2 ). Give an algorithm to compute
the dendrogram on n points in O(n2 log n) time.

Problem 5. Consider the set P of points below:

10
9
8
a
7
6
5
4 b
3
c
2
1

1 2 3 4 5 6 7 8 9 10

1
Set  = 1 and minpts = 3. Show the clusters output by DBSCAN, assuming that the distance
metric is Euclidean distance.

Problem 6. Given a pair of parameters  and minpts, describe an algorithm to compute the
DBSCAN clusters in O(n2 ) time, assuming that the distance metric is Euclidean distance, and
that the dimensionality of the data space is a constant.

You might also like