Advantages
&
Disadvantages
of
k-‐Means
and
Hierarchical
clustering
(Unsupervised
Learning)
Machine
Learning
for
Language
Technology
ML4LT
(2016)
Marina
San(ni
Department
of
LinguisHcs
and
Philology
Uppsala
University
2016
Advantages
&
Disadvantages
of
k-‐Means
and
Hierarchical
Clustering
1
Outline
• k-‐Means:
Advantages
and
Disadvantages
• Hierarchical
Clustering:
Advantages
and
Disadvantages
2016
Advantages
&
Disadvantages
of
k-‐Means
and
Hierarchical
Clustering
2
k-‐Means: Advantages and Disadvantages
Advantages
• Easy
to
implement
• With
a
large
number
of
variables,
K-‐Means
may
be
computaHonally
faster
than
hierarchical
clustering
(if
K
is
small).
• k-‐Means
may
produce
Hghter
clusters
than
hierarchical
clustering
• An
instance
can
change
cluster
(move
to
another
cluster)
when
the
centroids
are
re-‐ computed.
Disavantages
• Difficult
to
predict
the
number
of
clusters
(K-‐Value)
• IniHal
seeds
have
a
strong
impact
on
the
final
results
• The
order
of
the
data
has
an
impact
on
the
final
results
• SensiHve
to
scale:
rescaling
your
datasets
(normalizaHon
or
standardizaHon)
will
completely
change
results.
While
this
itself
is
not
bad,
not
realizing
that
you
have
to
spend
extra
a4en(on
to
scaling
your
data
might
be
bad.
2016
Advantages
&
Disadvantages
of
k-‐Means
and
Hierarchical
Clustering
3
Hierarchical Clustering: Advantages and Disadvantages
Advantages
• Hierarchical
clustering
outputs
a
hierarchy,
ie
a
structure
that
is
more
informaHve
than
the
unstructured
set
of
flat
clusters
returned
by
k-‐means.
Therefore,
it
is
easier
to
decide
on
the
number
of
clusters
by
looking
at
the
dendrogram
(see
suggesHon
on
how
to
cut
a
dendrogram
in
lab8).
• Easy
to
implement
Disavantages
• It
is
not
possible
to
undo
the
previous
step:
once
the
instances
have
been
assigned
to
a
cluster,
they
can
no
longer
be
moved
around.
• Time
complexity:
not
suitable
for
large
datasets
• IniHal
seeds
have
a
strong
impact
on
the
final
results
• The
order
of
the
data
has
an
impact
on
the
final
results
• Very
sensiHve
to
outliers
2016
Advantages
&
Disadvantages
of
k-‐Means
and
Hierarchical
Clustering
4
The end
Advantages
&
Disadvantages
of
k-‐Means
and
Hierarchical