19 - Sessionppt - Clusteringalgos
19 - Sessionppt - Clusteringalgos
19 - Sessionppt - Clusteringalgos
AGENDA PART 1
• Types of different Clustering Algorithms
• Partitioning or Similarity based Approach – Kmeans Algorithm
• What Is K Means Clustering
• Implementation of K means Clustering
• WCSS And Elbow Method To find No. Of clusters
• Python Implementation of K means Clustering
Types of Clustering Algorithms
• Partitioning or Similarity based Approach
• Hierarchical Clustering – Divisive and Agglomerative Approach
• Density Based
• Graph Based
• Probabilistic Based Topic Modelling using LDA (Latent Dirichlet
Algorithm)
Overview – K Means Clustering
• K means is one of the most popular Unsupervised Machine Learning
Algorithms Used for Solving Classification Problems.
• K Means segregates the unlabeled data into various groups, called
clusters, based on having similar features, common patterns.
What Is K Means Algorithm?
• Similarly, we can calculate all the distances and fill the proximity
matrix.
Steps to Perform Hierarchical Clustering
• Different colors here represent different clusters. You can see that we
have 5 different clusters for the 5 points in our data.
Steps to Perform Hierarchical Clustering
• Step 2: Next, we will look at the smallest distance in the proximity
matrix and merge the points with the smallest distance. We then
update the proximity matrix
• Here, the smallest distance is 3 and hence we will merge point 1 and 2:
Steps to Perform Hierarchical Clustering
• Let’s look at the updated clusters and accordingly update the proximity matrix
• Here, we have taken the maximum of the two marks (7, 10) to replace the
marks for this cluster. Instead of the maximum, we can also take the minimum
value or the average values as well. Now, we will again calculate the proximity
matrix for these clusters:
Steps to Perform Hierarchical Clustering
• Step 3: We will repeat step 2 until only a single cluster is left.
• So, we will first look at the minimum distance in the proximity matrix
and then merge the closest pair of clusters. We will get the merged
clusters as shown below after repeating these steps:
Finally
• We have the samples of the dataset on the x-axis and the distance on
the y-axis. Whenever two clusters are merged, we will join them in
this dendrogram and the height of the join will be the distance
between these points. Let’s build the dendrogram for our example:
Dendogram
Dendogram
• Take a moment to process the above
image. We started by merging sample
1 and 2 and the distance between
these two samples was 3 (refer to the
first proximity matrix in the previous
section). Let’s plot this in the
dendrogram
• Here, we can see that we have merged
sample 1 and 2. The vertical line
represents the distance between these
samples.
Dendogram
• We can clearly visualize the steps of
hierarchical clustering. More the
distance of the vertical lines in the
dendrogram, more the distance
between those clusters.
• Now, we can set a threshold distance
and draw a horizontal line
(Generally, we try to set the
threshold in such a way that it cuts
the tallest vertical line). Let’s set this
threshold as 12 and draw a
horizontal line:
Dendogram
• The number of clusters will be the number of vertical lines which are
being intersected by the line drawn using the threshold. In the above
example, since the red line intersects 2 vertical lines, we will have 2
clusters. One cluster will have a sample (1,2,4) and the other will have
a sample (3,5).