Clustering X
Clustering X
Clustering X
Factor Analysis is reduction of variables generally known as dimension reduction. One the other
hand cluster analysis is used to reduce the number of records or cases, commonly known as
segmentation.
Clustering is used for creating similar groups and the cases in a group is a cluster.
Factor analysis is based on the concept of correlation between the variables. In a cluster you
should have good similarity between the cases, and it should be quite dissimilar to the cases in
the other cluster.
How to calculate similarity what the algorithms that people use.
To calculate the similarity matrix different kind of distance algorithm can be used in R. The
popular algorithms to calculate the distance is 1) Euclidean distance.
Distance between and b is sqrt((a1-b1) ^2+(a2-b2) ^2…. +(an-bn) ^2) depends on
number of variables in study.
Chebyshev distance is also user. In this case they take the modulus of the distance|a1-b1|, |a2-
b2| and the largest distance is the mode distance.
Manhattan Distance This method calculates the modulus values and then adds them.
I R algorithm we will be using the Euclidean method to calculate the distance.
Process to do clustering
Step 1: To identify the variables for clustering. More the number of variables the good will be
clustering.
Step2: Decide the clustering procedure.one is Hierarchal clustering second one is Non-Hierarchal
clustering. Better is Hierarchal clustering.
Step 3: To calculate the similarity and dissimilarity matrix using Euclidean distance.
Step4: Select the clustering method.
Step5: Decide the number of clusters which generally comes from the business context.
Step6: To create the cluster profile and check which case is coming is under which clusters.
The objective of the dataset cust.csv is to cluster the customers based on the following
variables.
o First monthly average spending
o Number of visits to departmental store
o Number of apparel purchase
o Number of high value item purchase
o Number the staple value purchased
Libraries NbClust
Fpc
Cluster
The structure of the file Cust.csv shows there are 10 observations with 7 variables.
Before the running the cluster bring all the variables to same scale
To scale the variables which are input in to cluster or scale the variables on which you require
clustering.
To scale the variables in cluster we will use inbuilt function named scale.