Jaggia BA 1e Chap011 PPT

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 32

11

Unsupervised Data
Mining

Business Analytics, 1e
By Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, and
Leida Chen

Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or


5/13/2020 distribution without the prior written consent of McGraw-Hill Education.
11-1
Chapter 11 Learning Objectives
(LOs)

LO 11.1 Conduct hierarchical cluster analysis.


LO 11.2 Conduct k-means cluster analysis.
LO 11.3 Conduct association rule analysis.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-2


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-2
Education.
Introductory Case: Nutritional Facts of
Candy Bars
• Aliyah is an honors student at a prestigious business
school in Southern California. She is also a fledgling
entrepreneur and owns a vending machine business.
Aliyah is aware that California consumers are becoming
increasingly health conscious when it comes to food
purchase. Aliyah wants to come up with a better
selection of candy bars and strategically group and
display them in her vending machines.

• Aliyah wants to use the information to accomplish the


below tasks.
1. Analyze the nutritional fact data and group candy products
according to their nutritional content.
2. Select aBUSINESS variety ANALYTICS,
of candy bars fromKelly,
1e | Jaggia, each group to Chen
Lertwachara, better meet the 11-3
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
Copyright © 2021taste of Education.
McGraw-Hill today’s All consumers.
distribution
rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-3
Education.
3. Display the candy bars in her vending machines according to the
11.1: Hierarchical Cluster Analysis
(1/14)
• Unsupervised data mining requires no
knowledge of the target variable.
• The algorithms allow the computer to identify
complex processes and patterns without any
specific guidance from the analyst.
• It is an important part of exploratory data
analysis because it makes no distinction
between the target variable and the predictor
variables .
• Uses similarity measures: Euclidian, Manhattan,
Jaccard’s
• We explore two core unsupervised data mining
techniques: cluster analysis and association
rule analysis.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-4
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-4
Education.
11.1: Hierarchical Cluster Analysis
(2/14)
• Cluster analysis is an unsupervised data mining
technique that groups data into categories that
share some similar characteristic or trait.
– Similar within a cluster, dissimilar across clusters
– Uses similarity measures
• Allows useful exploratory analysis by
summarizing a large number of observations in
a data set into a small number of clusters.
• The cluster characteristics or profiles help us
understand and describe the different groups.
• A popular application of cluster analysis is
called customer or market segmentation.
• Two common clustering techniques:
hierarchical clustering and k-means clustering.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-5
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-5
Education.
11.1: Hierarchical Cluster Analysis
(3/14)
• Hierarchical clustering is a technique that
uses an iterative process to group data into a
hierarchy of clusters.
– Agglomerative clustering (AGNES): top-down,
starts with each observation being its own cluster,
iteratively merges clusters that are similar moving
up the hierarchy
– Divisive clustering (DIANA): bottom-up, starts with
a single cluster, iteratively separating the most
dissimilar observations moving down the hierarchy
• We focus on agglomerative clustering, which
is the most commonly used approach.
• The methods can be adapted to implement
divisive clustering.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-6
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-6
Education.
11.1: Hierarchical Cluster Analysis

(4/14)
With AGNES, each observation in the data initially forms its own
cluster.
• The algorithm then successively merges these clusters into
larger clusters based on their similarity until all observations
are merged into one final cluster, referred to as a root.
• Uses (dis)similarity measures.
– Numeric: Euclidean distance or Manhattan distance
– Categorical: matching, Jaccard’s coefficient
• Uses the z-score standardization.
• Linkage methods to evaluate (dis)similarity between clusters.
– Single: nearest distance between a pair of observations not in the same
cluster
– Complete: farthest distance between a pair of observations not in the same
cluster
– Centroid: distance between the center/centroid or mean values of the
clusters
– Average: average distance between all pairs of observations not in the same
cluster
– Ward’s: uses error sum of squares (ESS), which is the squared difference
between individual observations and the cluster mean; measures the loss of
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-7
information that occurs when
Copyright observations
© 2021 are clustered
McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-7
Education.
11.1: Hierarchical Cluster Analysis
(5/14)

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-8


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-8
Education.
11.1: Hierarchical Cluster Analysis
(6/14)
• Once AGNES completes the clustering process,
data are usually represented in a treelike
structure.
– Called a dendrogram
– Branches are clusters
– An observation is a “leaf”
– Visually inspect the clustering result and determine the
appropriate number of clusters
• The height of each branch (cluster) or sub-branch
(sub-cluster) indicates how dissimilar it is from
the other branches or sub-branches with which it
is merged.
• The greater the height, the more distinctive the
cluster is from the other clusters.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-9
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-9
Education.
11.1: Hierarchical Cluster Analysis
(7/14)

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-10


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-10
Education.
11.1: Hierarchical Cluster Analysis
(8/14)
• Relying solely on the height of a dendrogram tree branch
may lead to statistically distinctive clusters that have little
or no practical meaning.
• We often take into account both quantitative measures
(such as a dendrogram) and practical considerations to
determine the number of clusters.
• We should also review the profile of each cluster using
descriptive statistics.
• Another common approach to profile clusters is to
incorporate variables that were not used in clustering but of
interest to the decision maker.
• The ability of a clustering method to discover useful hidden
patterns of the data depends on how it is implemented: data
transformations, distance measures, algorithm, linkage.
• Try several techniques, use the one that makes the most
sense.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-11
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-11
Education.
11.1: Hierarchical Cluster Analysis
(9/14)

• Example: Consider the crime crate, median


income, and poverty rate for 41 cities.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-12


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-12
Education.
11.1: Hierarchical Cluster Analysis
(10/14)
• With Excel

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-13


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-13
Education.
11.1: Hierarchical Cluster Analysis
(11/14)
• With Excel

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-14


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-14
Education.
11.1: Hierarchical Cluster Analysis
(12/14)
• With R

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-15


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-15
Education.
11.1: Hierarchical Cluster Analysis
(13/14)
• With R

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-16


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-16
Education.
11.1: Hierarchical Cluster Analysis
(14/14)
• With R

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-17


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-17
Education.
11.2: k-Means Cluster Analysis (1/6)
• The objective is to divide the sample into a prespecified
number k of nonoverlapping clusters so that each of
these k clusters is as homogenous as possible.
• The number of clusters k needs to be specified prior to
performing the analysis.
• We may experiment with different values of k until we
obtain a desired result, or use hierarchical clustering
methods to help determine the appropriate k.
• In addition, we may have prior knowledge or theories
about the subjects under study and can determine the
appropriate number of clusters based on domain
knowledge.
• The k-means clustering method can only be applied to
data with numerical variables.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-18


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-18
Education.
11.2: k-Means Cluster Analysis (2/6)
• The k-means algorithm is based on the choice of the
initial cluster centers. The general process is below.
1. Specify the k value
2. Randomly assign k observations as cluster centers
3. Assign each observation to its nearest cluster center
4. Calculate cluster centroids
5. Reassign each observation to a cluster with the nearest
centroid
6. Recalculate the cluster centroids, and repeat step 5
7. Stop when reassigning observations can no longer improve
within-cluster dispersion.
• Dispersion is defined as the sum of Euclidean
distances of observations for their respective cluster
centers.
• Results from k-means clustering are highly sensitive to
the random process for finding the initial cluster
centers as well as implementing specific algorithms.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-19
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-19
Education.
11.2: k-Means Cluster Analysis (3/6)

• Example: Introductory case—group candy bars


into meaningful clusters.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-20


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-20
Education.
11.2: k-Means Cluster Analysis (4/6)
• With Excel

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-21


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-21
Education.
11.2: k-Means Cluster Analysis (5/6)
• With R

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-22


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-22
Education.
11.2: k-Means Cluster Analysis (6/6)
• With R

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-23


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-23
Education.
11.3: Association Rule Analysis (1/9)
• Association rule analysis is essentially a “what goes with
what” study.
– Designed to identify events that tend to occur together
– Also known as affinity analysis or market basket analysis
• Classic application of market basket analysis: retail
companies seek to identify products that consumers tend to
purchase together.
– Display products next to each other on a shelf
– Develop promotional campaigns to cross-sell or up-sell
• Other examples
– Improve sales and customer service
– Help diagnose illnesses based on different symptoms that occur together
• Association rules are If-Then logical statements that
represent relationships among different items or item sets.
– Designed to identify hidden patterns and co-occurring events in data
– If is the antecedent, then is the consequent
– Antecedents and consequents can comprise a single product or a
combination of products
– Products or a combination of products is called items or an item set
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-24
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-24
Education.
11.3: Association Rule Analysis (2/9)
• One inherent problem with searching for hidden
relationships between items or item sets is dealing with the
extremely large number of possible combinations.
• Let be the number of items. The number of possible
combinations exponentially increases: .
– Example: 100 items gives 5.15378E+47 possible combinations
– The search problem becomes extremely computationally intensive and
time-consuming.
• There are several algorithms that can be used to perform
association rule analysis in a more efficient manner. They all
focus on the frequency of item sets.
• One of the most widely used algorithms is called the Apriori
method.
– Designed to recursively generate item sets that exceed a predetermined
frequency threshold: the support of the item or item set.
– Set a minimum support value, below which an item or item set is
excluded, thus making the analysis more computationally feasible.
– Eliminates infrequent items that are below the support value, makes it
easier to analyze relevant information in a large data set.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-25
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-25
Education.
11.3: Association Rule Analysis (3/9)
• With enough data, we can propose many of these If-Then
association rules.
– We need a way to evaluate the effectiveness of these rules
– Only the strong associations that occur frequently have the potential to
reappear consistently in the future
• Support: the probability of the If-Then statement

• Confidence of the association rule: probability that the


antecedent and the consequent occur given the antecedent
occurs

• Both of these can be misleading, if the antecedent and


consequent are common yet unrelated.
• The lift ratio evaluates the strength of the association:
– Expected confidence =
– Compares the confidence of the association rule with the overall unconditional
probability
– Lift = 1: level of association is the same as no rule at all (random guessing)
– Lift > 1: strong (positive) association
– Lift between 0 and 1: negative association

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-26


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-26
Education.
11.3: Association Rule Analysis (4/9)
• Example: Consider the below table of
transactions.

• For the association rule {mascara} => {eye


liner}, compute the support, confidence, and lift
ratio. BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen
11-27
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-27
Education.
11.3: Association Rule Analysis (5/9)

• The lift ratio is greater than 1, indicating a


strong association between the purchase of
mascara and eyeliner.
• The association is 19% stronger than guessing
at random.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-28


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-28
Education.
11.3: Association Rule Analysis (6/9)
• Example: The store manager at an electronics
store collects data on the last 100 transactions.
Five possible products were purchased: a
keyboard, an SD card, a mouse, a USB drive,
and/or a headphone.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-29


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-29
Education.
11.3: Association Rule Analysis (7/9)
• With Excel

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-30


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-30
Education.
11.3: Association Rule Analysis (8/9)
• With R

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-31


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-31
Education.
11.3: Association Rule Analysis (9/9)
• With R

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, Lertwachara, Chen 11-32


Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or
distribution
Copyright © 2021 McGraw-Hill Education. All rights reserved. without theorprior
No reproduction writtenwithout
distribution consent ofprior
the McGraw-Hill Education.
written consent of McGraw-Hill
11-32
Education.

You might also like