On Local Pruning of Association Rules Using Directed Hypergraphs
Sanjay Chawla, Joseph Davis
University of Sydney
Knowledge Management Research Group
Sydney, NSW, Australia 2006
chawla,jdavis ✁ @it.usyd.edu.au
Abstract
In this paper we propose an adaptive local pruning
method for association rules. Our method exploits the exact mapping between a certain class of association rules,
namely those whose consequents are singletons and backward directed hypergraphs. The hypergraph which represents the association rules is called an Association Rules
Network(ARN). We propose two operations on this network
for pruning rules, prove several properties of the ARN and
apply the results of our approach to two popular data sets.
Gaurav Pandey
IIT Kanpur
CSE Department
IIT, Kanpur, India 208016
[email protected]
in the context of precise goals as the following example will
make clear.
Our goal is to understand the itemsets that are frequently
associated with the target item, ’income=below 50K’.
Let us start with the following rule discovered from the Census Data of Elderly People [12] :
✆✞✝✠✟
immigrant=no
1 Introduction and Motivation
It is widely recognized that the support-confidence
framework for association rule (AR) generation typically
results in a large number of rules. Many of these rules turn
out to be too obvious or uninteresting from the user/client
perspective or redundant. This can often hamper the knowledge discovery process.
The data mining research community has addressed this
problem by proposing a number of approaches to producing a parsimonious rule set. The approach based on closed
sets is applied during itemset generation [21, 17]. Additional constraints based on interestingness measures are introduced during rule generation [12]. Finally, approaches
based on clustering of ARs are introduced at the post mining stage [13, 9]. Each of these methods can be described
as ”global” in the sense that they compress the rule base
✂ into a smaller one ✂☎✄ with the assurance that very little
useful information is lost.
However, in practice, the data mining process is rarely
carried in isolation without any reference to specific user
goals or focus on target items of interest. This calls for an
interactive strategy by which pruning of ARs is carried out
income=below50k
By itself this rule appears to contradict our general perception, at least in the United States. However by combining
this pattern with the rule
✆☞☛✌✟
Keywords: Association Rules Pruning, Interestingess, Directed Hypergraphs
✡
sex=female ✍ age
✎✑✏☞✒✠✡
immigrant=no
provides a context which helps in interpreting the first
rule.
This forms a network which flows into the
goal(income=below50k) and provides a better explanation
of the goal as opposed to when the rule ✆✞✝ is viewed in isolation.
Now consider a third rule
✆✔✓✕✟
immigrant=no
✡
sex=female
Clearly this rule “flows in the opposite direction”, is
redundant and does not help explain the goal. Graphically
this rule participates in a cycle with rule ✆☞☛ .
Finally consider two more rules
✆✗✖
✆✒
The rule ✆ ✒
urban=no
urban=no
✡
✡
income=below50k
sex=female
in conjunction with rules ✆☞☛ and ✆✞✝ lead to
another (redundant)path to the goal. By pruning rule ✆ ✒ the
goal items remains reachable from the item urban=no.
Figure 1 captures the preceding discussion. The directed
hypergraph after the removal of edges ✆✔✓ and ✆ ✒ is called
an Association Rules Network (henceforth referred to as
ARN). We introduced ARNs in the context of a specific application in [19]. Here we develop this concept further in
its full generality.
2 Related Work
age < 75
r2
immigrant = no
sex = female
r1
income = below50K
r3
r5
r4
urban = no
Figure 1. A B-graph representing the rules
✆✞✝✁ ✆ ✒ . After the removal of rules ✆✔✓ and ✆ ✒ ,
this graph is called an Association Rules Network
Our approach is based on formalising the intuition
behind these common sense observations. It consists
of mapping a set of association rules into a directed
hypergraph and systematically removing circular paths
and redundant and backward edges that may obscure the
relationship between the target and other frequent items.
This offers the following advantages:
1. The pruning process is adaptive with respect to the
choice of the goal node made by the user.
2. Pruning is reduced to cycle and reverse edge detection
in the hypergraph.
3. The resulting hypergraph can be transformed into a
reasoning network to explain the goal node.
The reminder of this paper is organized as follows: In
Section 2 we provide a brief overview of related research
followed by an outline of the proposed approach in section 3. Section 4 provides background material on directed
hypergraphs and details of association rule network (ARN)
and associated algorithms are presented in section 5 and 6.
We outline the strengths of ARNs in section 7 followed by
some experimental results in section 8.
1.1 Problem Definition
The problem that we are addressing in this paper can be
succinctly stated as follows:
Given: A set of association rules and a target item.
Find: A parsimonious set of rules which explain the frequency of the target item.
Complexity: The set of rules can be large and may contain many redundant rules which obscure the relationship
between the target and other frequent items.
Association Rules Mining is considered a cornerstone of
data mining research [2, 5]. As mentioned in the introduction much of the research in association rule mining has focused on algorithms for discovery and more recently, pruning [12, 21, 17].
Liu, Hsu and Ma [14] were the first to propose an integration of classification and association rules. The resulting
rules are called class association rules(CARs). The key operation in their approach finds all the ruleitems of the form
✎ ✂☎✄✝✆✟✞✡✠☞☛☞✌✎✍✑✏✓✒ where ✂☎✄✝✆✟✞✡✠☞☛☞✌ is the set of items and ✏ is
a label.
Han, Karypis and Kumar [10] proposed a method of intergating association rules and clustering in an undirected
hypergraph. The frequent itemsets were modeled as hyperedges and a min-cut hypergraph partitioning algorithm was
used to cluster items.
There has been some theoretical work relating hypergraphs with association rules [20, 8]. Here the relationship between frequent itemset discovery and the undirected
hypergraph transversal problem has been noted.
Directed hypergraphs [7, 6] extend directed graphs and
have been used to model many-to-one, one-to-many and
many-to-many relationships in theoretical computer science
and operations research. Directed hypergraphs have also
appeared with different names including “labeled graphs”
and “And-Or” graphs.
Association rules can be considered as a data induced
rule system. To detect structural errors like circularity, unreachable goals, deadends, redundancy and contradiction
[15] has modeled the rule base as a directed hypergraph.
It is well known that the standard measures of support
and confidence generate many redundant rules. In general there are three approaches for pruning redundant rules.
The first approach exploits the concept of closed-sets [21].
These are maximal itemsets all of whose subsets have the
same support. It turns out that the set of all frequent closed
itemsets span the set of all frequent itemsets. The second
approach is based on the satisfaction of an additional interestingess measure in addition to support and confidence.
For example,
✗✙✘ ✚✜✛ ✢✜[16]
✣ proposed that a rule ✔ ✡✖✕ is interesting if ✗✙✘ ✚✤✣✥✗✙✘ ✢✜✣ ✒ ✝ . Several other statistical measures
of interestingness have been proposed. The third approach
which will be used to justify removal of cycles from ARNs
is based on clustering of rules using a suitable distance measure [9]. Theoretical work on distance measures for categorical data has been presented in [18].
3 Outline of Proposed Approach
We briefly describe our method for structuring association rules as a backward directed hypergraph (hence re-
ferred to as B-graphs), pruning it to generate the ARN and
transforming it for reasoning. The method consists of four
steps which will be expanded in subsequent sections.
Step A Given a database and the minimum support and
confidence we first extract all association rules using a
standard algorithm like Apriori [1] or FP-Growth [11].
Step B Choose a frequent item ✁ which appears as a singleton consequent in the rule set and build a leveled
B-graph which recursively flows into the goal ✁ .
Step C Prune the B-graph generated in Step B of cycles
and reverse edges. The resultant B-graph is called
an Association Rules Network(ARN). In Section 6 we
will give a formal justification for this step.
Step D Find shortest paths between the goal node and
nodes at maximal level in the ARN. The set of these
path represents the reasoning network for the goal
node.
4 Preliminaries
In this section we provide background material on association rules and directed hypergraphs [7].
4.2 Directed Hypergraphs
A hypergraph is a pair ❄❅✥❆✞❈❇ ✍✮❉ ✠ where ❇ is a set
✆ ❂ ✭ and ❉ ✍ ✬❊ is the set of hyperof nodes ✦ ✆ ✔ ✍✑✆ ✖ ✍✙✘✙✘✙✘ ✍✑✯
edges. Thus each hyperedge ☛ can potentially span more
than two nodes. Contrast this with a directed graph where
the edge set ❉ ✍✒❇●❋❍❇ .
In the context of association rules, each node corresponds to a frequent item and frequent itemsets are mapped
to hyperedges.
In a directed hypergraph the nodes spanned by a hyperedge ☛ are partitioned into two parts, the head and the tail
denoted by ❄■✞ ☛ ✠ and ✄❏✞ ☛ ✠ respectively. A hyperedge ☛ is
called backward if ❑ ❄■✞ ☛ ✠✙❑✛✥
. Similarly an edge is called
forward if ❑ ✄❏✞ ☛ ✠✙❑▲✥
. A directed hypergraph is called Bdirected hypergraph if all its hyperedges are backward. In
the rest of the paper we will refer to them as B-graphs. Thus
the set of association rules whose consequents are singletons map neatly into a B-graph. Each rule is represented
by a hyperedge ☛ , the antecedents of by ✄❏✞ ☛ ✠ and the consequent by ❄■✞ ☛ ✠ .
We will also consider the antecedent of a rule as a single entity. For that we define the notion of a hypernode.
Given a B-graph ✕ with hyperedges ✦ ☛ ✔ ✍✙✘✙✘✙✘ ✍ ☛ ✱▼✭ , the hypernodes induced by the hyperedge ☛ ✣ are the tail ✄❏✞ ☛ ✣❈✠ and
the head ❄■✞ ☛ ✣◆✠ considered as a single entity. The set of all
hypernodes is denoted by ❖ .
☛
✝
✝
✆
✆
4.1 Association Rules
e5
Association rules are generally described in the framework of market basket analysis. Given a set of items ✂ and
a set of transactions ✄ consisting of subsets
☎ ✛ ✆ of ✂ , an Associ✕ where ✔ and
ation Rule is a relationship of the form ✔
✕ are subsets of ✂ while ✠ and ✂ are the minimum support
and confidence of the rule. ✔ is called the antecedent and
✕ the consequent of the rule. The support ✝✟✞ ✔✡✠ of a subset
✔ of ✂ is defined as the percentage of transactions ✘ which
✚✌✡☞ ✢✜✣
contain ✔ and the confidence of a rule ✔
✕ is ☛ ✘ ✚✤✣ .
☛
Most algorithms for association rule discovery take advantage of the anti-monotonicity property exhibited by the support level: If ✔✎✍ ✕ then ✝✟✞ ✔✡✠✑✏✒✝✟✞ ✕✓✠ .
Our focus is to discover association rules in a more
structured and dense relational table. For example suppose
we are given a relation ✞ ✔✕✔ ✍ ✔✗✖ ✍✙✘✙✘✙☎✘ ✍ ✔✗✚✛✠ where the domain of ✔ , ✞ ✄✢✜ ✞ ✔✗✣✤✠✓✥✧✦✢★✩✔ ✍✙✘✙✘✙✘ ✍ ★✪✚✬✫✮✭ , is discrete-valued.
Then an item is an attribute-value pair ✦✝✔✡✣✯✥✰★✛✭ . The ARN
will be constructed using rles of the form
A
✡
✦✝✔✺✹✻✥
e4
e3
E
e6
D
G
F
✡
✦✝✔✗✱✳✲✴✥✵
★✪✱✳✲ ✍✙✘✙✘✙✘ ✍ ✔✗✱✑✶✷✥✵★✪✱✑✶✸✭
❀
★✬✹✼✭ where ✽✿✾ ✦ ✜ ✔ ✍✙✘✙✘✙✘ ✍❁✜❃❂ ✭
C
e2
✡
✂
e1
B
Figure 2. An example B-graph with all the major features
As can be seen in Figure 2, ❇
✥
✝✦ ✔ ✍ ✕ ◗✍ P✁✍ ✮✍ ❉ ✍✮❘ ✮✍ ❙ ✭ , ❉ ✥❚✦ ☛ ✔ ✍✙✘✙✘✙☎✘ ✍ ☛✢❯ ✭ and ✵
❖ ✥
P ✍ ❍✭ ✍ ✦ ❉ ✭ ✍ ✦ ❉ ✍✮❘ ✭ ✍ ✦ ❙ ✬✭ ✭ . Thus
✦✸✝✦ ✔✕✭ ✍ ✝✦ ✕❱✭ ✍ ✦ P ✭ ✍ ✦✢❍✭ ✍ ✦ ✁
Example:
❘
is a node but not a hypernode.
We now define a hyperpath and a hypercycle for a
B-graph. A hyperpath is defined as a sequence ❲❳✥
✦❩❨❬✔ ✍ ☛ ✔ ✍ ❨✬✖ ✍ ☛ ✖ ✍✙✘✙✘✙☎✘ ✍ ☛ ✚✪❭✌✔ ✍ ❨✬✚❪✭ , where ❫❵❴ ✍ ❨✬✣ is a hypernode and ☛ ✣ is a hyperedge. Furthermore for ❜❛ ❴ ❛
✆
✍ ❨✬✣▼✥❝✄❏✞ ☛ ✣◆✠ , ❄■✞ ☛ ✣◆✠ ❀ ❨✬✣ ❞❡✔ , ❨✬✚✪❭✌✔❢✥❣✄❏✞ ☛ ✚✪❭✌✔❤✠ and
❨✬✚❱✥✰❄■✞ ☛ ✚✪❭✌✔✐✠ . A hyperpath is a hypercycle if ❨✸✚ ❀ ❨✼❥ .
Again, as can be seen in Figure
2, ❲
✥
✑☛
✝
P ✍ ❍✭ ✍ ☛✄✂ ✍ ✦ ❉ ✍✮❘
✦✸✝✦ ✔✕✭ ✍ ☛✁✝✍ ✦ ✁
P ✎
✍
P ✍ ❍✭ ✍ ☛✄✂ ✍
✥ ✦✸✝✦ ✕❱✭ ☛ ✔ ✍ ✦ ✁
✭ ✍ ☛✢❯✝✍ ✦
✦ ❉ ✭ ✍ ☛✁☎
❙ ✸✭ ✭ is a hyperpath and
✍ ✝✦ ✕❱✭✸✭ is a hypercycle.
Finally, we define the size of a hyperpath ❑ ❲ ❑ as the total
number of hypernodes appearing on ❲ . Continuing with the
previous example, ❑ ❲ ❑✸✥
and ❑ P ❑❬✥ .
✖
e 6(0.65)
e 1(0.6)
B
✖
C
e 4(0.8)
e 2(0.7)
5 Association Rules Network
e 3(0.9)
A
In this section we will formally define an Association
Rules Network(ARN), present an algorithm to generate it
from a set of association rules and prove some important
properties of the ARN.
e 5(0.75)
D
G
F
Figure 3. An example ARN. ☛✢❯ is not a part
of the ARN because it is a reverse hyperedge
and also participates in a hypercycle.
5.1 Definition and Related Concepts
Here we give the definition of an ARN in terms of Bgraphs.
E
Example: In Figure 3, ☛✼❯ is a reverse hyperedge.
✂
Definition 1 Given a set of association rules
and a
frequent goal item ✆ which appears as singleton in a
❀ . An association rule network,
consequent of a rule
✍
✔ ❇■✞ ✆✪✠ , is a weighted B-graph such that
✂
✆
✂
✂
1. There is a hyperedge which corresponds to a rule
whose consequent is the singleton item ✆ .
✆
❥
❇■✞ ✍ ✆✪✠ corresponds to a rule
2. Each hyperedge in ✔
in whose consequent is a singleton. The weight on
the hyperedge is the confidence of the rule.
✂
Note: We now introduce two conditions which will prevent
redundancies (hypercycles and reverse hyperedges) from
appearing between hypernodes at different levels while preserving the reachability to the goal node in an ARN.
Condition 1 A node which has served as a consequent during ARN generation cannot be be an antecedent across levels for a rule whose consequent is at a higher level.
✂
✂
3. Any node ✝✟✥✠
✞ ✆ in the ARN is not reachable from ✆ .
We now define the notion of a level which will be used to
exclude hypercycles from the ARN while retaining reachability to the goal node.
Condition 2 The goal attribute can appear with only one
value in the ARN, namely the one in the goal node.
5.2 ARN Construction Algorithm
We now describe a breadth-first like algorithm to construct an ARN from a rule set and a goal node ✂ .
Algorithm ARNConst(shown below) takes as input the
rule set
and a goal node ✂ ✡ which appears
as single✡
ton consequent in . The ✏ ☛ ✠✸✘ ✁ ☛☞✌ ✏ ☛ ✠ ✞ ✍ ✏ ✍ ✠ ✠ is a
function that returns all rules in
whose consequent is
✏ but antecedent does not contain ✠ . For each of these
✡
✏ ✠ returns the set of all anrules, ✏ ☛ ✠✸✘ ✁ ☛☞✌ ✔ ✆✤✌ ☛✝✂☎☛✝✞ ☛☞✆✤✌✑✠ ✞✑▲
tecedents. The level of each these antecedent elements
is determined on the basis of Condition 1 and Definition 2. For all the rules satisfying Condition 1 a hyperedge is added to the ARN. Finally hypercycles between hypernodes which are on the same level are removed using
✡
✡
✡
✏ ☛ ✠✸✘ ☛❩✜ ✄ ❨ ☛✁✖ ☛ ❨ ☛ P✁✏ ✂ ☛ ✠ ✞◆✠ . The algorithm returns the
generated ARN.
✂
✂
✂
Definition 2 A The level of the goal node is zero.
✞ ❨ ✠ ✥☞☛✍✌ ✎▲✦ ✞✑▲✏ ✠✙❑ ✒ ☛
✡
✂
❀ ❏
such that ❨
✄ ✞ ☛ ✠ ★ ✟✆ ✞ ✏❢✥ ❄■✞ ☛ ✠◗✭
C The level of a hypernode ✂ is defined as
✡
✂
✂
B The level of a non-goal node ❨ , is defined as
✡
✂
✂
✡
✞ ✂ ✠ ✥✓☛✍✌ ✎✌✦ ✞ ✠ ✙✠ ❑ ✠ ❀ ✂ ✭
✂
✆
Example: For the ARN in Figure 3, ✞ ❙ ✠❏✕
✥ ✔,
✡ ❉
✡
✡
✡
✡
✞ ✠▼✥ , ✞ ✡ P ✠❏✥ ✞❈ ✠❏✥ and
✞
✓
✕
❏
✠
✥
✞
✡
✔
✠❏✥
✡
hypernodes, ✞✤✦ P✁✍ ❍✭✼✠ ✥
and ✞✤✦ ❉ ✍✮❘ ✭✼✠✟✥ .
✡
✝
☛
☛
✡
✞ ❘ ✠❏✥
✓
. For
Definition 3 A hyperedge ☛ in an ARN is called a reverse
✡
✡
hyperedge if ✞ ✄❏✞ ☛ ✠❁✠
✞❈❄■✞ ☛ ✠❁✠ .
✎
5.3 Results about ARN Construction Algorithm
✝
Theorem
1 The time complexity of ARN generation is
✗ ✆✙✘
✞ ✠ where ✆ is the number of frequent items and ✘ is
the number of rules whose consequents are singletons.
: Rules , Consequent ✂
: A Directed hypergraph ❙ representing an
AR Network which flows into ✂
✠
✌
✝
☛
✞
✁
❨❬❴ ❴
❴✄✂❵✥ if ❴ has been visited as a consequent;
✏✆☎ ✂ ;
✠ ☎ ✂;
Add ✏ to queue ✝ ;
❨❬❴ ✠ ❴ ✌ ☛✝✞✁ ✏✞✂✟☎ ;
repeat ✡
✡
✡
✏ ☛ ✠✸✘ ✁ ☛☞✌ ✏ ☛ ✠ ✞ ✍ ✏ ✍ ✠ ✠ /* Get
✏ ☛✡✠ ✏☞☛ ✠☞☛☞✌ ☎
all rules whose consequent is ✏ but antecedent
does not contain ✠ */;✡
❀ ✏ ☛✡✠ ✏☞☛ ✠☞☛☞✌ do
for all rules
✡ ☛ ✠✸✘ ☛☞✌ ✆✤✌ ☛✝✂☎☛✝✞ ☛☞✆✤✌✑✠
✁ ✔
✞✑▲✏ ✠ ;
★✌
☎
✏
✡☛ ☛ ✡
❨ ★✍❪✂ ✥☞✌ ✎✏✎ ;
❀ ★ do
for all elements ✑
✡☛ ☛ ✡
✡☛ ☛ ✡
if ✡ ❨
☛ ❨ ☛ ✑✒✡ ★✍✂ ❪✂ ✥ ✡ ❨☛ ❨ ☛ ✡ ★✍✑✒✂ then
✂;
end
end
✡
✡
✡
✡
if ☛ ❨ ☛ ★✍✌
✂ ✏ ☛ ❨ ☛ ✏✞✂ then
❀ ★ do
for all elements ✑
✠
✌
✝
☛
✞
✁
if ❨❬❴ ❴
✑✒✂▲✥✠✔ then
Add ✑ to ✝ ;
❨❬✡ ❴ ✠ ❴ ✌ ✡ ☛✝✞✁ ✑✒✂✌✥ ✡ ;✡
☛ ❨ ☛ ✑✒✂▲✥ ☛ ❨ ☛ ✏✞✂✔✓ ;
end
end
❙ ✘ ★ ✞ ✞ ❄ ✏ ✝ ☛ ✼❉ ✞ ✁ ☛ ✞ ✍ ✏▲✠ /* directed hyperedge flowing into ✏ */;
end
end
if ✝ is empty and ❙ is singleton then
return ❙ ✥✖✕ ;
else if ✝ is empty and ❙ is not singleton then
❙ ✘ ☛❩✜ ✄ ❨ ☛✁✖ ☛ ❨ ☛ ✡ P✁✏ ✂ ✡ ☛ ✠ ✞◆✠ /* remove
level hypercycles based on confidence */;
return ❙ ;
✂
Data
Result
Proof: For each node in the ARN the hyperedges flowing
into it are found by searching the set of all rules whose consequents are singletons. Furthermore the number of nodes
appearing in the ARN is bounded✗ by the number of frequent
items. Hence the complexity is ✞ ✆✙✘ ✠ . Also ✘ is bounded
✡
by ✗✙✘✣ ✚✌✖ ✆ ✣✁✛ ❴ where is the length of the longest frequent
✆
item set and ✣ is the number of frequent itemsets of size ❴ .
✝
✝
✂
✂
✂
✂
✂
✆
✂
✎
✝
✝
✆
✆
✆
Theorem 2 The ARN generated is unique, i.e., it does not
depend upon the sequence in which the nodes are explored.
Proof: Let ✏ and ❨ be two nodes at the same level. We want
to show that by exploring ✏ and ❨ in different orders we are
not losing an edge whose consequent is one of them and the
antecedent is the other.
WLOG assume ✏ is explored before ❨ . Let ✦ ☛ ✔ ✍✙✘✙✘✙✘ ✍ ☛✢✜✢✣ ✭
be the set of edges for which ✏ is a consequent and ❨ is
one of the antecedents. Then by the definition of level,
✡☛ ☛ ✡
✡
✡
❨ ✞ ✄❏✞ ☛ ✣ ✠❁✠❍✥ ☛ ❨ ☛ ✞ ❨ ✠ for all ❛ ❴ ❛ ✏❪❨ . Note that
there cannot be any edge whose consequent is ✏ and any of
whose antecedents have level less than ✏ (which is the same
as the level of ❨ ).
Now, once we explore ❨ , all the edges whose consequent
is ❨ and whose antecedents contain ✏ , are free to flow into
❨ because of Condition 1. Thus we have not lost any hyperedges between ✏ and ❨ since they are at the same level. A
similar argument holds when ❨ is explored before ✏ . Thus
the set of hyperedges and hypernodes remains the same in
both cases.
Hence same ARN will be generated irrespective of the
order in which the nodes at the same level are explored.
✝
Theorem 3 The goal node is reachable from any node in
the ARN.
Proof: By induction on level. By defintion the level of the
goal node is zero and trivially reachable from itself. Assume
that the goal node is reachable from any node at level ❴ . For
each node ✏ ✡ at level ❴✏✓
exists atleast one hyperedge
❀ ✄❏✞ ☛ ✠ . Thus the result holds
☛ such that ✞❈❄■✞ ☛ ✠❁✠ ✥ ❴ andthere
✏
✡
✜ ★✥✤✌✦ ✡ ✞✑▲✏ ✠✙❑ ✏ ❀ ❇ ✭ .
for all ❴✏✓
and hence for all levels
✝
✝
✎
Theorem 4 The ARN generated for the algorithm is free of
cycles across levels
end
Delete ✏ from ✝ ;
until false;
Algorithm 1: Association Rule Network(ARN) constructed from a rule set and flowing into consequent
✂ using a breadth-first like strategy.
✂
Proof: By contradiction. Let P be a hypercycle across levels, i.e., P ✥✎✦❩❨✼❥ ✍ ☛ ✔ ✍✙✘✙✘✙✘ ✍ ☛ ✚ ✍ ❨✬✚❪✭ where ❨✬✚❢✍ ❨✼❥ . Let ❨✬✣ be
✡
✡
✞ ❨✼❥✢✠ . Then
the first edge ✡ in P where✡ ✞ ❨✸✣
by construc✡
✡
✞
✬
❨
◆
✣
✠
✞ ❨✼❥✢✠ .
. Therefore ✞ ❨✬✚✪❭✌✔✐✠
tion of ARN, ✞ ❨✬✚✪❭✌✔✙✠
✡
✡
Now ✞ ❨✬✚✩✠ ✏ ✞ ❨✼❥✢✠ because ❨✬✚ is a singleton and ❨✸✚ ✍✒❨✼❥ .
Therefore ☛ ✚ is a hyperedge from an antecedent at a lower
level to a consequent at higher level. This contradicts condition ✔ .
✎
✎
✎
6 Cycle removal and rule pruning
This justifies the removal of the hyperedge representing
✕ ✡ ✔ .
In this section we provide a justification for the removal
of hypercycles and reverse hyperedges. We will show that
in either case the information provided by the hyperedge
pruned, or the interestingness of the corresponding rule, is
small, given the rest of the ARN.
C
e3
e2
D
B
Definition 4 Let ❘ be the set of all itemsets. Let ✞
❘ ✡ ✍ ✝ be defined as
F
e6
e4
e1
E
e5
CASE 2: Consider three rules of the form ✔ ✡ ✕ , ✕ ✡ P
and P ✡
✔ . Our argument is based on the concept of
information gain. Assume we have an information channel
✂ which is a sequence of transactions. Then given that the
pair of itemsets ✞ ✔ ✍ ✕✓✠ and ✞ ✕ ✍◗P ✠ are frequent, i.e., they
appear close to each other with high probability then the
information that ✞ ✔ ✍◗P ✠ is frequent is not surprising. We
formalize this argument as follows.
✔
✂
❲ ✞ ✔ ✍ ✕✓✠
❱
❲❱✞ ✔✡✠ ■❲❱✞ ✕✓✠ ❲❱✞ ✔ ✍ ✕✓✠
❀ ❘ and ❲❱✞ ✔✡✠ is the support of the itemset ✔
where ✔ ✍ ✕
Theorem 5 The function ✞ is a metric on the space ❘ .
✞
A
e7
✟✌❘ ❋
✞ ✔ ✍ ✕✓✠ ✥ ✝
✓
Figure 4. An example which distinguishes reverse hyperedges( ☛ ✖ ) and hypercycles( ☛✁ ).
Consider the example in Figure 4. Given the goal node
✖
☛
✔ , during ARN construction, the two hyperedges
and ☛✁
☛✂
are removed. The removal of hyperedge
is more “crucial” than ☛ because it is part of a hypercycle; otherwise ✔
✖
will never be reachable from ❘ violating the definition of
the ARN. On the other hand, removal of the reverse hyperedge ☛ ✖ is justified because it is part of a redudant path from
☛
✕ to ✔ . Hence removal of ✖ will not destroy the reachability condition of the ARN. This illustrates the difference
between the two kinds of pruning operations. We provide
separate arguments for each of the two operations.
Proof: The function ✞ is identical to the distance measure ✝
defined in [18]. This proof follows directly from the proof
of Lemma 3.2 in [18].
✝ , the information gain from
Corollary 1 For ✎✟✞✡✠
✍◗P
✠ are close to each other is
observation that the pair ✞ ✔
small given that ✞ ✞ ✔ ✍ ✕✓✠ ✎☛✞ and ✞ ✞ ✕ ✍◗P ✠ ✎☛✞ .
✔
Proof: Follows directly from the triangle inequality of ✞ .
Now WLOG assume that the ☛ ❨ ☛ ✞ ✔✡✠ ✒ ☛ ❨ ☛ ✞ P ✠ . The
fact that ✞ ✞ ✔ ✍◗P ✠ is small given that ✞ ✞ ✔ ✍ ✕✓✠ and ✞ ✞ ✕ ✍◗P ✠
are small is an indication of the fact that the rule ✔ ✡ P and
P ✡ ✔ can be derived from the rules ✔ ✡ ✕ and ✕ ✡ P
which are already in the ARN. Thus P ✡ ✔ can be safely
pruned without violating the reachability constraint of the
ARN.
✡
6.1 Removal of Hypercycles
✄
We first note that a hypercycle in a B-graph cannot be of
size less than three(see Section 4.2). We will provide the
✄
desired justification for two cases, one in size of is three
✄
and the other in which size of is four. From transitivity,
it follows that the removal of any hypercycle of size bigger
than four can be justified using the arguments presented for
the second case. We will now consider the two cases.
✕ and
CASE 1: Consider two rules of the form ✔ ✡
✕ ✡ ✔ . If ✔ and ✕ are the same level then WLOG remove
the hyperedge with lower confidence to break the hypercycle. Else WLOG assume ☛ ❨ ☛ ✞ ✔✡✠ ✒ ☛ ❨ ☛ ✞ ✕✓✠ . Since we
are exclusively dealing with rules whose antecedents are
a conjunction of items and the consequent is a singleton
item, for a vast majority of these rules, ❲❱✞ ✔✡✠☎✎ ❲❱✞ ✕✓✠ . In
such a scenario,
✡
✂☎✄✝✆
✆ ☎
✞✔ ✡ ✓
✕ ✠
✒
✡
✡
✂☎✄✝✆✆☎
✡
✞✕ ✡ ✡
✔ ✠
.
✡
✡
✡
6.2 Removal of Reverse Hyperedges
The removal of reverse hyperedges in an ARN can be
justified on the basis of thefact that they generate redundant
paths from a node to the goal. This can be formally proved
as follows.
Theorem 6 Let ★ be a node in the ARN from which a reverse edge ☛ originates. Then, there exists a path ❲ from ★
❀
to the goal node such that ☛ ✾ ❲ and the size of ❲ is smaller
than the size of the path from ★ to the goal node in which ☛
participates.
Proof: Let ✞❈★ ✠ ✥ ✔ and ✞❈❄■✞ ☛ ✠❁✠ ✥ ✖ . Since ☛ is a reverse
✖ . We observe that the path of smallest
hyperedge, ✔
size from any node at level in the ARN to the goal node is
of size ✓
(follows from the definition of level). Let this
path of smallest size from ★ be ❲ ✔ and the one from ❄■✞ ☛ ✠
be ❲❡✖ . Clearly, the size of ❲ ✔ is ✔ ✓
and that of the new
path ☛ ✍ ☛✁ ❲❡✖ is ✖ ✓ . Thus ❲✟✔ is the required path.
This theorem justifies the redundancy of the rule represented by a reverse hyperedge and hence its removal.
✡
✡
✡
✡
✡
E
e 6(0.65)
✡
✎
✡
B
e 2(0.7)
✡
✝
D
✡
✝
✡
☛
e 3(0.9)
A
Figure 6. ARN with goal node . The edge
☛✢❯
is now part of the ARN. This illustrates the
adaptive nature of local pruning
7 Benefits of ARN
In the introduction we raised the question of the utility of association rules beyond simple exploratory analysis.
ARNs also are a tool for exploratory analysis but they provide a context for understanding and relating the discovered
rules with each other. ARN offers the following benefits.
Local Pruning ARNs provide a graphical method to prune
rules by associating redundant rules with hypercycles
and reverse hyperedges. Furthermore the pruning takes
place in the context of a goal node. Thus a rule which
is redundant for a particular goal node may become relevant for another goal. This is more flexible than pruning based on statistical measures of interestingness.
e 6(0.65)
goal node can be interpreted as providing an explanation for the goal node.
Formally let ❖✛☎
✱ ✄✝✆ be the set of❀ all maximum level
❖❪✱☎✄✝✆ let ❲ ✣ be the
nodes in the ARN. For each ❨
set of all hyperpaths from ❨ to the goal node ✁ . We can
define two cost measure on each hyperpath
✝
❴◆✁✠✟ ✌ ✞ ❵✠✟✥✖☛
✗ ✡ ✫✌☞✎✍✑✏ ✒✔✓ ✞ ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠❁✠
✂
✞ ❵✠ ✥
✗✕✡ ✫✌☞✎✍ ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠ ✏ ✒✔✓ ✞ ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠
where ☛ ✣ is a hyperedge and ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠ is the confidence
of the rule represented by ☛ ✣ .
1.
2.
✞
☛
✆✆☎ ✄
✝
✝
The reason for introducing two cost functions is that
they provide different kinds of information depending
upon the context. For example ✞ ☛ ❴◆✁✠✟ ✌ ✞ ❵✠ can be interpreted as the strength of the correlation between the
source and the goal node. Similarly ✂ ✆✆☎ ✄ ✞ ❵✠ can be
interpreted as the total information gain along the path
from the source to the goal node.
Now the optimal path in ❲ ✣ using ✞ ☛ ❴◆✁✠✟ ✌ ✞ ❵✠ or
✝
e 1(0.6)
B
C
✝
e 4(0.8)
e 2(0.7)
A
e 3(0.9)
E
e 5(0.75)
D
G
✂ ✆✆☎ ✄ ✞ ❵✠
✝
is the best explanation for dependence of
the goal node on ❨ . Computing these optimal hy❀ ❖❪✱☎✄✝✆ provides a reasoning netperpaths for all ❨
work for the goal node ✁ . The problem of optimal hyperpaths in B-graphs has been studied by [3] where
they have reported an algorithm of time complexity
✞✮❑ ❄ ❑✡✓ ✆ ✄ ✁ ✆ ✠ where ❑ ❄ ❑ , size of the hypergraph is
✠.
✗ ✡ ✫✖☞✔✗ ✞✮❑ ✄❏✞ ☛ ✣◆✠✙❑ ✓
☛
✝
F
Figure 5. ARN with goal node ❙ . The edge ☛✼❯
is not part of the ARN because it participates
in a hypercycle
✗
✡
✝
Consider the B-graph in Figure ✂ 5 and Figure 6. When
the goal node is ❙ , the edge ❉ ✕ is a redundant rule
which may be eliminated. On the ✂ other hand when the
the same edge ❉ ✕ is relevant. Thus
goal node is
pruning of a rule as per our notion becomes dependent
upon the context of the goal node. We refer to this
kind of pruning as local pruning and it may be more
flexible than global pruning based simply on measures
of interestingness.
Reasoning using Path Traversal An ARN is a weighted
hypercycle free B-graph. Hyperpaths which lead to the
8 Applications
We now show the advantage of our approach by constructing ARNs for two well known data sets. The results,
as we will describe below, vindicate our original thesis that
a network of association rules provides a context for interpreting the rules in a more coherent fashion than if they
were viewed in isolation. The two data sets that we considered were the Lens and the Mushroom data bases which
are part of the UCI machine learning repository [4].
8.1 Lens Database
The Lens database has twenty four rows where each row
has five attributes which are either binary or ternary-valued.
Our goal attribute was contact-lenses which denotes
whether a person should be fitted with either one of two
kinds of lenses:hard and soft, or neither.
astigmatism = yes
spectacle prescription = hypermetrope
tear-prod-rate = reduced
astigmatism = no
The observations made are consistent with the expected
benefits of using an ARN. This justifies the applicability of
an ARN for synthesizing association rules.
8.2 Mushroom Database
contact-lenses = none
Figure
7.
ARN
with
contact-lenses=none
spectacle prescription = hypermetrope
3. We finally note that when the goal value
is
contact-lenses=none,
spectacle
prescription=hypermetrope
and
tear-production-rate=reduced
participate with different in two distinct three-itemsets. This
suggests a strong correlation between them and the
goal value.
astigmatism = no
goal
tear-prod-rate = normal
node
age = pre-prebyopic
We also constructed an ARN for the Mushroom
database. This data set has approximately eight thousand
instances and twenty three attributes. We chose the binary attribute class=edible,poisonous as our goal
attribute. The ARNs generated are shown in Figure 10
and 11 for classes poisonous and edible respectively.
However, to save space, we have abbreviated the attribute
values by their first letters. For example p = poisonous and
e = edible.
contact-lenses = soft
veil-type = p
bruises? = t
ring-type = p
stack-surface-above-ring = s
Figure
8.
ARN
with
contact-lenses=soft
goal
node
stack-surface-below-ring = s
veil-color = w
spectacle prescription = myope
astigmatic = yes
tear-prod-rate = normal
age = young
gill-spacing = c
ring-number = o
gill-attachment = f
population = v
class = p
contact-lenses = hard
Figure
9.
ARN
with
contact-lenses=hard
goal
node
We make the following observations based on the three
ARNs in Figures 7, 8 and 9.
1. As we had mentioned in the introduction, an ARN provides a context for interpreting the rules. A change
in the value of the goal attribute is correlated with
changes in several antecedents. The ARNs clearly capture these simultaneous changes.
2. Also notice that the structure of the ARN does not
change drastically when the value of the goal attribute
changes. This suggests that the actual network is at the
variable (type) level rather than at the instance level.
Figure 10. The ARN when the goal attribute is
class=edible
The observations that we made for the Lens database are
applicable here as well. However since the database is bigger we were able to observe more interesting patterns which
we now discuss.
1. The set of contributing attributes for the two
goal values are mostly the same but some
attributes contribute to one and not to the
other.
More particularly, gill-size is
present in the ARN for class=edible
but not in the one for class=poisonous.
Similary,
bruises?,
population
and
stack-surface-above-ring
appear
in
the ARN for class=poisonous, but not for
class=edible.
ring-number = o
ring-type = p
gill-spacing = c
veil-color = w
gill-attachment = f
gill-size = b
veil-type = p
stack-surface-above-ring = s
important features of an ARN are (1) their ability to prune
rules in the context of a goal, (2) the pruning mechansim
is based on simple graph operations and (3) the ARN can
serve as a basis of reasoning with discovered rules.
For future work we would like to convert our intuition
about reasoning using ARNs into a more theoretical framework. We would also like to design a layout algorithm
specifically for ARNs. We are also working on a cycledetection algorithm for general B-graphs which can be applied to ARNs.
class = e
Figure 11. The ARN when the goal attribute is
class=poisonous
2. Notice how the item bruises?=true appears as a
level two item in the ARN for class=poisonous.
This clearly highlights the advantage of an ARN because it reveals that even though bruises?=true
does not appear as an antecedent in a rule in which the
consequent is class=poisonous, it seems to have
an important influence in determining the value of the
class attribute. In other words this shows the transitivity in association rules which the ARNs are able to
capture.
3. The level of certain items changes depending upon
the value of the goal attribute.
For example
ring-number=one appears as a second level node
in the ARN for class=edible but as a first level
node in that for class=poisonous. This change in
the level of the item reflects the change in its correlation with the value of the goal attribute.
4. Also notice that their are several paths from the higher
level nodes to the goal node in both the ARNs. By using the measures described in Section 7 we can choose
the optimal paths between the maximal level nodes and
the goal node. This will create the reasoning network
and sparsify the graph.
5. Finally, these ARNs reveal the utility of local pruning as the resulting rules are semantically meaningful.
Thus the two operations, hypercycle and reverse hyperedge removal, result in a meaningful network.
The observations that we have highlighted above illustrate the benefits of an ARN.
9 Conclusions and Future Work
Association Rules Network provides a mechanism for
synthesizing association rules in a structured manner. The
References
[1] Rakesh Agrawal, Tomasz Imielinski, and Arun N.
Swami. Mining association rules between sets of
items in large databases. In Proceedings of the 1993
ACM SIGMOD International Conference on Management of Data, pages 207–216, Washington, D.C., 26–
28 1993.
[2] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proc. 20th
Int. Conf. Very Large Data Bases, VLDB, pages 487–
499. Morgan Kaufmann, 12–15 1994.
[3] Giorgio Ausiello, Giuseppe F. Italiano, and Umberto
Nanni. Hypergraph traversal revisited: Cost measures
and dynamic algorithms. Lecture Notes in Computer
Science, 1450, 1998.
[4] C.L. Blake and C.J. Merz. UCI repository of machine
learning databases, 1998.
[5] L. Feng, J. Yu, H. Lu, and J. Han. A template model
for multi-dimensional, inter-transactional association
rules. VLDB Journal, 11(2):153–175, 2002.
[6] G.F. Italiano G. Ausiello and U. Nanni. Dynamic
maintenance of directed hypergraphs. Theoretical
Computer Science, 72(2-3):97–117, 1990.
[7] Giorgio Gallo, Giustino Longo, and Stefano Pallottino. Directed hypergraphs and applications. Discrete
Applied Mathematics, 42(2):177–201, 1993.
[8] Dimitrios Gunopulos, Heikki Mannila, Roni Khardon,
and Hannu Toivonen. Data mining, hypergraph
transversals, and machine learning (extended abstract). In Proc. PODS 1997, pages 209–216, 1997.
[9] G. Gupta, A. Strehl, and J. Ghosh. Distance based
clustering of association rules. In Intelligent Engineering Systems Through Artificial Neural Networks
(Proceedings of ANNIE 1999), ASME Press, November 1999., volume 9, pages 759–764, 1999.
[10] Eui-Hong Han, George Karypis, Vipin Kumar, and
Bamshad Mobasher. Clustering based on association
rule hypergraphs. In Proceedings SIGMOD Workshop
Research Issues on Data Mining and Knowledge Discovery(DMKD ’97), 1997.
[11] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent
patterns without candidate generation. In 2000 ACM
SIGMOD Intl. Conference on Management of Data,
pages 1–12. ACM Press, 05 2000.
[12] S. Jaroszewicz and D. A. Simovici. Pruning redundant
association rules using maximum entropy principle. In
Advances in Knowledge Discovery and Data Mining,
6th Pacific-Asia Conference, PAKDD’02, pages 135–
147, Taipei, Taiwan, May 2002.
[13] Brian Lent, Arun N. Swami, and Jennifer Widom.
Clustering association rules. In ICDE, pages 220–231,
1997.
[14] Bing Liu, Wynne Hsu, and Yiming Ma. Integrating
classification and association rule mining. In Knowledge Discovery and Data Mining, pages 80–86, 1998.
[15] S. Sarkar M. Ramaswamy and Y. Chen. Using directed
hypergraphs to verify rule-based expert systems. IEEE
TKDE, 9(2):221–237, 1997.
[16] G. Piatetsky-Shapiro and C. Matheus. The interestingness of deviations, 1994.
[17] V. Pudi and J.R. Haritsa. Reducing rule covers with
deterministic error bounds. In Proceedings of the 7th
Pacific-Asia Conference on Advances in Knowledge
Discovery and Data Mining, pages 313–324. Springer,
2003.
[18] M.D. Rice and M. Siff. Clusters, concepts, and pseudometrics. In Electronic Notes in Theoretical Computer Science, volume 40. Elsevier, 2002.
[19] S.Chawla, B.Arunasalam, and J. Davis. Mining open
source software(oss) data using association rules network. In Advances in Knowledge Discovery and Data
Mining, 7th Pacific-Asia Conference, PAKDD’03,
pages 461–466. Springer, 2003.
[20] M. Zaki and M. Ogihara. Theoretical foundations of
association rules. In Proceedings of 3rd SIGMOD’98
Workshop on Research Issues in Data Mining and
Knowledge Discovery (DMKD’98), Seattle, Washington, USA, June 1998., 1998.
[21] Mohammed J. Zaki. Generating non-redundant association rules. In Proceedings of the sixth ACM
SIGKDD international conference on Knowledge discovery and data mining, pages 34–43. ACM Press,
2000.