Academia.eduAcademia.edu

On local pruning of association rules using directed hypergraphs

2004, Proceedings. 20th International Conference on Data Engineering

In this paper we propose an adaptive local pruning method for association rules. Our method exploits the exact mapping between a certain class of association rules, namely those whose consequents are singletons and backward directed hypergraphs. The hypergraph which represents the association rules is called an Association Rules Network(ARN). We propose two operations on this network for pruning rules, prove several properties of the ARN and apply the results of our approach to two popular data sets.

On Local Pruning of Association Rules Using Directed Hypergraphs Sanjay Chawla, Joseph Davis University of Sydney Knowledge Management Research Group Sydney, NSW, Australia 2006 chawla,jdavis ✁ @it.usyd.edu.au Abstract In this paper we propose an adaptive local pruning method for association rules. Our method exploits the exact mapping between a certain class of association rules, namely those whose consequents are singletons and backward directed hypergraphs. The hypergraph which represents the association rules is called an Association Rules Network(ARN). We propose two operations on this network for pruning rules, prove several properties of the ARN and apply the results of our approach to two popular data sets. Gaurav Pandey IIT Kanpur CSE Department IIT, Kanpur, India 208016 [email protected] in the context of precise goals as the following example will make clear. Our goal is to understand the itemsets that are frequently associated with the target item, ’income=below 50K’. Let us start with the following rule discovered from the Census Data of Elderly People [12] : ✆✞✝✠✟ immigrant=no 1 Introduction and Motivation It is widely recognized that the support-confidence framework for association rule (AR) generation typically results in a large number of rules. Many of these rules turn out to be too obvious or uninteresting from the user/client perspective or redundant. This can often hamper the knowledge discovery process. The data mining research community has addressed this problem by proposing a number of approaches to producing a parsimonious rule set. The approach based on closed sets is applied during itemset generation [21, 17]. Additional constraints based on interestingness measures are introduced during rule generation [12]. Finally, approaches based on clustering of ARs are introduced at the post mining stage [13, 9]. Each of these methods can be described as ”global” in the sense that they compress the rule base ✂ into a smaller one ✂☎✄ with the assurance that very little useful information is lost. However, in practice, the data mining process is rarely carried in isolation without any reference to specific user goals or focus on target items of interest. This calls for an interactive strategy by which pruning of ARs is carried out income=below50k By itself this rule appears to contradict our general perception, at least in the United States. However by combining this pattern with the rule ✆☞☛✌✟ Keywords: Association Rules Pruning, Interestingess, Directed Hypergraphs ✡ sex=female ✍ age ✎✑✏☞✒✠✡ immigrant=no provides a context which helps in interpreting the first rule. This forms a network which flows into the goal(income=below50k) and provides a better explanation of the goal as opposed to when the rule ✆✞✝ is viewed in isolation. Now consider a third rule ✆✔✓✕✟ immigrant=no ✡ sex=female Clearly this rule “flows in the opposite direction”, is redundant and does not help explain the goal. Graphically this rule participates in a cycle with rule ✆☞☛ . Finally consider two more rules ✆✗✖ ✆✒ The rule ✆ ✒ urban=no urban=no ✡ ✡ income=below50k sex=female in conjunction with rules ✆☞☛ and ✆✞✝ lead to another (redundant)path to the goal. By pruning rule ✆ ✒ the goal items remains reachable from the item urban=no. Figure 1 captures the preceding discussion. The directed hypergraph after the removal of edges ✆✔✓ and ✆ ✒ is called an Association Rules Network (henceforth referred to as ARN). We introduced ARNs in the context of a specific application in [19]. Here we develop this concept further in its full generality. 2 Related Work age < 75 r2 immigrant = no sex = female r1 income = below50K r3 r5 r4 urban = no Figure 1. A B-graph representing the rules ✆✞✝✁ ✆ ✒ . After the removal of rules ✆✔✓ and ✆ ✒ , this graph is called an Association Rules Network Our approach is based on formalising the intuition behind these common sense observations. It consists of mapping a set of association rules into a directed hypergraph and systematically removing circular paths and redundant and backward edges that may obscure the relationship between the target and other frequent items. This offers the following advantages: 1. The pruning process is adaptive with respect to the choice of the goal node made by the user. 2. Pruning is reduced to cycle and reverse edge detection in the hypergraph. 3. The resulting hypergraph can be transformed into a reasoning network to explain the goal node. The reminder of this paper is organized as follows: In Section 2 we provide a brief overview of related research followed by an outline of the proposed approach in section 3. Section 4 provides background material on directed hypergraphs and details of association rule network (ARN) and associated algorithms are presented in section 5 and 6. We outline the strengths of ARNs in section 7 followed by some experimental results in section 8. 1.1 Problem Definition The problem that we are addressing in this paper can be succinctly stated as follows: Given: A set of association rules and a target item. Find: A parsimonious set of rules which explain the frequency of the target item. Complexity: The set of rules can be large and may contain many redundant rules which obscure the relationship between the target and other frequent items. Association Rules Mining is considered a cornerstone of data mining research [2, 5]. As mentioned in the introduction much of the research in association rule mining has focused on algorithms for discovery and more recently, pruning [12, 21, 17]. Liu, Hsu and Ma [14] were the first to propose an integration of classification and association rules. The resulting rules are called class association rules(CARs). The key operation in their approach finds all the ruleitems of the form ✎ ✂☎✄✝✆✟✞✡✠☞☛☞✌✎✍✑✏✓✒ where ✂☎✄✝✆✟✞✡✠☞☛☞✌ is the set of items and ✏ is a label. Han, Karypis and Kumar [10] proposed a method of intergating association rules and clustering in an undirected hypergraph. The frequent itemsets were modeled as hyperedges and a min-cut hypergraph partitioning algorithm was used to cluster items. There has been some theoretical work relating hypergraphs with association rules [20, 8]. Here the relationship between frequent itemset discovery and the undirected hypergraph transversal problem has been noted. Directed hypergraphs [7, 6] extend directed graphs and have been used to model many-to-one, one-to-many and many-to-many relationships in theoretical computer science and operations research. Directed hypergraphs have also appeared with different names including “labeled graphs” and “And-Or” graphs. Association rules can be considered as a data induced rule system. To detect structural errors like circularity, unreachable goals, deadends, redundancy and contradiction [15] has modeled the rule base as a directed hypergraph. It is well known that the standard measures of support and confidence generate many redundant rules. In general there are three approaches for pruning redundant rules. The first approach exploits the concept of closed-sets [21]. These are maximal itemsets all of whose subsets have the same support. It turns out that the set of all frequent closed itemsets span the set of all frequent itemsets. The second approach is based on the satisfaction of an additional interestingess measure in addition to support and confidence. For example, ✗✙✘ ✚✜✛ ✢✜[16] ✣ proposed that a rule ✔ ✡✖✕ is interesting if ✗✙✘ ✚✤✣✥✗✙✘ ✢✜✣ ✒ ✝ . Several other statistical measures of interestingness have been proposed. The third approach which will be used to justify removal of cycles from ARNs is based on clustering of rules using a suitable distance measure [9]. Theoretical work on distance measures for categorical data has been presented in [18]. 3 Outline of Proposed Approach We briefly describe our method for structuring association rules as a backward directed hypergraph (hence re- ferred to as B-graphs), pruning it to generate the ARN and transforming it for reasoning. The method consists of four steps which will be expanded in subsequent sections. Step A Given a database and the minimum support and confidence we first extract all association rules using a standard algorithm like Apriori [1] or FP-Growth [11]. Step B Choose a frequent item ✁ which appears as a singleton consequent in the rule set and build a leveled B-graph which recursively flows into the goal ✁ . Step C Prune the B-graph generated in Step B of cycles and reverse edges. The resultant B-graph is called an Association Rules Network(ARN). In Section 6 we will give a formal justification for this step. Step D Find shortest paths between the goal node and nodes at maximal level in the ARN. The set of these path represents the reasoning network for the goal node. 4 Preliminaries In this section we provide background material on association rules and directed hypergraphs [7]. 4.2 Directed Hypergraphs A hypergraph is a pair ❄❅✥❆✞❈❇ ✍✮❉ ✠ where ❇ is a set ✆ ❂ ✭ and ❉ ✍ ✬❊ is the set of hyperof nodes ✦ ✆ ✔ ✍✑✆ ✖ ✍✙✘✙✘✙✘ ✍✑✯ edges. Thus each hyperedge ☛ can potentially span more than two nodes. Contrast this with a directed graph where the edge set ❉ ✍✒❇●❋❍❇ . In the context of association rules, each node corresponds to a frequent item and frequent itemsets are mapped to hyperedges. In a directed hypergraph the nodes spanned by a hyperedge ☛ are partitioned into two parts, the head and the tail denoted by ❄■✞ ☛ ✠ and ✄❏✞ ☛ ✠ respectively. A hyperedge ☛ is called backward if ❑ ❄■✞ ☛ ✠✙❑✛✥ . Similarly an edge is called forward if ❑ ✄❏✞ ☛ ✠✙❑▲✥ . A directed hypergraph is called Bdirected hypergraph if all its hyperedges are backward. In the rest of the paper we will refer to them as B-graphs. Thus the set of association rules whose consequents are singletons map neatly into a B-graph. Each rule is represented by a hyperedge ☛ , the antecedents of by ✄❏✞ ☛ ✠ and the consequent by ❄■✞ ☛ ✠ . We will also consider the antecedent of a rule as a single entity. For that we define the notion of a hypernode. Given a B-graph ✕ with hyperedges ✦ ☛ ✔ ✍✙✘✙✘✙✘ ✍ ☛ ✱▼✭ , the hypernodes induced by the hyperedge ☛ ✣ are the tail ✄❏✞ ☛ ✣❈✠ and the head ❄■✞ ☛ ✣◆✠ considered as a single entity. The set of all hypernodes is denoted by ❖ . ☛ ✝ ✝ ✆ ✆ 4.1 Association Rules e5 Association rules are generally described in the framework of market basket analysis. Given a set of items ✂ and a set of transactions ✄ consisting of subsets ☎ ✛ ✆ of ✂ , an Associ✕ where ✔ and ation Rule is a relationship of the form ✔ ✕ are subsets of ✂ while ✠ and ✂ are the minimum support and confidence of the rule. ✔ is called the antecedent and ✕ the consequent of the rule. The support ✝✟✞ ✔✡✠ of a subset ✔ of ✂ is defined as the percentage of transactions ✘ which ✚✌✡☞ ✢✜✣ contain ✔ and the confidence of a rule ✔ ✕ is ☛ ✘ ✚✤✣ . ☛ Most algorithms for association rule discovery take advantage of the anti-monotonicity property exhibited by the support level: If ✔✎✍ ✕ then ✝✟✞ ✔✡✠✑✏✒✝✟✞ ✕✓✠ . Our focus is to discover association rules in a more structured and dense relational table. For example suppose we are given a relation ✞ ✔✕✔ ✍ ✔✗✖ ✍✙✘✙✘✙☎✘ ✍ ✔✗✚✛✠ where the domain of ✔ , ✞ ✄✢✜ ✞ ✔✗✣✤✠✓✥✧✦✢★✩✔ ✍✙✘✙✘✙✘ ✍ ★✪✚✬✫✮✭ , is discrete-valued. Then an item is an attribute-value pair ✦✝✔✡✣✯✥✰★✛✭ . The ARN will be constructed using rles of the form A ✡ ✦✝✔✺✹✻✥ e4 e3 E e6 D G F ✡ ✦✝✔✗✱✳✲✴✥✵ ★✪✱✳✲ ✍✙✘✙✘✙✘ ✍ ✔✗✱✑✶✷✥✵★✪✱✑✶✸✭ ❀ ★✬✹✼✭ where ✽✿✾ ✦ ✜ ✔ ✍✙✘✙✘✙✘ ✍❁✜❃❂ ✭ C e2 ✡ ✂ e1 B Figure 2. An example B-graph with all the major features As can be seen in Figure 2, ❇ ✥ ✝✦ ✔ ✍ ✕ ◗✍ P✁✍ ✮✍ ❉ ✍✮❘ ✮✍ ❙ ✭ , ❉ ✥❚✦ ☛ ✔ ✍✙✘✙✘✙☎✘ ✍ ☛✢❯ ✭ and ✵ ❖ ✥ P ✍ ❍✭ ✍ ✦ ❉ ✭ ✍ ✦ ❉ ✍✮❘ ✭ ✍ ✦ ❙ ✬✭ ✭ . Thus ✦✸✝✦ ✔✕✭ ✍ ✝✦ ✕❱✭ ✍ ✦ P ✭ ✍ ✦✢❍✭ ✍ ✦ ✁ Example: ❘ is a node but not a hypernode. We now define a hyperpath and a hypercycle for a B-graph. A hyperpath is defined as a sequence ❲❳✥ ✦❩❨❬✔ ✍ ☛ ✔ ✍ ❨✬✖ ✍ ☛ ✖ ✍✙✘✙✘✙☎✘ ✍ ☛ ✚✪❭✌✔ ✍ ❨✬✚❪✭ , where ❫❵❴ ✍ ❨✬✣ is a hypernode and ☛ ✣ is a hyperedge. Furthermore for ❜❛ ❴ ❛ ✆ ✍ ❨✬✣▼✥❝✄❏✞ ☛ ✣◆✠ , ❄■✞ ☛ ✣◆✠ ❀ ❨✬✣ ❞❡✔ , ❨✬✚✪❭✌✔❢✥❣✄❏✞ ☛ ✚✪❭✌✔❤✠ and ❨✬✚❱✥✰❄■✞ ☛ ✚✪❭✌✔✐✠ . A hyperpath is a hypercycle if ❨✸✚ ❀ ❨✼❥ . Again, as can be seen in Figure 2, ❲ ✥ ✑☛ ✝ P ✍ ❍✭ ✍ ☛✄✂ ✍ ✦ ❉ ✍✮❘ ✦✸✝✦ ✔✕✭ ✍ ☛✁✝✍ ✦ ✁ P ✎ ✍ P ✍ ❍✭ ✍ ☛✄✂ ✍ ✥ ✦✸✝✦ ✕❱✭ ☛ ✔ ✍ ✦ ✁ ✭ ✍ ☛✢❯✝✍ ✦ ✦ ❉ ✭ ✍ ☛✁☎ ❙ ✸✭ ✭ is a hyperpath and ✍ ✝✦ ✕❱✭✸✭ is a hypercycle. Finally, we define the size of a hyperpath ❑ ❲ ❑ as the total number of hypernodes appearing on ❲ . Continuing with the previous example, ❑ ❲ ❑✸✥ and ❑ P ❑❬✥ . ✖ e 6(0.65) e 1(0.6) B ✖ C e 4(0.8) e 2(0.7) 5 Association Rules Network e 3(0.9) A In this section we will formally define an Association Rules Network(ARN), present an algorithm to generate it from a set of association rules and prove some important properties of the ARN. e 5(0.75) D G F Figure 3. An example ARN. ☛✢❯ is not a part of the ARN because it is a reverse hyperedge and also participates in a hypercycle. 5.1 Definition and Related Concepts Here we give the definition of an ARN in terms of Bgraphs. E Example: In Figure 3, ☛✼❯ is a reverse hyperedge. ✂ Definition 1 Given a set of association rules and a frequent goal item ✆ which appears as singleton in a ❀ . An association rule network, consequent of a rule ✍ ✔ ❇■✞ ✆✪✠ , is a weighted B-graph such that ✂ ✆ ✂ ✂ 1. There is a hyperedge which corresponds to a rule whose consequent is the singleton item ✆ . ✆ ❥ ❇■✞ ✍ ✆✪✠ corresponds to a rule 2. Each hyperedge in ✔ in whose consequent is a singleton. The weight on the hyperedge is the confidence of the rule. ✂ Note: We now introduce two conditions which will prevent redundancies (hypercycles and reverse hyperedges) from appearing between hypernodes at different levels while preserving the reachability to the goal node in an ARN. Condition 1 A node which has served as a consequent during ARN generation cannot be be an antecedent across levels for a rule whose consequent is at a higher level. ✂ ✂ 3. Any node ✝✟✥✠ ✞ ✆ in the ARN is not reachable from ✆ . We now define the notion of a level which will be used to exclude hypercycles from the ARN while retaining reachability to the goal node. Condition 2 The goal attribute can appear with only one value in the ARN, namely the one in the goal node. 5.2 ARN Construction Algorithm We now describe a breadth-first like algorithm to construct an ARN from a rule set and a goal node ✂ . Algorithm ARNConst(shown below) takes as input the rule set and a goal node ✂ ✡ which appears as single✡ ton consequent in . The ✏ ☛ ✠✸✘ ✁ ☛☞✌ ✏ ☛ ✠ ✞ ✍ ✏ ✍ ✠ ✠ is a function that returns all rules in whose consequent is ✏ but antecedent does not contain ✠ . For each of these ✡ ✏ ✠ returns the set of all anrules, ✏ ☛ ✠✸✘ ✁ ☛☞✌ ✔ ✆✤✌ ☛✝✂☎☛✝✞ ☛☞✆✤✌✑✠ ✞✑▲ tecedents. The level of each these antecedent elements is determined on the basis of Condition 1 and Definition 2. For all the rules satisfying Condition 1 a hyperedge is added to the ARN. Finally hypercycles between hypernodes which are on the same level are removed using ✡ ✡ ✡ ✏ ☛ ✠✸✘ ☛❩✜ ✄ ❨ ☛✁✖ ☛ ❨ ☛ P✁✏ ✂ ☛ ✠ ✞◆✠ . The algorithm returns the generated ARN. ✂ ✂ ✂ Definition 2 A The level of the goal node is zero. ✞ ❨ ✠ ✥☞☛✍✌ ✎▲✦ ✞✑▲✏ ✠✙❑ ✒ ☛ ✡ ✂ ❀ ❏ such that ❨ ✄ ✞ ☛ ✠ ★ ✟✆ ✞ ✏❢✥ ❄■✞ ☛ ✠◗✭ C The level of a hypernode ✂ is defined as ✡ ✂ ✂ B The level of a non-goal node ❨ , is defined as ✡ ✂ ✂ ✡ ✞ ✂ ✠ ✥✓☛✍✌ ✎✌✦ ✞ ✠ ✙✠ ❑ ✠ ❀ ✂ ✭ ✂ ✆ Example: For the ARN in Figure 3, ✞ ❙ ✠❏✕ ✥ ✔, ✡ ❉ ✡ ✡ ✡ ✡ ✞ ✠▼✥ , ✞ ✡ P ✠❏✥ ✞❈ ✠❏✥ and ✞ ✓ ✕ ❏ ✠ ✥ ✞ ✡ ✔ ✠❏✥ ✡ hypernodes, ✞✤✦ P✁✍ ❍✭✼✠ ✥ and ✞✤✦ ❉ ✍✮❘ ✭✼✠✟✥ . ✡ ✝ ☛ ☛ ✡ ✞ ❘ ✠❏✥ ✓ . For Definition 3 A hyperedge ☛ in an ARN is called a reverse ✡ ✡ hyperedge if ✞ ✄❏✞ ☛ ✠❁✠ ✞❈❄■✞ ☛ ✠❁✠ . ✎ 5.3 Results about ARN Construction Algorithm ✝ Theorem 1 The time complexity of ARN generation is ✗ ✆✙✘ ✞ ✠ where ✆ is the number of frequent items and ✘ is the number of rules whose consequents are singletons. : Rules , Consequent ✂ : A Directed hypergraph ❙ representing an AR Network which flows into ✂ ✠ ✌ ✝ ☛ ✞ ✁ ❨❬❴ ❴ ❴✄✂❵✥ if ❴ has been visited as a consequent; ✏✆☎ ✂ ; ✠ ☎ ✂; Add ✏ to queue ✝ ; ❨❬❴ ✠ ❴ ✌ ☛✝✞✁ ✏✞✂✟☎ ; repeat ✡ ✡ ✡ ✏ ☛ ✠✸✘ ✁ ☛☞✌ ✏ ☛ ✠ ✞ ✍ ✏ ✍ ✠ ✠ /* Get ✏ ☛✡✠ ✏☞☛ ✠☞☛☞✌ ☎ all rules whose consequent is ✏ but antecedent does not contain ✠ */;✡ ❀ ✏ ☛✡✠ ✏☞☛ ✠☞☛☞✌ do for all rules ✡ ☛ ✠✸✘ ☛☞✌ ✆✤✌ ☛✝✂☎☛✝✞ ☛☞✆✤✌✑✠ ✁ ✔ ✞✑▲✏ ✠ ; ★✌ ☎ ✏ ✡☛ ☛ ✡ ❨ ★✍❪✂ ✥☞✌ ✎✏✎ ; ❀ ★ do for all elements ✑ ✡☛ ☛ ✡ ✡☛ ☛ ✡ if ✡ ❨ ☛ ❨ ☛ ✑✒✡ ★✍✂ ❪✂ ✥ ✡ ❨☛ ❨ ☛ ✡ ★✍✑✒✂ then ✂; end end ✡ ✡ ✡ ✡ if ☛ ❨ ☛ ★✍✌ ✂ ✏ ☛ ❨ ☛ ✏✞✂ then ❀ ★ do for all elements ✑ ✠ ✌ ✝ ☛ ✞ ✁ if ❨❬❴ ❴ ✑✒✂▲✥✠✔ then Add ✑ to ✝ ; ❨❬✡ ❴ ✠ ❴ ✌ ✡ ☛✝✞✁ ✑✒✂✌✥ ✡ ;✡ ☛ ❨ ☛ ✑✒✂▲✥ ☛ ❨ ☛ ✏✞✂✔✓ ; end end ❙ ✘ ★ ✞ ✞ ❄ ✏ ✝ ☛ ✼❉ ✞ ✁ ☛ ✞ ✍ ✏▲✠ /* directed hyperedge flowing into ✏ */; end end if ✝ is empty and ❙ is singleton then return ❙ ✥✖✕ ; else if ✝ is empty and ❙ is not singleton then ❙ ✘ ☛❩✜ ✄ ❨ ☛✁✖ ☛ ❨ ☛ ✡ P✁✏ ✂ ✡ ☛ ✠ ✞◆✠ /* remove level hypercycles based on confidence */; return ❙ ; ✂ Data Result Proof: For each node in the ARN the hyperedges flowing into it are found by searching the set of all rules whose consequents are singletons. Furthermore the number of nodes appearing in the ARN is bounded✗ by the number of frequent items. Hence the complexity is ✞ ✆✙✘ ✠ . Also ✘ is bounded ✡ by ✗✙✘✣ ✚✌✖ ✆ ✣✁✛ ❴ where is the length of the longest frequent ✆ item set and ✣ is the number of frequent itemsets of size ❴ . ✝ ✝ ✂ ✂ ✂ ✂ ✂ ✆ ✂ ✎ ✝ ✝ ✆ ✆ ✆ Theorem 2 The ARN generated is unique, i.e., it does not depend upon the sequence in which the nodes are explored. Proof: Let ✏ and ❨ be two nodes at the same level. We want to show that by exploring ✏ and ❨ in different orders we are not losing an edge whose consequent is one of them and the antecedent is the other. WLOG assume ✏ is explored before ❨ . Let ✦ ☛ ✔ ✍✙✘✙✘✙✘ ✍ ☛✢✜✢✣ ✭ be the set of edges for which ✏ is a consequent and ❨ is one of the antecedents. Then by the definition of level, ✡☛ ☛ ✡ ✡ ✡ ❨ ✞ ✄❏✞ ☛ ✣ ✠❁✠❍✥ ☛ ❨ ☛ ✞ ❨ ✠ for all ❛ ❴ ❛ ✏❪❨ . Note that there cannot be any edge whose consequent is ✏ and any of whose antecedents have level less than ✏ (which is the same as the level of ❨ ). Now, once we explore ❨ , all the edges whose consequent is ❨ and whose antecedents contain ✏ , are free to flow into ❨ because of Condition 1. Thus we have not lost any hyperedges between ✏ and ❨ since they are at the same level. A similar argument holds when ❨ is explored before ✏ . Thus the set of hyperedges and hypernodes remains the same in both cases. Hence same ARN will be generated irrespective of the order in which the nodes at the same level are explored. ✝ Theorem 3 The goal node is reachable from any node in the ARN. Proof: By induction on level. By defintion the level of the goal node is zero and trivially reachable from itself. Assume that the goal node is reachable from any node at level ❴ . For each node ✏ ✡ at level ❴✏✓ exists atleast one hyperedge ❀ ✄❏✞ ☛ ✠ . Thus the result holds ☛ such that ✞❈❄■✞ ☛ ✠❁✠ ✥ ❴ andthere ✏ ✡ ✜ ★✥✤✌✦ ✡ ✞✑▲✏ ✠✙❑ ✏ ❀ ❇ ✭ . for all ❴✏✓ and hence for all levels ✝ ✝ ✎ Theorem 4 The ARN generated for the algorithm is free of cycles across levels end Delete ✏ from ✝ ; until false; Algorithm 1: Association Rule Network(ARN) constructed from a rule set and flowing into consequent ✂ using a breadth-first like strategy. ✂ Proof: By contradiction. Let P be a hypercycle across levels, i.e., P ✥✎✦❩❨✼❥ ✍ ☛ ✔ ✍✙✘✙✘✙✘ ✍ ☛ ✚ ✍ ❨✬✚❪✭ where ❨✬✚❢✍ ❨✼❥ . Let ❨✬✣ be ✡ ✡ ✞ ❨✼❥✢✠ . Then the first edge ✡ in P where✡ ✞ ❨✸✣ by construc✡ ✡ ✞ ✬ ❨ ◆ ✣ ✠ ✞ ❨✼❥✢✠ . . Therefore ✞ ❨✬✚✪❭✌✔✐✠ tion of ARN, ✞ ❨✬✚✪❭✌✔✙✠ ✡ ✡ Now ✞ ❨✬✚✩✠ ✏ ✞ ❨✼❥✢✠ because ❨✬✚ is a singleton and ❨✸✚ ✍✒❨✼❥ . Therefore ☛ ✚ is a hyperedge from an antecedent at a lower level to a consequent at higher level. This contradicts condition ✔ . ✎ ✎ ✎ 6 Cycle removal and rule pruning This justifies the removal of the hyperedge representing ✕ ✡ ✔ . In this section we provide a justification for the removal of hypercycles and reverse hyperedges. We will show that in either case the information provided by the hyperedge pruned, or the interestingness of the corresponding rule, is small, given the rest of the ARN. C e3 e2 D B Definition 4 Let ❘ be the set of all itemsets. Let ✞ ❘ ✡ ✍ ✝ be defined as F e6 e4 e1 E e5 CASE 2: Consider three rules of the form ✔ ✡ ✕ , ✕ ✡ P and P ✡ ✔ . Our argument is based on the concept of information gain. Assume we have an information channel ✂ which is a sequence of transactions. Then given that the pair of itemsets ✞ ✔ ✍ ✕✓✠ and ✞ ✕ ✍◗P ✠ are frequent, i.e., they appear close to each other with high probability then the information that ✞ ✔ ✍◗P ✠ is frequent is not surprising. We formalize this argument as follows. ✔ ✂ ❲ ✞ ✔ ✍ ✕✓✠ ❱ ❲❱✞ ✔✡✠ ■❲❱✞ ✕✓✠ ❲❱✞ ✔ ✍ ✕✓✠ ❀ ❘ and ❲❱✞ ✔✡✠ is the support of the itemset ✔ where ✔ ✍ ✕ Theorem 5 The function ✞ is a metric on the space ❘ . ✞ A e7 ✟✌❘ ❋ ✞ ✔ ✍ ✕✓✠ ✥ ✝ ✓ Figure 4. An example which distinguishes reverse hyperedges( ☛ ✖ ) and hypercycles( ☛✁ ). Consider the example in Figure 4. Given the goal node ✖ ☛ ✔ , during ARN construction, the two hyperedges and ☛✁ ☛✂ are removed. The removal of hyperedge is more “crucial” than ☛ because it is part of a hypercycle; otherwise ✔ ✖ will never be reachable from ❘ violating the definition of the ARN. On the other hand, removal of the reverse hyperedge ☛ ✖ is justified because it is part of a redudant path from ☛ ✕ to ✔ . Hence removal of ✖ will not destroy the reachability condition of the ARN. This illustrates the difference between the two kinds of pruning operations. We provide separate arguments for each of the two operations. Proof: The function ✞ is identical to the distance measure ✝ defined in [18]. This proof follows directly from the proof of Lemma 3.2 in [18]. ✝ , the information gain from Corollary 1 For ✎✟✞✡✠ ✍◗P ✠ are close to each other is observation that the pair ✞ ✔ small given that ✞ ✞ ✔ ✍ ✕✓✠ ✎☛✞ and ✞ ✞ ✕ ✍◗P ✠ ✎☛✞ . ✔ Proof: Follows directly from the triangle inequality of ✞ . Now WLOG assume that the ☛ ❨ ☛ ✞ ✔✡✠ ✒ ☛ ❨ ☛ ✞ P ✠ . The fact that ✞ ✞ ✔ ✍◗P ✠ is small given that ✞ ✞ ✔ ✍ ✕✓✠ and ✞ ✞ ✕ ✍◗P ✠ are small is an indication of the fact that the rule ✔ ✡ P and P ✡ ✔ can be derived from the rules ✔ ✡ ✕ and ✕ ✡ P which are already in the ARN. Thus P ✡ ✔ can be safely pruned without violating the reachability constraint of the ARN. ✡ 6.1 Removal of Hypercycles ✄ We first note that a hypercycle in a B-graph cannot be of size less than three(see Section 4.2). We will provide the ✄ desired justification for two cases, one in size of is three ✄ and the other in which size of is four. From transitivity, it follows that the removal of any hypercycle of size bigger than four can be justified using the arguments presented for the second case. We will now consider the two cases. ✕ and CASE 1: Consider two rules of the form ✔ ✡ ✕ ✡ ✔ . If ✔ and ✕ are the same level then WLOG remove the hyperedge with lower confidence to break the hypercycle. Else WLOG assume ☛ ❨ ☛ ✞ ✔✡✠ ✒ ☛ ❨ ☛ ✞ ✕✓✠ . Since we are exclusively dealing with rules whose antecedents are a conjunction of items and the consequent is a singleton item, for a vast majority of these rules, ❲❱✞ ✔✡✠☎✎ ❲❱✞ ✕✓✠ . In such a scenario, ✡ ✂☎✄✝✆ ✆ ☎ ✞✔ ✡ ✓ ✕ ✠ ✒ ✡ ✡ ✂☎✄✝✆✆☎ ✡ ✞✕ ✡ ✡ ✔ ✠ . ✡ ✡ ✡ 6.2 Removal of Reverse Hyperedges The removal of reverse hyperedges in an ARN can be justified on the basis of thefact that they generate redundant paths from a node to the goal. This can be formally proved as follows. Theorem 6 Let ★ be a node in the ARN from which a reverse edge ☛ originates. Then, there exists a path ❲ from ★ ❀ to the goal node such that ☛ ✾ ❲ and the size of ❲ is smaller than the size of the path from ★ to the goal node in which ☛ participates. Proof: Let ✞❈★ ✠ ✥ ✔ and ✞❈❄■✞ ☛ ✠❁✠ ✥ ✖ . Since ☛ is a reverse ✖ . We observe that the path of smallest hyperedge, ✔ size from any node at level in the ARN to the goal node is of size ✓ (follows from the definition of level). Let this path of smallest size from ★ be ❲ ✔ and the one from ❄■✞ ☛ ✠ be ❲❡✖ . Clearly, the size of ❲ ✔ is ✔ ✓ and that of the new path ☛ ✍ ☛✁ ❲❡✖ is ✖ ✓ . Thus ❲✟✔ is the required path. This theorem justifies the redundancy of the rule represented by a reverse hyperedge and hence its removal. ✡ ✡ ✡ ✡ ✡ E e 6(0.65) ✡ ✎ ✡ B e 2(0.7) ✡ ✝ D ✡ ✝ ✡ ☛ e 3(0.9) A Figure 6. ARN with goal node . The edge ☛✢❯ is now part of the ARN. This illustrates the adaptive nature of local pruning 7 Benefits of ARN In the introduction we raised the question of the utility of association rules beyond simple exploratory analysis. ARNs also are a tool for exploratory analysis but they provide a context for understanding and relating the discovered rules with each other. ARN offers the following benefits. Local Pruning ARNs provide a graphical method to prune rules by associating redundant rules with hypercycles and reverse hyperedges. Furthermore the pruning takes place in the context of a goal node. Thus a rule which is redundant for a particular goal node may become relevant for another goal. This is more flexible than pruning based on statistical measures of interestingness. e 6(0.65) goal node can be interpreted as providing an explanation for the goal node. Formally let ❖✛☎ ✱ ✄✝✆ be the set of❀ all maximum level ❖❪✱☎✄✝✆ let ❲ ✣ be the nodes in the ARN. For each ❨ set of all hyperpaths from ❨ to the goal node ✁ . We can define two cost measure on each hyperpath ✝ ❴◆✁✠✟ ✌ ✞ ❵✠✟✥✖☛ ✗ ✡ ✫✌☞✎✍✑✏ ✒✔✓ ✞ ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠❁✠ ✂ ✞ ❵✠ ✥ ✗✕✡ ✫✌☞✎✍ ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠ ✏ ✒✔✓ ✞ ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠ where ☛ ✣ is a hyperedge and ✂☎✄✝✆✆☎ ✞ ☛ ✣◆✠ is the confidence of the rule represented by ☛ ✣ . 1. 2. ✞ ☛ ✆✆☎ ✄ ✝ ✝ The reason for introducing two cost functions is that they provide different kinds of information depending upon the context. For example ✞ ☛ ❴◆✁✠✟ ✌ ✞ ❵✠ can be interpreted as the strength of the correlation between the source and the goal node. Similarly ✂ ✆✆☎ ✄ ✞ ❵✠ can be interpreted as the total information gain along the path from the source to the goal node. Now the optimal path in ❲ ✣ using ✞ ☛ ❴◆✁✠✟ ✌ ✞ ❵✠ or ✝ e 1(0.6) B C ✝ e 4(0.8) e 2(0.7) A e 3(0.9) E e 5(0.75) D G ✂ ✆✆☎ ✄ ✞ ❵✠ ✝ is the best explanation for dependence of the goal node on ❨ . Computing these optimal hy❀ ❖❪✱☎✄✝✆ provides a reasoning netperpaths for all ❨ work for the goal node ✁ . The problem of optimal hyperpaths in B-graphs has been studied by [3] where they have reported an algorithm of time complexity ✞✮❑ ❄ ❑✡✓ ✆ ✄ ✁ ✆ ✠ where ❑ ❄ ❑ , size of the hypergraph is ✠. ✗ ✡ ✫✖☞✔✗ ✞✮❑ ✄❏✞ ☛ ✣◆✠✙❑ ✓ ☛ ✝ F Figure 5. ARN with goal node ❙ . The edge ☛✼❯ is not part of the ARN because it participates in a hypercycle ✗ ✡ ✝ Consider the B-graph in Figure ✂ 5 and Figure 6. When the goal node is ❙ , the edge ❉ ✕ is a redundant rule which may be eliminated. On the ✂ other hand when the the same edge ❉ ✕ is relevant. Thus goal node is pruning of a rule as per our notion becomes dependent upon the context of the goal node. We refer to this kind of pruning as local pruning and it may be more flexible than global pruning based simply on measures of interestingness. Reasoning using Path Traversal An ARN is a weighted hypercycle free B-graph. Hyperpaths which lead to the 8 Applications We now show the advantage of our approach by constructing ARNs for two well known data sets. The results, as we will describe below, vindicate our original thesis that a network of association rules provides a context for interpreting the rules in a more coherent fashion than if they were viewed in isolation. The two data sets that we considered were the Lens and the Mushroom data bases which are part of the UCI machine learning repository [4]. 8.1 Lens Database The Lens database has twenty four rows where each row has five attributes which are either binary or ternary-valued. Our goal attribute was contact-lenses which denotes whether a person should be fitted with either one of two kinds of lenses:hard and soft, or neither. astigmatism = yes spectacle prescription = hypermetrope tear-prod-rate = reduced astigmatism = no The observations made are consistent with the expected benefits of using an ARN. This justifies the applicability of an ARN for synthesizing association rules. 8.2 Mushroom Database contact-lenses = none Figure 7. ARN with contact-lenses=none spectacle prescription = hypermetrope 3. We finally note that when the goal value is contact-lenses=none, spectacle prescription=hypermetrope and tear-production-rate=reduced participate with different in two distinct three-itemsets. This suggests a strong correlation between them and the goal value. astigmatism = no goal tear-prod-rate = normal node age = pre-prebyopic We also constructed an ARN for the Mushroom database. This data set has approximately eight thousand instances and twenty three attributes. We chose the binary attribute class=edible,poisonous as our goal attribute. The ARNs generated are shown in Figure 10 and 11 for classes poisonous and edible respectively. However, to save space, we have abbreviated the attribute values by their first letters. For example p = poisonous and e = edible. contact-lenses = soft veil-type = p bruises? = t ring-type = p stack-surface-above-ring = s Figure 8. ARN with contact-lenses=soft goal node stack-surface-below-ring = s veil-color = w spectacle prescription = myope astigmatic = yes tear-prod-rate = normal age = young gill-spacing = c ring-number = o gill-attachment = f population = v class = p contact-lenses = hard Figure 9. ARN with contact-lenses=hard goal node We make the following observations based on the three ARNs in Figures 7, 8 and 9. 1. As we had mentioned in the introduction, an ARN provides a context for interpreting the rules. A change in the value of the goal attribute is correlated with changes in several antecedents. The ARNs clearly capture these simultaneous changes. 2. Also notice that the structure of the ARN does not change drastically when the value of the goal attribute changes. This suggests that the actual network is at the variable (type) level rather than at the instance level. Figure 10. The ARN when the goal attribute is class=edible The observations that we made for the Lens database are applicable here as well. However since the database is bigger we were able to observe more interesting patterns which we now discuss. 1. The set of contributing attributes for the two goal values are mostly the same but some attributes contribute to one and not to the other. More particularly, gill-size is present in the ARN for class=edible but not in the one for class=poisonous. Similary, bruises?, population and stack-surface-above-ring appear in the ARN for class=poisonous, but not for class=edible. ring-number = o ring-type = p gill-spacing = c veil-color = w gill-attachment = f gill-size = b veil-type = p stack-surface-above-ring = s important features of an ARN are (1) their ability to prune rules in the context of a goal, (2) the pruning mechansim is based on simple graph operations and (3) the ARN can serve as a basis of reasoning with discovered rules. For future work we would like to convert our intuition about reasoning using ARNs into a more theoretical framework. We would also like to design a layout algorithm specifically for ARNs. We are also working on a cycledetection algorithm for general B-graphs which can be applied to ARNs. class = e Figure 11. The ARN when the goal attribute is class=poisonous 2. Notice how the item bruises?=true appears as a level two item in the ARN for class=poisonous. This clearly highlights the advantage of an ARN because it reveals that even though bruises?=true does not appear as an antecedent in a rule in which the consequent is class=poisonous, it seems to have an important influence in determining the value of the class attribute. In other words this shows the transitivity in association rules which the ARNs are able to capture. 3. The level of certain items changes depending upon the value of the goal attribute. For example ring-number=one appears as a second level node in the ARN for class=edible but as a first level node in that for class=poisonous. This change in the level of the item reflects the change in its correlation with the value of the goal attribute. 4. Also notice that their are several paths from the higher level nodes to the goal node in both the ARNs. By using the measures described in Section 7 we can choose the optimal paths between the maximal level nodes and the goal node. This will create the reasoning network and sparsify the graph. 5. Finally, these ARNs reveal the utility of local pruning as the resulting rules are semantically meaningful. Thus the two operations, hypercycle and reverse hyperedge removal, result in a meaningful network. The observations that we have highlighted above illustrate the benefits of an ARN. 9 Conclusions and Future Work Association Rules Network provides a mechanism for synthesizing association rules in a structured manner. The References [1] Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 207–216, Washington, D.C., 26– 28 1993. [2] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487– 499. Morgan Kaufmann, 12–15 1994. [3] Giorgio Ausiello, Giuseppe F. Italiano, and Umberto Nanni. Hypergraph traversal revisited: Cost measures and dynamic algorithms. Lecture Notes in Computer Science, 1450, 1998. [4] C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998. [5] L. Feng, J. Yu, H. Lu, and J. Han. A template model for multi-dimensional, inter-transactional association rules. VLDB Journal, 11(2):153–175, 2002. [6] G.F. Italiano G. Ausiello and U. Nanni. Dynamic maintenance of directed hypergraphs. Theoretical Computer Science, 72(2-3):97–117, 1990. [7] Giorgio Gallo, Giustino Longo, and Stefano Pallottino. Directed hypergraphs and applications. Discrete Applied Mathematics, 42(2):177–201, 1993. [8] Dimitrios Gunopulos, Heikki Mannila, Roni Khardon, and Hannu Toivonen. Data mining, hypergraph transversals, and machine learning (extended abstract). In Proc. PODS 1997, pages 209–216, 1997. [9] G. Gupta, A. Strehl, and J. Ghosh. Distance based clustering of association rules. In Intelligent Engineering Systems Through Artificial Neural Networks (Proceedings of ANNIE 1999), ASME Press, November 1999., volume 9, pages 759–764, 1999. [10] Eui-Hong Han, George Karypis, Vipin Kumar, and Bamshad Mobasher. Clustering based on association rule hypergraphs. In Proceedings SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery(DMKD ’97), 1997. [11] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In 2000 ACM SIGMOD Intl. Conference on Management of Data, pages 1–12. ACM Press, 05 2000. [12] S. Jaroszewicz and D. A. Simovici. Pruning redundant association rules using maximum entropy principle. In Advances in Knowledge Discovery and Data Mining, 6th Pacific-Asia Conference, PAKDD’02, pages 135– 147, Taipei, Taiwan, May 2002. [13] Brian Lent, Arun N. Swami, and Jennifer Widom. Clustering association rules. In ICDE, pages 220–231, 1997. [14] Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In Knowledge Discovery and Data Mining, pages 80–86, 1998. [15] S. Sarkar M. Ramaswamy and Y. Chen. Using directed hypergraphs to verify rule-based expert systems. IEEE TKDE, 9(2):221–237, 1997. [16] G. Piatetsky-Shapiro and C. Matheus. The interestingness of deviations, 1994. [17] V. Pudi and J.R. Haritsa. Reducing rule covers with deterministic error bounds. In Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pages 313–324. Springer, 2003. [18] M.D. Rice and M. Siff. Clusters, concepts, and pseudometrics. In Electronic Notes in Theoretical Computer Science, volume 40. Elsevier, 2002. [19] S.Chawla, B.Arunasalam, and J. Davis. Mining open source software(oss) data using association rules network. In Advances in Knowledge Discovery and Data Mining, 7th Pacific-Asia Conference, PAKDD’03, pages 461–466. Springer, 2003. [20] M. Zaki and M. Ogihara. Theoretical foundations of association rules. In Proceedings of 3rd SIGMOD’98 Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’98), Seattle, Washington, USA, June 1998., 1998. [21] Mohammed J. Zaki. Generating non-redundant association rules. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 34–43. ACM Press, 2000.