Mining Closed Gradual Patterns

Sadok Ben Yahia

Mining Closed Gradual Patterns

Sadok Ben Yahia

2010, Artificial Intelligence and Soft Computing

visibility

…

description

13 pages

link

1 file

Mining Closed Gradual Patterns Sarra Ayouni1 , Anne Laurent2 Sadok Ben Yahia1 , and Pascal Poncelet2 1 1 Faculty of Sciences of Tunis, 1060, Campus Universitaire, Tunis, Tunisie, [email protected], [email protected] 2 LIRMM − CNRS, 161 rue Ada, Montpellier, France [email protected], [email protected] Introduction With the steady development of the computing tools, we attended last three decades a considerable increase of the quantity of data stored in databases. So, extracting knowledge from this data is of paramount importance. Data mining is becoming an inescapable tool to reach this goal. Association rule extraction is one of the important tasks in data mining. This powerful technique has a wide range of applications in many areas of business practice and also research. A scrutiny of the related work shows that another type of rules called gradual rules also paid attention within the data mining community. Gradual rules of the form −“the more A, the more B”− mainly grasped the interest within recommendation and command system fields [2]. Several approaches and semantics dealing with this kind of rules have been proposed in literature. However, the relevance and the usefulness of the mined knowledge seems no to be the main concern in these approaches. In fact, it is expected that an overwhelming quantity of gradual rules will be drawn even from low sized contexts. The main thrust of this paper is to address to lossless reduction of the mined knowledge. To reach this goal, a possible solution consists in using results of Formal Concept Analysis that has been shown to provide useful seeds to tackle such knowledge extraction problem. However, no work has addressed the use of the FCA framework for gradual patterns. Hence, we introduce a novel Galois connection that is a sine qua non issue for extracting closed gradual patterns. These latter patterns will act as a lossless reduced-size nucleus of patterns. The remainder of the paper is organized as follows. Section 2 reviews the related work focused on mining gradual rules and some basic notions of the FCA framework. Section 3 introduces our novel Galois connection definition and shows its validity and soundness. Section 4 validates the importance of our approach at reducing the hudge number of the extracted gradual closed patterns through experiments carried out over synthetic datasets. Section 5 sketches our future perspectives and presents concluding remarks. 2 Related work In this section, we present an overview of approaches dealing with gradual rules and some basic notions related to Formal Concept Analysis. 2 2.1 Gradual rules Gradual rules are applied on data sets with m attributes (X1 , . . ., Xm ) defined on numeric domains dom(Xi ). A data set D is a set of rows (m-uplets) of dom(X1 )×, . . ., × dom(Xm ). In this scope, a gradual item is defined as a pair of an attribute and a variation ∗ ∈ {≤, ≥}. Let A be an attribute then the gradual item A≥ means that the attribute A is increasing. It can be interpreted by “the more A”. A gradual itemset, or gradual tendency, is then defined as a non-empty set list of several gradual items. For instance, the gradual itemset M =A≥ B ≤ is interpreted as “The more A and the less B ”. Example 1. (Salary)≥ is a gradual item meaning that the “Salary” is increasing. ((Age)≥ , (Salary)≥ ) is a gradual itemset. A gradual rule, denoted by M ⇒ M ′ , is defined as a pair of gradual itemsets on which a causality relationship is imposed. Different measures and semantics have been proposed to extract and assess this kind of rules. In what follows, we review the related works that focused on mining gradual rules. Gradual dependencies were introduced in [4], where they are called tendency rules and denoted by A ⇀t B. Hüllermeier proposed to perform a linear regression analysis on the contingency diagram depicted from the data set. The validity of the rule is assessed on the basis of the regression coefficients α, β of the line that approximates the points in the contingency diagram. The quality of the regression is given by the R2 coefficient. A tendency rule contains one or more attributes in the condition part and only one in the conclusion part. When it contains several attributes in the condition part, the author proposes to use a logical conjunction modeled by means of a so-called t-norm3 This method leads to a high computational cost for the linear regression and the identification of the relevant item combinations. Another definition has been proposed in [1]. The semantic of a gradual dependence is quite different, since the authors only consider the variation fulfillment. The authors define a gradual dependence as being similar to a functional dependence by considering the degrees variation between two objects. According to [1], the gradual dependence A ⇒ B holds in a database D if ∀ o=(x, y) and o′ =(x′ , y ′ ) ∈ D, A(x) < A(x′ ) implies B(y) < B(y ′ ). A new definition of gradual dependence was proposed in [8] using fuzzy association rules. The authors take into account the variation strength in the degree of fulfilment of an imprecise property by different objects. Hence, a gradual dependence holds in a database D if ∀ o=(x, y) and o′ =(x′ , y ′ ) ∈ D, v∗1 (A(x), A(x′ )) implies v∗2 (B(y), B(y ′ )), where v∗ is a variation degree of an attribute 3 A triangular norm t-norm is a function ⊤ : [0, 1] × [0, 1] → [0, 1] verifying, ∀ x and y ∈ [0, 1], these following properties: ⊤ is commutative ⊤ (x, y) = ⊤ (y, x), ⊤ is associative ⊤ (x, ⊤ (y, z)) = ⊤ (⊤ (x, y), z), ⊤ is increasing ⊤ (x, y) ≤ ⊤ (z, t) if x ≤ z and y ≤ t, ⊤ (x, 1) = x. 3 between two different objects. In both propositions [1] and [8], the authors propose to build a modified data set D′ that contains as many rows as there are pairs of distinct objects in the initial data set D. Another definition of support and confidence of a gradual itemset, as defined ∗ above, was proposed in [5]. In fact, the support of a gradual itemset A∗11 , . . . , App , is defined as the maximal number of rows {r1 , . . . , rl } for which there exists a permutation π such that ∀ j ∈ [1, l − 1], ∀ k ∈ [1, p], it holds Ak (rπj ) ∗k Ak (rπj+1 ). More formally, denoting L the set of all such sets of rows the support of a gradual itemset is defined as follows. ∗ Definition 1. Let s=A∗11 , . . . , App be a gradual itemset, we have: supp(s) = maxLi ∈L |Li | . |D| The authors propose a heuristic to compute this support for gradual itemsets, in a level-wise process that considers itemsets of increasing lengths. Recently Di Jorio et al. [6], considered the same definition considered the same definition that was proposed within the conflict set based approach, and proposed an efficient method based on the precedence graph. In this method, named Grite that stands for GRadual ITemset Extraction, the data is represented through a graph whose nodes are defined as the objects in the data, and the vertices express the precedence relationships derived from the considered attributes. In [7], the authors propose to calculate the support in a different way by using the Kendall tau ranking correlation coefficient. This coefficient calculates the number of n-uplets pairs that can be ordered in the database according to the considered gradual pattern. Unfortunately, in all surveyed approaches, reducing the quantity of mined patterns was not the main concern. In the following subsection, we recall some key settings from the FCA framework presenting some pioneering results towards defining a concise representation of gradual patterns. 2.2 Formal concept analysis Formal Concept Analysis (FCA) is based on the mathematical theory of complete lattices (for further information please refer to [3]). This theory provides a powerful mathematical framework that has been used in many fields of computer science.Indeed, this theory has been employed in data mining to extract a representative set of itemsets that can be used to extract association rule in an efficient manner. To do so, a formel context must be defined as a triplet (R, O, I) where O is a set of transactions or objects, I is a finite set of items and R is a binary relation R ⊆ O × I. The first step to construct the lattice is to define a Galois connection between two derivation operators: one mapping a set of objects into a set of items and the other one mapping a set of items into a set of objects. For a set O ∈ O and I ∈ I the two mapping operators denoted by f and g are defined, respectively, as follows: 4 – f : P(O) → P(I), f (O) ={i ∈ I—(o, i) ∈ R, ∀ o ∈ O } – g : P(I) → P(O), g(I) ={o ∈ O—(o, i) ∈ R, ∀ i ∈ I } These two mapping operators f and g induce a Galois connection between the powerset of objects and the powerset of items. This Galois connection means that f and g are dually adjoint, i.e., O ⊆ g(I) ⇒ I ⊆ f (O) for a set of objects O and a set of items I. The two composite operators f ◦g and g ◦ f are closure operators (i.e., they keep the properties of monotonicity, extensivity and idempotency). A formal concept is a pair (O, I) of a set of objects O ∈ O and a set of items I ∈ I, where f (O)=I and g(I)=O. The set of all concepts that can be extracted from a gradual formal context form a complete lattice provided with a partial order relation ≤, such that ∀ c1 =(O1 , I1 ) and c2 =(O2 , I2 ) two concepts, if c1 ≤ c2 ⇔ I2 ⊂ I1 (O1 ⊂ O2 ). 3 Defining new Galois mapping operators for gradual patterns To the best of our knowledge, no previous study in the literature has paid attention to apply the Galois connection in gradual rules extraction problem. Given that classical FCA was developed for binary relationships, adapting the former results to the gradual case turns out to be an interesting formalization problem. Within this work we aim at using FCA theory in order to formalize a new closure system characterizing gradual data. A gradual rule is defined as a special kind of association rule reflecting a variation in the degree of membership of itemsets in a sequence of objects. A gradual rule is formulated as “The more/less X, the more/less Y”, where X and Y are gradual patterns. An example of a gradual rule is “The higher the Age, the higher the Salary”. In order to satisfy the graduation property, we must consider a set of object sequences. Indeed, in the classical FCA case the domain of an itemset is a set of objects. In our case the domain of a gradual itemset is a set of sequences satisfying this itemset. The set of sequences will be ordered by the properly defined relation . In what follows, we propose to define the notion of sequence and the related mathematical operations that can be applied over the set of these sequences. 3.1 Handling object sequences Let O=o1 , . . . , on be a set of objects. We consider a sequence to be an ordered list of objects described over attributes (items). This sequence can be represented as ho1 , . . . , om i. This means that objects are sorted and each object oi has an order in the sequence. Definition 2. A sequence S = ho1 , . . . , op i is included in another sequence S’= ho′1 , . . . , o′m i, denoted by S ⊆ S ′ , if there exist integers 1 < i1 < i2 , . . ., < ip < m such that o1 = o′i1 , . . ., op = o′ip . 5 Example 2. Let S1 =ho1 , o4 , o6 i, S2 =ho2 , o1 , o3 , o4 , o5 , o6 i, and S3 =ho2 , o1 , o6 , o3 , o4 i be three sequences, we have S1 ⊆ S2 but S1 * S3 . Definition 3. Let S be a collection (i.e., a set) of sequences, S ∈ S is said to be maximal if ∄ S ′ ∈ S, S ′ = 6 S such that S ⊂ S ′ . Definition 4. The intersection of two sequences S1 and S2 is the set of all maximal subsequences of both S1 and S2 : S1 ∩ S2 = {si |si ⊆ S1 , si ⊆ S2 and ∄ si ⊂ s′i such that s′i ⊆ S1 and s′i ⊆ S2 }. Example 3. Let S1 ={ho1 , o2 , o4 , o7 i}, S2 ={ho2 , o5 , o4 , o6 , o8 , o7 i} be two sequences. S1 ∩ S2 = {ho2 , o4 , o7 i, ho1 , o4 , o7 i}. Definition 5. A set of sequences S is included in another set S ′ , denoted by S S ′ , if ∀ S in S, ∃ S ′ ∈ S ′ s.t, S ⊆ S ′ . Example 4. The set of sequences S1 = {ho5 , o6 , o7 i, ho2 , o4 , o7 i} is included in the set of sequences S2 = {ho5 , o6 , o8 , o7 i, ho1 , o2 , 04 , o7 i}. Based on the binary inclusion relation of sequences set defined in definition 5, the following proposition holds: Proposition 1. Let S be a set of maximal sequences. P(S) provided with the binary relation is a partially ordered set (or poset). Proof. The binary relation over the set P(S) is reflexive, antisymmetric, and transitive, i.e., for all S1 , S2 , and S3 in P(S), we have: – S1 S1 . (Reflexivity) – Let us consider S1 and S2 such that   S1 S2 (1) According to, respecS S (2) 2 1   ∀s1 ∈ S1 , ∃s2 ∈ S2 s.t.s1 ⊆ s2  tively, (1) and (2) we have ∀s2 ∈ S2 , ∃s1 ∈ S1 s.t.s2 ⊆ s1 As the maximality prevents from having two sequences S1 and S2 in the same set such that S1 ⊂ S2 or S2 ⊂ S1 , hence the antisymetry property holds.   S1 S2 (3) – Let us consider S1 and S2 such that According to (3) and  S S (4) 2 3   ∀s1 ∈ S1 , ∃s2 ∈ S2 s.t.s1 ⊆ s2 (5) (4) we have:  ∀s2 ∈ S2 , ∃s3 ∈ S3 s.t.s2 ⊆ s3 (6) According to (5) and (6) we have ∀ s1 ∈ S1 , ∃ s3 ∈ S3 s.t. s1 ⊆ s3 . Hence, we have S1 S3 and the transitivity property is satisfied.  6 3.2 Gradual Galois Connection In this paper, we propose a new definition of Galois connection taking into account the graduality aspect. Hence, we first define the notion of a gradual formal context. Definition 6. Gradual formal context A gradual formal context is defined as the quadruplet K= (O, I, Q, R) describing a set of objects O, a finite set I of attributes (or items), a finite set of quantities or values Q and a binary relation R (i.e., R ⊆ O × I). Each pair (o, iq ) ∈ R, means that the value of the attribute (item) i belonging to I in the object o belonging to O is q. Example 5. An example of a gradual formal context K= (O, I, Q, R) is sketched in Table 1. We have (o1 , Age22 , Salary 1200 ) ∈ R. o1 o2 o3 o4 Table 1. Age Salary Loan 22 1200 4 24 1850 2 30 2200 3 28 3400 1 Formal gradual context Let K= (O, I, Q, R) be a gradual formal context, we define bellow the two closure operators f and g: f : P(S) → P(I) f (S) ={i∗ | ∀ s ∈ S, ∀ol , ok ∈ s s.t. (ol , iq1 ), (ok , iq2 ) ∈ R and k < l we have q1 ∗ q2 } The mapping function f returns all gradual items with their respective variation respecting all sequences in S. g : P(I) → P(S) g(I) ={s ∈ S| s is maximal in S and ∀ol , ok ∈ s s.t. k < l and (ol , iq1 ), (ok , iq2 ) ∈ R, ∀i∗ ∈ I we have q1 ∗ q2 } The mapping function g returns the set of maximal sequences respecting the variations of all items in I. The two mapping functions g and f are respectively defined over the power set of I and the power set of sequences of S. Given that the intersection of a set of object sequences may result more than one sequence, we consider the power set of sequences. The function f is applied on a set of sequences whereas g is applied on a set of gradual attributes. On the one hand, f (S) returns the gradual itemset I, such that every gradual item i ∈ I is provided with the corresponding 7 variation ∗ through the sequences of S. On the other hand, g(I) looks for all sequences verifying the variation of each item in I. The set of gradual itemsets can be ordered by the standard inclusion binary relation ⊆. However, the set of sequences is ordered by the binary relation . Example 6. Let us consider the context illustrated in Figure 1. Thus, we have for example f (ho1 , o2 , o4 i, ho1 , o2 , o3 i)= {Age≥ Salary ≥ } and g({Age≥ Loan≤ })={h o1 , o2 , o3 i, h o1 , o3 i} Based on the definitions and propositions introduced above, we can now demonstrate that we have construct a Galois Connection framework for the gradual case. As stated above, this will allow us to mine for concise representations, thus reducing the size of the results presented to end-users. It should be noted that this reduction is of great importance as users are often drawn in resulting patterns. Proposition 2. For sets of sequences S and S ′ ∈ S, and sets of gradual itemsets I and I ′ the following properties hold: 1) S S ′ ⇒f (S’) ⊆ f (S) 2) S g(f (S)) 1’) I ⊆ I ′ ⇒g(I’) ⊆ g(I) 2’) I ⊆ f (g(I)) Proof. Each property is respectively proved as follows: 1) S S ′ means that S ′ = S ∪ {s′1 , . . ., s′p }. Then every gradual item i∗ belonging to f (S ′ ) holds on S ∪ {s′1 , . . ., s′p } and thus holds on S. Hence, f (S) includes all gradual items i∗ included in f (S ′ ), and may include some other ones. Therefore f (S) ⊆ f (S ′ ). 2) We have: – f (S) ={i∗ | ∀ s ∈ S, ∀ol , ok ∈ s s.t. (ol , iq1 ), (ok , iq2 ) ∈ R and k < l we have q1 ∗ q2 } – g(f (S))= {s ∈ S| s is maximal in S and ∀ol , ok ∈ s s.t. k < l and (ol , iq1 ), (ok , iq2 ) ∈ R, ∀i∗ ∈ f (S) we have q1 ∗ q2 } Obviously, if s ∈ S then s ∈ g(f (S)). Hence, S g(f (S)). 1’) For every s′ ∈ g(I ′ ) we have ∀ i′∗ ∈ I ′ , i′∗ has an uniform variation ∗ in s′ . In particular, for every i∗ ∈ I ′ as I ⊆ I ′ . Thus, i∗ has also the same uniform variation in S ′ . Therefore, s′ ∈ g(I) which means that g(I ′ ) g(I). 2’) We have: – g(I) ={s ∈ S| s is maximal in S and ∀ol , ok ∈ s s.t. k < l and (ol , iq1 ), (ok , iq2 ) ∈ R, ∀i∗ ∈ I we have q1 ∗ q2 } – f (g(I)) ={i∗ | ∀ s ∈ g(I), ∀ol , ok ∈ s s.t. (ol , iq1 ), (ok , iq2 ) ∈ R and k < l we have q1 ∗ q2 } Obviously, if i∗ ∈ I then i∗ ∈ f (g(I)). Hence, I ⊆ f (g(I)). Proposition 3. The composite operators f ◦g and g◦f form two closure operators, respectively defined on the sets of sequences and the set of itemsets. Proof. Considering both functions f and g as previously defined, the following properties hold: 8 – Monotonicity : I ⊆ I ′ ⇒ f ◦g(I) f ◦g(I ′ ) We have I ⊆ I ′ , property 1′ ) yields to g(I) g(I ′ ), and by property 1) we get f ◦g(I) ⊆ f ◦g(I ′ ). – Extensivity : I ⊆ f ◦g(I) Proved in property 2′ ). – Idempotency : f ◦g(I)= f ◦g(f ◦g(I)) According to property 2′ ) we have f ◦g(I) ⊆ f ◦g(f ◦g(I)) With S = g(I) and according to property 2) we have g(I) g◦f ◦g(I), and property 1) yields to f ◦g(I) ⊇ f ◦g◦f ◦g(I). Hence we can conclude that f ◦g(I)=f ◦g(f ◦g(I)). These properties are valid in a dual manner for closure operators f ◦g and g◦f . Indeed: – Monotonicity : S S ′ ⇒ g◦f (S) g◦f (S ′ ) We have S S ′ , property 1) yields to f (S ′ ) ⊆ f (SS), and by property 1′ ) we get g◦f (S) g◦f (S ′ ). – Extensivity : S g◦f (S) Proved in 2). – Idempotency : g◦f (S)= g◦f (g◦f (S)) According to property 2) we have S g◦f (S). With 1), it yields that f (S) ⊇ f ◦g◦f (S)and according to property 1′ ), g◦f (S) ⊇ g◦f ◦g◦f (S). With I = f (S) and according to property 2′ ) we have f (S) ⊆ f ◦g◦f (S), and property 1′ ) yields to g◦f (S) g◦f ◦g◦f (S). Equality follows from the fact that the closure operator, by definition of g, always gives a set where no sequence is a subsequence of another. The result of all these propositions define a new framework for gradual closed patterns which are a concise representation of gradual freaquent patterns. In fact, let us consider the definitions below: Definition 7. Gradual formal concept The pair (S, I), such that S ∈ S and I ∈ I, is called a gradual concept if f (S) = I and g(I) = S. O is called the extension and I the intension of the gradual concept. Definition 8. Gradual closed itemset Let us consider the formal context K = (O, I, Q, R), a gradual subset I ⊆ I, I is called gradual closed itemset if and only if it is equal to its closure, i.e., f ◦g(I)= I. Definition 9. Minimal gradual generator A gradual itemset h ⊆ I is called minimal gradual generator of another gradual closed itemset I if f ◦g(h)= I and does not exist h′ ⊆ I such that h′ ⊂ h. The set GGM of all gradual minimal generators of a gradual closed itemset I is defined as bellow: GGM= {h⊆ I|f ◦g(h)=I ∧ ∄ h′ ⊂ h tel que f ◦g(h′ )= I} The closure operator as defined above infers an equivalence relation over the power set of I (i.e., P(I)). Hence, P(I) is partitionned into several disjoined subsets called equivalence classes. In one classe all elements have the same support and the minimal gradual genarators are the smallest uncomparable elements 9 in the class. However, gradual closed itemsets are the largest elements in that class ([9]). Proposition 4. A set of gradual formal concepts GCK extracted from a formal context K, ordered using the set inclusion relation, form a complete lattice LK = (GCK , ⊆), called gradual Galois lattice. Example 7. Let us consider the gradual formal context given by Table 1, a part of its related gradual Galois lattice is depicted in Figure 1. Fig. 1. A part of the gradual Galois lattice related to the gradual formal context depicted in Table 1. Remark 1. When only considering gradual closed itemsets with the inclusion relation ⊆, the resulting structure only preserves the join operator which is a suborder of the gradual itemset lattice, thus often is much smaller. This structure is represented in bold in Figure 1. The gradual closed itemset lattice can be used as a formal framework for discovering gradual itemsets given the basic properties that the support of a gradual itemset is equal to the support of its closure. On the other hand, using the gradual itemset lattice, we can directly generate a reduced set of gradual rules without loss of information. 3.3 Implementation In this paper, we aim at presenting the importance of our approach in reducing the number of gradual closed patterns that can be extracted from a data set compared to the number of all gradual patterns. So, to reach this goal, we have 10 adopt a post-treatment method to extract equivalence classes (i.e., the set of gradual closed itemsets and their respective minimal gradual generators) from the gradual frequent set provided by the approach of Di Jorio and al. [6]. To do so, we group together frequent gradual itemsets by their support. Hence, we build the set of equivalence class groups GCE in which every group E may contain one or several equivalent class(es). This can be explained by the double variation {≤, ≥} taken into account in our extraction process. Indeed, we can find in the same group two (or more) gradual closed itemsets having the same items but with different variations. The algorithm 1 illustrates the equivalence classes extraction process. In this algorithm, we call the function RechGen that determines the set of minimal gradual generators for every maximal gradual pattern, C in E (corresponding to a closed gradual pattern). For every gradual closed pattern C belonging to a group E taken as inputs, this function returns the gradual generators set of C. The pseudo-code and the notations used in our approach are, respectively presented in algorithm 1 and table 2. The RechGen function is given by algorithm 2: IGF GCE IGC GGM : : : : Gradual frequent patterns. Set of equivalence classe groups. Gradual closed patterns. Minimal gradual generator of a graduel closed pattern. Table 2. Notations used in the algorithm Algorithm 1: Equivalence classes extraction algorithm Input: Gradual frequent patterns (IGF) and their respective support. Output: Gradual closed patterns IGC and their respective minimal gradual generators. begin Insert into GCE Select ∗ From IGF Group By support ; foreach (E ∈ GCE) do C = Max(E) while (C) 6= {∅} do E = E - {C} C.GGM= RechGen(C, E ) C = Max(E) S IGC = {C} end 4 Experiments This section validates the interest of our approach. In fact, the number of gradual closed patterns that can be extracted from a dataset is much smaller than the 11 Algorithm 2: RechGen. Input: A gradual closed pattern (C) and a group of gradual frequent patterns (E). Output: Minimal gradual generators set GGM of C. begin foreach (e ∈ E) do if e ⊂ C et ∄ e′ ⊂ e such that (e′ ∈ E) and (e′ ⊂ C) then GGM=GGM ∪ {e} Return GGM end 4096 Nombre de motifs graduels FREQUENTS Clos 2048 1024 512 256 200 300 400 500 600 700 800 900 1000 Nombre de motifs graduels (MinSupp=50%) total number of frequent gradual patterns. In the other hand, the set of gradual closed patterns and their respective gradual generators can be used to define an irreducible compact nuclei (i.e., generic basis) of gradual rules (this point will be discussed with more details in a further paper). To rate the importance of our approach, we ran experiments on synthetic datasets. These datasets were generated by a modified version of IBM Synthetic Data Generation Code for Associations and Sequential Patterns4 . Let us note that these datasets are very dense and, even for a high value of the minimal support, a huge number of gradual patterns can be extracted. In most of the cases, techniques allowing to obtain gradual knowledge are generally driven on bases containing a weak number of objects and attributes. In our experiments, we focus on the variation of gradual closed patterns with regard of frequent ones according to the minimal support (minsup) value, and the variation of the attributes and object number. The fig- 65536 16384 4096 1024 Nombre de lignes (20 Attributs) Fig. 2. Gradual closed vs. gradual frequent patterns with the variation of objects number. FREQUENTS Clos 256 64 16 15 20 25 Nombre d’attributs (100 lignes) Fig. 3. Gradual closed patterns vs gradual frequent patterns with the variation of attributes number ure 2 shows, for 20 attributes and a minsupp equal to 0, 5, the number of closed 4 www.almaden.ibm.com/software/projects/hdb/resources.shtml 30 12 Nombre de motifs graduels 4.1943e+006 1.04858e+006 FREQUENTS Clos 262144 65536 16384 4096 1024 256 Temps d’execution (en sec) patterns and frequent patterns according to the number of lines. The reported values depicted in this figure show that both closed and frequents gradual patterns are varying with the number of lines in a linear manner. We have to note that we have a logarithmic scalability and the difference is very large. However this number varies in an exponential manner with the number of attributes and with the minsup values as, respectively, shown in Figures 3 and 4 for a dataset of 100 lignes and 40 columns. 16384 8192 GENERATION DES FREQUENTS GENERATION DES Clos 4096 2048 1024 512 256 128 64 32 16 8 65 64 30 40 50 60 70 80 90 MinSupp (en %) Fig. 4. Number of gradual closed patterns vs frequents with minsup variation 70 75 80 85 MinSupp(en %) Fig. 5. Evolution of the computation time for discovering gradual closed and frequent patterns vs the variation of the minsup value. In this paper, we aim at showing the importance reduction of the number of the extracted closed patterns with comparison with frequents ones. As mentioned above, our approach is a post-traitement of [6]. So, the run times are thus a little longer as shown in Figure 5 5 Conclusion and future issues In this paper, we propose an approach for mining a concise representation of gradual patterns. Indeed, we proposed a novel closure system in order to extract gradual closed patterns. As it is expected, these gradual closed patterns will provide a high rate of compactness vs the total number of gradual patterns, allowing by the ease of their extraction as well as their manageability by the enduser. The dedicated algorithms and experiments show this fact. further works include the study of other optimizations in order to im- prove the efficiency of our algorithms. In the other hand, a compelling issue that we have to tackle is the definition of a cover of the gradual association rules. This task is of paramount issue since it allows to present to end-users only a manageable reduced quantity of gradual association rules. A theoretical connected task will be the proof of 90 95 13 the soundness and the completeness of a derivation mechanism from the cover of gradual association rules. References 1. F. Berzal, J. Cubero, D. Snchez, M. Vila, and J. Serrano. An alternative approach to discover gradual dependencies. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 15(5):559– 570, 2007. 2. S. Galichet, D. Dubois, and H. Prade. Imprecise specification of ill-known functions using gradual rules. International Journal of Approximate Reasoning, 35:205–222, 2004. 3. B. Ganter and R. Wille. Formal Concept Analysis. Springer-Verlag, 1999. 4. E. Hüllermeier. Association rules for expressing gradual dependencies. In Proceedings of the International Conference PKDD’2002, Helsinki, Finland, pages 200–211, August 19-23, 2002. 5. L. D. Jorio, A. Laurent, and M. Teisseire. Fast extraction of gradual association rules: a heuristic based method. In Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology (CSTST ’08), pages 205–210, Cergy-Pontoise, France, 2008. 6. L. D. Jorio, A. Laurent, and M. Teisseire. Mining frequent gradual itemsets from large databases. In Proceedings of the International Conference Intelligent Data Analysis (IDA’09), page to appear, Lyon-France, 2009. 7. A. Laurent, M.-J. Lesot, and M. Rifqi. Graank: Exploiting rank correlations for extracting gradual dependencies. In Proc. of FQAS’09, 2009. 8. C. Molina, J. Serrano, D. Snchez, and M. Vila. Measuring variation strength in gradual dependencies. In Proceedings of the International Conference EUSFLAT’2007, Ostrava, Czech Republic, pages 337–344, 2007. 9. S. B. Yahia and E. M. Nguifo. Approches déxtraction de règles dássociation basées sur la correspondance de galois. Ingénierie des Systèmes d’Information, 9(3-4):23– 55, 2004.

Log In

Mining Closed Gradual Patterns

Related papers

Related papers

Related topics