Mining Closed Gradual Patterns
Sarra Ayouni1 , Anne Laurent2 Sadok Ben Yahia1 , and Pascal Poncelet2
1
1
Faculty of Sciences of Tunis, 1060, Campus Universitaire, Tunis, Tunisie,
[email protected],
[email protected]
2
LIRMM − CNRS, 161 rue Ada, Montpellier, France
[email protected],
[email protected]
Introduction
With the steady development of the computing tools, we attended last three
decades a considerable increase of the quantity of data stored in databases. So,
extracting knowledge from this data is of paramount importance. Data mining is
becoming an inescapable tool to reach this goal. Association rule extraction is one
of the important tasks in data mining. This powerful technique has a wide range
of applications in many areas of business practice and also research. A scrutiny
of the related work shows that another type of rules called gradual rules also paid
attention within the data mining community. Gradual rules of the form −“the
more A, the more B”− mainly grasped the interest within recommendation and
command system fields [2]. Several approaches and semantics dealing with this
kind of rules have been proposed in literature. However, the relevance and the
usefulness of the mined knowledge seems no to be the main concern in these
approaches. In fact, it is expected that an overwhelming quantity of gradual
rules will be drawn even from low sized contexts. The main thrust of this paper
is to address to lossless reduction of the mined knowledge. To reach this goal,
a possible solution consists in using results of Formal Concept Analysis that
has been shown to provide useful seeds to tackle such knowledge extraction
problem. However, no work has addressed the use of the FCA framework for
gradual patterns. Hence, we introduce a novel Galois connection that is a sine
qua non issue for extracting closed gradual patterns. These latter patterns will
act as a lossless reduced-size nucleus of patterns.
The remainder of the paper is organized as follows. Section 2 reviews the
related work focused on mining gradual rules and some basic notions of the
FCA framework. Section 3 introduces our novel Galois connection definition
and shows its validity and soundness. Section 4 validates the importance of our
approach at reducing the hudge number of the extracted gradual closed patterns
through experiments carried out over synthetic datasets. Section 5 sketches our
future perspectives and presents concluding remarks.
2
Related work
In this section, we present an overview of approaches dealing with gradual rules
and some basic notions related to Formal Concept Analysis.
2
2.1
Gradual rules
Gradual rules are applied on data sets with m attributes (X1 , . . ., Xm ) defined on
numeric domains dom(Xi ). A data set D is a set of rows (m-uplets) of dom(X1 )×,
. . ., × dom(Xm ). In this scope, a gradual item is defined as a pair of an attribute
and a variation ∗ ∈ {≤, ≥}. Let A be an attribute then the gradual item A≥
means that the attribute A is increasing. It can be interpreted by “the more
A”. A gradual itemset, or gradual tendency, is then defined as a non-empty set
list of several gradual items. For instance, the gradual itemset M =A≥ B ≤ is
interpreted as “The more A and the less B ”.
Example 1. (Salary)≥ is a gradual item meaning that the “Salary” is increasing.
((Age)≥ , (Salary)≥ ) is a gradual itemset.
A gradual rule, denoted by M ⇒ M ′ , is defined as a pair of gradual itemsets
on which a causality relationship is imposed. Different measures and semantics
have been proposed to extract and assess this kind of rules. In what follows, we
review the related works that focused on mining gradual rules.
Gradual dependencies were introduced in [4], where they are called tendency
rules and denoted by A ⇀t B. Hüllermeier proposed to perform a linear regression analysis on the contingency diagram depicted from the data set. The
validity of the rule is assessed on the basis of the regression coefficients α, β of
the line that approximates the points in the contingency diagram. The quality
of the regression is given by the R2 coefficient. A tendency rule contains one or
more attributes in the condition part and only one in the conclusion part. When
it contains several attributes in the condition part, the author proposes to use a
logical conjunction modeled by means of a so-called t-norm3 This method leads
to a high computational cost for the linear regression and the identification of
the relevant item combinations.
Another definition has been proposed in [1]. The semantic of a gradual dependence is quite different, since the authors only consider the variation fulfillment.
The authors define a gradual dependence as being similar to a functional dependence by considering the degrees variation between two objects. According
to [1], the gradual dependence A ⇒ B holds in a database D if ∀ o=(x, y) and
o′ =(x′ , y ′ ) ∈ D, A(x) < A(x′ ) implies B(y) < B(y ′ ).
A new definition of gradual dependence was proposed in [8] using fuzzy association rules. The authors take into account the variation strength in the degree
of fulfilment of an imprecise property by different objects. Hence, a gradual dependence holds in a database D if ∀ o=(x, y) and o′ =(x′ , y ′ ) ∈ D, v∗1 (A(x),
A(x′ )) implies v∗2 (B(y), B(y ′ )), where v∗ is a variation degree of an attribute
3
A triangular norm t-norm is a function ⊤ : [0, 1] × [0, 1] → [0, 1] verifying, ∀ x and
y ∈ [0, 1], these following properties:
⊤ is commutative ⊤ (x, y) = ⊤ (y, x),
⊤ is associative ⊤ (x, ⊤ (y, z)) = ⊤ (⊤ (x, y), z),
⊤ is increasing ⊤ (x, y) ≤ ⊤ (z, t) if x ≤ z and y ≤ t,
⊤ (x, 1) = x.
3
between two different objects. In both propositions [1] and [8], the authors propose to build a modified data set D′ that contains as many rows as there are
pairs of distinct objects in the initial data set D.
Another definition of support and confidence of a gradual itemset, as defined
∗
above, was proposed in [5]. In fact, the support of a gradual itemset A∗11 , . . . , App ,
is defined as the maximal number of rows {r1 , . . . , rl } for which there exists a
permutation π such that ∀ j ∈ [1, l − 1], ∀ k ∈ [1, p], it holds Ak (rπj ) ∗k
Ak (rπj+1 ). More formally, denoting L the set of all such sets of rows the support
of a gradual itemset is defined as follows.
∗
Definition 1. Let s=A∗11 , . . . , App be a gradual itemset, we have:
supp(s) =
maxLi ∈L |Li |
.
|D|
The authors propose a heuristic to compute this support for gradual itemsets,
in a level-wise process that considers itemsets of increasing lengths. Recently Di
Jorio et al. [6], considered the same definition considered the same definition that
was proposed within the conflict set based approach, and proposed an efficient
method based on the precedence graph. In this method, named Grite that
stands for GRadual ITemset Extraction, the data is represented through a graph
whose nodes are defined as the objects in the data, and the vertices express the
precedence relationships derived from the considered attributes.
In [7], the authors propose to calculate the support in a different way by
using the Kendall tau ranking correlation coefficient. This coefficient calculates
the number of n-uplets pairs that can be ordered in the database according to
the considered gradual pattern.
Unfortunately, in all surveyed approaches, reducing the quantity of mined
patterns was not the main concern. In the following subsection, we recall some
key settings from the FCA framework presenting some pioneering results towards
defining a concise representation of gradual patterns.
2.2
Formal concept analysis
Formal Concept Analysis (FCA) is based on the mathematical theory of complete lattices (for further information please refer to [3]). This theory provides
a powerful mathematical framework that has been used in many fields of computer science.Indeed, this theory has been employed in data mining to extract
a representative set of itemsets that can be used to extract association rule in
an efficient manner. To do so, a formel context must be defined as a triplet (R,
O, I) where O is a set of transactions or objects, I is a finite set of items and
R is a binary relation R ⊆ O × I. The first step to construct the lattice is to
define a Galois connection between two derivation operators: one mapping a set
of objects into a set of items and the other one mapping a set of items into a set
of objects.
For a set O ∈ O and I ∈ I the two mapping operators denoted by f and g
are defined, respectively, as follows:
4
– f : P(O) → P(I), f (O) ={i ∈ I—(o, i) ∈ R, ∀ o ∈ O }
– g : P(I) → P(O), g(I) ={o ∈ O—(o, i) ∈ R, ∀ i ∈ I }
These two mapping operators f and g induce a Galois connection between the
powerset of objects and the powerset of items. This Galois connection means that
f and g are dually adjoint, i.e., O ⊆ g(I) ⇒ I ⊆ f (O) for a set of objects O and
a set of items I. The two composite operators f ◦g and g ◦ f are closure operators
(i.e., they keep the properties of monotonicity, extensivity and idempotency).
A formal concept is a pair (O, I) of a set of objects O ∈ O and a set of items I
∈ I, where f (O)=I and g(I)=O. The set of all concepts that can be extracted
from a gradual formal context form a complete lattice provided with a partial
order relation ≤, such that ∀ c1 =(O1 , I1 ) and c2 =(O2 , I2 ) two concepts, if c1 ≤
c2 ⇔ I2 ⊂ I1 (O1 ⊂ O2 ).
3
Defining new Galois mapping operators for gradual
patterns
To the best of our knowledge, no previous study in the literature has paid attention to apply the Galois connection in gradual rules extraction problem. Given
that classical FCA was developed for binary relationships, adapting the former
results to the gradual case turns out to be an interesting formalization problem.
Within this work we aim at using FCA theory in order to formalize a new
closure system characterizing gradual data.
A gradual rule is defined as a special kind of association rule reflecting a
variation in the degree of membership of itemsets in a sequence of objects. A
gradual rule is formulated as “The more/less X, the more/less Y”, where X and
Y are gradual patterns. An example of a gradual rule is “The higher the Age, the
higher the Salary”. In order to satisfy the graduation property, we must consider
a set of object sequences. Indeed, in the classical FCA case the domain of an
itemset is a set of objects. In our case the domain of a gradual itemset is a set
of sequences satisfying this itemset. The set of sequences will be ordered by the
properly defined relation . In what follows, we propose to define the notion of
sequence and the related mathematical operations that can be applied over the
set of these sequences.
3.1
Handling object sequences
Let O=o1 , . . . , on be a set of objects. We consider a sequence to be an ordered list
of objects described over attributes (items). This sequence can be represented
as ho1 , . . . , om i. This means that objects are sorted and each object oi has an
order in the sequence.
Definition 2. A sequence S = ho1 , . . . , op i is included in another sequence S’=
ho′1 , . . . , o′m i, denoted by S ⊆ S ′ , if there exist integers 1 < i1 < i2 , . . ., < ip <
m such that o1 = o′i1 , . . ., op = o′ip .
5
Example 2. Let S1 =ho1 , o4 , o6 i, S2 =ho2 , o1 , o3 , o4 , o5 , o6 i, and S3 =ho2 , o1 , o6 , o3 , o4 i
be three sequences, we have S1 ⊆ S2 but S1 * S3 .
Definition 3. Let S be a collection (i.e., a set) of sequences, S ∈ S is said to
be maximal if ∄ S ′ ∈ S, S ′ =
6 S such that S ⊂ S ′ .
Definition 4. The intersection of two sequences S1 and S2 is the set of all
maximal subsequences of both S1 and S2 :
S1 ∩ S2 = {si |si ⊆ S1 , si ⊆ S2 and ∄ si ⊂ s′i such that s′i ⊆ S1 and s′i ⊆ S2 }.
Example 3. Let S1 ={ho1 , o2 , o4 , o7 i}, S2 ={ho2 , o5 , o4 , o6 , o8 , o7 i} be two sequences.
S1 ∩ S2 = {ho2 , o4 , o7 i, ho1 , o4 , o7 i}.
Definition 5. A set of sequences S is included in another set S ′ , denoted by
S S ′ , if ∀ S in S, ∃ S ′ ∈ S ′ s.t, S ⊆ S ′ .
Example 4. The set of sequences S1 = {ho5 , o6 , o7 i, ho2 , o4 , o7 i} is included in the
set of sequences S2 = {ho5 , o6 , o8 , o7 i, ho1 , o2 , 04 , o7 i}.
Based on the binary inclusion relation of sequences set defined in definition 5,
the following proposition holds:
Proposition 1. Let S be a set of maximal sequences. P(S) provided with the
binary relation is a partially ordered set (or poset).
Proof. The binary relation over the set P(S) is reflexive, antisymmetric, and
transitive, i.e., for all S1 , S2 , and S3 in P(S), we have:
– S1 S1 . (Reflexivity)
– Let us consider S1 and S2 such that
S1 S2 (1)
According to, respecS
S
(2)
2
1
∀s1 ∈ S1 , ∃s2 ∈ S2 s.t.s1 ⊆ s2
tively, (1) and (2) we have
∀s2 ∈ S2 , ∃s1 ∈ S1 s.t.s2 ⊆ s1
As the maximality prevents from having two sequences S1 and S2 in the
same set such that S1 ⊂ S2 or S2 ⊂ S1 , hence the antisymetry property
holds.
S1 S2 (3)
– Let us consider S1 and S2 such that
According to (3) and
S
S
(4)
2
3
∀s1 ∈ S1 , ∃s2 ∈ S2 s.t.s1 ⊆ s2 (5)
(4) we have:
∀s2 ∈ S2 , ∃s3 ∈ S3 s.t.s2 ⊆ s3 (6)
According to (5) and (6) we have ∀ s1 ∈ S1 , ∃ s3 ∈ S3 s.t. s1 ⊆ s3 . Hence,
we have S1 S3 and the transitivity property is satisfied.
6
3.2
Gradual Galois Connection
In this paper, we propose a new definition of Galois connection taking into
account the graduality aspect. Hence, we first define the notion of a gradual
formal context.
Definition 6. Gradual formal context A gradual formal context is defined
as the quadruplet K= (O, I, Q, R) describing a set of objects O, a finite set
I of attributes (or items), a finite set of quantities or values Q and a binary
relation R (i.e., R ⊆ O × I). Each pair (o, iq ) ∈ R, means that the value of the
attribute (item) i belonging to I in the object o belonging to O is q.
Example 5. An example of a gradual formal context K= (O, I, Q, R) is sketched
in Table 1. We have (o1 , Age22 , Salary 1200 ) ∈ R.
o1
o2
o3
o4
Table 1.
Age Salary Loan
22 1200
4
24 1850
2
30 2200
3
28 3400
1
Formal gradual context
Let K= (O, I, Q, R) be a gradual formal context, we define bellow the two
closure operators f and g:
f : P(S) → P(I)
f (S) ={i∗ | ∀ s ∈ S, ∀ol , ok ∈ s s.t. (ol , iq1 ), (ok , iq2 ) ∈ R and k < l we have q1
∗ q2 }
The mapping function f returns all gradual items with their respective variation
respecting all sequences in S.
g : P(I) → P(S)
g(I) ={s ∈ S| s is maximal in S and ∀ol , ok ∈ s s.t. k < l and (ol , iq1 ), (ok , iq2 )
∈ R, ∀i∗ ∈ I we have q1 ∗ q2 }
The mapping function g returns the set of maximal sequences respecting the
variations of all items in I.
The two mapping functions g and f are respectively defined over the power
set of I and the power set of sequences of S. Given that the intersection of a set
of object sequences may result more than one sequence, we consider the power
set of sequences. The function f is applied on a set of sequences whereas g is
applied on a set of gradual attributes. On the one hand, f (S) returns the gradual
itemset I, such that every gradual item i ∈ I is provided with the corresponding
7
variation ∗ through the sequences of S. On the other hand, g(I) looks for all
sequences verifying the variation of each item in I.
The set of gradual itemsets can be ordered by the standard inclusion binary
relation ⊆. However, the set of sequences is ordered by the binary relation .
Example 6. Let us consider the context illustrated in Figure 1. Thus, we have for
example f (ho1 , o2 , o4 i, ho1 , o2 , o3 i)= {Age≥ Salary ≥ } and g({Age≥ Loan≤ })={h
o1 , o2 , o3 i, h o1 , o3 i}
Based on the definitions and propositions introduced above, we can now demonstrate that we have construct a Galois Connection framework for the gradual
case. As stated above, this will allow us to mine for concise representations,
thus reducing the size of the results presented to end-users. It should be noted
that this reduction is of great importance as users are often drawn in resulting
patterns.
Proposition 2. For sets of sequences S and S ′ ∈ S, and sets of gradual itemsets
I and I ′ the following properties hold:
1) S S ′ ⇒f (S’) ⊆ f (S)
2) S g(f (S))
1’) I ⊆ I ′ ⇒g(I’) ⊆ g(I)
2’) I ⊆ f (g(I))
Proof. Each property is respectively proved as follows:
1) S S ′ means that S ′ = S ∪ {s′1 , . . ., s′p }. Then every gradual item i∗ belonging to f (S ′ ) holds on S ∪ {s′1 , . . ., s′p } and thus holds on S. Hence, f (S)
includes all gradual items i∗ included in f (S ′ ), and may include some other
ones. Therefore f (S) ⊆ f (S ′ ).
2) We have:
– f (S) ={i∗ | ∀ s ∈ S, ∀ol , ok ∈ s s.t. (ol , iq1 ), (ok , iq2 ) ∈ R and k < l we
have q1 ∗ q2 }
– g(f (S))= {s ∈ S| s is maximal in S and ∀ol , ok ∈ s s.t. k < l and (ol ,
iq1 ), (ok , iq2 ) ∈ R, ∀i∗ ∈ f (S) we have q1 ∗ q2 }
Obviously, if s ∈ S then s ∈ g(f (S)). Hence, S g(f (S)).
1’) For every s′ ∈ g(I ′ ) we have ∀ i′∗ ∈ I ′ , i′∗ has an uniform variation ∗ in s′ .
In particular, for every i∗ ∈ I ′ as I ⊆ I ′ . Thus, i∗ has also the same uniform
variation in S ′ . Therefore, s′ ∈ g(I) which means that g(I ′ ) g(I).
2’) We have:
– g(I) ={s ∈ S| s is maximal in S and ∀ol , ok ∈ s s.t. k < l and (ol , iq1 ),
(ok , iq2 ) ∈ R, ∀i∗ ∈ I we have q1 ∗ q2 }
– f (g(I)) ={i∗ | ∀ s ∈ g(I), ∀ol , ok ∈ s s.t. (ol , iq1 ), (ok , iq2 ) ∈ R and k <
l we have q1 ∗ q2 }
Obviously, if i∗ ∈ I then i∗ ∈ f (g(I)). Hence, I ⊆ f (g(I)).
Proposition 3. The composite operators f ◦g and g◦f form two closure operators, respectively defined on the sets of sequences and the set of itemsets.
Proof. Considering both functions f and g as previously defined, the following
properties hold:
8
– Monotonicity : I ⊆ I ′ ⇒ f ◦g(I) f ◦g(I ′ )
We have I ⊆ I ′ , property 1′ ) yields to g(I) g(I ′ ), and by property 1) we
get f ◦g(I) ⊆ f ◦g(I ′ ).
– Extensivity : I ⊆ f ◦g(I)
Proved in property 2′ ).
– Idempotency : f ◦g(I)= f ◦g(f ◦g(I))
According to property 2′ ) we have f ◦g(I) ⊆ f ◦g(f ◦g(I)) With S = g(I) and
according to property 2) we have g(I) g◦f ◦g(I), and property 1) yields
to f ◦g(I) ⊇ f ◦g◦f ◦g(I).
Hence we can conclude that f ◦g(I)=f ◦g(f ◦g(I)).
These properties are valid in a dual manner for closure operators f ◦g and g◦f .
Indeed:
– Monotonicity : S S ′ ⇒ g◦f (S) g◦f (S ′ )
We have S S ′ , property 1) yields to f (S ′ ) ⊆ f (SS), and by property 1′ )
we get g◦f (S) g◦f (S ′ ).
– Extensivity : S g◦f (S)
Proved in 2).
– Idempotency : g◦f (S)= g◦f (g◦f (S))
According to property 2) we have S g◦f (S). With 1), it yields that f (S)
⊇ f ◦g◦f (S)and according to property 1′ ), g◦f (S) ⊇ g◦f ◦g◦f (S). With I =
f (S) and according to property 2′ ) we have f (S) ⊆ f ◦g◦f (S), and property
1′ ) yields to g◦f (S) g◦f ◦g◦f (S). Equality follows from the fact that the
closure operator, by definition of g, always gives a set where no sequence is
a subsequence of another.
The result of all these propositions define a new framework for gradual closed
patterns which are a concise representation of gradual freaquent patterns. In
fact, let us consider the definitions below:
Definition 7. Gradual formal concept The pair (S, I), such that S ∈ S and
I ∈ I, is called a gradual concept if f (S) = I and g(I) = S. O is called the
extension and I the intension of the gradual concept.
Definition 8. Gradual closed itemset Let us consider the formal context K
= (O, I, Q, R), a gradual subset I ⊆ I, I is called gradual closed itemset if and
only if it is equal to its closure, i.e., f ◦g(I)= I.
Definition 9. Minimal gradual generator A gradual itemset h ⊆ I is called
minimal gradual generator of another gradual closed itemset I if f ◦g(h)= I and
does not exist h′ ⊆ I such that h′ ⊂ h. The set GGM of all gradual minimal
generators of a gradual closed itemset I is defined as bellow:
GGM= {h⊆ I|f ◦g(h)=I ∧ ∄ h′ ⊂ h tel que f ◦g(h′ )= I}
The closure operator as defined above infers an equivalence relation over the
power set of I (i.e., P(I)). Hence, P(I) is partitionned into several disjoined
subsets called equivalence classes. In one classe all elements have the same support and the minimal gradual genarators are the smallest uncomparable elements
9
in the class. However, gradual closed itemsets are the largest elements in that
class ([9]).
Proposition 4. A set of gradual formal concepts GCK extracted from a formal
context K, ordered using the set inclusion relation, form a complete lattice LK
= (GCK , ⊆), called gradual Galois lattice.
Example 7. Let us consider the gradual formal context given by Table 1, a part
of its related gradual Galois lattice is depicted in Figure 1.
Fig. 1. A part of the gradual Galois lattice related to the gradual formal context
depicted in Table 1.
Remark 1. When only considering gradual closed itemsets with the inclusion
relation ⊆, the resulting structure only preserves the join operator which is a suborder of the gradual itemset lattice, thus often is much smaller. This structure is
represented in bold in Figure 1. The gradual closed itemset lattice can be used as
a formal framework for discovering gradual itemsets given the basic properties
that the support of a gradual itemset is equal to the support of its closure. On
the other hand, using the gradual itemset lattice, we can directly generate a
reduced set of gradual rules without loss of information.
3.3
Implementation
In this paper, we aim at presenting the importance of our approach in reducing
the number of gradual closed patterns that can be extracted from a data set
compared to the number of all gradual patterns. So, to reach this goal, we have
10
adopt a post-treatment method to extract equivalence classes (i.e., the set of
gradual closed itemsets and their respective minimal gradual generators) from
the gradual frequent set provided by the approach of Di Jorio and al. [6]. To
do so, we group together frequent gradual itemsets by their support. Hence,
we build the set of equivalence class groups GCE in which every group E may
contain one or several equivalent class(es). This can be explained by the double
variation {≤, ≥} taken into account in our extraction process. Indeed, we can
find in the same group two (or more) gradual closed itemsets having the same
items but with different variations. The algorithm 1 illustrates the equivalence
classes extraction process. In this algorithm, we call the function RechGen that
determines the set of minimal gradual generators for every maximal gradual
pattern, C in E (corresponding to a closed gradual pattern). For every gradual
closed pattern C belonging to a group E taken as inputs, this function returns
the gradual generators set of C. The pseudo-code and the notations used in our
approach are, respectively presented in algorithm 1 and table 2. The RechGen
function is given by algorithm 2:
IGF
GCE
IGC
GGM
:
:
:
:
Gradual frequent patterns.
Set of equivalence classe groups.
Gradual closed patterns.
Minimal gradual generator of a graduel closed pattern.
Table 2. Notations used in the algorithm
Algorithm 1: Equivalence classes extraction algorithm
Input: Gradual frequent patterns (IGF) and their respective support.
Output: Gradual closed patterns IGC and their respective minimal gradual
generators.
begin
Insert into GCE Select ∗ From IGF Group By support ;
foreach (E ∈ GCE) do
C = Max(E) while (C) 6= {∅} do
E = E - {C}
C.GGM= RechGen(C, E )
C = Max(E)
S
IGC =
{C}
end
4
Experiments
This section validates the interest of our approach. In fact, the number of gradual
closed patterns that can be extracted from a dataset is much smaller than the
11
Algorithm 2: RechGen.
Input: A gradual closed pattern (C) and a group of gradual frequent patterns
(E).
Output: Minimal gradual generators set GGM of C.
begin
foreach (e ∈ E) do
if e ⊂ C et ∄ e′ ⊂ e such that (e′ ∈ E) and (e′ ⊂ C) then
GGM=GGM ∪ {e}
Return GGM
end
4096
Nombre de motifs graduels
FREQUENTS
Clos
2048
1024
512
256
200 300 400 500 600 700 800 900 1000
Nombre de motifs graduels (MinSupp=50%)
total number of frequent gradual patterns. In the other hand, the set of gradual
closed patterns and their respective gradual generators can be used to define an
irreducible compact nuclei (i.e., generic basis) of gradual rules (this point will
be discussed with more details in a further paper). To rate the importance of
our approach, we ran experiments on synthetic datasets. These datasets were
generated by a modified version of IBM Synthetic Data Generation Code for
Associations and Sequential Patterns4 . Let us note that these datasets are very
dense and, even for a high value of the minimal support, a huge number of gradual patterns can be extracted. In most of the cases, techniques allowing to obtain
gradual knowledge are generally driven on bases containing a weak number of
objects and attributes. In our experiments, we focus on the variation of gradual
closed patterns with regard of frequent ones according to the minimal support
(minsup) value, and the variation of the attributes and object number. The fig-
65536
16384
4096
1024
Nombre de lignes (20 Attributs)
Fig. 2. Gradual closed vs.
gradual frequent patterns
with the variation of objects number.
FREQUENTS
Clos
256
64
16
15
20
25
Nombre d’attributs (100 lignes)
Fig. 3. Gradual closed
patterns vs gradual frequent patterns with the
variation of attributes
number
ure 2 shows, for 20 attributes and a minsupp equal to 0, 5, the number of closed
4
www.almaden.ibm.com/software/projects/hdb/resources.shtml
30
12
Nombre de motifs graduels
4.1943e+006
1.04858e+006
FREQUENTS
Clos
262144
65536
16384
4096
1024
256
Temps d’execution (en sec)
patterns and frequent patterns according to the number of lines. The reported
values depicted in this figure show that both closed and frequents gradual patterns are varying with the number of lines in a linear manner. We have to note
that we have a logarithmic scalability and the difference is very large. However
this number varies in an exponential manner with the number of attributes and
with the minsup values as, respectively, shown in Figures 3 and 4 for a dataset
of 100 lignes and 40 columns.
16384
8192 GENERATION DES FREQUENTS
GENERATION DES Clos
4096
2048
1024
512
256
128
64
32
16
8
65
64
30 40 50 60 70 80 90
MinSupp (en %)
Fig. 4. Number of gradual closed patterns vs frequents with minsup variation
70 75 80 85
MinSupp(en %)
Fig. 5. Evolution of the
computation time for discovering gradual closed
and frequent patterns vs
the variation of the minsup value.
In this paper, we aim at showing the importance reduction of the number of
the extracted closed patterns with comparison with frequents ones. As mentioned
above, our approach is a post-traitement of [6]. So, the run times are thus a little
longer as shown in Figure 5
5
Conclusion and future issues
In this paper, we propose an approach for mining a concise representation of
gradual patterns. Indeed, we proposed a novel closure system in order to extract
gradual closed patterns. As it is expected, these gradual closed patterns will
provide a high rate of compactness vs the total number of gradual patterns,
allowing by the ease of their extraction as well as their manageability by the enduser. The dedicated algorithms and experiments show this fact. further works
include the study of other optimizations in order to im- prove the efficiency of
our algorithms. In the other hand, a compelling issue that we have to tackle is the
definition of a cover of the gradual association rules. This task is of paramount
issue since it allows to present to end-users only a manageable reduced quantity
of gradual association rules. A theoretical connected task will be the proof of
90
95
13
the soundness and the completeness of a derivation mechanism from the cover
of gradual association rules.
References
1. F. Berzal, J. Cubero, D. Snchez, M. Vila, and J. Serrano. An alternative approach
to discover gradual dependencies. International Journal of Uncertainty, Fuzziness
and Knowledge-Based Systems, 15(5):559– 570, 2007.
2. S. Galichet, D. Dubois, and H. Prade. Imprecise specification of ill-known functions
using gradual rules. International Journal of Approximate Reasoning, 35:205–222,
2004.
3. B. Ganter and R. Wille. Formal Concept Analysis. Springer-Verlag, 1999.
4. E. Hüllermeier. Association rules for expressing gradual dependencies. In Proceedings of the International Conference PKDD’2002, Helsinki, Finland, pages 200–211,
August 19-23, 2002.
5. L. D. Jorio, A. Laurent, and M. Teisseire. Fast extraction of gradual association
rules: a heuristic based method. In Proceedings of the 5th international conference
on Soft computing as transdisciplinary science and technology (CSTST ’08), pages
205–210, Cergy-Pontoise, France, 2008.
6. L. D. Jorio, A. Laurent, and M. Teisseire. Mining frequent gradual itemsets from
large databases. In Proceedings of the International Conference Intelligent Data
Analysis (IDA’09), page to appear, Lyon-France, 2009.
7. A. Laurent, M.-J. Lesot, and M. Rifqi. Graank: Exploiting rank correlations for
extracting gradual dependencies. In Proc. of FQAS’09, 2009.
8. C. Molina, J. Serrano, D. Snchez, and M. Vila. Measuring variation strength in gradual dependencies. In Proceedings of the International Conference EUSFLAT’2007,
Ostrava, Czech Republic, pages 337–344, 2007.
9. S. B. Yahia and E. M. Nguifo. Approches déxtraction de règles dássociation basées
sur la correspondance de galois. Ingénierie des Systèmes d’Information, 9(3-4):23–
55, 2004.