An Intelligent Approach of Rough Set in Knowledge Discovery Databases
An Intelligent Approach of Rough Set in Knowledge Discovery Databases
An Intelligent Approach of Rough Set in Knowledge Discovery Databases
I. INTRODUCTION
215
World Academy of Science, Engineering and Technology 35 2007
process, like attribute selection, attribute extraction, data • Developing an understanding of the application
reduction, decision rule generation and pattern extraction domain, the relevant prior knowledge, and the goal(s)
(templates, assrociation rules) (Komorowski et al., 1999). of the end user.
Furthermore, recent extensions of rough set theory (rough • Creating or selecting a target data set.
mereology) have brought new methods of decomposition of • Data cleaning and preprocessing: this step includes,
large data sets, data mining in distributed and multi-agent among other tasks, removing noise or accounting for
based environments and granular computing (Polkowski and noise, and imputation of missing values.
Skowron, 1996; Polkowski and Skowron, 1999; Yao and • Data reduction: Finding useful features to represent
Zhong, 1999; Zhong et al., 1999) [3]. the data depending on the goal of the task. This may
It includes mechanisms for defining partial memberships of include dimensionality reduction or transformation.
sets, but does not introduce additional measures of • Matching the goals to a particular data mining
probabilities or degrees of membership. The basic assumption method such as classification, regression, clustering
is that there is some information (data) associated with each etc. Model and hypothesis selection, choosing the
object in the universe of discourse. Based on this information, data mining algorithm(s) and methods to be used for
it is possible to tell some of the objects apart, while others are searching for data patterns.
impossible to distinguish. The latter objects are indiscernible • Data mining.
from each other, and form a set. Each set of indiscernible • Interpreting mined patterns.
objects is a knowledge granule (atom), and they form the • Acting on discovered knowledge.
building blocks for knowledge about the universe. The rough
set community has been a very active research community III. ROUGH SET ANALYSIS IN KDD
since its inception in the eighties, and a large number of rough The most common representation of initial knowledge in
set methods for knowledge discovery and data mining have rough set theory is in a tabular form, similar to a relational
been developed. The entire knowledge discovery process has
table. The column in the table represents attributes, and each
been subject to research, and a wide range of contributions has
row represents an object. There are two different kinds of
been made. Data mining technology provides a new thought
knowledge representations, namely information systems and
for organizing and managing tremendous data. Rough set
theory is one of the important methods for knowledge decision systems.
discovery. This method can analyze intactly data, obtain A. Information Systems
uncertain knowledge and offer an effective tool by reasoning.
An information system is the most basic kind of knowledge.
The main feature of rough set data analysis is non-invasive,
It consists of a set of tuples, where each tuple is a collection of
and the ability to handle qualitative data. This fits into most
attribute values. Rough Set Analysis in KDD is based on the
real life application nicely [4].
viewpoint that objects are known up to their description by
attribute vectors: An information system Ι consists of a set U
II. PROCESS BEHIND KDD
of objects, and a set Ω of attributes; the latter are functions
Knowledge discovery in databases (KDD) is defined as the a:U→Va which assign to each object x a value a(x) in the set
nontrivial process of identifying valid, novel, potentially Va of values which x can take under a .
useful, and ultimately understandable patterns in data [8], If θ ≠ Q ⊆ Ω we denote the feature vector of x with respect
[14]. Data is a set of facts F, and a pattern is an expression E →Q
in a language L describing the facts in a subset FE of F. E is to the attributes in Q by x . This operationalisation by
called a pattern if it is simpler than the enumeration of all facts Object → Attribute data tables assumes the “nominal scale
in FE. A measure of certainty, measuring the validity of restriction” which postulates that each object has exactly one
discovered patterns, is a function C mapping expressions in L value of each attribute at a given time, and that the
to a partially or totally ordered measure space MC. An observation of this value is without error. Data reduction is a
expression E in L about a subset FE ⊂ F can be assigned a major feature of rough set analysis. Each Q ⊆ Ω determines
certainty measure c = C(E,F). Novelty of patterns can be an equivalence relation θQ on U by setting,
measured by a function N(E,F) with respect to changes in data x ≡θQ y ⇔ (∀a∈ Q) a(x) = a(y)
or knowledge. Patterns should potentially lead to some useful The finest equivalence obtained in this way is θΩ. If Q ⊆ Ω
actions, as measured by some utility function u = U(E,F) and θQ =θΩ, then the attributes in Q are sufficient to describe
mapping expressions in L to a partially or totally ordered the classification induced by Ω, and thus, one can project Ω to
measure space MU. The goal of KDD is to make patterns Q. Note that only information by the data is used for attribute
understandable to humans. This is measured by a function s =
reduction. A set Q of attributes, which is minimal with respect
S(E,F) mapping expressions E in L to a partially or totally
ordered measure space MS. to above equation, is called reduct of Ι [6].
According to the widely accepted description of [5], the B. Decision Systems
(iterative) process of knowledge discovery in databases A decision system is similar to an information system, but a
(KDD) consists of the following steps: distinction is made between condition and decision attributes.
216
World Academy of Science, Engineering and Technology 35 2007
In an information system, the information is not interpreted. unambiguously classified into X. A set is said to be rough
However, an expert may classify the different objects (respectively crisp) if the boundary region is non-empty
according to some semantic criteria, thus assigning an expert (respectively empty). Consequently each rough set has
classification attribute to each object. Adding a decision boundary-line cases, i.e., objects, which cannot be with
attribute d to an information system creates a decision system, certainty classified neither as members of the set nor of its
where the attributes A form the condition attributes. Using a complement. Obviously crisp sets have no boundary-line
single decision attribute can be done without any loss of elements at all. That means that boundary-line cases cannot be
generality, as it is possible to represent any k-size attribute set properly classified by employing the available knowledge.
D by a single decision attribute d. Any combination of the The size of the boundary region can be used as a measure of
values for the decision attributes in D may be represented the quality of set approximation (in U). It can be easily seen
(coded) by a distinct value for d. Hence, it is sound to assume that the lower and upper approximations of a set are,
that D ={d}. A decision system (DS) A = (U,A,{d}) is an respectively, the interior and the closure of this set in the
information system for which the attributes are separated into topology generated by the indiscernibility relation [12].
disjoint sets of condition attributes A and a decision attributes A. Accuracy of Approximation
d (A ∪ {d}=φ). Now, it should be apparent that from any
A rough set X can be characterized numerically by the
given DS A = (U,A,{d}), it is possible to construct an following coefficient,
information system by simply removing the decision attribute
B( X )
d from the system, giving us an information system A’= (U, αB(X) = ,
A). In the same manner as a decision system is a specialized B( X )
kind of information system, decision rules are a special kind
of pattern. A decision rule represents a probabilistic called the accuracy of approximation, where X denotes the
relationship between a set of conditions and a decision. Given cardinality of X ≠ φ and B is a set of attributes. Obviously 0 ≤
a decision system A = (U,A,{d}), let α denote a pattern that αB(X) ≤ 1. If αB(X) = 1, X is crisp with respect to B (X is exact
only involves attributes in A. Let β denote a descriptor d = v, with respect to B), and otherwise, if αB(X) < 1, X is rough with
where v ∈ Vd. The decision rule is then read as “if α then β”, respect to B (X is vague with respect to B).
and is denoted α→β. α is called the rule’s antecedent, and β
the rule’s consequent [4]. B. Rough Membership Function
In practice however, generating a decision rule from a In classical set theory either an element belongs to a set or it
reduct or a reduct-equivalent means overlaying the attributes does not. The corresponding membership function is the
in the reduct over an object x, and reading off the values of characteristic function of the set, i.e., the function takes values
a(x) for every a ∈ reduct. This means that the decision rules 1 and 0, respectively. In the case of rough sets the notion of
will always be conjunctions of descriptors (or a single membership is different. The rough membership function
descriptor, in the event that the reduct consists of a single quantifies the degree of relative overlap between the set X and
attribute). Rules of this type are said to represent positive the equivalence class to which x belongs. It is defined as
knowledge, defined as follows: follows:
Given a DS A = (U,A,{d}). The decision rule α → β is said [ x ]B ∩ X
to be a positive decision rule if α is a conjunction of μ xB ( x) : U → [0,1] and μ xB ( x) = .
descriptors that only involve attributes in A.
[ x ]B
The rough membership function can be interpreted as a
IV. UPPER AND LOWER APPROXIMATION frequency-based estimate of Pr( ( y ∈ X u ) , the conditional
Let A = (U,A) be an information system and let B ⊆ A and probability that object y belongs to set X, given the
X ⊆ U. We can approximate X using only the information information signature u = InfB(x) of object x with respect to
contained in B by constructing the B-lower and B-upper
attributes B. The value μ XB ( x) measures degree of inclusion
approximations of X, denoted BX and BX respectively,
of { y ∈ U : Inf B ( x) = Inf B ( y )} in X.
where BX = {x : [x]B ⊆ X} and BX = {x : [x]B ∪ X ≠ φ}.
The lower approximation corresponds to certain rules while V. COMPUTATIONAL ASPECTS OF ROUGH SET ON KDD
the upper approximation to possible rules (rules with
confidence greater than 0) for X. The B-lower approximation In the literature [7], there has long been a lack of time
of X is the set of all objects, which can be with certainty complexity analysis of algorithms for frequently used rough
set operations. Time complexities of constructing an
classified to X using attributes from B. The set U − BX is equivalence relation are shown to be O(lm2), where l and m
called the B-outside region of X and consists of those objects, are number of attributes and objects, respectively [8]. This
which can be with certainty classified as not belonging to X
result corresponds to the analysis of an algorithm, reported in
using attributes from B. The set BNB(X) = BX − BX is [9], where the goal is to obtain the equivalence relation
called the B-boundary region of X thus consisting of those according to the values of a single attribute. For a given
objects that on the basis of the attributes from B cannot be
217
World Academy of Science, Engineering and Technology 35 2007
functional dependency X⇒Y that holds in an information table Hrudaya Ku. Tripatthy is presently working in the Department of Computer
Science & Engineering at Institute of Advanced Computer & Research. M.
S, we say that x ∈ X is superfluous (or non-significant) Tech in Computer Science & Engineering from Indian Institute of Technology
attribute for Y in S if and only if, X-{x}⇒Y still holds in S. A Guwahati on the year 2006. He is continuing PhD in Computer Science under
reduct of X for Y in S is a subset P of X such that P does not Berhampur University. He has 10 years of teaching experience in both
Graduate & under graduate technical degree course.
contain any superfluous attribute. If we have a metric to
measure the degree of dependency, then we have a way to
explore a reduct of X, with a degree of θ, where 0 ≤ θ ≤ 1
[10]. It is shown that finding a reduct of X for Y in S is
computationally bounded by l2m2 where l and m is a length of
X and the number of objects in S respectively. The time
complexity to find all reducts of X is O(2lJ), where J is the
computational cost for finding one reduct, and l is the number
of attributes in X [11].
VI. CONCLUSION
In the paper, basic concepts of data mining/KDD and the
rough set theory were discussed. Rough Set Theory has been
widely used in KDD since it was put forward. Having
important functions in the expression, study, conclusion and
etc. of the uncertain knowledge, it is a powerful tool, which
sets up the intelligent decision system. The main focus is to
show how rough set techniques can be employed as an
approach to the problem of data mining and knowledge
extraction.
REFERENCES
[1] Ryszard S. Michalski and Kenneth A. Kaufman, “Data Mining and
Knowledge Discovery: A Review of Issues and a Multistrategy
Approach”, Machine Learning and Data Mining, Methods and
Applications, 1997.
[2] J.N.Kok and W.A.Kosters, “Natural Data Mining Techniques”,
European Association for Theoretical Computer Science, Vol. 71, June
2000, pp.133-142.
[3] Ning ZHONG, Andrzej SKOWRON, “A Rough Set-Based Knowledge
Discovery Process”, International Journal of Applied Mathematical
Computer Science, 2001, Vol.11, No.3, pp.603-619.
[4] Terje Løken, “Rough Modeling Extracting Compact Models from Large
Databases”, Knowledge Systems Group IDI, A thesis submitted to
Norwegian University of Science and Technology, 1999.
[5] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to
knowledge discovery in databases”, Artificial Intelligence Magazine 17
(1996), pp.37–54.
[6] Ivo Düntsch, Günther Gediga, Hung Son Nguyen, “Rough set data
analysis in the KDD process”, published in the Proceedings of IPMU
2000, pp. 220-226.
[7] Hayri Sever, “The Status of Research on Rough Sets for Knowledge
Discovery in Databases”, www. cuadra.cr.usgs.gov/pubs/srj98.pdf
[8] Deogun.J,Choubey.S,Raghavan.V and Sever.H, “Feature selection and
effective classifiers”, Journal of ASIS 49, 5 (1998), pp.423–434.
[9] Bell.D, and Guan.J, “Computational methods for rough classification
and discovery”, Journal of ASIS 49, 5 (1998), pp.403–414.
[10] Deogun.J.S, Raghavan.V.V, and Sever.H, “Exploiting upper
approximations in the rough set methodology”, In The First
International Conference on Knowledge Discovery and Data Mining
(Montreal, Quebec, Canada, aug 1995), U. Fayyad and R. Uthurusamy,
Eds., pp.69–74.
[11] Kent.R. E, “Rough concept analysis”, In Proceedings of the
International Workshop on Rough Sets and Knowledge Discovery
(Banff, Alberta, Canada, 1993), pp.245–253.
[12] Andrzej Skowron, “Rough Sets in KDD”, Institute of Mathematics
Warsaw University Banacha 2, 02{095, Warsaw, Poland.
218