Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
By
Praveen Ranjan Srivastava
Lecturer (Computer Science)
Banasthali Vidyapith (Deemed University)
Banasthali-304002 (Rajasthan)
Phone : 01438-228787
E-mail : [email protected] / [email protected]
ABSTRACT:
Data mining is the process of extracting knowledge from large amount of databases
whether database is relational, temporal, spatial, multimedia etc. Data mining is very
popular technology some companies like IBM, ORACLE, and TCS are working on data
mining. Data mining engine having different kinds of approach like classification, clustering,
association, outliers analysis. Each approach has a different technology like classification
based on decision tree induction whether association based on interesting pattern
generation. Some people are saying data mining is the part of knowledge discovery in
databases (KDD). We can say Data mining tools perform data analysis and may uncover
important data patterns. This paper has a full attention on association rule. Association rule
is very important tool for mining process it has two special chartersticis first one is support
and another is confidence. Support gives total number of transaction of any particular item
are occurring in datasets while confidence gives strength of a data in a dataset, we can say
support is probability of A and B while confidence is conditional probability. Association rule
based on these two characteristics. Different algorithms support association rule but this
paper show only two very popular approach Apriori and FP-tree method. Both approach
gives frequent patterns, and candidate generation; frequent pattern means those item sets
that satisfy minimum support value. We can say in one-word association rule gives an
"interesting pattern"; this paper give and tries how association rules generate interesting
patterns form a huge amount of databases.
INTRODUCTION:
The major reason that data mining has attracted a great deal of attention in the information
industry in recent years is due to the wide availability of huge amount of data and the
imminent need for turning such data into useful information and knowledge. The
information and knowledge gained can be used for application ranging from business
management, production control, and market analysis, to engineering design and science
exploration. Data mining can be viewed as a result of the natural evolution of information
technology. Data mining tools can answer business questions that traditionally were too
time consuming to resolve. They scour databases for hidden patterns, finding predictive
information that experts may miss because it lies outside their expectations. Data mining
used different types of engine such as classification, clustering, association, and
outliers .In classification class label is known whether in clustering class label is unknown,
association gives to interesting pattern of the data while outliers give fraud detections.
3. Data Selection: Where data relevant to the analysis task are retrieved from the
database.
5. Data mining: An essential process where intelligent methods are applied in order to
extract data pattern.
Architecture of a data mining system: The Typical data mining architecture may have
following major component.
The data mining components are
Support
Confidence
Where Support_ count (AUB) is the number of transaction containing the item sets AU B,
and Support_ count (A) is the number of transactions containing the item set A.
"How are association rules mined from large databases?" Association rule mining is a two
–step process:
1. Find all frequent item sets: By definition, each of these item sets will occur at
least as frequently as a predetermined minimum support count.
2. Generate strong association rules from the frequent item sets: By
definition, these rules must satisfy minimum support and minimum support and
minimum confidence.
Apriori is an influential algorithm for mining frequent item sets. The name of the algorithms
is based on the fact that the algorithm uses prior knowledge of frequent item sets
properties. Apriori employs an iterative approach known as a level-wise search.
To improve the efficiency of the level-wise generation of frequent item sets, an important
property called the apriori property, i.e." all nonempty subsets of a frequent item sets
must also be frequent."
procedure AprioriAlg()
begin
L1 := {frequent 1-itemsets};
for ( k := 2; Lk-1 0; k++ ) do {
Ck= apriori-gen(Lk-1) ; // new candidates
for all transactions t in the dataset do {
for all candidates c Ck contained in t do
c:count++
}
Lk = { c Ck | c:count >= min-support}
}
Answer := k Lk
end
It makes multiple passes over the database. In the first pass, the algorithm simply counts
item occurrences to determine the frequent 1-itemsets (itemsets with 1 item). A subsequent
pass, say pass k, consists of two phases. First, the frequent itemsets Lk-1 (the set of all
frequent (k-1)-itemsets) found in the (k-1)th pass are used to generate the candidate
itemsets Ck, using the apriori-gen() function. This function first joins Lk-1 with Lk-1, the joining
condition being that the lexicographically ordered first k-2 items are the same. Next, it
deletes all those itemsets from the join result that have some (k-1)-subset that is not in Lk-1
yielding Ck.
The algorithm now scans the database. For each transaction, it determines which of the
candidates in Ck are contained in the transaction using a hash-tree data structure and
increments the count of those candidates. At the end of the pass, Ck is examined to
determine which of the candidates are frequent, yielding Lk . The algorithm terminates when
Lk becomes empty.
1. It is costly to handle large numbers of candidate sets. For instance, 104 frequent
1-itemsets, then approximately, 107 candidate 2-itemsets are generated.
2. It is tedious to repeatedly scan the database and check a large set of candidates
by pattern matching.
Keeping this in mind, a new class of algorithms has recently been proposed which avoids
the generation of large numbers of candidate sets. We describe one such method, called the
FP-tree growth algorithm. It is proposed by Han et al. The main idea of the algorithm is to
maintain a frequent pattern tree of the databases.
A frequent pattern tree (or FP-tree) is a tree structure consisting of an item-prefix-tree and
a frequent item-header table.
Item- prefix-tree:
* Item name
* Support count
* Node link.
* Item name
* Head of node link which points to the first node in the FP-tree
Association rules should not be used directly for prediction without further analysis or
domain knowledge. They do not necessarily indicate causation. They are however a helpful
starting point for further exploration, making them a popular tool for understanding data.
CONCLUSION:
The discovery of association relationship among huge amount of data is useful in selective
marketing, decision analysis, and business management. A popular area of application is
market basket analysis, which studies the buying habits of customers by searching for
sets of item that are frequently purchased together. Association rule mining consists of
first finding frequent item sets, from which strong association rules in the form of A B are
generated. Association rule is the important tool for data mining engine. It is very popular
technology now days. This paper have used only two approaches for association rule for
mining the process named apriori and fp-tree. These two algorithms are very popular for
mining rule. Last but not least we can say association rule gives interesting pattern to our
customers.
References:
* http://www.kdnuggets.com/
* http://www.dmg.org/
* http://www.almaden.ibm.com/software/quest/
* http://www.data-mine.com/bin/site/templates/splash.asp
* http://www.thearling.com/
* www.indianmba.com