Knowledge Discovery in Databases (KDD) : An Overview
Knowledge Discovery in Databases (KDD) : An Overview
Knowledge Discovery in Databases (KDD) : An Overview
13 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 12, December 2017
data, the sample size and composition are determined during Data Mining Models
this stage.
A few of the many model functions being incorporated in
KDD include:
14 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 12, December 2017
Choosing a Data Mining Model recognizing patterns in data, a task that is exceeds human
ability as the size of data warehouses increase. New methods
There are no established guidelines to assist in choosing the of analysis and pattern extraction are being developed and
correct algorithm to apply to a dataset. Typically, the more adapted to KDD. Which method is used depends on the
complex models may fit the data better but may also be more domain and results expected. The accuracy of the recorded
difficult to understand and to fit reliably (Fayyad et al, 1995). data must not be overlooked during the KDD process. Domain
Successful applications often use simpler models due to the specific knowledge assists with the subjective analysis of
their ease of translation. Each technique tends to lend itself to KDD results. Much attention has been given to the data
a particular type problem. Understanding the domain will mining phase of KDD but earlier steps, such as data cleaning,
assist in determining what kind of information is needed from play a significant role in the validity of the results.
the discovery process thereby narrowing the field of choice.
Results can be broken into two general categories; prediction The potential benefits of discovery driven data mining
and description. Prediction, as the name infers, attempts to techniques in extracting valuable information from large
forecast the possible future values of data elements. Prediction complex databases are unlimited. Successful applications are
is being applied extensively in the area of finance in an surfacing in industries and areas were data retrieval is
attempt to forecast movement in the stock market. Description outpacing man's ability to effectively analyze its content.
seeks to discover interpretable patterns in the data. Fraud Users must be aware of the potential moral conflicts to using
detection is an application that uses description to identify sensitive information.
characteristics of potential fraudulent transactions.
Privacy Concerns and Knowledge Discovery (5) Simoudis, E. "Reality Check for Data Mining" in IEEE
Expert October 1996 pp. 26-33
Although not unique to Knowledge Discovery, sensitive
information is being collected and stored in these huge data (6) Hand, D. J. 1981 Discrimination and Classification.
warehouses. Concerns have been raised about what Chichester, U.K.: John Wiley and Sons
information should be protected from KDD-type access. The
ethical and moral issues of invasion of privacy are intrinsically (7) Fayyad, U.; Piatetsky-Shapiro, G; Smyth, P; "From Data
connected to pattern recognition. Safeguards are being Mining to Knowledge Discovery: An overview" in Advances
discussed to prevent misuses of the technology. in Knowledge discovery and Data Mining. Fayyad, U.;
Piatetsky-Shapiro, G; Smyth, P; Uthurusamy, R. MIT Press.
Summary Cambridge, Mass.. 1996 pp. 1-36
Knowledge Discovery in Databases is answering a need to (8) Fayyad, U.; Piatetsky-Shapiro, G; Smyth, P.; "The KKD
make use of the mountains of data that is accumulating daily. Process for Extracting Useful Knowledge from Volumes of
KDD enlists the power of computers to assist in the Data" in Communications of the ACM, November 1996/Vol
39, No.11 pp.27-34
15 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 12, December 2017
16 https://sites.google.com/site/ijcsis/
ISSN 1947-5500