Knowledge Discovery Process
Knowledge Discovery Process
Knowledge Discovery Process
1
Session 1 1
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
SESSION 2
Knowledge Discovery
Process
2
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
3
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Data Cleaning
Data cleaning is defined as removal of noisy and
irrelevant data from collection.
Cleaning in case of Missing values.
Cleaning noisy data, where noise is a random or
variance error.
Cleaning with Data discrepancy
detection and Data transformation tools.
4
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Data Integration
Data integration is defined as heterogeneous
data from multiple sources combined in a
common source(DataWarehouse).
5
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Data Selection
Data selection is defined as the process where
data relevant to the analysis is decided and
retrieved from the data collection.
6
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Data Transformation
Data Transformation is defined as the process of
transforming data into appropriate form required
by mining procedure. Data Transformation is a two
step process:
Data Mapping: Assigning elements from source
base to destination to capture transformations.
Code generation: Creation of the actual
transformation program.
7
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Data Mining
Data mining is defined as techniques that are
applied to extract patterns potentially useful. It
transforms task relevant data into patterns,
and decides purpose of model
using classification or characterization.
8
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Pattern Evaluation
Pattern Evaluation is defined as
identifying strictly increasing patterns
representing knowledge based on given
measures.
It find interestingness score of each
pattern, and
uses summarization and Visualization
to make data understandable by user.
9
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Knowledge Representation
This involves presenting the results in a way
that is meaningful and can be used to make
decisions.
10
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
11
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
12
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Advantages of KDD
Improves decision-making: KDD provides valuable insights and
knowledge that can help organizations make better decisions.
Increased efficiency: KDD automates repetitive and time-
consuming tasks and makes the data ready for analysis, which
saves time and money.
Better customer service: KDD helps organizations gain a better
understanding of their customers’ needs and preferences, which
can help them provide better customer service.
Fraud detection: KDD can be used to detect fraudulent activities
by identifying patterns and anomalies in the data that may indicate
fraud.
Predictive modeling: KDD can be used to build predictive models
that can forecast future trends and patterns.
13
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
Disadvantages of KDD
Privacy concerns: KDD can raise privacy concerns as it involves collecting
and analyzing large amounts of data, which can include sensitive information
about individuals.
Complexity: KDD can be a complex process that requires specialized skills
and knowledge to implement and interpret the results.
Unintended consequences: KDD can lead to unintended consequences,
such as bias or discrimination, if the data or models are not properly
understood or used.
Data Quality: KDD process heavily depends on the quality of data, if data is
not accurate or consistent, the results can be misleading
High cost: KDD can be an expensive process, requiring significant
investments in hardware, software, and personnel.
Overfitting: KDD process can lead to overfitting, which is a common
problem in machine learning where a model learns the detail and noise in
the training data to the extent that it negatively impacts the performance of
the model on new unseen data.
14
2
MODULE 2:Session 1 21CS601 - Enterprise data Warehouse
15
2