Data Mining

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

1. What You can observe from this figure?

2. Differentiate between Classification and regression


3. What are data mining tasks?
List some of Data Mining Tasks
• Characterization
• Discrimination
• Prediction
• Classification
• Association analysis
• Outlier Analysis
• Cluster Analysis
• Evolution and Deviation Analysis
2. Differentiate between Regression Classification &
Regression:
• The response variable is continuous.
• For example, the response variable could be:
• Weight
• Height
• Price
• Time
• Total units
• In each case, a regression model seeks to predict a
continuous quantity.
Classification:
• The response variable is categorical.
• For example, the response variable could take
on the following values:
• Male or female
• Pass or fail
• Low, medium, or high
• In each case, a classification model seeks to
predict some class label.
Differences Between Regression and
Classification

• Regression and classification algorithms are


different in the following ways:
• Regression algorithms seek to predict a
continuous quantity and classification
algorithms seek to predict a class label.
• The way we measure the accuracy of
regression and classification models differs.
Similarities Between Regression and
Classification
• Regression and classification algorithms are
similar in the following ways:
• Both are supervised learning algorithms, i.e.
they both involve a response variable.
• Both use one or more explanatory variables to
build models to predict some response.
• Both can be used to understand how changes
in the values of explanatory variables affect the
values of a response variable.
Data Mining…
• It can be defined in many different ways.
• Even the term data mining does not really present
all the major components in the picture.
• To refer to the mining of gold from rocks or sand,
we say gold mining instead of rock or sand mining.
• Analogously, data mining should have been more
appropriately named “knowledge mining from
data,” which is unfortunately somewhat long.
Data Mining…
• However, the shorter term, knowledge mining may
not reflect the emphasis on mining from large
amounts of data.
• Nevertheless, mining is a bright term characterizing
the process that finds a small set of precious
nuggets/pieces from a great deal of raw material
• In addition, many other terms have a similar meaning
to data mining—for example, knowledge mining from
data, knowledge extraction, data/pattern analysis,
data archaeology, and data dredging
Synonyms for Data Mining…
• knowledge mining from data,
• knowledge extraction,
• data/pattern analysis,
• data archaeology,
• data dredging.
Data Mining…
• Many people treat data mining as a synonym
for another popularly used term, knowledge
discovery from data, or KDD, while others
view data mining as merely an essential step
in the process of knowledge discovery.
• The knowledge discovery process is shown in
the ff figure as an iterative sequence of the
following steps:
KDD Steps
• 1. Data cleaning (to remove noise and inconsistent data)
• 2. Data integration (where multiple data sources may be
combined)
• Data selection (where data relevant to the analysis task
are retrieved from the database)
• 4. Data transformation (where data are transformed and
consolidated into forms appropriate for mining by
performing summary or aggregation operations)
• 5. Data mining (an essential process where intelligent
methods are applied to extract data patterns)
KDD Steps
• Pattern evaluation: - to identify the truly
interesting patterns representing knowledge
based on interestingness measures
• 7. Knowledge presentation (where
visualization and knowledge representation
techniques are used to present mined
knowledge to users
Data Mining…
• Data mining is the process of discovering
interesting patterns and knowledge from
large amounts of data.
• The data sources can include databases, data
warehouses(data repository), the Web, other
information repositories, or data that are
streamed into the system dynamically.
Data Mining
• Therefore Data mining can be defined as the
process of extracting hidden, useful, non-
truvial, interesting and previously
unknown patterns(information) from vast
amount of data.
What is a Data Warehouse?
• Defined in many different ways, but not strictly.
– A decision support database that is maintained separately from the
organization’s operational database
– Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• “A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision-
making process.”—W. H. Inmon
• Data warehousing:
– The process of constructing and using data warehouses

17
Data Warehouse- Subject-Oriented
• Organized around major subjects, such as customer, product,
sales
• Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing
• Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process

18
Data Warehouse - Integrated
• Constructed by integrating multiple, heterogeneous data
sources
– relational databases, flat files, on-line transaction records
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.

19

You might also like