DM Unit 1
DM Unit 1
DM Unit 1
UNIT - I
DATA MINING
UNIT - I
Data Mining: Data–Types of Data–, Data Mining Functionalities–
Interestingness Patterns– Classification of Data Mining systems–
Data mining Task primitives –Integration of Data mining system with
a Data warehouse– Major issues in Data Mining–Data Preprocessing.
Introduction
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
Distributed techniques can also help address the issue of size and
are essential when the data cannot be gathered in one location.
Below figure shows the relationship of data mining to other areas.
Data Mining
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
Data Sources
There are so many documents present. That is a database, data
warehouse, World Wide Web (WWW). That are the actual sources of
data. Sometimes, data may reside even in plain text files or
spreadsheets. World Wide Web or the Internet is another big source
of data.
Database or Data Warehouse Server
The database server contains the actual data that is ready to be
processed. Hence, the server handles retrieving the relevant data.
That is based on the data mining request of the user.
Data Mining Engine
In data mining system data mining engine is the core component. As
It consists a number of modules. That we used to perform data
mining tasks. That includes association, classification,
characterization, clustering, prediction, etc.
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
representation techniques)
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
Data–Types of Data
Nominal Attributes
Nominal means “relating to names.” The values of a nominal
attribute are symbols or names of things. Each value represents
some kind of category, code, or state, and so nominal attributes are
also referred to as categorical. The values do not have any
meaningful order.
Nominal attributes. Suppose that hair color and marital status are
two attributes describing person objects. In our application, possible
values for hair color are black, brown, blond, red, auburn, gray, and
white.
The attribute marital status can take on the values single, married,
divorced, and widowed. Both hair color and marital status are
nominal attributes. Another example of a nominal attribute is
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
Binary Attributes
A binary attribute is a nominal attribute with only two categories or
states: 0 or 1, where 0 typically means that the attribute is absent,
and 1 means that it is present. Binary attributes are referred to as
Boolean if the two states correspond to true and false.
Binary attributes. Given the attribute smoker describing a patient
object, 1 indicates that the patient smokes, while 0 indicates that the
patient does not Ordinal Attributes
An ordinal attribute is an attribute with possible values that have a
meaningful order or ranking among them, but the magnitude
between successive values is not known.
Ordinal attributes. Suppose that drink size corresponds to the size of
drinks available at a fast-food restaurant. This nominal attribute has
three possible values: small, medium, and large.
Other examples of ordinal attributes include grade (e.g., A+, A, A−,
B+, and so on) and professional rank. Professional ranks can be
enumerated in a sequential order: for example, assistant, associate,
and full for professors, and private, private first class, specialist,
corporal, and sergeant for army ranks.
Numeric Attributes
A numeric attribute is quantitative; that is, it is a measurable
quantity, represented in integer or real values. Numeric attributes
can be interval-scaled or ratio-scaled.
Interval-Scaled Attributes Interval-scaled attributes are measured on a
scale of equal-size units. The values of interval-scaled attributes
have order and can be positive, 0, or negative.
Interval-scaled attributes. A temperature attribute is interval-scaled.
Suppose that we have the outdoor temperature value for a number
of different days, where each day is an object.
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
increased by 15% two years ago, anyone can collect these type
of data related to such products by running SQL queries.
Data Discrimination: It compares common features of class
which is under study. The output of this process can be
represented in many forms. Eg., bar charts, curves and pie
charts.
2. Mining Frequent Patterns, Associations, and Correlations: Frequent
patterns are nothing but things that are found to be most common in
the data. There are different kinds of frequencies that can be
observed in the dataset.
Frequent item set: This applies to a number of items that can be
seen together regularly for eg: milk and sugar.
Frequent Subsequence: This refers to the pattern series that
often occurs regularly such as purchasing a phone followed by
a back cover.
Frequent Substructure: It refers to the different kinds of data
structures such as trees and graphs that may be combined with
the itemset or subsequence.
Association Analysis: The process involves uncovering the
relationship between data and deciding the rules of the association.
It is a way of discovering the relationship between various items. for
example, it can be used to determine the sales of items that are
frequently purchased together.
Correlation Analysis: Correlation is a mathematical technique that
can show whether and how strongly the pairs of attributes are
related to each other. For example, Highted people tend to have
more weight.
There are various data mining functionalities which are as follows −
Data characterization − It is a summarization of the general
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
Interestingness Patterns
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
With Data mining, businesses are found to gain more profit. It has
not only helped in understanding customer demand but also in
developing effective strategies to enforce overall business turnover.
It has helped in determining business objectives for making clear
decisions.
Data collection and data warehousing, and computer processing are
some of the strongest pillars of data mining. Data mining utilizes the
concept of mathematical algorithms to segment the data and assess
the possibility of occurrence of future events.
To understand the system and meet the desired requirements, data
mining can be classified into the following systems:
o Classification based on the mined Databases
o Classification based on the type of mined knowledge
o Classification based on statistics
o Classification based on Machine Learning
o Classification based on visualization
o Classification based on Information Science
o Classification based on utilized techniques
o Classification based on adapted applications
Classification Based on the mined Databases
A data mining system can be classified based on the types of
databases that have been mined. A database system can be further
segmented based on distinct principles, such as data models, types
of data, etc., which further assist in classifying a data mining
system.
For example, if we want to classify a database based on the data
model, we need to select either relational, transactional, object-
relational or data warehouse mining systems.
Classification Based on the type of Knowledge Mined
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
1. Characterization
2. Discrimination
3. Association and Correlation Analysis
4. Classification
5. Prediction
6. Outlier Analysis
7. Evolution Analysis
Classification Based on the Techniques Utilized
A data mining system can also be classified based on the type of
techniques that are being incorporated. These techniques can be
assessed based on the involvement of user interaction involved or
the methods of analysis employed. Classification Based on the
Applications Adapted
Data mining systems classified based on adapted applications
adapted are as follows:
1. Finance
2. Telecommunications
3. DNA
4. Stock Markets
5. E-mail
Examples of Classification Task
Following is some of the main examples of classification tasks:
o Classification helps in determining tumor cells as benign or
malignant.
o Classification of credit card transactions as fraudulent or
legitimate.
o Classification of secondary structures of protein as alpha-
helix, beta- sheet, or random coil.
o Classification of news stories into distinct categories such as
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
allows users
SWCET
DATA MINING UNIT - 1
scalable.
Parallel, distributed, and incremental mining algorithms − The
factors such as huge size of databases, wide distribution of
data, and complexity of data mining methods motivate the
development of parallel
SWCET
DATA MINING UNIT - 1
Data Preprocessing
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET
DATA MINING UNIT - 1
SWCET