0% found this document useful (0 votes)
90 views8 pages

Data Mining

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 8

DATA MINING

SYLLABUS
UNIT I:
Introduction: Data mining application – data mining techniques – data
mining case studies the future of data mining – data mining software. Association
rules mining: Introduction -Basics-task and a Naive algorithm- Apriori algorithm –
improve the efficiency of the Apriori algorithm – mining frequent pattern without
candidate generation (FP-growth)-performance evaluation of algorithms.

UNIT II:
Data warehousing: Introduction – Operational data sources- data
warehousing – Data Warehousing design – Guidelines for data warehousing
implementation - Data warehousing -Metadata. Online analytical processing
(OLAP): Introduction – OLAP characteristics of OLAP system –
Multidimensional view and data cube - Data cube implementation – Data Cube
operations OLAP implementation guidelines.

UNIT III:
Classification: Introduction – decision tree – over fitting and pruning -
DT rules – Naïve Bayes method- estimation predictive accuracy of classification
methods - other evaluation criteria for classification method – classification
software.

UNIT IV:
Cluster analysis: cluster analysis – types of data – computing distances-
types of cluster analysis methods - partitioned methods – hierarchical methods –
density based methods – Dealing with large databases – quality and validity of
cluster analysis methods – cluster analysis software.

UNIT V:
Web data mining: Introduction- web terminology and characteristics-
locality and hierarchyin the web- web content mining-web usage mining- web
structure mining – web mining software. Search engines: Search engines
functionality- search engines architecture – Ranking of web pages.
UNIT-I
WHAT IS DATA MINING?

The process of extracting information to identify patterns, trends, and useful data that would allow the
business to take the data-driven decision from huge sets of data is called Data Mining.

The primary goal of data mining is to discover hidden patterns and relationships in the data that can
be used to make informed decisions or predictions.

TYPES OF DATA MINING

Data mining can be performed on the following types of data:

Relational Database:

A relational database is a collection of multiple data sets formally organized by tables, records, and
columns from which data can be accessed in various ways without having to recognize the database
tables.

Data warehouses:

A Data Warehouse is the technology that collects the data from various sources within the organization
to provide meaningful business insights. The huge amount of data comes from multiple places such as
Marketing and Finance

Data Repositories:

The Data Repository generally refers to a destination for data storage.


Object-Relational Database:

A combination of an object-oriented database model and relational database model is called an object-
relational model. It supports Classes, Objects, Inheritance, etc.

One of the primary objectives of the Object-relational data model is to close the gap between the
Relational database and the object-oriented model practices frequently utilized in many programming
languages, for example, C++, Java, C#, and so on.

Transactional Database:

A transactional database refers to a database management system (DBMS) that has the potential to
undo a database transaction if it is not performed appropriately.

APPLICATION OF DATA MINING

SCIENTIFIC ANALYSIS:

Scientific simulations are generating bulks of data every day. This includes data collected from
nuclear laboratories, data about human psychology, etc. Data mining techniques are capable of the
analysis of these data. Now we can capture and store more new data faster than we can analyze the
old data already accumulated.

Example of scientific analysis:


 Sequence analysis in bioinformatics
 Classification of astronomical objects
 Medical decision support.
INTRUSION DETECTION:

A network intrusion refers to any unauthorized activity on a digital network. Network


intrusions often involve stealing valuable network resources. Data mining technique plays a vital
role in searching intrusion detection, network attacks, and anomalies.

For example:
 Detect security violations
 Misuse Detection
 Anomaly Detection

BUSINESS TRANSACTIONS :

Every business industry is memorized for perpetuity. Such transactions are usually time-related and
can be inter-business deals or intra-business operations. Data mining helps to analyze these business
transactions and identify marketing approaches and decision-making.

Example :
 Direct mail targeting
 Stock trading
 Customer segmentation
 Churn prediction (Churn prediction is one of the most popular Big Data use cases in business)

MARKET BASKET ANALYSIS:

Market Basket Analysis is a technique that gives the careful study of purchases done by a customer
in a supermarket. This concept identifies the pattern of frequent purchase items by customers.

Example:
 Data mining concepts are in use for Sales and marketing to provide better customer service, to
improve cross-selling opportunities, to increase direct mail response rates.
EDUCATION:

For analyzing the education sector, data mining uses Educational Data Mining (EDM) method. This
method generates patterns that can be used both by learners and educators.

By using data mining EDM we can perform some educational task:


 Predicting students admission in higher education
 Predicting students profiling
 Predicting student performance
 Teachers teaching performance.

HEALTHCARE AND INSURANCE :

A Pharmaceutical sector can examine its new deals force activity and their outcomes to improve the
focusing of high-value physicians and figure out which promoting activities will have the best effect
in the following upcoming months, Whereas the Insurance sector, data mining can help to predict
which customers will buy new policies, identify behavior patterns of risky customers and identify
fraudulent behavior of customers.

 Claims analysis i.e which medical procedures are claimed together.


 Identify successful medical therapies for different illnesses.
 Characterizes patient behavior to predict office visits.

TRANSPORTATION:

A diversified transportation company with a large direct sales force can apply data mining to
identify the best prospects for its services. A large consumer merchandise organization can apply
information mining to improve its business cycle to retailers
.
 Determine the distribution schedules among outlets.
 Analyze loading patterns.

FINANCIAL/BANKING SECTOR:

A credit card company can leverage its vast warehouse of customer transaction data to identify
customers most likely to be interested in a new credit product.

 Credit card fraud detection.


 Identify ‘Loyal’ customers.
 Extraction of information related to customers.
 Determine credit card spending by customer groups.
DATA MINING TECHNIQUES

ASSOCIATION RULES:

Association rules are if-then statements that support to show the probability of interactions
between data items within large data sets in different types of databases.

For example, a list of grocery items that you have been buying for the last six months. It calculates a
percentage of items being purchased together.

These are three major measurements technique:

o Lift:
This measurement technique measures the accuracy of the confidence over how often item B is
purchased. (Confidence) / (item B)/ (Entire dataset)
o Support:
This measurement technique measures how often multiple items are purchased and compared it to the
overalldataset. (Item A + Item B) / (Entire dataset)
o Confidence:
This measurement technique measures how often item B is purchased when item A is purchased as
well.
(Item A + Item B)/ (Item A)

CLASSIFICATION:

This technique is used to obtain important and relevant information about data and metadata. This data
mining technique helps to classify data in different classes.
Data mining techniques can be classified by different criteria, as follows:

Classification of Data mining frameworks as per the type of data sources mined:
This classification is as per the type of data handled.
For example, multimedia, spatial data, text data, time-series data, World Wide Web, and so on..
Classification of data mining frameworks as per the database involved:
This classification based on the data model involved.
For example. Object-oriented database, transactional database, relational database, and so on..
Classification of data mining frameworks as per the kind of knowledge discovered:

This classification depends on the types of knowledge discovered or data mining functionalities. For
example, discrimination, classification, clustering, characterization, etc. some frameworks tend to be
extensive frameworks offering a few data mining functionalities together..

Classification of data mining frameworks according to data mining techniques used:

This classification is as per the data analysis approach utilized, such as neural networks, machine
learning, genetic algorithms, visualization, statistics, data warehouse-oriented or database-oriented.

PREDICTION:

Prediction used a combination of other data mining techniques such as trends, clustering, classification,
etc. It analyzes past events or instances in the right sequence to predict a future event.

CLUSTERING:

Clustering is a division of information into groups of connected objects. Describing the data by a few
clusters mainly loses certain confine details, but accomplishes improvement. It models data by its
clusters. Data modeling puts clustering from a historical point of view rooted in statistics, mathematics,
and numerical analysis.

For example: scientific data exploration, text mining, information retrieval, spatial database
applications, CRM, Web analysis, computational biology, medical diagnostics, and much more.

REGRESSION:

Regression analysis is the data mining process is used to identify and analyze the relationship between
variables because of the presence of the other factor. It is used to define the probability of the specific
variable.

OUTER DETECTION:
This type of data mining technique relates to the observation of data items in the data set, which do not
match an expected pattern or expected behavior. This technique may be used in various domains like
intrusion, detection, fraud detection, etc. It is also known as Outlier Analysis or Outilier mining.

You might also like