Unit 3 PPT (BA)
Unit 3 PPT (BA)
Unit 3 PPT (BA)
• Outlier Analysis: Used to find anomalies, that is, data that doesn’t fit neatly
into patterns. Outlier analysis is especially useful in fraud detection, network
intrusion detection and criminal investigations.
Origins of Data Mining
• Data mining has its origins in the field of statistics and
computer science, and it has evolved over the years as
technology and data collection methods have advanced.
• Today, data mining continues to evolve with the development
of more sophisticated algorithms, big data technologies, and
increased automation, making it an essential component of
data analytics and business intelligence in various fields like
• Statistics, Machine Learning, Databases, Knowledge
Discovery in Databases (KDD), Rapid Growth of Data, Data
Warehousing, Industry Applications, Research and Academia
etc
Data mining Tasks
• Data mining encompasses various tasks and
techniques for extracting valuable insights and
patterns from large datasets. These data mining
tasks are essential for gaining insights from
large and complex datasets, and the choice of
the task and technique depends on the specific
problem and data at hand. These tasks can be
broadly categorized into the following:
• Classification
• Regression
• Clustering
• Association Rule Mining
• Anomaly Detection
• Sequential Pattern Mining
• Text Mining and Natural Language
Processing (NLP)
• Time Series Analysis
• Dimensionality Reduction
• Recommendation Systems
• Graph Mining:
• Spatial Data Mining
• Ensemble Methods
OLAP(Online Analytical Processing) and
Multidimensional Analysis
Online Analytical process(OLAP)
• OLAP (Online Analytical Processing) and
multidimensional data analysis are techniques and
technologies used in data warehousing and business
intelligence to help users analyze and explore large sets
of data. They are particularly useful for decision support
and reporting purposes. OLAP is a category of software
tools and applications that allow users to interact with
and analyze data from various perspectives, making it
easier to perform complex and multidimensional data
analysis. OLAP systems are designed for query
performance
• Slice: View a single "slice" or cross-section of data to focus on specific
attributes or dimensions.
• Dice: Select and view a subset of data based on multiple criteria or
dimensions.
• Pivot (rotate): Change the orientation of data to look at it from different
angles or dimensions.
• Drill-down: Navigate from summary-level data to more detailed data.
• Roll-up: Aggregate data to a higher-level summary.
• Measure: Apply mathematical functions to measure data, such as
calculating sums, averages, or percentages.
• Hierarchy navigation: Explore data hierarchies, such as drilling down
from years to quarters to months.
• OLAP systems typically use a multidimensional data model, which
organizes data into dimensions, facts, and measures, making it more
suitable for complex and interactive analysis.
Multidimensional Data Analysis