BusinessIntelligence 2023

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

Advanced Databases

Business Intelligence Technologies


Dr David Hamill
Dependent vs
Independent Data Marts
Data Marts

Overview 1/2 Issues with


Architecture
of Data
Data Marts
Marts

Reasons for
creating
Data Marts
Business Intelligence
Technologies

Online Analytic Processing


(OLAP)
Overview 2/2
Multi-dimensional Data

Data Mining
Typical Data
Warehouse and
Data Mart
Architecture
Data Mart (DM)
• A subset of data warehouse that supports the requirements
of a particular department or business function

• Characteristics:
• Focuses only on requirement of one department/business
function.
• Do not normally contain detailed operational data unlike data
warehouse.
• More easily understood and navigated.
Reasons for Creating a Data Mart

• Gives users access to data they need to analyse most often.


• Provides data in a format matching the collective view of the data by a group
of users in a department or those interested in a particular business function.
• To import end-user response time.
• Provide appropriate structured data as dictated by requirement of end users
(OLAP/datamining tools).
• Data Marts use less data and simplify ETL , in comparison to EDW.
• Cost of implementing data marts is normally less than EDW.
• Users of data mart more clearly defined and can be targeted to obtain
support for a data mart project.
Data Mart Issues
• Functionality
• Size
• Load Performance
• User access to data in multiple data marts.
• Internet/Intranet access
• Administration
• Installation
Types of Data Mart
• Dependent data mart: created directly from a subset of the data in
a data warehouse. It does not exist on its own:
• Inmons Corporate Information factory methodology –the data mart
acts as a database getting information from the enterprise data
warehouse.

• Independent data mart: a small-scale data warehouse that supports


the requirements of a particular department or business function:
• Kimballs methodology where the data mart exists as a single star schema
(see later) for a particular business function.
Independent Data Marts
Advantages:
• Easy to build organisationally.
• Easy to build technically.

Disadvantages:
• Business Enterprise view unavailable
• Redundant Data Costs
• High ETL costs
• High DBA and operational costs
Business Intelligence Technologies
• Growth in data warehousing is an ever-increasing demand by users
for more powerful access tools that provide advanced analytical
capabilities.
• Two main types of access tools available to meet this demand, namely
Online Analytical Processing (OLAP) and data mining.
• OLAP and Data Mining differ in what they offer the user and because
of this they are complementary technologies.
• A data warehouse together with tools like OLAP and/or data mining
are referred to as Business Intelligence (BI) technologies.
Online Analytical Processing (OLAP)
• Enables users to gain a deeper
understanding and knowledge
about various aspects of their
corporate data…..through fast,
consistent, interactive access to a
wide variety of possible views of
the data.
• Allows users to view corporate
data in such a way that it is a
better model of the true
dimensionality of the enterprise.
Online Analytical Processing (OLAP)
• Can easily answer ‘who?’ and
‘what?’ questions, however, ability
to answer ‘why?’ type questions
distinguishes OLAP from general-
purpose query tools.

• Types of analysis ranges from basic


navigation and browsing, to
calculations and to more complex
analyses such as time series and
complex modeling.
OLAP Applications
• Just-In-Time (JIT) information is computed data that usually reflects
complex relationships and is often calculated on the fly. Also, as data
relationships may not be known in advance, the data model must be
flexible.
• OLAP systems are benchmarked using Analytical Queries per Minute
(AQM).
OLAP Applications
Remember OLAP applications have the following features:

• Multi-dimensional views of data

• Support for complex calculations

• Time intelligence
Multi-Dimensional Data
• Seeking relationships between data means we often move beyond the
two-dimensional relational model to develop data cubes (or
hypercubes if more than three dimensions)

• Which is the best representation of multi-dimensional data: relational


table, matrix, or data cube?
Multi-dimensional Data as 3-field Table
versus 2-D Matrix
Multi-dimensional Data as 4-field Table
versus 3-D Cube
Multi-dimensional Data as series of 3-D
Cubes
Multi-dimensional Storage Model
• Represented as a table of facts (numeric measurements) such as
property sales revenue data ... and the association of this data with
other tables (known as dimension tables) such as location (of the
property) and time (of the property sale).

• The fact table holds actual data, or facts, relating to each attribute
and a foreign key to each dimension table.

• The dimension table stores the attributes of the dimension, e.g.


attributes of product, fiscal year, or region.
Multi-dimensional Schema
• Many storage schema have been suggested with two of the most
common ones being:
• The star schema
• The snowflake schema

• With the star schema there is a fact table with a single table for each
dimension.

• With the snowflake schema we also have a fact table, but the
dimensional tables are organised into a hierarchy through normalisation.
Star Schema
Dimension Table
• Fact table contains a Fiscal Quarter
foreign key for each QTR
dimension table Fact Table Year
Beg_Date
Business Results
• Fact tables may be very Dimension Table
Product Product
End_Date

large relative to the Quarter

dimension tables Prod. No


Prod. Name
Region Dimension Table
Sales Revenue
Prod. Desc.
Prod. Style Region
… Subregion

Snowflake Schema
Snowflake Schema – a variant of the star schema where each
dimension can have its own dimensions
Dimension Table Dimension
PName Table
Dimension Table Fact Table Sales Revenue
Prod. Name
Product Region
Prod. Desc. Business Results
Subregion
Prod. No Product
Prod. Name Quarter
Prod. LineNo Region

Dimension Table Dimension


PLine Table
Fiscal Quarter
Qtr
Prod. LineNo Year
Prod. Line Name Begin Date
SQL Example
• OLAP extensions have been built int SQL.
We will discuss two:
• ROLLUP
• CUBE

• Both are extensions of the GROUP BY


function that provides an aggregation
(SUM, COUNT, MAX, MIN, and AVG) of
selected data columns.
ROLLUP Example
CUBE
Table Query to form cube CUBE result
Data Mining
• The process of extracting valid, previously unknown,
comprehensible, and actionable information from large databases and
using it to make crucial business decisions (Simoudis,1996).

• Involves the analysis of data and the use of software techniques for
finding hidden and unexpected patterns and relationships in sets of
data.

• The process of searching data for unanticipated new knowledge.


Data Mining
• Reveals information that is hidden and unexpected…as no point in
finding patterns and relationships that are intuitive.

• Patterns and relationships are identified by examining the underlying


rules and features in the data.

• Tends to work from the data up and most accurate results normally
require large volumes of data to deliver reliable conclusions.
Data Mining
• Data mining can provide huge paybacks for companies who
have made a significant investment in data warehousing.

• Relatively new technology, however already used in a


number of industries.
Data Mining Techniques
• Predictive Modeling:
• Using observations to form a model of characteristics of some phenomenon.
• Database segmentation:
• Partition database into an unknown number of segments/clusters of similar
records.
• Link analysis:
• Establish links/associations between individual records (often measured over
time)
• Deviation detection:
• Identifies outliers expressing deviation from previously known expectations
Predictive Modelling
Classification: Value Prediction:
Database segmentation
Link Analysis
Deviation Detection
Examples of Applications of Data Mining
• Retail / Marketing
• Identifying buying patterns of customers
• Finding associations among customer demographic characteristics
• Predicting response to mailing campaigns
• Market basket analysis
Examples of Applications of Data Mining
• Banking
• Detecting patterns of fraudulent credit card use

• Identifying loyal customers

• Predicting customers likely to change their credit card affiliation

• Determining credit card spending by customer groups


Examples of Applications of Data Mining
• Insurance
• Claims analysis

• Predicting which customers will buy new policies

• Medicine
• Characterising patient behaviour to predict surgery visits

• Identifying successful medical therapies for different illnesses

You might also like