Ch6 - Data Warehouse in The The Age of Big Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

DATA WAREHOUSE AND

INTEGRATION
Lecturer: Nguyễn Văn Hồ, M.A.
Data Warehouse and Integration:
Data Warehouse in the the age of Big data
Nguyễn Văn Hồ, M.A.
[email protected]
Data Warehouse and Integration

Comparison Chart of Database Types

Data warehouse Operational system


Subject oriented Transaction oriented

Large (hundreds of GB up to several TB) Small (MB up to several GB)

Historic data Current data

De-normalized table structure (few tables, Normalized table structure (many tables,
many columns per table) few columns per table)

Batch updates Continuous updates

Usually very complex queries Simple to complex queries


Data Warehouse and Integration

What’s wrong with reporting from Operational System?


 OLTP – On Line Transaction Source

Processing 1 Source
1

 Designed for working with single Source


2

record at a time
 Data is highly “normalized”, i.e. Source
1

duplicate values have been removed Source


2
Source

 Getting all data for a record can 1

Source
involve many table joins 2

 Can be quite confusing for


‘ad-hoc’ reporting Source

 Can also be slow, having an


1

Source

impact on the OLTP system 2

32
Data Warehouse and Integration

A typical enterprise data warehousing system architecture


- Ad-hoc Queries - Statistical tools
An enterprise data warehousing - Standard Reports - Operational BI Apps
Front-end
Tools
system architecture with - Desktop OLAP - Data Mining
ROLAP/MOLAP/Hybrid OLAP LAN

Hybrid OLAP Data Marts


ROLAP MOLAP and OLAP
Data mart Data mart Engines
Departmentally
Structured
Sub setting, summarizing, customizing
DW Storage
Data Metadata Repository
Warehouse Metadata

ETL (Extract, Transform, Load)


ETL Management
Staging DB/ ODS Services

Other Archived External


OLTP ERP
RDMS Systems
Internal Data
sources Data source
Sources Systems
33
DW+BI Systems Used to Be Fairly Straightforward
Modern Data Warehousing
Why Data Lake is needed?
Data Warehouse and Integration

Objectives of Data Lake


• Reduce up-front effort by ingesting data in any
format without requiring a schema initially
• Make acquiring new data easy, so it can be
available for data science & analysis quickly
• Store large volume of multi-structured data in
its native format
• Achieve agility faster than a traditional data
warehouse can
• Speed up decision-making ability
• Storage for additional types of data which
were historically difficult to obtain
Data Warehouse and Integration

Growing your DW/BI/Analytics Environment

Cloud & Modern DW Advanced


Hybrid Multi-Platform Analytics
Platforms Architecture

Self- Agile,
Real-Time
Service Nimble
Reporting
BI Solutions
Data Warehouse and Integration

Achieving Extensibility in a DW
Design with change in mind. Ex: Create a lookup table with
code/descriptions, or implement in a view, rather than hard-coding in ETL.

Plan for a hybrid environment with multiple architectures.

Introduce conformed dimensions first whenever possible.

Try to avoid isolated “stovepipe” implementations unless the isolation


is absolutely intended.

Conduct active prototyping sessions with business users to flush out


requirements. A data modeling tool like Power BI works well for this.
Data Warehouse and Integration

Achieving Extensibility in a DW
Be prepared to do some refactoring along the way. Ex: converting an
attribute to be a conformed dimension.
DimCustomer
First implementation: FactSalesInvoice
CustomerName
CustomerRegion

Updated in a later iteration:


DimRegion
FactWarrantyRequest FactSalesInvoice
DimCustomer
Data Warehouse and Integration

Achieving Extensibility in a DW
Introducing new measures:
• Can be a new column in a fact table as long as it’s the same grain & the
same business process

Introducing new attributes:


• Can be a new column in a dimension, or
• Can be via a new foreign key in a fact table as long as it doesn’t affect
the grain

Agility for the things that usually require the most time investment:
• Data modeling
• ETL processes
• Data quality
Data Warehouse and Integration

Achieving Extensibility in a DW
Reusability Downstream Speed of Change
Implemented

DW OLAP Reports

Consider using an OLAP cube or in-memory model (like Analysis


Services) for:
• Summary data (as opposed to summary tables in your DW)
• Year-to-Date type of calculations
• Year-over-Year type of calculations
• Aggregate level calculations (as opposed to row-by-row calculations)
Data Warehouse and Integration

Modern DW: Important Concepts to Know


Polygot Lambda Schema on
Persistence Architecture Read
Using the most Data processing Data structure is
effective data architecture applied at query
storage technology which supports time rather than
to handle different large amounts of when the data is
data storage needs data via a speed initially stored
layer, batch layer,
and serving layer

You might also like