Data Warehouse OLAP OLTP
Data Warehouse OLAP OLTP
Data Warehouse OLAP OLTP
Analytical Processing
Data Warehouse: Basic Concepts
What is a Data Warehouse?
• Defined in many different ways, but not rigorously.
• A decision support database that is maintained separately from the
organization’s operational database
• Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• Data warehousing:
• The process of constructing and using data warehouses
2
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer, product,
sales
• Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing
• Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process
3
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources
• relational databases, flat files, on-line transaction records
• Data cleaning and data integration techniques are applied.
• Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
• When data is moved to the warehouse, it is converted.
4
Data Warehouse—Time Variant
• The time horizon for the data warehouse is significantly longer
than that of operational systems
• Operational database: current value data
• Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
• Contains an element of time, explicitly or implicitly
• But the key of operational data may or may not contain
“time element”
5
Data Warehouse—Nonvolatile
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data
warehouse environment
• Does not require transaction processing, recovery, and
concurrency control mechanisms
• Requires only two operations in data accessing:
• initial loading of data and access of data
6
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
7
Why a Separate Data Warehouse?
• High performance for both systems
• DBMS— tuned for OLTP: access methods, indexing, concurrency control,
recovery
• Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view,
consolidation
• Different functions and different data:
• missing data: Decision support requires historical data which operational
DBs do not typically maintain
• data consolidation: DS requires consolidation (aggregation, summarization)
of data from heterogeneous sources
• data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
• Note: There are more and more systems which perform OLAP analysis directly on
relational databases
8
Data Warehouse: A Multi-Tiered Architecture
Monitor
Metadata & OLAP Server
Other
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
• Data extraction
• get data from multiple, heterogeneous, and external sources
• Data cleaning
• detect errors in the data and rectify them when possible
• Data transformation
• convert data from legacy or host format to warehouse format
• Load
• sort, summarize, consolidate, compute views, check integrity,
and build indicies and partitions
• Refresh
• propagate the updates from the data sources to the
warehouse
11
Metadata Repository
• Meta data is the data defining warehouse objects. It stores:
• Description of the structure of the data warehouse
• schema, view, dimensions, hierarchies, derived data defn, data mart locations and
contents
• Operational meta-data
• data lineage (history of migrated data and transformation path), currency of data
(active, archived, or purged), monitoring information (warehouse usage statistics,
error reports, audit trails)
• The algorithms used for summarization
• The mapping from operational environment to the data warehouse
• Data related to system performance
• warehouse schema, view and derived data definitions
• Business data
• business terms and definitions, ownership of data, charging policies
12