Unit-3 - I MGN 343
Unit-3 - I MGN 343
Unit-3 - I MGN 343
WAREHOUSING
Learning Objectives
• Understand the basic definitions and concepts
of data warehouses
• Understand data warehousing architectures
• Describe the processes used in developing and
managing data warehouses
• Explain data warehousing operations
• Explain the role of data warehouses in decision
support
Learning Objectives
• Explain data integration and the
extraction, transformation, and load (ETL)
processes
• Describe real-time (active) data
warehousing
• Understand data warehouse
administration and security issues
Main Data Warehousing
Topics
• DW definition
• Characteristics of DW
• Data Marts
• ODS, EDW, Metadata
• DW Framework
• DW Architecture & ETL Process
• DW Development
• DW Issues
Data Warehousing
Definitions and Concepts
• Data warehouse
A physical repository where relational data
are specially organized to provide
enterprise-wide, cleansed data in a
standardized format
What is a Data Warehouse?
• “The data warehouse is a collection of
integrated, subject-oriented databases
designed to support DSS functions,
where each unit of data is non-volatile
and relevant to some moment in time”
Data Warehousing
Definitions and Concepts
• Characteristics of data warehousing
– Subject oriented (Sales, products, or customers)
– Integrated
– Time variant (time series)
– Nonvolatile
– Web based
– Relational/multidimensional
– Client/server
– Real-time
– Include metadata
A Historical Perspective to
Data Warehousing
ü Mainframe computers ü Centralized data storage ü Big Data analytics
ü Simple data entry ü Data warehousing was born ü Social media analytics
ü Routine reporting ü Inmon, Building the Data Warehouse ü Text and Web Analytics
ü Primitive database structures ü Kimball, The Data Warehouse Toolkit ü Hadoop, MapReduce, NoSQL
ü Teradata incorporated ü EDW architecture design ü In-memory, in-database
/ Middleware
Legacy Metadata Data/text
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
API
Data mart Dashboard,
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data
Data warehouse is
• None of these
Decision support systems
(DSS) is
• A.
• A family of relational database management systems marketed by IBM
• B.
• Interactive systems that enable decision makers to use databases and
models on a computer in order to solve ill-structured problems
• C.
• It consists of nodes and branches starting from a single root node. Each
node represents a test, or decision
• D.
• None of these
Which of these is an advantage of the
database approach to data storage?
• Data independence
• Data consistency
• Data quality
• All of the above.
___________ is a data warehouse
that is limited in scope.
• A data strip
• A database
• A data box
• A data mart
A data warehouse is which of
the following?
Packaged Transient
application data source
Data
warehouse
Legacy
Extract Transform Cleanse Load
system
Data mart
Other internal
applications
ETL (Extract, Transform,
Load)
• Issues affecting the purchase of an ETL tool
– Data transformation tools are expensive
– Data transformation tools may have a long learning
curve
• Important criteria in selecting an ETL tool
– Ability to read from and write to an unlimited number
of data sources/architectures
– Automatic capturing and delivery of metadata
– A history of conforming to open standards
– An easy-to-use interface for the developer and the
functional user
Data Warehousing Architectures
Web pages
Application
Server
Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse
Data Warehousing Architectures
Slicing Operations on a
Simple Tree-Dimensional e
Data Cube m
Ti
Product
Geography
Cells are filled
with numbers
Sales volumes of
representing a specific Region
sales volumes on variable Time
and Products
Sales volumes of
a specific Time on
variable Region
and Products
Variations of OLAP
• Multidimensional OLAP (MOLAP)
OLAP implemented via a specialized
multidimensional database (or data store) that
summarizes transactions into
multidimensional views ahead of time
• Relational OLAP (ROLAP)
The implementation of an OLAP database on
top of an existing relational database
• Database OLAP and Web OLAP (DOLAP and
WOLAP)...
DW Implementation Issues
• Identification of data sources and governance
• Data quality planning, data model design
• ETL tool selection
• Establishment of service-level agreements
• Data transport, data conversion
• Reconciliation process
• End-user support
• Political issues
• … more in the book
Successful DW
Implementation
Things to Avoid
• Starting with the wrong sponsorship chain
• Setting expectations that you cannot meet
• Engaging in politically naive behavior
• Loading the data warehouse with information
just because it is available
• Believing that data warehousing database
design is the same as transactional database
design
• Choosing a data warehouse manager who is
technology oriented rather than user oriented
• … more in the book
Failure Factors in DW
Projects
• Lack of executive sponsorship
• Unclear business objectives
• Cultural issues being ignored
– Change management
• Unrealistic expectations
• Inappropriate architecture
• Low data quality / missing information
• Loading data just because it is available
Massive DW and
Scalability
• Scalability
– The main issues pertaining to scalability:
• The amount of data in the warehouse
• How quickly the warehouse is expected to grow
• The number of concurrent users
• The complexity of user queries
– Good scalability means that queries and
other data-access functions will grow linearly
with the size of the warehouse
Real-Time/Active DW/BI
• Enabling real-time data updates for real-
time analysis and real-time decision
making is growing rapidly
– Push vs. Pull (of data)
• Concerns about real-time BI
– Not all data should be updated continuously
– Mismatch of reports generated minutes apart
– May be cost prohibitive
– May also be infeasible
Enterprise Decision Evolution
and Data Warehousing
Real-Time/Active DW at
Teradata
Traditional versus Active
DW
DW Administration and
Security
• Data warehouse administrator (DWA)
– DWA should…
• have the knowledge of high-performance software,
hardware and networking technologies
• possess solid business knowledge and insight
• be familiar with the decision-making processes so as to
suitably design/maintain the data warehouse structure
• possess excellent communications skills
• Security and privacy is a pressing issue in DW
– Safeguarding the most valuable assets
– Government regulations (HIPAA, etc.)
– Must be explicitly planned and executed
The Future of DW
• Sourcing…
– Web, social media, and Big Data
– Open source software
– SaaS (software as a service)
– Cloud computing
• Infrastructure…
– Columnar
– Real-time DW
– Data warehouse appliances
– Data management practices/technologies
– In-database & In-memory processing New DBMS
– Advanced analytics
– …
End of the Chapter
• Questions, comments