Data Warehousing

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Data Warehousing

Defining a data warehouse (DW)

 A repository of current and historical data

 A pool of data produced to support decision making in organisations

 A historical data store.

 Single version of the truth.

2
Data Warehousing: Process Overview
 Organizations continuously collect data, information, and knowledge at an
increasingly accelerated rate and store them in computerized systems

 The number of users needing to access the information continues to increase


as a result of improved reliability and availability of network access, especially
the Internet

3
Benefits of a Data Warehouse
 Direct benefits of a data warehouse
 Allows end users to perform extensive analysis
 Allows a consolidated view of corporate data
 Better and more timely information
 Enhanced system performance
 Simplification of data access
 Indirect benefits result from end users using these direct benefits
 Enhance business knowledge
 Present competitive advantage
 Enhance customer service and satisfaction
 Facilitate decision making
 Help in reforming business processes

4
Data Warehousing Characteristics
 Basic (Inmon)
 Subject oriented
 Integrated
 Time variant (time series)
 Nonvolatile

 Extra
 Relational/multidimensional
 Real-time

5
Types of data warehouse

Data Marts

• Segments of the data warehouse that focus on a particular


subject or department

Enterprise Data Warehouses


EDW

• Large-scale data-warehouse used across the enterprise


• Allows for effective BI to take place and provide support
for decision support applications 6
EDW
 Enterprise data warehouse (EDW)
A physical repository where relational data are specially organized to
provide enterprise-wide, cleansed data in a standardized format.

 Metadata
Data about data. In a data warehouse, metadata describe the contents of a
data warehouse and the manner of its use

7
Data Marts

 Data mart: A departmental data warehouse that stores


only relevant data

 Dependent data mart: A subset that is created directly


from a data warehouse

 Independent data mart: A small data warehouse


designed for a strategic business unit or a department;
independent of the EDW

8
ODS
 Operational data stores (ODS) is a type of database
often used as an interim area for a data warehouse,
especially for customer information files

 Oper marts is an operational data mart. An oper mart is a


small-scale data mart typically used by a single
department or functional area in an organization

9
Data Warehousing Components

 The major components of the DW process


 Data sources
 Data extraction
 Data loading
 Comprehensive database
 Metadata

 Middleware tools
 software that acts as a bridge between an operating system or
database and applications, especially on a network.
10
Data Warehousing Architectures
 Three parts of the data warehouse
 Data acquisition (back-end) software that extracts data from legacy systems and
external sources, consolidates and summarizes them, and loads them into the data
warehouse

 The data warehouse that contains the data and associated software

 Client (front-end) software that allows users to access and analyze data from the
warehouse

11
EDW Architecture

12
Hub-and-spoke Architecture

14
The Problem
Operational Systems Decision Makers

Accts. Payable
Accts. Receivable Marketing

Invoicing Supply Chain


Sales/Orders Finance
Finance G/L
Risk Management
Customer Support
HR Maintenance

Payroll Sales
Purchasing Operations
Order Fulfillment
Inventory
Manufacturing
Inventory … Call Center …

Proliferation of Data Marts has resulted in


fragmented data, higher costs, poor decisions
The EDW Solution
Operational Systems Decision Makers

Accts. Payable
Accts. Receivable Marketing

Invoicing Supply Chain


Sales/Orders Finance
Finance G/L
Enterprise Risk Management
Customer Support Data
HR
Warehouse Maintenance
(EDW)
Payroll Sales
Purchasing Operations
Order Fulfillment
Inventory
Manufacturing
Inventory … Call Center …

Integrated data provides consistency of data,


lower costs, better decisions
Factors influencing architecture decision

o Information interdependence between organizational units


o Upper management information needs
o Urgency of need for a data warehouse
o Nature of end-user tasks
o Constraints on resources
o Compatibility with existing systems
o Perceived ability of the in-house IT staff
o Social/political factors

17
Data Integration and the Extraction,
Transformation, and Load (ETL) Process

 Data integration
Integration that comprises three major processes:
1. Data access: the ability to access data from the source
2. Data federation: the integration of business views data
from various sources
3. Change capture: capture and delivery of the changes
made to the data sources

When these three processes are correctly implemented, data


can be accessed and made accessible to an array of ETL
and analysis tools and DW environments 18
Data Integration and the Extraction,
Transformation, and Load (ETL) Process

 Extraction, transformation, and load (ETL)


A data warehousing process that consists of:
 extraction (reading data from a database),

 transformation (converting the extracted data from its previous


form into the form in which it needs to be so that it can be
placed into a data warehouse or simply another database),

 loading (putting the data into the data warehouse)

19
Data Integration and the Extraction,
Transformation, and Load (ETL) Process

20
Data Warehouse Development

 Data warehouse development approaches


 Inmon Model: EDW approach
 Kimball Model: Data mart approach

 Which model is best?


 There is no one-size-fits-all strategy to data warehousing
 One alternative is the hosted warehouse

21
DW Development best practice
 Project must fit corporate strategy and business objectives
 There must be complete buy-in to the project by executives, managers,
and users
 It is important to manage user expectations about the completed project
 The data warehouse must be built incrementally
 The project must be managed by both IT and business professionals
 Develop a business/supplier relationship
 Only load data that have been cleansed and are of a quality understood
by the organization
 Do not overlook training requirements
22
Real-Time Data Warehousing
 Real-time (active) data warehousing
The process of loading and providing data via a data warehouse as
they become available

 Levels of data warehouses:


1. Reports what happened
2. Some analysis occurs
3. Provides prediction capabilities,
4. Operationalization
5. Becomes capable of making events happen

23
Real-Time Data Warehousing

24
Real-Time Data Warehousing
 The need for real-time data
 Business often cannot afford to wait a whole day for its operational data to load into the
data warehouse for analysis
 Provides incremental real-time data showing every state change
 Less costly to develop, maintain, and secure one huge data warehouse so that data are
centralized for BI/BA tools

 Data warehouse administrator (DWA)


A person responsible for the administration and management of a data warehouse

 Effective security in a data warehouse should focus on four main areas:


 Establishing effective corporate and security policies and procedures
 Implementing logical security procedures and techniques to restrict access
 Limiting physical access to the data center environment
25
 Establishing an effective internal control review process with an emphasis on security and privacy
Selecting a Data Warehouse Vendor

 Six guidelines to considered when developing a vendor list:

1. Financial strength
2. ERP linkages
3. Qualified consultants
4. Market share
5. Industry experience
6. Established partnerships

26

You might also like