Data Warehousing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 28

Data warehousing

The basic concepts & its


applications
What is a Data warehouse?
It means…
 collection of databases with information
extracted from other databases.
 a repository of an organization's
electronically stored data
 model of corporation's data, put together
to answers the corporation's business
questions.
 A data warehouse is designed to support
strategic decision making.
 designed to facilitate reporting and
analysis.
Comparing a Data Warehouse
and an Operational Database
Data Warehouse, Operational Database

subject oriented application oriented

integrated multiple diverse sources

time –variant, current real-time

nonvolatile updateable
History of DATA
WAREHOUSING
 concept dates back to the late
1980s when IBM researchers Barry
Devlin and Paul Murphy developed
"business data warehouse".
 In essence, data warehousing
concept was intended to provide an
architectural model for the flow of
data from operational systems
to decision support environments
 In absence of a data warehousing
architecture, an enormous amount of
redundancy of information was required to
support the multiple decision support
environments that usually existed.

 In larger corporations it was typical for


multiple decision support environments to
operate independently. Each environment
served different users but often required
much of the same data.

 The process of gathering, cleaning and


integrating data from various sources,
usually long existing operational systems
was replicated for each environment.
 the operational systems were frequently
reexamined as new decision support
requirements emerged.

 Based on analogies with real-life


warehouses, data warehouses were
intended as large-scale
collection/storage/staging areas for
corporate data.

 Data could be retrieved from one central


point or data could be distributed to "retail
stores" or "data marts" which were tailored
Types of Data Warehousing
Applications
 Data warehousing systems target three
different types of applications:

 personal productivity

 query and reporting

 planning and analysis


"Personal productivity applications

 spreadsheets, statistical packages


and graphics tools,
 are useful for manipulating and
presenting data on individual PCs.
 Developed for a standalone
environment, these tools address
applications requiring only small
volumes of warehouse data.
"Data query and reporting applications

 deliverwarehouse-wide data access


through simple, list-oriented queries,
 the generation of basic reports.
 These reports provide a view of historical
data
 they do not address the enterprise need
for in-depth analysis and planning.
"Planning and analysis applications

address essential business requirements as

 budgeting,
 forecasting,
 product line and customer profitability,
 sales analysis,
 financial consolidations
 manufacturing mix analysis
 --applications that use historical, projected and derived
data.
Benefits of Data
Warehousing
 Has a subject area orientation
 Data is arranged by subject rather than by
application, and is more intuitive for users to
navigate.
 closer in concept to the way decision makers
think about their business.

 Integrates data from multiple,


diverse sources
 the integration provided by a data warehouse
was seen as very desirable.
 Allows grouping multiple, diverse sources of
data & integrating them into single
administrative suite
 Allows for analysis of data over
time
 data snapshots taken at times that are significant to the
decision making process make it possible to analyze trends
over time.
 This can readily indicate areas where further analysis is
warranted.

 Adds ad hoc reporting and


enquiry
 In addition to the points highlighted in the definition, a data
warehouse is designed to be accessible with end-user tools,
 Relieves the development burden on
IT
 With data warehouse and users trained in the use of
appropriate desktop tools, users can find answers to their
own questions.
 Provides improved performance for complex analytical
queries

 provides improved performance for


complex analytical queries
 Using denormalization and aggregation.
 Frequently used aggregates are often precomputed and
materialized in views known as summary tables
 These materialized views provide fast access to integrated
 Relieves processing burden on
transaction oriented databases
 Provides analysis capabilities to
decision makers
 Allows for a continuous planning
process
 Converts Corporate Data into
Strategic Information
Drawbacks of data
warehousing
 1. Time spent in careful analysis of
measurable needs
 2. Design and implementation effort
 3. Hardware costs
 4. Software costs
 5. On-going support and maintenance
 6. Resulting re-engineering effort
Design process of data
warehouse
The actual design process for developing
data warehouse

 talk to the users


 determine their needs in terms that can be measured
 design a database to support those needs
 document the data descriptions and other attributes
 design the logic for translating data from various sources
into an integrated data store
 Writing code for extracting data from various sources &
transforming it into data warehouse, with updates to the
metadata
 Finally package the procedures to handle scheduling,
management and maintenance
Data Warehouse Architecture

 Architecture,
in the context of an
organization's data warehousing efforts,
is a conceptualization of how the data
warehouse is built.

 conceptualizationof data warehouse


architecture consists of the following
interconnected layers:
 OPERATIONAL DATABASE LAYER
 source data for the data warehouse - An organization's EIS systems fall
into this layer.

 INFORMATIONAL ACCESS LAYER


 data accessed for reporting and analyzing & tools for reporting and
analyzing data - Business intelligence tools fall into this layer.

 DATA ACCESS LAYER


 The interface between the operational and informational access layer -
Tools to extract, transform, load data into the warehouse fall into this
layer.

 METADATA LAYER
 The data directory -This is often usually more detailed than an
operational system data directory. There are dictionaries for the entire
warehouse and sometimes dictionaries for the data that can be
accessed by a particular reporting and analysis tool.
Data Warehouse
Components
 LOAD MANAGEMENT
 relates to the collection of info. from disparate internal or external
sources.
 loading process includes summarizing, manipulating and changing data
structures into format that lends itself to analytical processing.
 Actual raw data kept alongside or within data warehouse enabling
construction of new and different representations.

 WAREHOUSE MANAGEMENT
 relates to day-to-day management of data warehouse.
 management tasks associated with warehouse includes ensuring its
availability, the effective backup of its contents & security.

 QUERY MANAGEMENT
 relates to provision of access to contents of warehouse
 includes partitioning of info. into different areas with different
privileges to different users.
 Access may be provided through custom-built applications, or ad hoc
query tools.
Technologies Involved In Data
Warehousing

 source system identification:


 data warehouse design and creation:
 changed data capture:
 data acquisition:
 data cleansing:
 data aggregation
 multi-dimensional analysis tools:
 business intelligence (bi)
 metadata management
 data mining tools:
 data visualization tools:
 query tools:
APPROACHES TO STORING
DATA
 leading approaches
to storing data in a
data warehouse –
 DIMENSIONAL
APPROACH
 NORMALIZED
APPROACH.
DIMENSIONAL APPROACH
 transaction data are partitioned into either "facts" (numeric
transaction data), or "dimensions", (reference information giving
context to facts).

 Key advantage :
 easier for the user to understand and to use.
 the retrieval of data from the data warehouse tends to operate very
quickly.

 disadvantages

 to maintain integrity of facts and dimensions, loading the data


warehouse with data from different operational systems is
complicated
 difficult to modify data warehouse structure if organization adopting
the dimensional approach changes the way in which it does
business.
NORMALIZED APPROACH
 the data in the data warehouse are stored following the Codd
normalization rule.

 Tables grouped by subject areas that reflect general data categories


data on customers, products, finance, etc.)

 Key advantage
 it is straightforward to add information into the database.

 disadvantage
 because of no. of tables involved, difficult for users to
 1) Join data from different sources into meaningful information
 2) Access the information without a precise understanding of
sources of data & data structure of the data warehouse.

You might also like