Oracle BI - Topic - Data Warehousing

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

In today’s competitive market, most successful companies respond quickly to market

changes and opportunities. The requirement to respond quickly is by effective and


efficient use of data and information. “Data Warehouse” is a central repository of data
that is organized by category to support the organization’s decision makers. Once data
is stored in a data warehouse, it can be accessed for analysis.
The term "Data Warehouse" was first invented by Bill Inmon in 1990. According to him,
“Data warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making process.”
Ralph Kimball provided a definition of data warehouse based on its functionality. He
said, “Data warehouse is a copy of transaction data specifically structured for query
and analysis.”
Data Warehouse (DW or DWH) is a system used for analysis of data and reporting
purposes. They are repositories that saves data from one or more heterogeneous data
sources. They store both current and historical data and are used for creating
analytical reports. DW can be used to create interactive dashboards for the senior
management.
For example, analytic reports can contain data for quarterly comparisons or for annual
comparison of sales report for a company.
Data in DW comes from multiple operational systems like sales, human resource,
marketing, warehouse management, etc. It contains historical data from different
transaction systems but it can also include data from other sources. DW is used to
separate data processing and analysis workload from transaction workload and
enables to consolidate the data from several data sources.

The Need for Data Warehouse


For example − You have a home loan agency, where data comes from multiple
SAP/non-SAP applications such as marketing, sales, ERP, HRM, etc. This data is
extracted, transformed and loaded into DW. If you have to do quarterly/annual sales
comparison of a product, you cannot use an operational database as this will hang the
transaction system. This is where the need for using DW arises.

Characteristics of a Data Warehouse


Some of the key characteristics of DW are −

 It is used for reporting and data analysis.


 It provides a central repository with data integrated from one or more sources.
 It stores current and historical data.

Data Warehouse vs. Transactional System


Following are few differences between Data Warehouse and Operational Database
(Transaction System) −
 Transactional system is designed for known workloads and transactions like updating a user
record, searching a record, etc. However, DW transactions are more complex and present a
general form of data.
 Transactional system contains the current data of an organization whereas DW normally
contains historical data.
 Transactional system supports parallel processing of multiple transactions. Concurrency
control and recovery mechanisms are required to maintain consistency of the database.
 Operational database query allows to read and modify operations (delete and update), while
an OLAP query needs only read-only access of stored data (select statement).
 DW involves data cleaning, data integration, and data consolidations.
DW has a three-layer architecture − Data Source Layer, Integration Layer, and
Presentation Layer. The following diagram shows the common architecture of a Data
Warehouse system.

Types of Data Warehouse System


Following are the types of DW system −

 Data Mart
 Online Analytical Processing (OLAP)
 Online Transaction Processing (OLTP)
 Predictive Analysis
Data Mart
Data Mart is the simplest form of DW and it normally focuses on a single functional
area, such as sales, finance or marketing. Hence, data mart usually gets data only
from few data sources.
Sources could be an internal transaction system, a central data warehouse, or an
external data source application. De-normalization is the norm for data modeling
techniques in this system.

Online Analytical Processing (OLAP)


An OLAP system contains less number of transactions but involves complex
calculations like use of Aggregations − Sum, Count, Average, etc.

What is Aggregation?
We save tables with aggregated data like yearly (1 row), quarterly (4 rows), monthly
(12 rows) and now we want to compare data, like Yearly only 1 row will be processed.
However, in an un-aggregated data, all the rows will be processed.
OLAP system normally stores data in multidimensional schemas like Star Schema,
Galaxy schemas (with Fact and Dimensional tables are joined in logical manner).
In an OLAP system, response time to execute a query is an effectiveness measure.
OLAP applications are widely used by Data Mining techniques to get data from OLAP
systems. OLAP databases store aggregated historical data in multi-dimensional
schemas. OLAP systems have data latency of a few hours as compared to Data Marts
where latency is normally closer to few days.
Online Transaction Processing (OLTP)
An OLTP system is known for large number of short online transactions like insert,
update, delete, etc. OLTP systems provide fast query processing and also responsible
to provide data integrity in multi-access environment.
For an OLTP systems, effectiveness is measured by the number of transactions
processed per second. OLTP systems normally contain only current data. The schema
used to store transactional databases is the entity model. Normalization is used for
data modeling techniques in OLTP system.

OLTP vs OLAP
The following illustration shows the key differences between an OLTP and OLAP
system.

Indexes − In an OLTP system, there are only few indexes while in an OLAP system
there are many indexes for performance optimization.
Joins − In an OLTP system, large number of joins and data is normalized; however, in
an OLAP system there are less joins and de-normalized.
Aggregation − In an OLTP system, data is not aggregated while in an OLAP database
more aggregations are used.

You might also like