Data Warehouse Architecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

DATA WAREHOUSE ARCHITCTURE AND DATA ANALYSIS

TECHNIQUES

Mrs. Vaishali V. Khachane


Dept of Information Technology
VPM’s Polytechnic
Thane, Mumbai
Email: [email protected]
Subsequently used quite differently. A data warehouse
Abstract provides the base for the powerful data analysis
A data warehouse is an architectural construct of an techniques that are available today such as data mining
information system that provides users with current and and multidimensional analysis, as well as the more
historical decision support information that is hard to traditional query and reporting. Making use of these
access or present in traditional operational systems. techniques along with data warehousing can result in
It is also an integrated set of products that enable the easier access to the information you need for more
extraction and transformation of operational data to be informed decision making.
loaded into a database for end-user analysis and
reporting. It usually contains historical data derived Data warehousing provides an excellent approach for
from transaction data, but it includes data from other transforming the vast amounts of data that exist in these
sources. It separates analysis workload from transaction organizations into useful and reliable information for
workload and enables an organization to consolidate getting answers to their questions and to support the
data from several sources. This paper presents concept decision making process. It is globally accepted that
of data warehousing, architecture of data warehouse and information is a very powerful asset that can provide
techniques of data analysis in data warehousing. significant benefits to any organization and a competitive
Query and reporting, multidimensional, analysis, and advantage in the business world. Organizations have vast
data mining run the spectrum of being analyst driven to amounts of data but have found it increasingly difficult to
analyst assisted to data driven. Because of this spectrum, access it and make use of it. This is because it is in many
each of the data analysis methods affects data modeling. different formats, exists on many different platforms, and
resides in many different file and database structures
developed by different vendors. Thus organizations have
Index Terms: Data warehousing, operational data store vs. had to write and maintain perhaps hundreds of programs
data warehouse, Data warehouse Architecture, Data that are used to extract, prepare, and consolidate data for
Analysis techniques use by many different applications for analysis and
reporting. This would typically require modification of
the extract programs or development of new ones. This
process is costly, inefficient, and very time consuming.
I. INTRODUCTION Data warehousing offers a better approach.
A data warehouse is a relational database that is designed
for query and analysis rather than for transaction Operational Data Store Vs Data Warehouse:
processing. It usually contains historical data derived Operational Data Store Data Warehouse
from transaction data, but it can include data from other
sources. In addition to a relational database, a data Character Process-oriented, Focused Subject-oriented,
istics on specific business Integrated, Non-volatile,
warehouse environment includes an extraction,
processes and task, Time-Variant, Support
transportation, transformation, and loading (ETL) Support high volume high volume OLAP
solution, an online analytical processing (OLAP) engine, transaction Processing
client analysis tools, and other applications that manage Age of Current, Near Term Historic (Month, Quarter,
the process of gathering data and delivering it to business the data (Today, Last Week) Year)
users. Primary Day-To-Day Decisions Day-To-Day Decisions
Use: Tactical Reporting Tactical Reporting
Current Operational Current Operational
A data warehouse acts as a centralized repository of an Results Results
organization's data. A data warehouse complements an Frequenc Twice Daily , Daily, Weekly, Monthly,
existing operational system and is therefore designed and y Of Weekly Quarterly[2,3]
Load
B. DATA WAREHOUSE ARCHITECTURE - WITH STAGING
II. DATA WAREHOUSE ARCHITECTURES: AREA

Data warehouse Architecture is a design that We need to clean and process the operational data
encapsulates all the facets of data warehousing for an before putting it into the warehouse. We can do this
enterprise environment. Data warehousing is the creation programmatically, although most data warehouses use a
of a central domain to store complex, decentralized staging area instead. A staging area simplifies building
enterprise data in a logical unit that enables data mining, summaries and general warehouse management.
business intelligence, and overall access to all relevant
data within an organization. Data warehouse architecture
is inclusive of all reporting requirements, data
management, security requirements, band width
requirements, and storage requirements. There are three
common types of data architecture which are as follows:

A. DATA WAREHOUSE ARCHITECTURE - BASIC


Figure below shows a simple architecture for a data
warehouse. End users directly access data derived from
several source systems through the data warehouse.

Data Staging Area:


A place where data is processed before entering the
warehouse. Data staging area is where the raw
operational data is extracted, cleaned, transformed and
combined so that it can be reported on and queried by
users. This area lies between the operational source
systems and the user database and is typically not
accessible to users.

In this figure the metadata and raw data of traditional


OLTP system is present, as is an additional type of data,
summary data. Summaries are very valuable in data C. DATA WAREHOUSE ARCHITECTURE - WITH A
warehouses because they pre-compute long operations in STAGING AREA AND DATA MARTS
advance. For example, a typical data warehouse query is
to retrieval something like August sales. We may want to customize your warehouse’s
architecture for different groups within your
Operational Source Systems: organization. You can do this by adding data marts,
Operational source systems are developed to capture and which are designed for a particular line of business. An
process original business transactions. These Systems are example is where purchasing, sales, and inventories are
designed for data entry, not for reporting, but it is from separated. In this example, a financial analyst might to
here the data in data warehouse gets populated. analyze historical data for purchases and sales.

Data Warehouse: Data Mart:


A data warehouse contains the data that is organized and Data mart is a logical subset of an enterprise-wide data
stored specifically for direct user queries and reports. It warehouse. For example, a data warehouse for a retail
differs from an OLTP database in the sense that it is chain is constructed incrementally from individual,
designed primarily for reads not writes. Data warehouses conformed data marts dealing with separate subject areas
and their architectures vary depending upon the specifics such as product sales. Dimensional Data marts are
of an organization's situation. organized by subject area such as sales, finance and
marketing and coordinated data category such as
customer, product and location. These flexible
information stores allows data structures to respond to
business changes-product line additions, new staff,
responsibilities, mergers, consolidations, and
acquisitions[2].

III. DATA ANALYSIS TECHNIQUES

A data warehouse is built to provide an easy to access


source of high quality data. It is a means to an end, not
the end itself. That end is typically the need to perform
analysis and decision making through the use of that
source of data. There are several techniques for data
analysis that are in common use today. They are query Query definition is the process of taking a business
and reporting, multidimensional analysis, and data question or hypothesis and translating it into a query
mining . They are used to formulate and display query format that can be used by a particular decision support
results, to analyze data content by viewing it from tool. When the query is executed, the tool generates the
different perspectives, and to discover patterns and appropriate language commands to access and retrieve
clustering attributes in the data that will provide further the requested data, which is returned in what is typically
insight into the data content. The techniques of data called an answer set. The data analyst then performs the
analysis can impact the type of data model selected and required calculations and manipulations on the answer
its content. set to achieve the desired results. Those results are then
formatted to fit into a display or report template that has
been selected for ease of understanding by the end user.
This template could consist of combinations of text,
graphic images, video, and audio. Finally, the report is
delivered to the end user on the desired output medium,
which could be printed on paper, visualized on a
computer display device, or presented audibly.

B. Multidimensional Analysis

Multidimensional analysis has become a popular way to


extend the capabilities of query and reporting. That is,
rather than submitting multiple queries, data is structured
to enable fast and easy access to answers to the questions
that are typically asked. For example, How much of each
Following are the data analysis techniques: of our products was sold on a particular day, by a
particular sales person, in a particular store? Each
A. Query and Reporting: separate part of that query is called a dimension. Many
answers can be readily available because the results are
Query and reporting analysis is the process of posing a not recalculated with each query; they are simply
question to be answered, retrieving relevant data from the accessed and displayed. The data categorized by these
data warehouse, transforming it into the appropriate different factors, or dimensions, makes it easier to
context, and displaying it in a readable format. understand, particularly by business-oriented users of the
Query and reporting capability primarily consists of data. Dimensions can have individual entities or a
selecting associated data elements, perhaps summarizing hierarchy of entities, such as region, store, and
them and grouping them by some category, and department.
presenting the results. Executing this type of capability
typically might lead to the use of more direct table scans.
For this type of capability, perhaps an ER model with a
normalized and/or denormalized data structure would be
most appropriate.
discover new insights about the business by giving us
answers to questions we might never have thought to
ask[1,2].

CONCLUSION
Over the next few years, the growth of data warehousing
is going to be enormous with new products and
technologies coming out frequently. The Data warehouse
Architecture is simply a framework for understanding
data warehousing and how the components of data
warehousing fit together. Data warehouse architecture
Multidimensional analysis enables users to look at a provides a useful way of determining if the organization
large number of interdependent factors involved in a is moving toward a reasonable data warehousing
business problem and to view the data in complex framework. The type of analysis that will be done with
relationships. End users are interested in exploring the the data warehouse can determine the type of model and
data at different levels of detail, which is determined the model’s contents.
dynamically. The complex relationships can be analyzed
through an iterative process that includes drilling down to
REFERENCES
lower levels of detail or rolling up to higher levels of
summarization and aggregation. Figure demonstrates that
the user can start by viewing the total sales for the 1. Chuck Ballard, Dirk Herreman, Don Schau, Rhonda
organization and drill down to view the sales by Bell ( Data Modeling techniques for Data
continent, region, country, and finally by customer. Or, Warehousing - redbooks.ibm.com)
the user could start at customer and roll up through the
different levels to finally reach total sales. Pivoting in the 2. Principle Partners Inc presentation(Data Warehouse
data can also be used. This is a data analysis operation concepts and Architecture)
whereby the user takes a different viewpoint than is
typical on the results of the analysis, changing the way 3. Oracle Data warehousing guide – Release 2(9.2)
the dimensions are arranged in the result. Like query and
reporting, multidimensional analysis continues until no
more drilling down or rolling up is performed.

C. DATA MINING

Data mining is a relatively new data analysis technique.


It is very different from query and reporting and
multidimensional analysis in that is uses what is called a
discovery technique.
Data mining is most typically used for statistical data
analysis and knowledge discovery. Statistical data
analysis detects unusual patterns in data and applies
statistical and mathematical modelling techniques to
explain the patterns. The models are then used to forecast
and predict. Types of statistical data analysis techniques
include linear and nonlinear analysis, regression analysis,
multivariate analysis, and time series analysis.
Knowledge discovery extracts implicit, previously
unknown information from the data. This often results in
uncovering unknown business facts.
Data mining is data driven. There is a high level of
complexity in stored data and data interrelations in the
data warehouse that are difficult to discover without data
mining. Data mining offers new insights into the business
that may not be discovered with query and reporting or
multidimensional analysis. Data mining can help

You might also like