Unit - 3 Data Warehouse Modelling and Online Analytical Processing II
Unit - 3 Data Warehouse Modelling and Online Analytical Processing II
Unit - 3 Data Warehouse Modelling and Online Analytical Processing II
Data Warehouse
Modelling and Online
Analytical Processing II
Business Analysis Framework :
The business analyst get the information from the data warehouses to measure the
performance and make critical adjustments in order to win over other business
holders in the market.
Having a data warehouse offers the following advantages −
• Since a data warehouse can gather information quickly and efficiently, it can
enhance business productivity.
• A data warehouse provides us a consistent view of customers and items,
hence, it helps us manage customer relationship.
• A data warehouse also helps in bringing down the costs by tracking trends,
patterns over a long period in a consistent and reliable manner.
Data Warehouse Design :
• Good Business Intelligence (BI), allows your organization to query data obtained
from trusted sources and use the answers to gain a competitive edge in your
industry.
• The first step to achieving effective BI is a well-designed warehouse.
• Data warehouse design is the process of building a solution to integrate data
from multiple sources that support analytical reporting and data analysis.
• A poorly designed data warehouse can result in acquiring and using inaccurate
source data that negatively affect the productivity and growth of your
organization.
• Datawarehouse design and usage deals with how data warehousing can be used
for information processing analytical processing data mining
• To design an effective and efficient data warehouse, we need to understand
and analyze the business needs and construct a business analysis framework.
• Each person has different views regarding the design of a data warehouse.
• These views are as follows −
The top-down view − This view allows the selection of relevant information
needed for a data warehouse.
The data source view − This view presents the information being captured,
stored, and managed by the operational system.
The data warehouse view − This view includes the fact tables and dimension
tables. It represents the information stored inside the data warehouse.
The business query view − It is the view of the data from the viewpoint of the
end-user.
Important steps in Data warehouse design :
Requirements Gathering :
• Gathering requirements is step one of the data warehouse design process.
• The goal of the requirements gathering phase is to determine the criteria for a
successful implementation of the data warehouse.
• An organization's long-term business strategy should be just as important as
the current business and technical requirements.
• User analysis and reporting requirements must be identified.
• Hardware, development, testing, implementation, and user training are also
important.
• Once the business and technical strategy has been decided the next step is
to address how the organization will backup the data warehouse and how it
will recover if the system fails.
• Developing a disaster recovery plan while gathering requirements, ensures
that the organization is prepared to respond quickly to direct and indirect
threats to the data warehouse.
Physical Environment Setup :
• Once the business requirements are set, the next step is to determine the
physical environment for the data warehouse.
• There should be separate physical application and database servers as well as
separate ETL/ELT, OLAP, data cube, and reporting processes set up for
development, testing, and production.
• Building separate physical environments ensure that all changes can be tested
before moving them to production, development, and testing.
• Testing can be done without halting the production environment.
• If data integrity problem occurs, the IT staff can investigate the issue without
negatively impacting the production environment.
Data Modeling :
• Once requirements gathering and physical environments have been defined, the
next step is to define how data structures will be accessed, connected, processed,
and stored in the data warehouse. This process is known as data modeling.
• During this phase of data warehouse design, it is important to identify data
sources. Knowing where the original data resides and just as importantly, the
availability of that data, is crucial to the success of the project.
• Once the data sources have been identified, the data warehouse team can begin
building the logical and physical structures based on established requirements.
• The ETL process occurs after identifying the data sources.
OLAP Cube Design :
• On-Line Analytical Processing (OLAP) provides the infrastructure for ad-hoc user
query and multi-dimensional analysis.
• OLAP design specification should come from users who will query the data.
• Documentation specifying the OLAP cube dimensions and measures should be
obtained during the beginning of data warehouse design process.
• The three critical elements of OLAP design include :-
• Grouping measures - numerical values you want to analyze such as
revenue, number of customers, how many products customers purchase,
or average purchase amount.
• Dimension - where measures are stored for analysis such as geographic
region, month, or quarter.
• Granularity - the lowest level of detail that you want to include in the
OLAP dataset.
Front End Development :
• Business requirements have been captured, physical environment complete,
data model decided, and ETL process has been documented. The next step is
to work on how users will access the data warehouse.
• Front end development is how users will access the data for analysis and run
reports.
• There are many options available, including building your front end in-house
or purchasing an off the shelf product.
• To ensure the best experience for end users we have to provide -
o Secure access to the data from any device - desktop, laptop, tablet, or
phone should be the primary consideration.
o The tool should allow your development team to modify the backend
structure as enterprise level reporting requirements change.
o It should also provide a Graphical User Interface (GUI) that enables
users to customize their reports as needed.
o The OLAP engine and data can be the best in class, but if users are not
able to use the data, the data warehouse becomes an expensive and
useless data repository.
Report Development :
• For most end users, the only contact they have with the data warehouse is
through the reports they generate.
• Users’ ability to select their report criteria quickly and efficiently is an essential
feature for data warehouse report generation.
• Along with receiving reports through a secure web interface, users may want
or need reports sent as an email attachment, or spreadsheet.
• Controlling the flow and visibility of data is another aspect of report
development.
• Developing user groups with access to specific data segments should provide
data security and control.
• Reporting should change after the initial implementation.
• A well-designed data warehouse should be able to handle the new reporting
requests with little to no data warehouse system modification.
Testing :
• Once the data warehouse system has been developed according to
business requirements, the next step is to test it.
• Testing, or quality assurance, is a step that should not be skipped because
it will allow the data warehouse team to expose and solve issues before
the initial rollout.
• Failing to complete the testing phase could lead to implementation delays
or termination of the data warehouse project.
Deployment :
• Time to go live.
• Deciding to make the system available to everyone at once or perform a
staggered release, will depend on the number of end users and how they will
access the data warehouse system.
• Another important aspect of any system implementation is end-user training.
• The data warehouse team and developers think the GUI is good, but if the
actual end users finds the tool difficult to use, or do not understand the
benefits of using the data warehouse for reporting and analysis, it is of no use.
Data warehouse building require three types of skill :
Business skill :
• Understanding how operational systems store and manage data
• Building extractors that can transfer data from operational systems to
data warehouse
• Building a refresher software that can keep the data warehouse upto date
Technology skill :
• Involves making assessments from quantitative information
• Deriving facts from historical information
• Discovering patterns and trends and look for anomalies
Program management skill :
• interfacing with many technologies, vendors and end users to deliver
results in cost effective and timely manner
Data warehouse usage for information processing :
• Data warehouses and data marts are used in a wide range of applications.
• Business executives use the data in data warehouses and data marts to perform
data analysis and make strategic decisions.
• Business users need to know what exists in the data warehouse (through
metadata), how to access the contents of the data warehouse, how to examine
the contents using analysis tools, and how to present the results of such
analysis.
There are three kind of warehouse application -
Information processing includes -
a) Querying
b) Statistical analysis
c) Reporting using cross tabs tables charts graphs
Bitmap Index :
• The bitmap indexing method is popular in OLAP products.
• it allows quick searching in data cubes.
• Index on a particular column.
• The bitmap index is an alternative representation of the record ID (RID) list.
• Each attribute is represented by distinct bit value.
• If a given attribute’s domain consists of n values, then n bits are needed for
each entry in the bitmap index (i.e., there are n bit vectors).
• If the attribute has the value v for a given row in the data table, then the bit
representing that value is set to 1 in the corresponding row of the bitmap index.
All other bits for that row are set to 0.
Example : Bitmap indexing
o In the AllElectronics data warehouse, suppose the dimension item at the
top level has four values (representing item types): “home
entertainment,” “computer,” “phone,” and “security.”
o Each value (e.g., “computer”) is represented by a bit vector in the item
bitmap index table.
o Suppose that the cube is stored as a relation table with 100,000 rows.
Because the domain of item consists of four values, the bitmap index
table requires four bit vectors (or lists), each with 100,000 bits.
o Figure shows a base (data) table containing the dimensions item and city,
and its mapping to bitmap index tables for each of the dimensions.
Base table item bitmap index table city bitmap index table
Hybrid OLAP :
• Hybrid OLAP is a combination of both ROLAP and MOLAP.
• It offers higher scalability of ROLAP and faster computation of MOLAP.
• HOLAP servers allows to store the large data volumes of detailed information.
• The aggregations are stored separately in MOLAP store.
Specialized SQL Servers :
• Specialized SQL servers provide advanced query language and query processing
support for SQL queries over star and snowflake schemas in a read-only
environment.