DMBI Unit-2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Principles of Dimensional Modelling

Dimensional Data Modeling


Dimensional Data Modeling is one of the data modeling
techniques used in data warehouse design.
Goal: Improve the data retrieval.
The concept of Dimensional Modeling was developed by Ralph
Kimball which is comprised of facts and dimension tables. Since
the main goal of this modeling is to improve the data retrieval so
it is optimized for SELECT OPERATION. The advantage of using
this model is that we can store data in such a way that it is easier
to store and retrieve the data once stored in a data warehouse.
Dimensional model is the data model used by many OLAP
systems.
Steps to Create Dimensional Data Modeling:
• Step-1: Identifying the business objective –
The first step is to identify the business objective. Sales, HR,
Marketing, etc. are some examples as per the need of the
organization. Since it is the most important step of Data
Modelling the selection of business objective also depends on
the quality of data available for that process.
• Step-2: Identifying Granularity –
Granularity is the lowest level of information stored in the
table. The level of detail for business problem and its solution
is described by Grain.
• Step-3: Identifying Dimensions and its Attributes –
Dimensions are objects or things. Dimensions categorize and
describe data warehouse facts and measures in a way that
support meaningful answers to business questions. A data
warehouse organizes descriptive attributes as columns in
dimension tables. For Example, the data dimension may
contain data like a year, month and weekday.
• Step-4: Identifying the Fact –
The measurable data is hold by the fact table. Most of the fact
table rows are numerical values like price or cost per unit, etc.
• Step-5: Building of Schema –
We implement the Dimension Model in this step. A schema is
a database structure. There are two popular schemes: Star
Schema and Snowflake Schema.

Figure – Steps for Dimensional Model


From requirements to data design
Good Business Intelligence (BI), allows your organization to query data
obtained from trusted sources and use the answers to gain a competitive
edge in your industry. The first step to achieving effective BI is a well-
designed warehouse. Data warehouse design is the process of building a
solution to integrate data from multiple sources that support analytical
reporting and data analysis. A poorly designed data warehouse can result
in acquiring and using inaccurate source data that negatively affect the
productivity and growth of your organization. This blog post will take a high-
level look at the data warehouse design process from requirements
gathering to implementation.
Requirements Gathering
Gathering requirements is step one of the data warehouse design process.
The goal of the requirements gathering phase is to determine the criteria
for a successful implementation of the data warehouse. An organization's
long-term business strategy should be just as important as the current
business and technical requirements. User analysis and reporting
requirements must be identified as well as hardware, development, testing,
implementation, and user training.
Once the business and technical strategy has been decided the next step
is to address how the organization will backup the data warehouse and
how it will recover if the system fails. Developing a disaster recovery plan
while gathering requirements, ensures that the organization is prepared to
respond quickly to direct and indirect threats to the data warehouse.
Physical Environment Setup
Once the business requirements are set, the next step is to determine the
physical environment for the data warehouse. At a minimum, there should
be separate physical application and database servers as well as separate
ETL/ELT, OLAP, cube, and reporting processes set up for development,
testing, and production. Building separate physical environments ensure
that all changes can be tested before moving them to production,
development, and testing can occur without halting the production
environment, and if data integrity becomes suspect, the IT staff can
investigate the issue without negatively impacting the production
environment.
Data Modeling
Once requirements gathering and physical environments have been
defined, the next step is to define how data structures will be accessed,
connected, processed, and stored in the data warehouse. This process is
known as data modeling. During this phase of data warehouse design, is
where data sources are identified. Knowing where the original data resides
and just as importantly, the availability of that data, is crucial to the success
of the project. Once the data sources have been identified, the data
warehouse team can begin building the logical and physical structures
based on established requirements.
ETL
The ETL process takes the most time to develop and eats up the majority
of implementation. Identifying data sources during the data modeling phase
may help to reduce ETL development time. The goal of ETL is to provide
optimized load speeds without sacrificing quality. Failure at this stage of the
process can lead to poor performance of the ETL process and the entire
data warehouse system.
OLAP Cube Design
On-Line Analytical Processing (OLAP) is the answer engine that provides
the infrastructure for ad-hoc user query and multi-dimensional analysis.
OLAP design specification should come from those who will query the data.
Documentation specifying the OLAP cube dimensions and measures
should be obtained during the beginning of data warehouse design
process. The three critical elements of OLAP design include:
Grouping measures - numerical values you want to analyze such as
revenue, number of customers, how many products customers purchase,
or average purchase amount.
Dimension - where measures are stored for analysis such as geographic
region, month, or quarter.
Granularity - the lowest level of detail that you want to include in the OLAP
dataset.
During development, make sure the OLAP cube process is optimized. A
data warehouse is usually not a nightly priority run, and once the data
warehouse has been updated, there little time left to update the OLAP
cube. Not updating either of them in a timely manner could lead to reduced
system performance. Taking the time to explore the most efficient OLAP
cube generation path can reduce or prevent performance problems after
the data warehouse goes live.
Front End Development
At this point, business requirements have been captured, physical
environment complete, data model decided, and ETL process has been
documented. The next step is to work on how users will access the data
warehouse. Front end development is how users will access the data for
analysis and run reports. There are many options available, including
building your front end in-house or purchasing an off the shelf product.
Either way, there are a few considerations to keep in mind to ensure the
best experience for end users.
Secure access to the data from any device - desktop, laptop, tablet, or
phone should be the primary consideration. The tool should allow your
development team to modify the backend structure as enterprise level
reporting requirements change. It should also provide a Graphical User
Interface (GUI) that enables users to customize their reports as needed.
The OLAP engine and data can be the best in class, but if users are not
able to use the data, the data warehouse becomes an expensive and
useless data repository.
Report Development
For most end users, the only contact they have with the data warehouse is
through the reports they generate. As mentioned in the front end
development section, users’ ability to select their report criteria quickly and
efficiently is an essential feature for data warehouse report generation.
Delivery options are another consideration. Along with receiving reports
through a secure web interface, users may want or need reports sent as an
email attachment, or spreadsheet. Controlling the flow and visibility of data
is another aspect of report development that must be addressed.
Developing user groups with access to specific data segments should
provide data security and control. Reporting will and should change well
after the initial implementation. A well-designed data warehouse should be
able to handle the new reporting requests with little to no data warehouse
system modification.
Performance Tuning
Earlier in this post, the recommendation was to create separate
development and testing environments. Doing so allows organizations to
provide system performance tuning on ETL, query processing, and report
delivery without interrupting the current production environment. Make sure
the development and testing environments-hardware and applications
mimic the production environment so that the performance enhancements
created in development will work in the live production environment.
Testing
Once the data warehouse system has been developed according to
business requirements, the next step is to test it. Testing, or quality
assurance, is a step that should not be skipped because it will allow the
data warehouse team to expose and address issues before the initial
rollout. Failing to complete the testing phase could lead to implementation
delays or termination of the data warehouse project.
Implementation
Time to go live. Deciding to make the system available to everyone at once
or perform a staggered release, will depend on the number of end users
and how they will access the data warehouse system. Another important
aspect of any system implementation and one that is often skipped, is end-
user training. No matter how "intuitive" the data warehouse team and
developers think the GUI is, if the actual end users finds the tool difficult to
use, or do not understand the benefits of using the data warehouse for
reporting and analysis, they will not engage.
Understanding Best Practices for Data Warehouse Design
Data warehouse design is a time consuming and challenging endeavor.
There will be good, bad, and ugly aspects found in each step. However, if
an organization takes the time to develop sound requirements at the
beginning, subsequent steps in the process will flow more logically and
lead to a successful data warehouse implementation.
Multidimensional Data Model
Multidimensional data model stores data in the form of data cube.
Mostly, data warehousing supports two or three-dimensional cubes.
A data cube allows data to be viewed in multiple dimensions.
Dimensions are entities with respect to which an organization wants to
keep records. For example in store sales record, dimensions allow the
store to keep track of things like monthly sales of items and the
branches and locations.
A multidimensional database helps to provide data-related answers to
complex business queries quickly and accurately.
Data warehouses and Online Analytical Processing (OLAP) tools are
based on a multidimensional data model. OLAP in data warehousing
enables users to view data from different angles and dimensions.





OLAP in the Data Warehouse
Limitations of other analysis methods (OLTP)
Online transaction processing refers to more than just financial
transactions. Telemarketers use OLTP for their phone surveys, call
centers use it to access customer data and companies use it to
manage customers' online accounts. Businesses also use it to
facilitate e-commerce, their internal communications and numerous
applications. If your company doesn't use don't use OLTP, it may
lose sales opportunities and operational efficiencies. However, OLTP
also comes with important disadvantages and limitations.

Unscheduled Downtime

Your business can suffer considerable losses when the OLTP system
goes down, even temporarily. This can happen due to network
outages, data corruption or hardware failure. Companies can protect
their operation by building redundancy into the business platform,
but that may not prove cost-effective for smaller businesses. To
mitigate these concerns, hire competent IT personnel who are
available 24/7 to respond to critical issues. You also must maintain
good lines of communication with your institutional partners, so
that you can get support when you need it and pass information
along to others who may be affected.

Concurrency Challenges

OLTP systems allow multiple users to access and modify the same
data at the same time. For obvious reasons, you can't allow one user
to change data while another person is modifying it. You must
devise an efficient way to ensure people aren't working at cross
purposes while retaining a system that is responsive for everyone.
This may require costly systems designs and maintenance. OLTP
concurrency best practices have evolved in step with the growth of
the Internet and OLTP itself, so the solutions are available in the
form of OLTP software packages, but if you can't implement them
yourself, you must hire a professional to do it.

Atomicity

In OLTP, "atomicity" refers to a transaction in which either all the


database steps succeed or the entire transaction fails. If any one step
goes wrong, and the transaction continues anyway, you'll probably
end up with data errors or corruption. That could be devastating for
your company. All OLTP transactions should be atomic, with an
emphasis on data recovery when something goes wrong. However,
there may be bottom-line consequences when the technology
doesn't work correctly. Inefficiently implemented database
atomicity also may cause system slowdowns.

Financial Transaction Processing Costs

For many businesses, "OLTP" narrowly refers to transactions with


financial institutions -- mainly credit and debit card payments over
the Internet or through physical card readers. Financial institutions
do impose costs on merchants for these transactions. Your business
will be charged monthly fees, minimum fees, and gateway payments.
Transaction fees hit you twice, first as a percentage of the value of
the entire transaction, and then as a supplemental absolute fee
that's usually a fraction of a dollar. According to figures collected by
Community Merchants USA and reported in Forbes, plastic
transactions -- debit, credit and gift cards -- accounted for two-
thirds of all point-of-sale transactions in 2013. Few businesses can
afford to ignore these payment methods. You either must eat the
costs, include the cost in your margins or charge customers a fee for
paying electronically.

OLAP is the answer…


An effective OLAP solution solves problems for both business users
and IT departments. For business users, it enables fast and intuitive
access to centralized data and related calculations for the purposes
of analysis and reporting. For IT, an OLAP solution enhances a data
warehouse or other relational database with aggregate data and
business calculations. In addition, by enabling business users to do
their own analyses and reporting, OLAP systems reduce demands on
IT resources.

OLAP offers five key benefits:

• Business-focused multidimensional data


• Business-focused calculations
• Trustworthy data and calculations
• Speed-of-thought analysis
• Flexible, self-service reporting

OLAP Definition and Rules


OLAP stands for "Online Analytical Processing." OLAP allows
users to analyse database information from multiple database
systems at one time. While relational databases are considered
to be two-dimensional, OLAP data is multidimensional,
meaning the information can be compared in many different
ways. For example, a company might compare their computer
sales in June with sales in July, then compare those results with
the sales from another location, which might be stored in a
different database.

In order to process database information using OLAP, an OLAP


server is required to organize and compare the information.
Clients can analyse different sets of data using functions built
into the OLAP server. Some popular OLAP server software
programs include Oracle Express Server and Hyperion
Solutions Essbase. Because of its powerful data analysis
capabilities, OLAP processing is often used for data mining,
which aims to discover new relationships between different
sets of data.

Codd’s rules for OLAP Tools


In 1993, Dr. E.F. Codd originated twelve rules as the basis for
selecting OLAP tools. The publication of these rules was the
result of research carried out on behalf of Arbor Software and has
resulted in a formalized redefinition of the requirements for OLAP
tools. These rules are:

1. Multi-dimensional conceptual view of the database


2. Concept of transparency
3. Concept of accessibility
4. Consistent reporting performance
5. Client-server architecture
6. Generic dimensionality
7. Dynamic sparse matrix handling
8. Multi-user support
9. Unrestricted cross-dimensional operations
10. Intuitive data manipulation
11. Flexible reporting
12. Unlimited dimensions and aggregation levels

• Multi-dimensional conceptual view: OLAP tools should allow


users with a multi-dimensional model that keep up a
correspondence to users' views of the enterprise and is
intuitively analytical and simple to use. Interestingly, this rule is
given various levels of support by sellers who disagree that a
multi-dimensional conceptual view of data can be delivered
without multi-dimensional storage.

• Transparency: The OLAP technology has the underlying


database and architecture, and the likely heterogeneity of input
data sources that should be apparent to users. This necessity is
to preserve the user's productivity and proficiency with familiar
front-end environments and tools.

• Accessibility: The OLAP tool also let to access data needed for
the analysis from all heterogeneous enterprise data sources
such as relational, non-relational, and legacy methods.

• Consistent reporting performance: With the number of


dimensions, levels of aggregations, and the size of the database
raises, users ought to not perceive any significant fall in
performance. There should be no change in the way the key
figures are calculated, and the system models must have to be
strong enough to cope with changes to the enterprise model.
• Client-server architecture: The OLAP system should be
proficient enough to operate efficiently in a client-server
environment. The architecture should permit optimal
performance, flexibility, adaptability, scalability, and
interoperability.

• Generic dimensionality: Every data dimension must be the same


in both structure and operational capabilities, i.e., the basic
structure, formulae, and reporting should not be biased towards
any one dimension.

• Dynamic sparse matrix handling: The OLAP system should be


able to cope up with the physical schema to the specific
analytical model that optimizes sparse matrix handling to
achieve and maintain the required level of performance.

• Multi-user support: The OLAP system should be able to hold up


a group of users working at the same time on the same or
different models of the enterprise's data.

• Unrestricted cross-dimensional operations: The OLAP system


must be able to identify the dimensional hierarchies and
automatically perform associated roll-up calculations across
dimensions.

• Intuitive data manipulation: Slicing and cubing, consolidation


(roll-up), and other manipulations can be accomplished via direct
'point-and-click' or 'drag-and-drop' actions on the cells of the
cube.

• Flexible reporting: The capability of arranging rows, columns,


and cells in a Way that facilitates analysis by an intuitive visual
presentation of analytical reports must exist.
• Unlimited dimensions and aggregation levels: Depending on
business needs, an analytical model may have some dimensions
each having multiple hierarchies.

Characteristics of OLAP
Fast
It defines which the system targeted to deliver the most
feedback to the client within about five seconds, with the
elementary analysis taking no more than one second and
very few taking more than 20 seconds.

Analysis
It defines which the method can cope with any business
logic and statistical analysis that is relevant for the
function and the user, keep it easy enough for the target
client. Although some preprogramming may be needed
we do not think it acceptable if all application definitions
have to be allow the user to define new Adhoc
calculations as part of the analysis and to document on
the data in any desired method, without having to
program so we excludes products (like Oracle Discoverer)
that do not allow the user to define new Adhoc
calculation as part of the analysis and to document on
the data in any desired product that do not allow
adequate end user-oriented calculation flexibility.

Share
It defines which the system tools all the security
requirements for understanding and, if multiple write
connection is needed, concurrent update location at an
appropriated level, not all functions need customer to
write data back, but for the increasing number which
does, the system should be able to manage multiple
updates in a timely, secure manner.

Multidimensional
This is the basic requirement. OLAP system must provide
a multidimensional conceptual view of the data, including
full support for hierarchies, as this is certainly the most
logical method to analyze business and organizations.

Information
The system should be able to hold all the data needed by
the applications. Data sparsity should be handled in an
efficient manner.

Features of OLAP
Online Analytical Processing (OLAP) is usually contrasted from
Online Transactional Processing (OLTP) and is a way of storing
data so that it can be used for better analytical queries. The
biggest difference in OLAP is that data is stored in what is
commonly referred to as an OLAP cube. In OLTP however, the
data is stored in tables and it uses a relational database system.

What OLAP does is pre-process all of the ways someone might


want to analyze the data so that when someone does use it, the
process is very fast.

OLAP provides summary data and generates rich calculations.


For example, OLAP answers questions like "How do sales of
mutual funds in North America for this quarter compare with sales
a year ago? What can we predict for sales next quarter? What is
the trend as measured by percent change?"

Advantages of OLAP

• Fast query performance due to optimized storage,


multidimensional indexing and caching.

• Smaller on-disk size of data compared to data stored in


relational database due to compression techniques.

• Automated computation of higher level aggregates of the


data.

• It is very compact for low dimension data sets.

• Array models provide natural indexing.

• Effective data extraction achieved through the pre-structuring


of aggregated data.

Disadvantages of OLAP

• Within some OLAP Solutions the processing step (data load)


can be quite lengthy, especially on large data volumes. This
is usually remedied by doing only incremental processing,
i.e., processing only the data which have changed (usually
new data) instead of reprocessing the entire data set.

• Some OLAP methodologies introduce data redundancy.

Functions of OLAP
OLAP can be used for data mining or the discovery of previously
undiscerned relationships between data items. An OLAP
database does not need to be as large as a data warehouse,
since not all transactional data is needed for trend analysis. Using
Open Database Connectivity (ODBC), data can be imported from
existing relational databases to create a multidimensional
database for OLAP.

OLAP products include IBM Cognos, Oracle OLAP and Oracle


Essbase. OLAP features are also included in tools such as
Microsoft Excel and Microsoft SQL Server's Analysis Services).
OLAP products are typically designed for multiple-user
environments, with the cost of the software based on the number
of users.

Hypercube (OLAP Cube)


An OLAP cube is a multidimensional database that is optimized
for data warehouse and online analytical processing (OLAP)
applications.

An OLAP cube is a method of storing data in a multidimensional


form, generally for reporting purposes. In OLAP cubes, data
(measures) are categorized by dimensions. OLAP cubes are
often pre-summarized across dimensions to drastically improve
query time over relational databases. The query language used to
interact and perform tasks with OLAP cubes is multidimensional
expressions (MDX). The MDX language was originally developed
by Microsoft in the late 1990s, and has been adopted by many
other vendors of multidimensional databases.

Although it stores data like a traditional database does, an OLAP


cube is structured very differently. Databases, historically, are
designed according to the requirements of the IT systems that
use them. OLAP cubes, however, are used by business users for
advanced analytics. Thus, OLAP cubes are designed using
business logic and understanding. They are optimized for
analytical purposes, so that they can report on millions of records
at a time. Business users can query OLAP cubes using plain
English.
OLAP Operations
OLAP stands for Online Analytical Processing Server. It is a
software technology that allows users to analyze information
from multiple database systems at the same time. It is based on
multidimensional data model and allows the user to query on
multi-dimensional data (eg. Delhi -> 2018 -> Sales data). OLAP
databases are divided into one or more cubes and these cubes
are known as Hyper-cubes.

OLAP operations:

There are five basic analytical operations that can be performed


on an OLAP cube:
1. Drill down: In drill-down operation, the less detailed data is
converted into highly detailed data. It can be done by:
• Moving down in the concept hierarchy
• Adding a new dimension
In the cube given in overview section, the drill down operation
is performed by moving down in the concept hierarchy
of Time dimension (Quarter -> Month).

2. Roll up: It is just opposite of the drill-down operation. It


performs aggregation on the OLAP cube. It can be done by:
• Climbing up in the concept hierarchy
• Reducing the dimensions

In the cube given in the overview section, the roll-up operation is


performed by climbing up in the concept hierarchy of Location
dimension (City -> Country).
3. Dice: It selects a sub-cube from the OLAP cube by selecting
two or more dimensions. In the cube given in the overview
section, a sub-cube is selected by selecting following
dimensions with criteria:
• Location = “Delhi” or “Kolkata”
• Time = “Q1” or “Q2”
• Item = “Car” or “Bus”

4. Slice: It selects a single dimension from the OLAP cube which


results in a new sub-cube creation. In the cube given in the
overview section, Slice is performed on the dimension Time =
“Q1”.

5. Pivot: It is also known as rotation operation as it rotates the


current view to get a new view of the representation. In the
sub-cube obtained after the slice operation, performing pivot
operation gives a new view of it.


BASIS FOR
ROLAP MOLAP
COMPARISON

Full Form ROLAP stands for MOLAP stands for

Relational Online Multidimensional Online

Analytical Processing. Analytical Processing.

Storage & Data is stored and Data is Stored and fetched

Fetched fetched from the main from the Proprietary database

data warehouse. MDDBs.

Data Form Data is stored in the Data is Stored in the large

form of relational tables. multidimensional array made

of data cubes.

Data volumes Large data volumes. Limited summaries data is

kept in MDDBs.
BASIS FOR
ROLAP MOLAP
COMPARISON

Technology Uses Complex SQL MOLAP engine created a

queries to fetch data precalculated and

from the main prefabricated data cubes for

warehouse. multidimensional data views.

Sparse matrix technology is

used to manage data sparsity.

View ROLAP creates a MOLAP already stores the

multidimensional view of static multidimensional view

data dynamically. of data in MDDBs.

Access Slow access. Faster access.







Executive Information System (EIS)


Definition - What does Executive Information System (EIS)
mean?

An executive information system (EIS) is a decision support


system (DSS) used to assist senior executives in the decision-
making process. It does this by providing easy access to
important data needed to achieve strategic goals in an
organization. An EIS normally features graphical displays on an
easy-to-use interface.

Executive information systems can be used in many different


types of organizations to monitor enterprise performance as well
as to identify opportunities and problems.
Data Warehouse and Business Strategy
This service can be targeted at producing a new Data Warehouse (DW)
strategy or for working within your company’s existing DW strategy. Both
business and IT management will be interviewed by the Data Warehouse
Architect to develop a well-grounded strategy for the long-term and short-
term.
We assess your data warehouse and business intelligence needs. Then we
develop a prioritized, high level plan on how to reach your goals.

Business Analysis
Working from the results of the Opportunity Assessment, and using the Data
Warehouse Strategy as a framework, the Senior Business Analyst identifies
the detailed requirements for the solution.
The Senior Business Analyst works with end users to capture their intended
use of the information for business growth, determine what information they
need to see, and how they need to see it:

• Identify the Key Performance Indicators (KPIs) which will be used to


track objectives.
• Online analysis processing requirements (analytics).
• Security requirements.
• Review current reporting systems that report on related data.
• Describe reports that need to be provided.
• Span of history to be kept in the data mart.
• Facts and Dimensions required to support the analysis and reports.
• Frequency of generating analytics reports.
• Frequency of refreshing the data mart.
• Geographic location of users.
• Method of distributing reports.
• Estimate how much data will be processed and stored for facts,
dimensions, summaries and indexes.

Outputs from this service include:


• Documentation of objectives, KPIs, requirements, reports, estimates,
assumptions and findings
• High level Logical Data Model.
• Scenarios describing how the analytics and reports will be used in the
business context.

Based on findings from the Business Analysis service, the Data Warehouse
Architect works with the company’s Data Administrator to determine the
data sources that are required to support the business needs, and the quality
of the data in those sources.

Sources may be internal or external. Internal data sources typically belong to


the company and reside in operational databases such as an order processing
database. External data sources may be acquired from outside the
organization, and often are used for demographic analyses or database
marketing.

If similar data exists in more than one source, we work with the Data
Administrator and business user to determine the single authoritative source

Data quality issues may include:


• Redundant data sources
• Duplicate rows within a data source
• Dirty data (invalid values)
• Data Integrity problems (inconsistent meanings and usage of fields)
• Referential Integrity problems (missing links, invalid links)

We will recommend solutions for addressing data quality issues. It is


imperative that the best quality data be used to drive your analytics.

Data quality improvement may involve one-time and ongoing activities such
as removing duplicates, standardizing names and addresses, deriving implied
values, and improving data validation or logic in the operational system
which generates the data.

You might also like