DimensionalityModeling 2023

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Dimensionality

Modelling
Advanced databases
Dr David Hamill
Content

Dimensionality Modelling

• Definitions
• Schemas
• Dimensional Model vs Entity Relationship
• Dimensional Modeling Stage of Kimball’s Business Dimensional Lifecycle
Definitions – Dimensional Models
• Dimensionality Modelling (DM): a logical design
technique to present data in a standard, intuitive form
to allow high-performance access.
• A DM forms a ‘star-like’ structure, called a star schema
or star join.
Definitions - Keys
• A DM is composed of one table with
composite primary keys, called a fact
table, and a set of smaller tables called
dimension tables.
• Simple (non-composite) primary key: Each
dimension table has one of these that
corresponds to one component from the
composite key in the fact table.
Definitions - Keys
• Natural keys are replaced with surrogate
keys. Meaning that all dimension and fact
table joins are done using the surrogate
keys instead.
• This is so the warehouse can
independently decide how to store data
instead of using the OLTP indexes.
Definitions - Facts
• Star schema’s are logical structures containing
fact tables in the center, surrounded by
denormalized dimension tables.
• Facts are generated by events occurring in the
past.
• Most data is contained in the fact table.
• Once inserted the facts should be read-only.
• Useful fact tables are usually numerical, or
additive.
Definitions
– Dimension tables
• Dimension tables usually contain
descriptive textual information.
• Dimension attributes are used as
constraints in data warehouse queries.
• Star schemas can be used to speed up
query performance by denormalizing
reference information into a single
dimension table.
Snowflake schema
• Snowflake schema: a variant of star
schema that has a fact table at the center,
surrounded by normalized dimension
tables.
• Starflake schema: a hybrid structure that
contains a mixture of star and snowflake
dimension tables.
Dimensional Modelling Advantages
Star, Snowflake, and starflake representations have the following advantages:
• Efficiency
• Handling of changing requirements
• All dimensions are equivalent to access to the fact table
• Extensibility
• Adding new facts
• Adding dimensions
• Added dimensional attributes
• Breaking down to lower levels of granularity
• Ability to model business situations
• Predictable query processing
Comparing DM and ER models
Entity Relationship Dimensional Model
• A single ER model normally • Multiple DMs are associated
decomposes to multiple DMs. through ‘shared’ dimension
• Used for identifying relations tables.
among entities to remove • Attraction is the high
redundancy. performance of ad-hoc user
queries.
Dimensional Modeling Stage of Kimball’s
Business Dimensional Lifecycle
• This stage can result in the creation of a DM for a data mart or to
dimensionalize a schema for an OLTP.
• Starts by defining high-level dimensional model, which progressively gains
detail and is achieved in a two-phased approach:
1. Creation of a high-level DM – this has 4 steps

2. Adding details to the model through identification of dimensional attributes


for the model.
Phase 1 – Step 1: Select Business Process
• The process refers to the subject matter of a particular
data mart.
• The first data mart built should be:
• one most likely to be delivered on time
• within budget
• Answer the most commercially important questions.
ER model of extended version of
DreamHome example
ER model of property Sales Business
process of DreamHome example
Phase 1 – Step 2: Declare Grain
• Decide what each record of the fact table will
represent.
• Identity dimensions of the fact table.
• The grain decision for the fact table determines the
grain of each dimension table.
• Include time as a core dimension, which is always
present in star schemas.
Phase 1 – Step 3: Choose Dimensions
• Dimensions set the context for asking questions about
the facts in the fact table.
• If a dimension occurs in two data marts, the must be
exactly the same dimension, or one is a subset of the
other.
• A dimension used in more than one data mart is
referred to as being a conformed dimension.
Star schemas for property sales and property
advertising
Phase 1 – Step 4: Identify Facts
• The grain of the fact table determines which facts can be used in the
data mart
• Facts should be numeric and additive.
• Non-Usable facts include:
• Non-numeric facts
• Non-additive facts
• Facts at different granularity from other facts in the table
• Once facts are selected each should be re-examined to determine if
there are opportunities to use pre-calculations
Phase 1 – Step 4: Identify Facts –bad facts
table
Phase 1 – Step 4: Identify Facts – corrected
facts table
Phase 2: Identify all dimension attributes for
DM
• This involves adding attributes to dimension tables.
• They are identified by the business requirements analysis to
analyse the given business process.
• Text descriptions are added to the tables and should be
intuitive and understandable to users.
• Usefulness of data mart will be determined by the scope and
nature of these attributes.
Issues while developing dimensional
models
Choosing a duration of a database:
• How far back the database goes.
• Some enterprises want information a year or two earlier for comparison
(sales).
• Others legally require documents extending back 5 or more years
(insurance).
• Older data typically has problems with reading and interpretation.
• Slowly changing dimension problem: important for old versions important
dimensions be used and not the current ones.
Slowly Changing dimension Problem
• Example: the proper • 3 Types of slowly changing
description of the old client dimension:
and old branch must be used 1. Changed dimension attribute
with old transaction history. overwritten.
2. Changed dimension causes new
• Data warehouse must assign a dimension record to be created.
key to these dimensions to 3. Old attribute changed to an
distinguish multiple snapshots alternate attribute so both old
of clients and branches over and new values of attribute are
time. simultaneously available.
Kimball’s Business Dimensional Lifecycle
• Produces a data mart that supports the
requirements of a particular business process.
• Allows easy integration with other related data
marts to form an enterprise-wide data warehouse.
• A dimensional model containing more than one fact
table sharing more than one conformed dimension
tables is referred to as a fact constellation.
Dimensional model (fact constellation) for
DreamHome data warehouse

You might also like