ORACLE SQL Materialized Views
ORACLE SQL Materialized Views
ORACLE SQL Materialized Views
Yanamala
Materialized Views
This chapter introduces you to the use of materialized views and discusses:
Typically, data flows from one or more online transaction processing (OLTP)
databases into a data warehouse on a monthly, weekly, or daily basis. The data
is normally processed in a staging file before being added to the data
warehouse. Data warehouses commonly range in size from tens of gigabytes to
a few terabytes. Usually, the vast majority of the data is stored in a few very large
fact tables.
The summaries or aggregates that are referred to in this book and in literature on
data warehousing are created in Oracle using a schema object called a
materialized view. Materialized views can perform a number of roles, such as
improving query performance or providing replicated data.
In data warehouses, you can use materialized views to precompute and store
aggregated data such as the sum of sales. Materialized views in these
environments are often referred to as summaries, because they store
summarized data. They can also be used to precompute joins with or without
aggregations. A materialized view eliminates the overhead associated with
expensive joins and aggregations for a large or important class of queries.
See Also:
You can also use materialized views to download a subset of data from central
servers to mobile clients, with periodic refreshes and updates between clients
and the central servers.
See Also:
You can use materialized views in data warehouses to increase the speed of
queries on very large databases. Queries to large databases often involve joins
between tables, aggregations such as SUM, or both. These operations are
expensive in terms of time and processing power. The type of materialized view
you create determines how the materialized view is refreshed and used by query
rewrite.
You can use materialized views in a number of ways, and you can use almost
identical syntax to perform a number of roles. For example, a materialized view
can replicate data, a process formerly achieved by using the CREATE
SNAPSHOT statement. Now CREATE MATERIALIZED VIEW is a synonym for
CREATE SNAPSHOT.
When using query rewrite, create materialized views that satisfy the largest
number of queries. For example, if you identify 20 queries that are commonly
applied to the detail or fact tables, then you might be able to satisfy them with five
or six well-written materialized views. A materialized view definition can include
any number of aggregations (SUM, COUNT(x), COUNT(*),
COUNT(DISTINCT x), AVG, VARIANCE, STDDEV, MIN, and MAX). It can also
include any number of joins. If you are unsure of which materialized views to
create, Oracle provides a set of advisory procedures in the DBMS_OLAP
package to help in designing and evaluating materialized views for query rewrite.
These functions are also known as the Summary Advisor or the Advisor. Note
that the OLAP Summary Advisor is different. See Oracle9i OLAP User's Guide
for further details regarding the OLAP Summary Advisor.
See Also:
Many large decision support system (DSS) databases have schemas that do not
closely resemble a conventional data warehouse schema, but that still require
joins and aggregates. The use of summary management features imposes no
schema restrictions, and can enable some existing DSS database applications to
improve performance without the need to redesign the database or the
application.
Figure 8-2 illustrates the use of summary management in the warehousing cycle.
After the data has been transformed, staged, and loaded into the detail data in
the warehouse, you can invoke the summary management process. First, use
the Advisor to plan how you will use summaries. Then, create summaries and
design how queries will be rewritten.
Dimension tables usually change slowly over time and are not modified on
a periodic schedule. They are used in long-running decision support
queries to aggregate the data returned from the query into appropriate
levels of the dimension hierarchy.
Chapter 9, "Dimensions"
•
• Fact tables describe the business transactions of an enterprise. Fact
tables are sometimes called detail tables.
The vast majority of data in a data warehouse is stored in a few very large
fact tables that are updated periodically with data from one or more
operational OLTP databases.
Fact tables include facts (also called measures) such as sales, units, and
inventory.
Fact tables also contain one or more foreign keys that organize the
business transactions by the relevant business entities such as time,
product, and market. In most cases, these foreign keys are non-null, form
a unique compound key of the fact table, and each foreign key joins with
exactly one row of a dimension table.
You must not create dimensions in any schema that does not satisfy these
relationships. Incorrect results can be returned from queries otherwise.
See Also:
Guidelines 1 and 2 are more important than guideline 3. If your schema design
does not follow guidelines 1 and 2, it does not then matter whether it follows
guideline 3. Guidelines 1, 2, and 3 affect both query rewrite performance and
materialized view refresh performance.
Schema
Guideline Description
Guideline 1 Dimensions should either be denormalized (each dimension
contained in one table) or the joins between tables in a normalized
Dimensions or partially normalized dimension should guarantee that each child-
side row joins with exactly one parent-side row. The benefits of
maintaining this condition are described in "Creating Dimensions".
You can enforce this condition by adding FOREIGN KEY and NOT
NULL constraints on the child-side join keys and PRIMARY KEY
constraints on the parent-side join keys.
Guideline 2 If dimensions are denormalized or partially denormalized,
hierarchical integrity must be maintained between the key columns
Dimensions of the dimension table. Each child key value must uniquely identify
its parent key value, even if the dimension table is denormalized.
Hierarchical integrity in a denormalized dimension can be verified
by calling the VALIDATE_DIMENSION procedure of the
DBMS_OLAP package.
Schema
Guideline Description
Guideline 3 Fact and dimension tables should similarly guarantee that each fact
table row joins with exactly one dimension table row. This condition
Dimensions must be declared, and optionally enforced, by adding FOREIGN
KEY and NOT NULL constraints on the fact key column(s) and
PRIMARY KEY constraints on the dimension key column(s), or by
using outer joins. In a data warehouse, constraints are typically
enabled with the NOVALIDATE and RELY clauses to avoid
constraint enforcement performance overhead. See Oracle9i SQL
Reference for further details.
Guideline 4 Incremental loads of your detail data should be done using the
SQL*Loader direct-path option, or any bulk loader utility that uses
Incremental Oracle's direct-path interface. This includes INSERT ... AS SELECT
Loads with the APPEND or PARALLEL hints, where the hints cause the
direct loader log to be used during the insert. See Oracle9i SQL
Reference and "Types of Materialized Views".
Guideline 5 Range/composite partition your tables by a monotonically
increasing time column if possible (preferably of type DATE).
Partitions
Guideline 6 After each load and before refreshing your materialized view, use
the VALIDATE_DIMENSION procedure of the DBMS_MVIEW
Dimensions package to incrementally verify dimensional integrity.
Guideline 7 If a time dimension appears in the materialized view as a time
column, partition and index the materialized view in the same
Time manner as you have the fact tables.
Dimensions
If you are concerned with the time required to enable constraints and whether
any constraints might be violated, use the ENABLE NOVALIDATE with the RELY
clause to turn on constraint checking without validating any of the existing
constraints. The risk with this approach is that incorrect query results could occur
if any constraints are broken. Therefore, as the designer, you must determine
how clean the data is and whether the risk of wrong results is too great.
Loading Data
A popular and efficient way to load data into a warehouse or data mart is to use
SQL*Loader with the DIRECT or PARALLEL option or to use another loader tool
that uses the Oracle direct-path API.
See Also:
Immediately after loading the detail data and updating the indexes on the detail
data, the database can be opened for operation, if desired. You can disable
query rewrite at the system level by issuing an ALTER SYSTEM SET
QUERY_REWRITE_ENABLED = false statement until all the materialized views
are refreshed.
The motivation for using materialized views is to improve performance, but the
overhead associated with materialized view management can become a
significant system management problem. When reviewing or evaluating some of
the necessary materialized view management activities, consider some of the
following:
After the initial effort of creating and populating the data warehouse or data mart,
the major administration overhead is the update process, which involves:
• Periodic extraction of incremental changes from the operational systems
• Transforming the data
• Verifying that the incremental changes are correct, consistent, and
complete
• Bulk-loading the data into the warehouse
• Refreshing indexes and materialized views so that they are consistent with
the detail data
The update process must generally be performed within a limited period of time
known as the update window. The update window depends on the update
frequency (such as daily or weekly) and the nature of the business. For a daily
update frequency, an update window of two to six hours might be typical.
You need to know your update window for the following activities:
The SELECT clause in the materialized view creation statement defines the data
that the materialized view is to contain. Only a few restrictions limit what can be
specified. Any number of tables can be joined together. However, they cannot be
remote tables if you wish to take advantage of query rewrite. Besides tables,
other elements such as views, inline views (subqueries in the FROM clause of a
SELECT statement), subqueries, and materialized views can all be joined or
referenced in the SELECT clause.
See Also:
Here are some examples of materialized views with aggregates. Note that
materialized view logs are only created because this materialized view will be
fast refreshed.
Note that COUNT(*) must always be present. Oracle recommends that you
include the optional aggregates in column Z in the materialized view in order to
obtain the most efficient and accurate fast refresh of the aggregates.
Fast refresh for a materialized view containing only joins is possible after any
type of DML to the base tables (direct-path or conventional INSERT, UPDATE, or
DELETE).
If you specify REFRESH FAST, Oracle performs further verification of the query
definition to ensure that fast refresh can be performed if any of the detail tables
change. These additional checks are:
If some of these restrictions are not met, you can create the materialized view as
REFRESH FORCE to take advantage of fast refresh when it is possible. If one of
the tables did not meet all of the criteria, but the other tables did, the materialized
view would still be fast refreshable with respect to the other tables for which all
the criteria are met.
A materialized view log should contain the rowid of the master table. It is not
necessary to add other columns.
Alternatively, if the previous example did not include the columns times_rid and
customers_id, and if the refresh method was REFRESH FORCE, then this
materialized view would be fast refreshable only if the sales table was updated
but not if the tables times or customers were updated.
In a data warehouse, you typically create many aggregate views on a single join
(for example, rollups along different dimensions). Incrementally maintaining these
distinct materialized aggregate views can take a long time, because the
underlying join has to be performed many times.
You can create a nested materialized view on materialized views that contain
joins only or joins and aggregates.
All the underlying objects (materialized views or tables) on which the materialized
view is defined must have a materialized view log. All the underlying objects are
treated as if they were tables. All the existing options for materialized views can
be used, with the exception of ON COMMIT REFRESH, which is not supported
for a nested materialized views that contains joins and aggregates.
Using the tables and their columns from the sh sample schema, the following
materialized views illustrate how nested materialized views can be created.
You can nest materialized views with joins and aggregates, but the ON DEMAND
clause is necessary for FAST REFRESH.
You should keep the following in mind when deciding whether to use nested
materialized views:
• If you want to use fast refresh, you should fast refresh all the materialized
views along any chain. It makes little sense to define a fast refreshable
materialized view on top of a materialized view that must be refreshed with
a complete refresh.
• If you want the highest level materialized view to be fresh with respect to
the detail tables, you need to ensure that all materialized views in a tree
are refreshed in the correct dependency order before refreshing the
highest-level. Oracle does not provide support for automatic refreshing of
intermediate materialized views in a nested hierarchy. If the materialized
views under the highest-level materialized view are stale, refreshing only
the highest-level will succeed, but makes it fresh only with respect to its
underlying materialized view, not the detail tables at the base of the tree.
• When refreshing materialized views, you need to ensure that all
materialized views in a tree are refreshed. If you only refresh the highest-
level materialized view, the materialized views under it will be stale and
you must explicitly refresh them.
The following restrictions exist on the way you can nest materialized views:
Once you have selected the materialized views you want to create, follow these
steps for each materialized view.
See Also:
If you do not know how much space the materialized view will require, then the
DBMS_OLAP.ESTIMATE_SIZE package, which is described in Chapter 16,
"Summary Advisor", can estimate the number of bytes required to store this
uncompressed materialized view. This information can then assist the design
team in determining the tablespace in which the materialized view should reside.
You should use data segment compression with highly redundant data, such as
tables with many foreign keys. This is particularly useful for materialized views
created with the ROLLUP clause. Data segment compression reduces disk use
and memory use (specifically, the buffer cache), often leading to a better scaleup
for read-only operations. Data segment compression can also speed up query
execution.
See Also:
Build Methods
Two build methods are available for creating the materialized view, as shown in
Table 8-2. If you select BUILD IMMEDIATE, the materialized view definition is
added to the schema objects in the data dictionary, and then the fact or detail
tables are scanned according to the SELECT expression and the results are
stored in the materialized view. Depending on the size of the tables to be
scanned, this build process can take a considerable amount of time.
Before creating a materialized view, you can verify what types of query rewrite
are possible by calling the procedure DBMS_MVIEW.EXPLAIN_MVIEW. Once
the materialized view has been created, you can use
DBMS_MVIEW.EXPLAIN_REWRITE to find out if (or why not) it will rewrite a
specific query.
If you define a materialized view as BUILD DEFERRED, it is not eligible for query
rewrite until it is populated with data.
Query rewrite is not possible with all materialized views. If query rewrite is not
occurring when expected, DBMS_MVIEW.EXPLAIN_REWRITE can help provide
reasons why a specific query is not eligible for rewrite. Also, check to see if your
materialized view satisfies all of the following conditions.
• The defining query of the materialized view cannot contain any non-
repeatable expressions (ROWNUM, SYSDATE, non-repeatable PL/SQL
functions, and so on).
• The query cannot contain any references to RAW or LONG RAW
datatypes or object REFs.
• If the defining query of the materialized view contains set operators
(UNION, MINUS, and so on), rewrite will use them for full text match
rewrite only.
• If the materialized view was registered as PREBUILT, the precision of the
columns must agree with the precision of the corresponding SELECT
expressions unless overridden by the WITH REDUCED PRECISION
clause.
• If the materialized view contains the same table more than once, it is
possible to do a general rewrite, provided the query has the same aliases
for the duplicate tables as the materialized view.
General Query Rewrite Restrictions
• If a query has both local and remote tables, only local tables will be
considered for potential rewrite.
• Neither the detail tables nor the materialized view can be owned by SYS.
• SELECT and GROUP BY lists, if present, must be the same in the query
of the materialized view.
• Aggregate functions must occur only as the outermost part of the
expression. That is, aggregates such as AVG(AVG(x)) or AVG(x)+ AVG(x)
are not allowed.
• CONNECT BY clauses are not allowed.
Refresh Options
When you define a materialized view, you can specify two refresh options: how to
refresh and what type of refresh. If unspecified, the defaults are assumed as ON
DEMAND and FORCE.
Refresh
Mode Description
ON Refresh occurs automatically when a transaction that modified one of
COMMIT the materialized view's detail tables commits. This can be specified as
long as the materialized view is fast refreshable (in other words, not
complex). The ON COMMIT privilege is necessary to use this mode
ON Refresh occurs when a user manually executes one of the available
DEMAND refresh procedures contained in the DBMS_MVIEW package
(REFRESH, REFRESH_ALL_MVIEWS, REFRESH_DEPENDENT)
When a materialized view is maintained using the ON COMMIT method, the time
required to complete the commit may be slightly longer than usual. This is
because the refresh operation is performed as part of the commit process.
Therefore this method may not be suitable if many users are concurrently
changing the tables upon which the materialized view is based.
If you think the materialized view did not refresh, check the alert log or trace file.
If a materialized view fails during refresh at COMMIT time, you must explicitly
invoke the refresh procedure using the DBMS_MVIEW package after addressing
the errors specified in the trace files. Until this is done, the materialized view will
no longer be refreshed automatically at commit time.
You can specify how you want your materialized views to be refreshed from the
detail tables by selecting one of four options: COMPLETE, FAST, FORCE, and
NEVER. Table 8-4 describes the refresh options.
Refresh
Option Description
COMPLETE Refreshes by recalculating the materialized view's defining query
FAST Applies incremental changes to refresh the materialized view using
the information logged in the materialized view logs, or from a
SQL*Loader direct-path or a partition maintenance operation
FORCE Applies FAST refresh if possible; otherwise, it applies COMPLETE
refresh
NEVER Indicates that the materialized view will not be refreshed with the
Oracle refresh mechanisms
Whether the fast refresh option is available depends upon the type of
materialized view. You can call the procedure DBMS_MVIEW.EXPLAIN_MVIEW
to determine whether fast refresh is possible.
Defining queries for materialized views with joins only and no aggregates have
the following restrictions on fast refresh:
Defining queries for materialized views with joins and aggregates have the
following restrictions on fast refresh:
• All tables in the materialized view must have materialized view logs, and
the materialized view logs must:
Materialized views with the UNION ALL set operator support the REFRESH
FAST option if the following conditions are satisfied:
• The defining query must have the UNION ALL operator at the top level.
• Each query block in the UNION ALL query must satisfy the requirements
of a fast refreshable materialized view with aggregates or a fast
refreshable materialized view with joins.
Note that Oracle also allows the special case of a single table materialized
view with joins only provided the ROWID column has been included in the
SELECT list and in the materialized view log. This is shown in the defining
query of the view view_with_unionall_mv.
ORDER BY Clause
The ORDER BY clause is not considered part of the materialized view definition.
As a result, there is no difference in the manner in which Oracle detects the
various types of materialized views (for example, materialized join views with no
aggregates). For the same reason, query rewrite is not affected by the ORDER
BY clause. This feature is similar to the CREATE TABLE ... ORDER BY
capability that exists in Oracle.
Materialized view logs are required if you want to use fast refresh. They are
defined using a CREATE MATERIALIZED VIEW LOG statement on the base
table that is to be changed. They are not created on the materialized view. For
fast refresh of materialized views, the definition of the materialized view logs
must specify the ROWID clause. In addition, for aggregate materialized views, it
must also contain every column in the table referenced in the materialized view,
the INCLUDING NEW VALUES clause and the SEQUENCE clause.
• For ON COMMIT, the mixed DML statements occur within the same
transaction because the refresh of the materialized view will occur upon
commit of this transaction.
• For ON DEMAND, the mixed DML statements occur between refreshes.
The following example of a materialized view log illustrates where one is
created on the table sales that includes the SEQUENCE keyword:
• CREATE MATERIALIZED VIEW LOG ON sales
• WITH SEQUENCE, ROWID
• (prod_id, cust_id, time_id, channel_id, promo_id,
• quantity_sold, amount_sold)
• INCLUDING NEW VALUES;
See Also:
When using certain materialized views, you must ensure that your NLS
parameters are the same as when you created the materialized view.
Materialized views with this restriction are as follows:
The contents of the table must reflect the materialization of the defining query at
the time you register it as a materialized view, and each column in the defining
query must correspond to a column in the table that has a matching datatype.
However, you can specify WITH REDUCED PRECISION to allow the precision of
columns in the defining query to be different from that of the table columns.
The table and the materialized view must have the same name, but the table
retains its identity as a table and can contain columns that are not referenced in
the defining query of the materialized view. These extra columns are known as
unmanaged columns. If rows are inserted during a refresh operation, each
unmanaged column of the row is set to its default value. Therefore, the
unmanaged columns cannot have NOT NULL constraints unless they also have
default values.
Materialized views based on prebuilt tables are eligible for selection by query
rewrite provided the parameter QUERY_REWRITE_INTEGRITY is set to at least
the level of stale_tolerated or trusted.
See Also:
When you drop a materialized view that was created on a prebuilt table, the table
still exists--only the materialized view is dropped.
The following example illustrates the two steps required to register a user-defined
table. First, the table is created, then the materialized view is defined using
exactly the same name as the table. This materialized view sum_sales_tab is
eligible for use in query rewrite.
You could have compressed this table to save space. See "Storage And Data
Segment Compression" for details regarding data segment compression.
If the user-defined materialized view does not contain a time dimension, then:
• Create a new materialized view that does include the time dimension (if
possible).
• The view should aggregate over the time column in the new materialized
view.
Partitioning a materialized view also has benefits for refresh, because the refresh
procedure can use parallel DML to maintain the materialized view.
See Also:
It is possible and advantageous to track freshness to a finer grain than the entire
materialized view. The ability to identify which rows in a materialized view are
affected by a certain detail table partition, is known as Partition Change Tracking
(PCT). When one or more of the detail tables are partitioned, it may be possible
to identify the specific rows in the materialized view that correspond to a modified
detail partition(s); those rows become stale when a partition is modified while all
other rows remain fresh.
Partition Change Tracking can be used to identify which materialized view rows
correspond to a particular detail table. Partition Change Tracking is also used to
support fast refresh after partition maintenance operations on detail tables. For
instance, if a detail table partition is truncated or dropped, the affected rows in
the materialized view are identified and deleted. Identifying which materialized
view rows are fresh or stale, rather than considering the entire materialized view
as stale, allows query rewrite to use those rows that are fresh while in
QUERY_REWRITE_INTEGRITY=ENFORCED or TRUSTED modes.
• At least one of the detail tables referenced by the materialized view must
be partitioned.
• Partitioned tables must use either range or composite partitioning.
• The partition key must consist of only a single column.
• The materialized view must contain either the partition key column or a
partition marker of the detail table. See Oracle9i Supplied PL/SQL
Packages and Types Reference for details regarding the
DBMS_MVIEW.PMARKER function.
• If you use a GROUP BY clause, the partition key column or the partition
marker must be present in the GROUP BY clause.
• Data modifications can only occur on the partitioned table.
• The COMPATIBILITY initialization parameter must be a minimum of
9.0.0.0.0.
• Partition Change Tracking is not supported for a materialized view that
refers to views, remote tables, or outer joins.
• Partition Change Tracking-based refresh is not supported for UNION ALL
materialized views.
Partition Marker
In many cases, the advantages of PCT will be offset by this restriction for highly
aggregated materialized views. The DBMS_MVIEW.PMARKER function is
designed to significantly reduce the cardinality of the materialized view (see
Example 8-7 for an example). The function returns a partition identifier that
uniquely identifies the partition for a specified row within a specified partition
table. The DBMS_MVIEW.PMARKER function is used instead of the partition key
column in the SELECT and GROUP BY clauses.
Unlike the general case of a PL/SQL function in a materialized view, use of the
DBMS_MVIEW.PMARKER does not prevent rewrite with that materialized view
even when the rewrite mode is QUERY_REWRITE_INTEGRITY=enforced.
The following example uses the sh sample schema and the three detail tables
sales, products, and times to create two materialized views. For this example,
sales is a partitioned table using the time_id column and products is partitioned
by the prod_category column. times is not a partitioned table.
The first materialized view is for the yearly sales revenue for each product.
The second materialized view is for monthly customer sales. As customers tend
to purchase in bulk, sales average just two orders for each customer per month.
Therefore, the impact of including the time_id in the materialized view will not
unacceptably increase the number of rows stored. However, most orders are
large and contain many different products. With approximately 1000 different
products sold each day, including the time_id in the materialized view would
substantially increase the cardinality. This materialized view uses the
DBMS_MVIEW.PMARKER function.
The detail tables must have materialized view logs for FAST REFRESH.
cust_mth_sales_mv includes the partition key column from table sales (time_id)
in both its SELECT and GROUP BY lists. This enables PCT on table sales for
materialized view cust_mth_sales_mv. However, the GROUP BY and SELECT
lists include PRODUCTS.PROD_ID rather than the partition key column
(PROD_CATEGORY) of the products table. Therefore, PCT is not enabled on
table products for this materialized view. In other words, any partition
maintenance operation to the sales table will allow a PCT fast refresh of
cust_mth_sales_mv. However, PCT fast refresh is not possible after any kind of
modification to the products table. To correct this, the GROUP BY and SELECT
lists must include column PRODUCTS.PROD_CATEGORY. Following a partition
maintenance operation, such as a drop partition, a PCT fast refresh should be
performed on any materialized view that is referencing the table upon which the
partition operations are undertaken.
Partitioning a materialized view involves defining the materialized view with the
standard Oracle partitioning clauses, as illustrated in the following example. This
statement creates a materialized view called part_sales_mv, which uses three
partitions, can be fast refreshed, and is eligible for query rewrite.
In this example, the table part_sales_tab has been partitioned over three months
and then the materialized view was registered to use the prebuilt table. This
materialized view is eligible for query rewrite because the ENABLE QUERY
REWRITE clause has been included.
If you plan to have rolling materialized views in your warehouse, you should
determine how frequently you plan to perform partition maintenance operations,
and you should plan to partition fact tables and materialized views to reduce the
amount of system administration overhead required when old data is aged out.
An additional consideration is that you might want to use data compression on
your infrequently updated partitions.
You are not restricted to using range partitions. For example, a composite
partition using both a time value and a key value could result in a good partition
solution for your data.
See Also:
This section discusses certain OLAP concepts and how relational databases can
handle OLAP queries. Next, it recommends an approach for using materialized
views to meet OLAP performance needs. Finally, it discusses using materialized
views with set operators, a common scenario for OLAP environments.
OLAP Cubes
While data warehouse environments typically view data in the form of a star
schema, OLAP environments view data in the form of a hierarchical cube. A
hierarchical cube includes both detail data and aggregated data: it is a data set
where the data is aggregated along the rollup hierarchy of each of its dimensions
and these aggregations are combined across dimensions. It includes the typical
set of aggregations needed for business intelligence queries.
Consider a sales data set with two dimensions, each of which has a 4-level
hierarchy:
This means there are 16 aggregate groups in the hierarchical cube. This is
because the four levels of time are multiplied by four levels of product to produce
the cube. Table 8-5 shows the four levels of each dimension.
Note that as you increase the number of dimensions and levels, the number of
groups to calculate increases dramatically. This example involves 16 groups, but
if you were to add just two more dimensions with the same number of levels, you
would have 4 x 4 x 4 x 4 = 256 different groups. Also, consider that a similar
increase in groups occurs if you have multiple hierarchies in your dimensions.
For example, the time dimension might have an additional hierarchy of fiscal
month rolling up to fiscal quarter and then fiscal year. Handling the explosion of
groups has historically been the major challenge in data storage for OLAP
systems.
Typical OLAP queries slice and dice different parts of the cube comparing
aggregations from one level to aggregation from another level. For instance, a
query might find sales of the grocery division for the month of January, 2002 and
compare them with total sales of the grocery division for all of 2001.
Oracle9i can specify hierarchical cubes in a simple and efficient SQL query.
These hierarchical cubes represent the logical cubes referred to in many OLAP
products. To specify data in the form of hierarchical cubes, users can work with
the extensions to GROUP BY clause introduced in Oracle9i.
You can use one of Oracle's new extensions to the GROUP BY clause,
concatenated grouping sets, to generate the aggregates needed for a
hierarchical cube of data. By using concatenated rollup (rolling up along the
hierarchy of each dimension and then concatenate them across multiple
dimensions), you can generate all the aggregations needed by a hierarchical
cube. These extensions are discussed in detail in Chapter 18, "SQL for
Aggregation in Data Warehouses".
The following shows the GROUP BY clause needed to create a hierarchical cube
for the 2-dimension example described earlier. The following simple syntax
performs a concatenated rollup:
This concatenated rollup takes the ROLLUP aggregations listed in the table of
the prior section and perform a cross-product on them. The cross-product will
create the 16 (4x4) aggregate groups needed for a hierarchical cube of the data.
Analytic applications treat data as cubes, but they want only certain slices and
regions of the cube. Concatenated rollup (hierarchical cube) enables relational
data to be treated as cubes. To handle complex analytic queries, the
fundamental technique is to enclose a hierarchical cube query in an outer query
that specifies the exact slice needed from the cube. Oracle9i optimizes the
processing of hierarchical cubes nested inside slicing queries. By applying many
powerful algorithms, these queries can be processed at unprecedented speed
and scale. This enables OLAP tools and analytic applications to use a consistent
style of queries to handle the most complex questions.
The inner hierarchical cube specified defines a simple cube, with two dimensions
and four levels in each dimension. It would generate 16 groups (4 Time levels * 4
Product levels). The GROUPING_ID function in the query identifies the specific
group each row belongs to, based on the aggregation level of the grouping-
columns in its argument.
The outer query applies the constraints needed for our specific query, limiting
Division to a value of 25 and Month to a value of 200201 (representing January
2002 in this case). In conceptual terms, it slices a small chunk of data from the
cube. The outer query's constraint on the GID column, indicated in the query by
gid-for-division-month would be the value of a key indicating that the data is
grouped as a combination of division and month. The GID constraint selects only
those rows that are aggregated at the level of a GROUP BY month, division
clause.
The bold items highlight the changed SQL. The inner query now has a simple
GROUP BY clause of month, division. The columns year, quarter, brand and item
have been converted to null to match the simplified GROUP BY clause. Because
the query now requests just one group, fifteen out of sixteen groups are removed
from the processing, greatly reducing the work. For a cube with more dimensions
and more levels, the savings possible through group pruning can be far greater.
Note that the group pruning transformation works with all the GROUP BY
extensions: ROLLUP, CUBE, and GROUPING SETS.
While the Oracle optimizer has simplified the previous query to a simple GROUP
BY, faster response times can be achieved if the group is precomputed and
stored in a materialized view. Because OLAP queries can ask for any slice of the
cube many groups may need to be precomputed and stored in a materialized
view. This is discussed in the next section.
OLAP requires fast response times for multiple users, and this in turn demands
that significant parts of an OLAP cube be precomputed and held in materialized
views. Oracle9i enables great flexibility in the use of materialized views for
OLAP.
Data warehouse designers can choose exactly how much data to materialize. A
data warehouse can have the full OLAP cube materialized. While this will take
the most storage space, it ensures quick response for any query within the cube.
On the other hand, a warehouse could have just partial materialization, saving
storage space, but allowing only a subset of possible queries to be answered at
highest speed. If an OLAP environment's queries cover the full range of
aggregate groupings possible in its data set, it may be best to materialize the
whole hierarchical cube.
See Also:
This section shows complete and partial hierarchical cube materialized views.
Materialized views with multiple aggregate groups will give their best
performance when partitioned appropriately. The most effective partitioning
scheme for these materialized views is to use list partitioning, especially with the
GROUPING_ID column. By partitioning the materialized views this way, you
enable partition pruning for queries rewritten against this materialized view: only
relevant aggregate groups will be accessed, greatly reducing the query
processing cost.
You should consider data compression when using highly redundant data, such
as tables with many foreign keys. In particular, materialized views created with
the ROLLUP clause are likely candidates.
See Also:
Oracle provides some support for materialized views whose defining query
involves set operators. Materialized views with set operators can now be created
enabled for query rewrite. Query rewrite with such materialized views is
supported using full exact text match. You can refresh the materialized view
using either ON COMMIT or ON DEMAND refresh.
Fast refresh is supported if the defining query has the UNION ALL operator at the
top level and each query block in the UNION ALL, meets the requirements of a
materialized view with aggregates or materialized view with joins only. Further,
the materialized view must include a constant column (known as a UNION ALL
marker) that has a distinct value in each query block, which, in the following
example, is columns 1 marker and 2 marker.
See "Restrictions on Fast Refresh on Materialized Views With the UNION ALL
Operator" for detailed restrictions on fast refresh for materialized views with
UNION ALL.
To create a UNION ALL materialized view with two join views, the materialized
view logs must have the rowid column and, in the following example, the UNION
ALL marker is the columns, 1 marker and 2 marker.
The following example shows a UNION ALL of a materialized view with joins and
a materialized view with aggregates. A couple of things can be noted in this
example. Nulls or constants can be used to ensure that the data types of the
corresponding SELECT list columns match. Also the UNION ALL marker column
can be a string literal, which is 'Year' umarker, 'Quarter' umarker, or 'Daily'
umarker in the following example:
The two most common operations on a materialized view are query execution
and fast refresh, and each operation has different performance requirements.
Query execution might need to access any subset of the materialized view key
columns, and might need to join and aggregate over a subset of those columns.
Consequently, query execution usually performs best if a single-column bitmap
index is defined on each materialized view key column.
In the case of materialized views containing only joins using fast refresh, Oracle
recommends that indexes be created on the columns that contain the rowids to
improve the performance of the refresh operation.
See Also:
The state of a materialized view can be checked by querying the data dictionary
views USER_MVIEWS or ALL_MVIEWS. The column STALENESS will show
one of the values FRESH, STALE, UNUSABLE, UNKNOWN, or UNDEFINED to
indicate whether the materialized view can be used. The state is maintained
automatically, but it can be manually updated by issuing an ALTER
MATERIALIZED VIEW name COMPILE statement.
To create a materialized view in your own schema, you must have the CREATE
MATERIALIZED VIEW privilege and the SELECT privilege to any tables
referenced that are in another schema. To create a materialized view in another
schema, you must have the CREATE ANY MATERIALIZED VIEW privilege and
the owner of the materialized view needs SELECT privileges to the tables
referenced if they are from another schema.
If the materialized view is on a prebuilt container, the creator, if different from the
owner, must have SELECT WITH GRANT privilege on the container table.
If you continue to get a privilege error while trying to create a materialized view
and you believe that all the required privileges have been granted, then the
problem is most likely due to a privilege not being granted explicitly and trying to
inherit the privilege from a role instead. The owner of the materialized view must
have explicitly been granted SELECT access to the referenced tables if the
tables are in a different schema.
All other changes are achieved by dropping and then re-creating the materialized
view.
See Also:
Use the DROP MATERIALIZED VIEW statement to drop a materialized view. For
example:
Note that you must run the utlxmv.sql script prior to calling EXPLAIN_MVIEW
except when you are placing the results in MSG_ARRAY. The script is found in
the admin directory. In addition, you must create MV_CAPABILITIES_TABLE in
the current schema. An explanation of the various capabilities is in Table 8-6,
and all the possible messages are listed in Table 8-7.
• stmt_id
• mv
• msg-array
DBMS_MVIEW.EXPLAIN_MVIEW Declarations
The following PL/SQL declarations that are made for you in the DBMS_MVIEW
package show the order and datatypes of these parameters for explaining an
existing materialized view and a potential materialized view with output to a table
and to a VARRAY.
DBMS_MVIEW.EXPLAIN_MVIEW
(mv IN VARCHAR2,
stmt_id IN VARCHAR2:= NULL);
DBMS_MVIEW.EXPLAIN_MVIEW
(mv IN VARCHAR2,
msg_array OUT SYS.ExplainMVArrayType);
Using MV_CAPABILITIES_TABLE
Example of DBMS_MVIEW.EXPLAIN_MVIEW
First, create the materialized view. Alternatively, you can use EXPLAIN_MVIEW
on a potential materialized view using its SELECT statement.
Then, you invoke EXPLAIN_MVIEW with the materialized view to explain. You
need to use the SEQ column in an ORDER BY clause so the rows will display in
a logical order. If a capability is not possible, N will appear in the P column and
an explanation in the MSGTXT column. If a capability is not possible for more
than one reason, a row is displayed for each reason.
See Also:
MV_CAPABILITIES_TABLE.CAPABILITY_NAME Details
CAPABILITY_NAME Description
PCT If this capability is possible, Partition
Change Tracking is possible on at
least one detail relation. If this
capability is not possible, PCT is not
possible with any detail relation
referenced by the materialized view.
REFRESH_COMPLETE If this capability is possible, complete
refresh of the materialized view is
possible.
REFRESH_FAST If this capability is possible, fast
refresh is possible at least under
certain circumstances.
REWRITE If this capability is possible, at least
full text match query rewrite is
possible. If this capability is not
possible, no form of query rewrite is
possible.
CAPABILITY_NAME Description
PCT_TABLE If this capability is possible, it is
possible with respect to a particular
partitioned table in the top level
FROM list. When possible, PCT
applies to the partitioned table named
in the RELATED_TEXT column.