Unit Ii DWDM
Unit Ii DWDM
Unit Ii DWDM
The major distinguishing features between OLTP and OLAP are summarized
asfollows.
2. Data contents: An OLTP system manages current data that, typically, are too
detailed to be easily used for decision making. An OLAP system manages large
amounts of historical data, provides facilities for summarization and aggregation,
and stores and manages information at different levels of granularity. These
featuresmake the data easier for use in informed decision making.
4. View: An OLTP system focuses mainly on the current data within an enterprise
or department, without referring to historical data or data in different
organizations. In contrast, an OLAP system often spans multiple versions of a
database schema. OLAP systems also deal with information that originates from
different organizations, integrating information from many data stores. Because of
their huge volume, OLAP data are stored on multiple storage media.
5. Access patterns: The access patterns of an OLTP system consist mainly of
short, atomic transactions. Such a system requires concurrency control and
recovery mechanisms. However, accesses to OLAP systems are mostly read-only
operationsalthough many could be complex queries.
2.3 Data warehouse architecture: The Design of a Data Warehouse: A
Business Analysis Framework Four different views regarding the design of a data
warehouse must be considered: the top-down view, the data source view, the data
warehouseview, the business query view.
The top-down view allows the selection of relevant information
necessary for the data warehouse.
The data source view exposes the information being captured, stored
and managed by operational systems.
The data warehouse view includes fact tables and dimension tables
Finally the business query view is the Perspective of data in the data
warehouse from the viewpoint of the end user.
(i).Independent data marts are sourced from data captured from one or
more operational systems or external information providers, or from data
generated locally within a particular department or geographic area.
EXTRACT: Some of the data elements in the operational database can be reasonably be
expected to be useful in the decision making, but others are of less value for that purpose.
For this reason, it is necessary to extract the relevant data from the operational database
before bringing into the data warehouse. Many commercial tools are available to help with
the extraction process. Data Junction is one of the commercial products. The user of one
of these tools typically has an easy-to-use windowed interface by which to specify the
following:
(i) Which files and tables are to be accessed in the source database?
(ii) Which fields are to be extracted from them? This is often
doneinternally by SQL Select statement.
(iii) What are those to be called in the resulting database?
(iv) What is the target machine and database format of the output?
(v) On what schedule should the extraction process be repeated?
Data Cleansing must deal with many types of possible errors. These include
missing data and incorrect data at one source; inconsistent data and conflicting data
when twoor more source are involved. There are several algorithms followed to
clean the data, which will be discussed in the coming lecture notes.
LOADING: Loading often implies physical movement of the data from the
computer(s) storing the source database(s) to that which will store the data
warehouse database, assumingit is different. This takes place immediately after the
extraction phase. The most common channel for data movement is a high-speed
communication link. Ex: Oracle Warehouse Builder is the API from Oracle,
which provides the features to performthe ETL task on Oracle Data Warehouse.
2.6 Meta data Repository: Metadata are data about data. When used in a data
warehouse, metadata are the data that define warehouse objects. Metadata are
created for the data names and definitions of the given warehouse. Additional
metadata are created and captured for time stamping any extracted data, the source
of the extracted data, and missing fields that have been added by data cleaning or
integration processes. A metadata repositoryshould contain:
Star schema: The star schema is a modeling paradigm in which the data warehouse
contains (1) a large central table (fact table), and (2) a set of smaller attendant tables
(dimension tables), one for each dimension. The schema graph resembles a starburst,
with the dimension tables displayed in a radial pattern around the central fact table.
Snowflake schema: The snowflake schema is a variant of the star schema model,
where some dimension tables are normalized, thereby further splitting the data into
additional tables. The resulting schema graph forms a shape similar to a snowflake.
The major difference between the snowflake and star schema models is that the
dimension tables of the snowflake model may be kept in normalized form. Such a
table is easy to maintain and also saves storage space because a large dimension
table can be extremely large whenthe dimensional structure is included as columns.