DB m8 9 10 11 PDF

INTENDED LEARNING OUTCOME
• Define Data warehouse.

• Understand Data warehouse concept.
• List the advantages and disadvantages of data warehouse
• List the usage of data warehouse
4
What is a Data Warehousing?
• An electronic storage of a large amount of
information by a business or organization.
• A type of data management system that is

designed to enable and support business
intelligence (BI) activities, especially analytics.
Data warehouses are solely intended to
perform queries and analysis and often contain
large amounts of historical data. The data
https://www.betterbuys.com/wp-
within a data warehouse is usually derived from content/uploads/2015/01/Data-Warehouse-Visual21.png
a wide range of sources such as application log

files and transaction applications.
5
Data Warehousing vs. Databases
• A database is a transactional system that is set to monitor and update real-time
data in order to have only the most recent data available.
• A data warehouse is programmed to aggregate structured data over a period of

time.
https://blog.panoply.io/hs-fs/hubfs/db_vs_dw.jpg?width=694&name=db_vs_dw.jpg
6
Alternative Names for Data Warehouse System
https://www.guru99.com/images/1/data_warehousing.png
7
History of Data Warehouse
• 1960- Dartmouth and General Mills in a joint research project, develop the terms
dimensions and facts.
• 1970- A Nielsen and IRI introduces dimensional data marts for retail sales.
• 1983- Tera Data Corporation introduces a database management system which is
specifically designed for decision support
• Data warehousing started in the late 1980s when IBM worker Paul Murphy and
Barry Devlin developed the Business Data Warehouse.
• However, the real concept was given by Inmon Bill. He was considered as a father of
data warehouse. He had written about a variety of topics for building, usage, and
maintenance of the warehouse & the Corporate Information Factory.
8
Benefits of Data Warehouse
https://cdn.educba.com/academy/wp-content/uploads/2019/10/Benefits-of-Data-Warehouse.png
9
Data Warehouse Architecture
• Simple. All data warehouses share a basic design in which metadata, summary data,
and raw data are stored within the central repository of the warehouse. The
repository is fed by data sources on one end and accessed by end users for analysis,
reporting, and mining on the other end.
• Simple with a staging area. Operational data must be cleaned and processed before
being put in the warehouse. Although this can be done programmatically, many data
warehouses add a staging area for data before it enters the warehouse, to simplify
data preparation.
10
• Hub and spoke. Adding data marts between the central repository and end users
allows an organization to customize its data warehouse to serve various lines of
business. When the data is ready for use, it is moved to the appropriate data mart.
• Sandboxes. Sandboxes are private, secure, safe areas that allow companies to quickly
and informally explore new datasets or ways of analyzing data without having to
conform to or comply with the formal rules and protocol of the data warehouse.
11
The Data Warehouse Architecture
generally comprises of three tiers.
https://cdn.educba.com/academy/wp
-content/uploads/2019/05/OLAP-
Server.png
12
Top Tier
The Top Tier consists of the Client-side front end of the architecture.
The Transformed and Logic applied information stored in the Data Warehouse will be
used and acquired for Business purposes in this Tier.
Middle Tier
The Middle Tier consists of the OLAP Servers
OLAP is Online Analytical Processing Server
OLAP is used to provide information to business analysts and managers
Bottom Tier
The Bottom Tier mainly consists of the Data Sources, ETL Tool, and Data Warehouse.
13
Components of Data Warehouse
Query
Warehouse Manager
Manager
Load Manager
End-user
Access Tools
https://www.teksystems.com/-/media/teksystems/images/blogs/content%20images/conventional-data-warehouse-
architecture.png?h=327&w=601&hash=7FBF0193539FC596BBDA0A36A2373CB30D802006&hash=7FBF0193539FC596BBDA0A
36A2373CB30D802006&la=en
14
1. Load Manager
• Load manager is also called the front component.

• It performs with all the operations associated with the extraction and
load of data into the warehouse.
• These operations include transformations to prepare the data for
entering the Data warehouse.
15
2. Warehouse Manager
• Warehouse manager performs operations associated with the

management of the data in the warehouse.
• It performs operations like analysis of data to ensure consistency,
creation of indexes and views, generation of denormalization and
aggregations, transformation and merging of source data and
archiving and baking-up data.
16
3. Query Manager
• Query manager is also known as backend component.

• It performs all the operation operations related to the management of
user queries.
• The operations of this Data warehouse components are direct queries
to the appropriate tables for scheduling the execution of queries.
17
4. End-user Access Tools
• This is categorized into five different groups like:

o Data Reporting
o Query Tools
o Application development tools
o EIS tools
o OLAP tools and data mining tools.
18
Designing a Data Warehouse
Any data warehouse design must address the following:
• Specific data content
• Relationships within and between groups of data
• The systems environment that will support the data warehouse
• The types of data transformations required
• Data refresh frequency
19
Applications of Data Warehouse
1. Airline
2. Banking
3. Healthcare
4. Public Sector
5. Investment and Insurance Sector
6. Retail sectors
7. Telecommunication
8. Hospitality Industry
9. Financial services
10. Consumer goods
11. Controlled manufacturing
20
Advantages of Data Warehousing
1. Cleans data 8. Timely access to data
2. Indexes multiple types 9. Enhanced data consistency and quality
3. Secured data and its access 10. Return on Investment is high
4. Query process 11. Increase revenues
5. Enhanced business intelligence 12. Standardizes data across the organization
6. Increased system and query 13. Database Normalization
performance
7. Business Intelligence
21
Disadvantages of Data Warehousing
1. Raising ownership
2. Extra reporting
3. Data flexibility
4. Compatibility with the existing
system
5. Keeping data online
6. Dimensional technique
7. Costs
22
Example of Data Warehouse Tools:
20 Best ETL / Data Warehousing Tools in 2020
1. Cdata Sync
2. QuerySurge
3. Xplenty
4. CloverDX
5. Oracle
6. Amazon RedShift
7. Domo
8. Teradata
9. SAP
10. SAS
23
Example of Data Warehouse Tools:
20 Best ETL / Data Warehousing Tools in 2020
11. IBM DataStage
12. Informatica
13. MS SSIS
14. Talend Open Studio
15. The Ab Initio Software
16. Dundas
17. Sisense
18. TabLeau
19. MicroStrategy
20. Pentaho
24
References:
• Frankenfiled, J., 20-Aug-2019. Data Warehousing. Retrieved from Investopedia
https://www.investopedia.com/terms/d/data-warehousing.asp
• Oracle, 2020. What is Data Management?. Retrieved from
https://www.oracle.com/database/what-is-data-management/
• Oracle, 2020. What is a Data Warehouse?. Retrieved from
https://www.oracle.com/database/what-is-a-data-warehouse/
• Guru99, 2020. What is Data Warehouse? Types, Definition & Example. Retrieved
from https://www.guru99.com/data-warehousing.html#2.
• EDUCBA, 2020. Data Warehouse Architecture. Retrieved from
https://www.educba.com/data-warehouse-architecture/
25
References:
• Reddy, C.. Data Warehousing: Characteristics, Functions, Pros & Cons. Wisestep.
Retrieved from https://content.wisestep.com/data-warehousing-characteristics-
functions-pros-cons/
• Guru99, 2020. 20 Best ETL / Data Warehousing Tools in 2020. Retrieved from
https://www.guru99.com/top-20-etl-database-warehousing-tools.html
26
• Identify the types of Data warehouse

• Understand the concept of Enterprise Data Warehouse
• Understand the concept of Data Mart
• Understand the concept of Operational Data Warehouse
• Differentiate between data warehouse, data mart, enterprise data
warehouse, and operational data warehouse
30
Types of Data Warehouse
There are three types of data warehouse:
1. Enterprise Data Warehouse

2. Operational Data Store
3. Data Mart
31
Enterprise Data Warehouse (EDW)
• Enterprise Data Warehouse is a centralized warehouse.
• It provides decision support service across the enterprise.
• It offers a unified approach for organizing and representing data.
• It also provide the ability to classify data according to the subject and give
access according to those divisions.
32
Enterprise Data Warehouse Architecture
One-tier
Architecture
https://content.altexsoft.com/media/2019/10/word-image-19.png
33
Two-tier
Architecture
34
Three-tier
Architecture
35
One-tier architecture for EDW means that you have a database directly connected
with the analytical interfaces where the end user can make queries.
Two-tier architecture. A data mart level is added between the user interface and
EDW. A data mart is a low-level repository that contains domain-specific
information. Simply put, it’s another, smaller-sized database that extends EDW
with dedicated information for your sales/operational departments, marketing, etc.
Three-tier architecture. On top of the data mart layer, enterprises also use online
analytical processing (OLAP) cubes. An OLAP cube is a specific type of database that
represents data from multiple dimensions.
36
Enterprise Data Warehouse Components
https://www.teksystems.com/-/media/teksystems/images/blogs/content%20images/conventional-data-warehouse-
architecture.png?h=327&w=601&hash=7FBF0193539FC596BBDA0A36A2373CB30D802006&hash=7FBF0193539FC596BBDA0A
36A2373CB30D802006&la=en
37
1. Sources
• The databases where raw data is stored.
2. Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) layer.
• These are the tools that perform actual connection with source data, its
extraction, and loading to the place where it will be transformed.
Transformation unifies data format.
• ETL and ELT approaches differ in that in ETL the transformation is done
before EDW, in a staging area. ELT is a more modern approach that
handles all the transformation in a warehouse.
38
3. Staging Area
• In the case of ETL, the staging area is the place data is loaded before
EDW. Here, it will be cleaned and transformed to a given data model.
4. DW Database
• The data is finally loaded into the storage space.
5. Reporting Layer
• These are tools that give end users access to data..
39
Operational Data Store
https://sqlwithmanoj.files.wordpress.com/2014/12/ods_dw.jpg
40
• ODS support only daily operations, so their view of historical data is very
limited. Although they work very well as sources of current data and are
often used as such by data warehouses, they do not support historically rich
queries.
• In ODS, Data warehouse is refreshed in real time. Hence, it is widely

preferred for routine activities like storing records of the Employees.
41
https://static.javatpoint.com/tutorial/datawarehouse/images/operational-database-vs-data-warehouse.png
42
Operational Database vs. Data Warehouse
• The Operational Database is the source • Data Warehouse Systems serve users or
of information for the data warehouse. knowledge workers in the purpose of
It includes detailed information used to data analysis and decision-making. Such
run the day to day operations of the systems can organize and present
business. The data frequently changes information in specific formats to
as updates are made and reflect the accommodate the diverse needs of
current value of the last transactions. various users. These systems are called
as Online-Analytical Processing (OLAP)
• Operational Database Management Systems.
Systems also called as OLTP (Online
Transactions Processing Databases),
are used to manage dynamic data in
real-time.
43
Advantages of Operational Data Stores
• An ODS provides access to only the current, finely crumbled, non-aggregated,
less complicated data, which can be queried in a well-fitted approach without
using the operational systems.
• When the Reporting & Analysis tools need data that are closer to real-time
operations, it can be queried from the Operational Data Store, as and when it is
received from the respective source systems, instead of opting for more
prolonging conversion and loading operations from the other database sources
in the data warehouse systems.
• ODS is a secure option as it does not contain all the historical data and
operations, which makes it resilient for any cyber-attacks or hacking of data
privacy.
44
Advantages of Operational Data Stores
• An Operational Data Store is a practically feasible structural design option
when bearing in mind the complex requirements provided by the business, in
order to generate input for analysis and reporting processes that leads to
business decision-making.
• ODS facilitates considerably less total time for turnaround when in trouble, like
the environment failure or when a database needs a restart, which implicates
less stress on the business or the stack holders of the application.
• In order to fetch data from an ODS, querying need not be complex or with
multi-level joins and conditional Simple queries will be sufficient enough as it
holds moderately detailed operational data.
45
Data Mart
http://1.bp.blogspot.com/-pUOXIPxlxps/VLHslazE74I/AAAAAAAAIXM/Rj3oUxpe_dY/s1600/DWH_P002_004.jpg
Data Mart
• A data mart is a subset of the data warehouse. It is specially designed
for a particular line of business, such as sales, finance, sales or finance.
In an independent data mart, data can collect directly from sources.
• A data mart is a scaled-down version of a data warehouse aimed at

meeting the information needs of a homogeneous small group of end
users such as a department or business unit (marketing, finance,
logistics, or human resources). It typically contains some form of
aggregated data and is used as the primary source for report
generation and analysis by this end user group.
47
Types of Data Mart
1. Dependent: A dependent data mart
allows sourcing organization's data from a
single Data Warehouse.
https://www.guru99.com/images/1
/022218_0604_DataMartTut1.png
48
Types of Data Mart
2. Independent: Independent data mart is
created without the use of a central data
warehouse.
https://www.guru99.com/images/1/02
2218_0604_DataMartTut2.png
49
Types of Data Mart
3. Hybrid: This type of data marts can take
data from data warehouses or operational
systems.
https://www.guru99.com/images/1/
022218_0604_DataMartTut3.png
50
Steps in Implementing a Data Mart
Designing Constructing Populating Accessing Managing
51
Designing
It covers all the tasks between initiating the request for a data mart to gathering
information about the requirements. Finally, we create the logical and physical design
of the data mart.
Constructing
It involves creating the physical database and the logical structures.
Populating
data in populated in the data mart.
52
Accessing
Involves putting the data to use: querying the data, creating reports, charts, and
publishing them. End-user submit queries to the database and display the results of the
queries
Managing
This step covers management tasks such as-
• Ongoing user access management.
• System optimizations and fine-tuning to achieve the enhanced performance.
• Adding and managing fresh data into the data mart.
• Planning recovery scenarios and ensure system availability in the case when the
system fails.
53
Advantages of Data Mart
• Data marts contain a subset of organization-wide data. This Data is valuable to a
specific group of people in an organization.
• It is cost-effective alternatives to a data warehouse, which can take high costs to
build.
• Data Mart allows faster access of Data.
• Data Mart is easy to use as it is specifically designed for the needs of its users. Thus a
data mart can accelerate business processes.
• Data Marts needs less implementation time compare to Data Warehouse systems. It
is faster to implement Data Mart as you only need to concentrate on subset of data.
• It contains historical data which enables the analyst to determine data trends.
54
Disadvantages of Data Mart
• Many times enterprises create too many unrelated data marts without much
benefit. It can become a big hurdle to maintain.
• Data Mart cannot provide company-wide data analysis as their data set is limited.
55
References:
• Guru99, 2020. What is Data Warehouse? Types, Definition & Example. Retrieved
from https://www.guru99.com/data-warehousing.html#4.
• EDUCBA, 2020. Types of Data Warehouse. Retrieved from
https://www.educba.com/types-of-data-warehouse/
• Technopedia, 02-May-2013. Enterprise Data Warehouse. Retrieved from
https://www.techopedia.com/definition/26204/enterprise-data-warehouse
• Altexsoft Software R&D Engineering, 24-Oct-2019. Enterprise Data Warehouse:
Concepts, Architecture, and Components. Retrieved from
https://www.altexsoft.com/blog/enterprise-data-warehouse-concepts/
• Oracle, 2020. What is a Data Warehouse?. Retrieved from
https://www.oracle.com/database/what-is-a-data-warehouse/
56
References:
• JavaTpoint, 2018. Difference between Operational Database and Data Warehouse.
Retrieved from https://www.javatpoint.com/operational-database-vs-data-
warehouse
https://www.educba.com/operational-data-stores/
• Rouse, M., TechTarget. Operational Data Store (ODS). Retrieved from
https://searchoracle.techtarget.com/definition/operational-data-store
• Grenier, R. 05-Feb-2011. Operational Data Stores (ODS). Retrieved from
http://randygrenier.blogspot.com/2011/02/operational-data-stores-ods.html
57
References:
• Baesens, B., 26-Sept-2018. Data Warehouses, Data Marts, Operational Data Stores,
and Data Lakes: What’s in a Name?. Database Trends and Applications. Retrieved
from http://www.dbta.com/BigDataQuarterly/Articles/Data-Warehouses-Data-
Marts-Operational-Data-Stores-and-Data-Lakes-Whats-in-a-Name-127417.aspx
https://www.educba.com/operational-data-stores/
• Guru 99, 2020. What is Data Mart? Types & Example. Retrieved from
https://www.guru99.com/data-mart-tutorial.html
58
• Describe star schema

• Describe snowflake schema
• Describe Galaxy schema
• Understand the difference between the types of data warehouse
schema based on its design.
4
Types of Data Warehouse Schema
• Star Schema
• Snowflake Schema
• Galaxy Schema
5
Schema
• A schema is a logical description that describes the entire

database.
• In the data warehouse there includes the name and description of
records.
• It has all data items and also different aggregates associated with
the data.
6
Star Schema
http://www.durofy.com/wp-content/uploads/2013/07/star.png https://www.guru99.com/images/1/022218_0758_StarandSnow1.png
7
Star Schema
• It is known as star schema as its structure
resembles a star.
• The star schema is the simplest type of
Data Warehouse schema.
• It is also known as Star Join Schema and is
optimized for querying large data sets.
• The center of the star can have one fact
table and a number of associated
dimension tables.
http://www.durofy.com/wp-content/uploads/2013/07/star.png
8
Star Schema
Fact Table
https://www.oercommons.org/editor/images/6978
https://i.stack.imgur.com/aB9k9.jpg
9
Star Schema
Fact Table
• A Fact table in a Data Warehouse system is nothing but the table that contains
all the facts or the business information, which can be subjected to analysis
and reporting activities when required.
• These tables hold fields that represent the direct facts, as well as the foreign
fields that are used to connect the fact table with other dimension tables in
the Data Warehouse system.
• A Data Warehouse system can have one or more fact tables, depending on the
model type used to design the Data Warehouse.
10
Star Schema
Dimension Table
https://www.dotnetinterviewquestions.in/contentpics/star1.JPG
11
Star Schema
Dimension Table
• Dimension is a collection of reference information about a measurable in the

fact table.
• The primary key column of the dimension table has uniquely identifies each
dimension record or row.
• The dimension tables are organized has descriptive attributes. For example, a
customer dimension’s attributes could include first and last name, birth date,
gender, Qualification, Address etc.
12
Characteristics of Star Schema
• Every dimension in a star schema is represented with only one-dimension
table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a foreign key
• The dimension table are not joined to each other
• Fact table would contain key and measure
• The Star schema is easy to understand and provides optimal disk usage.
• The dimension tables are not normalized.
• The schema is widely supported by BI Tools
13
Steps in Designing Star Schema
1. Identify the business process.
2. Identify the measures or fact
data.
3. Identify the Dimension table
related to fact table.
4. List of attributes or column in
each dimension table
5. Find the lowest level of
aggregation or summary
analysis in the fact table.
https://www.oercommons.org/editor/images/6929
14
Snowflake Schema
https://upload.wikimedia.org/wikipedia/commons/1/1d/Snowflake_schema.png https://static.javatpoint.com/tutorial/datawarehouse/images/star-schema-vs-snowflake-
schema2.png
15
Snowflake Schema
https://www.guru99.com/images/1/022218_0758_StarandSnow2.png
16
Snowflake Schema
• A snowflake schema is an extension of star schema where the dimension

tables are connected to one or more dimensions.
• The dimension tables are normalized which splits data into additional tables.
• Snowflake schema keeps same fact table structure as star schema.
• In the dimension, it has multiple levels with multiple hierarchies. From each
hierarchy of levels any one level can be attached to Fact Table.
17
When to Use Snowflake Schema
Ralph Kimball, the data warehousing guru, proposes three cases where snowflake
implementation is not only acceptable but is also the key to a successful design:
• Large customer dimensions where, for example, 80 percent of the fact table
measurements involve anonymous visitors about whom you collect little
detail, and 20 percent involve reliably registered customers about whom you
collect much detailed data by tracking many dimensions
18
When to Use Snowflake Schema
• Financial product dimensions for banks, brokerage houses, and insurance

companies, because each of the individual products has a host of special
attributes not shared by other products
• Multi-enterprise calendar dimensions because each organization has

idiosyncratic fiscal periods, seasons, and holidays
19
Characteristics of Snowflake Schema
• The main benefit of the snowflake schema is it uses smaller disk space.
• Easier to implement a dimension is added to the Schema.
• Due to multiple tables, query performance is reduced.
• The primary challenge that you will face while using the snowflake Schema
is that, you need to perform more maintenance efforts because of the more
lookup tables.
20
Galaxy Schema
http://wiki.bi-academy.eu/lib/exe/fetch.php?media=bicn01:modelling:en-img-0079.png
21
Galaxy Schema
• Multiple fact tables share
dimension tables.
• This schema is viewed as
collection of stars hence called
galaxy schema.
• It is also called Fact
Constellation Schema.
https://www.guru99.com/images/1/022218_0758_StarandSnow3.png
22
Characteristics of Galaxy Schema
• The dimensions in this schema are separated into separate dimensions based
on the various levels of hierarchy. For example, if geography has four levels of
hierarchy like region, country, state, and city then Galaxy schema should have
four dimensions.
• Moreover, it is possible to build this type of schema by splitting the one-star
schema into more Star schema.
• The dimensions are large in this schema which is needed to build based on the
levels of hierarchy.
• This schema is helpful for aggregating fact tables for better understanding.
23
References
• EDUCBA, 2020. Introduction to Data warehouse Schema. Retrieved from
https://www.educba.com/data-warehouse-schema/
• TechTerms, 2020. Schema. Retrieved from
https://techterms.com/definition/schema
• Guru99, 2020. Star and SnowFlake Schema in Data Warehouse. Retrieved from
https://www.guru99.com/star-snowflake-data-warehousing.html
• OER Commons – Open Educational Resources. Data warehouse Design Using
Oracle. Retrieved from https://www.oercommons.org/authoring/21861-data-
warehouse-design-using-oracle/3/view
• EDUCBA, 2020. Overview of Fact Table. Retrieved from
https://www.educba.com/what-is-fact-table/
24
References
• EDUCBA, 2020. Introduction to Data warehouse Schema. Retrieved from
https://www.educba.com/data-warehouse-schema/
• JavaTpoint, 2018. Difference between Star and Snowflake Schemas. Retrieved
from https://www.javatpoint.com/data-warehouse-star-schema-vs-snowflake-
schema
• Star and Snowflake Schema. Retrieved from
https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/ow
b/owb10gr2_gs/owb/lesson3/starandsnowflake.htm
25
• Define OLAP
• Determine the types of OLAP
• Understand OLAP operations.
• List OLAP Tools
• Define ETL and understand its process in data warehouse.
4
Online Analytical Processing (OLAP)
• The OLAP (Online Analytical Processing) is a powerful

technology behind many Business Intelligence (BI) applications
that discovers data, report viewing capabilities, complex
analytical calculations, and predictive “what if” scenario, budget
planning, forecast planning.
• OLAP is for the data analytic purpose, hence it enables us to

analyze information from multiple database systems at the same
time.
5
Online Analytical Processing (OLAP)
• It is a computing method that allows users to easily extract

required data and query data in order to analyze it from different
points of view.
• It is basically based on the huge data that is called data

warehouse; it collects the required data from the data warehouse
and perform the business required analysis to take some decision
in the business to improve in profit, to improve sale, to improve
brand, to improve marketing and so all.
6
OLAP Architecture
https://image.slidesharecdn.com/1-150506060450-conversion-gate01/95/14-data-warehouse-13-638.jpg?cb=1430892370
7
How Does It Work?
• A Data warehouse would extract information from multiple data sources and
formats like text files, excel sheet, multimedia files, etc.
• The extracted data is cleaned and transformed. Data is loaded into an OLAP
server (or OLAP cube) where information is pre-calculated in advance for
further analysis.
8
OLAP Cube
• An OLAP Cube is a data structure that

allows fast analysis of data according
to the multiple Dimensions that
define a business problem. A
multidimensional cube for reporting
sales might be, for example,
composed of 7 Dimensions:
Salesperson, Sales Amount, Region,
Product, Region, Month, Year. https://gccontent.blob.core.windows.net/gccontent/blo
gs/legacy/c1/2014/11/OLAP_cube.png
9
Basic Analytical Operations of OLAP
• Roll-up – Also known

as drill-up or
consolidation, use to
summarize operation data
along with the dimension.
https://www.guru99.com/images/1/022218_1238_WhatisOLAPO2.png
10
Basic Analytical Operations of OLAP
• Drill-down – To perform
the analysis in deeper
among the dimensions of
data. For example, drilling
down from “time period”
to “years” and “months”
and to “days” and so on to
plot sales growth for a
product.
11
Basic Analytical Operations
of OLAP
• Slice – To perform the analysis

take one level of information
for display, such as “sales in
2019.”
12
of OLAP
• Dice – To perform the analysis,

select data from multiple
dimensions to analyze, such as
“sales of Laptop in Region 4 in
2019.”
13
of OLAP
• Pivot – To perform the

analysis that can gain a
new view of data by
rotating the data axes
of the cube.
14
Types of OLAP System
• Relational OLAP (ROLAP)

• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
• Desktop OLAP (DOLAP)
• Web OLAP (WOLAP)
• Mobile OLAP
• Spatial OLAP (SOLAP)
15
TYPE OF OLAP EXPLANATION
ROLAP utilize a relational database management system to keep
and control the data. These are the servers that exist
between the database and the user. ROLAP systems work
on the information that resides in a relational database.
MOLAP This server utilizes a multi-dimensional Database (MDDB)

for storing and analyzing information. MDDB can
proficiently store summaries, giving a method for quick
questioning and recovering information from the database
for processing.
16
HOLAP It is a blend of MOLAP and ROLAP. By utilizing both ROLAP
and MOLAP information stores, Hybrid OLAP offers the
qualities of both techniques. HOLAP stores data summaries
in the binary files or in the pre-calculated cubes. It leaves
the quantities of fact and dimension information in the
relational database.
DOLAP Desktop On-Line Analytical Processing (DOLAP) is a single-

tier, desktop-based OLAP technology.
17
WOLAP Web OLAP which is OLAP system accessible via the web
browser. WOLAP is a three-tiered architecture. It consists of
three components: client, middleware, and a database
server.
Mobile OLAP Mobile OLAP helps users to access and analyze OLAP data
using their mobile devices
SOLAP SOLAP is created to facilitate management of both spatial
and non-spatial data in a Geographic Information system
(GIS)
18
Examples of OLAP Tools
• IBM Cognos
• Micro Strategy
• Palo OLAP Server
• Apache Kylin
• icCube
19
Examples of OLAP Tools
• Pentaho BI
• Mondrian
• Oracle Business Intelligence
Enterprise Edition(OBIEE)
• JsHypercube
• Jedox
20
Advantages of OLAP Disadvantages of OLAP
• High Speed of Data Processing • High Cost

• Aggregated and Detailed Data • OLAP is Relational
• Multidimensional Data Representation • Computation Capability
• Using Familiar Business Expressions • Some Potential Risk
• “What-if” Scenarios
• Flat Learning Curve
21
22
What is ETL?
https://lh5.googleusercontent.com/Dg1ZqrEAp-nzxVEog5ZfI2lKD-
quLUttcbQoawUN1a5gyk4lL3DG7k6FSA9_rI_oi9aPGyZZDntlo5ErYoB4os4S07vlHP1Y4iWW1XCy3-UaFpaFcwibzR3GVTSEI50j81FXmmA
23
What is ETL?
• ETL stands for Extract, Transform and Load.
• It is a process in data warehousing used to extract data from the

database or source systems and after transforming placing the
data into data warehouse. It is a combination of three database
functions i.e. Extract, Transform and Load.
24
ETL Process
http://busitelce.com/images/14articles/etl-process/etl-process-in-data-warehouse.jpg
25
ETL Process
• Step 1 - Extraction
All the preferred data from various source systems such as databases,
applications, and flat files is identified and extracted. Data extraction can be
completed by running jobs during non-business hours.
26
ETL Process
• Step 1 - Extraction
Data Extraction Strategies

Full Extraction: This is followed when whole data from sources get loaded into the data
warehouses that show either data warehouse is being populated the first time or no strategy
has been made for data extraction.
Partial Extraction (with update notification): This strategy is also known delta, where only the
data being changed is extracted and update data warehouses
Partial Extraction (without update notification): This strategy refers to extract specific required
data from sources according to load in the data warehouses instead of extracting whole data.
27
ETL Process
• Step 2 - Transformation
Most of the extracted data can’t be directly loaded into the target system.
Based on the business rules, some transformations can be done before
loading the data.
The transformation process also corrects the data, removes any incorrect
data and fixes any errors in the data before loading it.
28
ETL Process
• Step 3 - Loading
All the gathered information is loaded into the target Data Warehouse
tables.
Types of Loading:
Initial Load — populating all the Data Warehouse tables
Incremental Load — applying ongoing changes as when needed periodically.
Full Refresh —erasing the contents of one or more tables and reloading with
fresh data.
29
Example of ETL Tools
RightData
Informatica Data Validation SSISTester
QuerySurge TestBench
ICEDQ GTL QAceGen
Datagaps ETL Validator Zuzena Automated Testing Service
QualiDI DbFit
Talend Open Studio for Data Integration AnyDbTest
Codoid's ETL Testing Services 99 Percentage ETL Testing
Data Centric Testing
30
References
• MOOCs : edX Courses. StanfordOnline: SOE.YDB-OLAP0001
Databases: On-Line Analytical Processing
• EDUCBA, 2020. What is OLAP?. Retrieved from
https://www.educba.com/what-is-olap/
• EDUCBA, 2020. Types of OLAP. Retrieved from
https://www.educba.com/types-of-olap/
• Guru99, 2020. What is OLAP? Cube, Operations & Types in Data Warehouse.
Retrieved from https://www.guru99.com/online-analytical-processing.html#2
• OLAP.com. OLAP Cube. Retrieved from https://olap.com/learn-bi-olap/olap-bi-
definitions/olap-cube/
• OLAP.com. Types of OLAP System. Retrieved from https://olap.com/types-of-
olap-systems/
31
References
• Tutorialspoint Simply Easy Learning, 2020. Data Warehousing – OLAP.
Retrieved from https://www.tutorialspoint.com/dwh/dwh_olap.htm
• Software Testing Help, 2020. Top 10 Best Analytical Processing (OLAP) Tools:
Business Intelligence. Retrieved from
https://www.softwaretestinghelp.com/best-olap-tools/
• EDUCBA, 2020. OLAP Tools. Retrieved from https://www.educba.com/olap-
tools/
• Galaktikasoft, 2020. Advantages Of Using OLAP for Business Intelligence.
Retrieved from https://galaktika-soft.com/blog/advantages-of-using-olap-for-
business-intelligence.html
32
References
• EDUCBA, 2020. What is ETL?. Retrieved from https://www.educba.com/what-
is-etl/
• Software Testing Help, 16-Apr-2020. What Is ETL (Extract, Transform, Load)
Process In Data Warehouse?. Retrieved from
https://www.softwaretestinghelp.com/etl-process-in-data-warehouse/
• Software Testing Help, 16-Apr-2020. Top 10 ETL Testing Tools In 2020.
Retrieved from https://www.softwaretestinghelp.com/top-4-etl-testing-tools/
33
• Understand Data Mining Concepts

• Understand the uses of data mining
• Enumerate and explain the different techniques used in data mining
• Describe supervised and unsupervised learning
• Enumerate the different data mining algorithms
• Understand the steps of Crisp DM
• Understand the concept of Define KDD
4
What is Data Mining?
https://miro.medium.com/max/2800/0*2883vRd4rpXWUIz_.jpg
It is basically the extraction of vital
information/knowledge from a large set of data.
Fundamentally, data mining is about processing

data and identifying patterns and trends in that
information so that you can decide or judge.
Data mining is also called as Knowledge

discovery, Knowledge extraction, data/pattern
analysis, information harvesting, etc.
https://www.helponclick.com/blog/data-
mining-applications-improved-customer-
support/
Its foundation comprises three scientific
disciplines:
statistics (the numeric study of data
relationships)
artificial intelligence (human-like
intelligence displayed by software
https://www.sas.com/en_us/insights/analyti
and/or machines) cs/data-mining.html
machine learning (algorithms that can

learn from data to make predictions).
Data Mining Examples
Example 1:
So there is a Mobile network operator. They consult a Data miner to dig into the call
records of the operator. No specific targets are given to the Data Miner.
A quantitative target of finding at least 2 new patterns in a month is given.
As the data miner starts digging into the data he finds a pattern that there are less
international calls on Wednesday as compared to other days.
This information is shared with the management and they come up with the plan to
reduce the international call rates on Wednesdays and start a campaign.
Call rates surge, customers are happy with low call price, more customers sign up and the
company makes more money! Win-Win situation!
Example 2 :
Consider a marketing head of telecom service provides who wants to increase revenues
of long distance services. For high ROI on his sales and marketing efforts customer
profiling is important. He has a vast data pool of customer information like age, gender,
income, credit history, etc. But its impossible to determine characteristics of people who
prefer long distance calls with manual analysis. Using data mining techniques, he may
uncover patterns between high long distance call users and their characteristics.
For example, he might learn that his best customers are married females between the
age of 45 and 54 who make more than $80,000 per year. Marketing efforts can be
targeted to such demographic.
Example 3:
A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were halved.
Bank has multiple years of record on average credit card balances, payment
amounts, credit limit usage, and other key parameters. They create a model to
check the impact of the proposed new business policy. The data results show that
cutting fees in half for a targetted customer base could increase revenues by $10
million.
Data Mining Techniques
Cluster Analysis. Enables to

identify a given user group
according to common features in
a database. These features could
include age, geographic location,
education level and so on.
Anomaly Detection.
It is used to determine when
something is noticeably different
from the regular pattern. It is
used to eliminate any database
inconsistencies or anomalies at
the source.
Regression Analysis.
This technique is used to make
predictions based on
relationships within the data set.
Classification.
This deals with the things which have
labels on it. Note in cluster detection,
the things did not have a label in it and
by using data mining we had to label
and form into clusters, but in
classification, there is information
existing that can be easily classified
using an algorithm.
Associative Learning.
It is used to analyze which things
tend to occur together either in
pairs or larger groups.
Example of Data Mining Tools
• Rapid Miner
• Orange
• Weka
• Knime
• Sisense
• SQL Server Data
Tools (SSDT)
• Apache Mahout
• Oracle Data Mining
Example of Data Mining Tools
• Rattle
• DataMelt
• IBM Cognos
• IBM SPSS Modeler
• SAS Data Mining
• Teradata
• Board
• Dundas BI
Benefits of Data Mining
• Data mining technique helps companies to get knowledge-based
information.
• Data mining helps organizations to make the profitable adjustments in
operation and production.
• The data mining is a cost-effective and efficient solution compared to
other statistical data applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviors as well as
automated discovery of hidden patterns.
• It can be implemented in new systems as well as existing platforms
• It is the speedy process which makes it easy for the users to analyze
huge amount of data in less time.
Disadvantages of Data Mining
• There are chances of companies may sell useful information of their
customers to other companies for money.
• Many data mining analytics software is difficult to operate and requires
advance training to work on.
• Different data mining tools work in different manners due to different
algorithms employed in their design. Therefore, the selection of correct
data mining tool is a very difficult task.
• The data mining techniques are not accurate, and so it can cause serious
consequences in certain conditions.
Data Mining Applications
• Communications
• Insurance
• Education
• Manufacturing
• Banking
• Retail
• Service Providers
• E-Commerce
• Super Markets
• Crime Investigation
• Bioinformatics
20
Applications Usage
Communications Data mining techniques are used in communication sector to predict
customer behavior to offer highly targeted and relevant campaigns.
Insurance Data mining helps insurance companies to price their products profitable
and promote new offers to their new or existing customers.
Education Data mining benefits educators to access student data, predict
achievement levels and find students or groups of students which need
extra attention. For example, students who are weak in maths subject.
Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of
production assets. They can anticipate maintenance which helps them
reduce them to minimize downtime.
Applications Usage
Banking Data mining helps finance sector to get a view of market risks and manage
regulatory compliance. It helps banks to identify probable defaulters to
decide whether to issue credit cards, loans, etc.
Retail Data Mining techniques help retail malls and grocery stores identify and
arrange most sellable items in the most attentive positions. It helps store
owners to comes up with the offer which encourages customers to
increase their spending.
Service Providers Service providers like mobile phone and utility industries use Data Mining
to predict the reasons when a customer leaves their company. They
analyze billing details, customer service interactions, complaints made to
the company to assign each customer a probability score and offers
incentives.
Applications Usage
E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-
sells through their websites. One of the most famous names is
Amazon, who use Data mining techniques to get more customers
into their eCommerce store.
Super Markets Data Mining allows supermarket's develope rules to predict if their
shoppers were likely to be expecting. By evaluating their buying
pattern, they could find woman customers who are most likely
pregnant. They can start targeting products like baby powder, baby
shop, diapers and so on.
Crime Investigation Data Mining helps crime investigation agencies to deploy police
workforce (where is a crime most likely to happen and when?), who
to search at a border crossing etc.
Bioinformatics Data Mining helps to mine biological data from massive datasets
gathered in biology and medicine.
Predictive vs. Descriptive Data Mining Algorithm
Predictive Descriptive
https://www.researchgate.net/figure/Main-data-mining-techniques_fig2_270552309
Predictive Data Mining vs. Descriptive Data Mining
The descriptive analysis is used to mine data and provide the latest
information on past or recent events.
The predictive analysis provides answers of the future queries that move
across using historical data as the chief principle for decisions.
25
Key Data Mining Algorithms
Key Data Mining Algorithms
Supervised learning requires a known output, sometimes called a label or

target. These algorithms include Naïve Bayes, Decision Tree, Neural
Networks, SVMs, Logistic Regression, etc.
Unsupervised learning algorithms do not require a predefined set of outputs

but rather look for patterns or trends without any label or target. They
include k-Means Clustering, Anomaly Detection, and Association Mining.
27
What is Data Mining Algorithm?
• Data Mining Algorithms are a special category of algorithms that are useful
for analyzing data and developing data models to identify meaningful
patterns. These are part of machine learning algorithms.
• These algorithms are implemented through various programming like R
language, Python and using data mining tools to derive the optimized data
models.
• These algorithms are part of data analytics implementation for business.
• These algorithms are based upon statistical and mathematical formulas
which applied to the data set.
Data Mining Algorithms
• C4.5
• k-means
• Support Vector Machines (SVM)
• Apriori
• Expectation-Maximization (EM)
• PageRank
• AdaBoost
• kNN (k-Nearest Neighbors)
• Naïve Bayes
• Classification and Regression Trees (CART)
C4.5
C4.5 constructs a classifier in the form of a decision tree. In order to do this, C4.5 is
given a set of data representing things that are already classified.
k-means
k-means creates k groups from a set of objects so that the members of a group are
more similar. It’s a popular cluster analysis technique for exploring a dataset.
SVM
Support vector machine (SVM) learns a hyperplane to classify data into 2 classes. At
a high-level, SVM performs a similar task like C4.5 except SVM doesn’t use decision
trees at all.
Apriori
The Apriori algorithm learns association rules and is applied to a database
containing a large number of transactions.
EM
In data mining, expectation-maximization (EM) is generally used as a clustering
algorithm (like k-means) for knowledge discovery.
PageRank
PageRank is a link analysis algorithm designed to determine the relative
importance of some object linked within a network of objects.
AdaBoost
AdaBoost is a boosting algorithm which constructs a classifier. As you
probably remember, a classifier takes a bunch of data and attempts to predict
or classify which class a new data element belongs to.
kNN
kNN, or k-Nearest Neighbors, is a classification algorithm. However, it differs
from the classifiers previously described because it’s a lazy learner.
Naïve Bayes
Naive Bayes is not a single algorithm, but a family of classification
algorithms that share one common assumption: Every feature of the data
being classified is independent of all other features given the class.
CART
CART stands for classification and regression trees. It is a decision tree
learning technique that outputs either classification or regression trees.
Like C4.5, CART is a classifier.
CRISP-DM
• CRISP-DM stands for Cross Industry Standard Process for Data

Mining.
• Is a 1996 methodology created to shape Data Mining projects. It

consists of 6 steps to conceive a Data Mining project and they
can have cycle iterations according to developers’ needs.
34
Phases of CRISP-DM
https://cdn.educba.com/academy/wp-content/uploads/2019/02/steps-involved.png
Phases of CRISP-DM
1. Business Understanding
Focuses on understanding the project objectives and requirements from a

business perspective, and then converting this knowledge into a data mining
problem definition and a preliminary plan.
2. Data Understanding
Starts with an initial data collection and proceeds with activities in order to get
familiar with the data, to identify data quality problems, to discover first
insights into the data, or to detect interesting subsets to form hypotheses for
hidden information.
36
Phases of CRISP-DM
3. Data Preparation
The data preparation phase covers all activities to construct the final dataset
from the initial raw data.
4. Modeling
Modeling techniques are selected and applied. Since some techniques like
neural nets have specific requirements regarding the form of the data, there
can be a loop back here to data prep.
37
Phases of CRISP-DM
5. Evaluation
Once one or more models have been built that appear to have high quality based on
whichever loss functions have been selected, these need to be tested to ensure they
generalize against unseen data and that all key business issues have been sufficiently
considered. The end result is the selection of the champion model(s).
6. Deployment
Consists of presenting the results in a useful and understandable manner, and by

achieving this, the project should achieve its goals. It is the only step not belonging to
a cycle.
38
Knowledge Discovery in Databases (KDD)
• Some people don’t differentiate data mining from knowledge discovery while
others view data mining as an essential step in the process of knowledge
discovery.
• The term knowledge discovery in databases, or KDD for short, refers to the
broad process of finding knowledge in data, and emphasizes the high level
application of particular data minded methods. It is of interest to researchers
in machine learning, pattern recognition, databases, statistics, artificial
intelligence, knowledge acquisition for expert systems, and data visualization.
• The ultimate goal is to extract high-level knowledge from low-level data.
39
9-step KDD Process Starting from Managerial Step
Step 1. Developing and understanding of the application domain.

Step 2 Selecting and creating a data set on which discovery will be performed,
based on goals.
Step 3. Pre-processing and cleansing. Data reliability is enhanced in this stage.
Step 4. Data Transformation.
Step 5. Choosing the appropriate data mining task.
40
Step 6. Choosing the data mining algorithm.

Step 7. Next is employing the data mining algorithm.
Step 8. Evaluation.
Step 9. Using the discovered knowledge.
41
(Explanation)
STEP1.Developing and understanding of the application domain. This is the

preparatory step that sets the scene for understanding what should be done with
transformation, algorithms, and representation. Those in charge of the KDD
project need to understand and define the goals of the end user, then where the
knowledge discovery process will take place, and other relevant prior knowledge.
42
(Explanation)
Step 2
Selecting and creating a data set on which discovery will be performed, based on
goals. Determine what data will be used for the knowledge discovery, such as:
what data is available, obtaining additional necessary data, and the integrating all
the data for the knowledge discovery into one data set, including the attributes
that will be considered for the process. This process is very important because the
data mining learns and discovers from the available data. This is the evidence base
for constructing the models. If some important attributes are missing, then the
entire study may fail. From this respect, the more attributes considered, the
better.
43
(Explanation)
Step 3
Pre-processing and cleansing. Data reliability is enhanced in this stage. It includes
data clearing, such as handling missing values, and removing of outliers.
It may involve complex statistical methods, or using a data mining algorithm in this
context. For example: If one suspects that a certain attribute is of insufficient
reliability, or has many missing data, then this attribute could become the goal of a
data mining supervised algorithm. A prediction model for this attribute will be
developed, and then missing data can be predicted.
44
(Explanation)
Step 4
Next is data transformation. In this stage, the generation of better data, for the
data mining is prepared and developed. Methods here include dimension
reduction, such as feature selection, and extraction, and record sampling, and
attribute transformation such as discretization of numerical attributes and
functional transformation.
45
(Explanation)
Step 5
Choosing the appropriate data mining task. We’re now ready to decide which type
of data mining to use. For example: classification, regression, or clustering. This
mostly depends on the KDD goals, and also on the previous steps. There are two
major goals in data mining: prediction and description. Prediction is often referred
to as supervised data mining, while descriptive data mining includes the
unsupervised, and visualization aspects of data mining.
46
(Explanation)
Step 6
Choosing the data mining algorithm. Now that you have the strategy, we can
decide which tactics to use. This stage includes selecting the specific method for
searching patterns, including multiple inducers. For example, when considering
precision versus understandability, the former is better with neural networks,
while the latter is better with decision trees.
47
(Explanation)
Step 7
Next is employing the data mining algorithm. Finally you can implement the
appropriate data mining algorithm. In this step we might need to employ the
algorithm several times until a satisfying the result is obtained. For instance, by
tuning the algorithms control parameters, such as the minimum number of
instances in a single leaf of a decision tree.
48
(Explanation)
Step 8
evaluation. In this stage we evaluate and interpret the mined patterns with respect
to the goals defined in the first step. Here we consider the pre-processing steps
with respect to their effect on the data mining algorithm results. This step focuses
on the comprehensible nature and usefulness of the induced model. In this step,
the discovered knowledge is also documented for further usage.
49
(Explanation)
Step 9
Using the discovered knowledge. We’re now ready to incorporate the knowledge
into another system for further action. The knowledge becomes active in the sense
that we may make changes to the system, and measure the effects. Actually, the
success of this step determines the effectiveness of the entire KDD process.
50
References
• EDUCBA, 2020. Data Mining Algorithms. Retrieved from
https://www.educba.com/data-mining-algorithms/
• Ray Li, 2015. Top 10 Data Mining Algorithms. Hacker Bits. Retrieved from
https://hackerbits.com/data/top-10-data-mining-algorithms-in-plain-english/
• TechDifferences, 25-Mar-2019. Difference Between Descriptive and Predictive
Data Mining. Retrieved from https://techdifferences.com/difference-between-
descriptive-and-predictive-data-mining.html
• Guru99, 2020. Data Mining Tutorial: Process, Techniques, Tools, Examples.
Retrieved from https://www.guru99.com/data-mining-tutorial.html
• SAS, 2020. Data Mining What it is and why matters. SAS Institute Inc..
Retrieved from https://www.sas.com/en_us/insights/analytics/data-
mining.html
51
References
• JavaTpoint, 2018. Data Mining Tutorial. Retrieved from
https://www.javatpoint.com/data-mining
• EDUCBA, 2020. Introduction to Data Mining. Retrieved from
https://www.educba.com/introduction-to-data-mining/
• Guru99, 2020. Supervised vs Unsupervised Learning: Key Differences.
Retrieved from https://www.guru99.com/supervised-vs-unsupervised-
learning.html
• Software Testing Help, 16-Apr-2020. Top 15 Best Free Data Mining Tools: The
Most Comprehensive List. Retrieved from
https://www.softwaretestinghelp.com/data-mining-tools/
52
References
• RapidMiner, 2020. Data Mining Tools. Retrieved from
https://rapidminer.com/glossary/data-mining-tools/
• Vorhies, W., 26-Jul-2016. CRISP-DM – a Standard Methodology to Ensure a
Good Outcome. Data Science Central. Retrieved from
https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-
methodology-to-ensure-a-good-outcome
• Rodrigues, I.. CRISP-DM methodology leader in data mining and big data.
Towards Data Science. Retrieved from https://towardsdatascience.com/crisp-
dm-methodology-leader-in-data-mining-and-big-data-467efd3d3781
• Tutorialspoint, 2020. Data Mining - Knowledge Discovery. Retrieved from
https://www.tutorialspoint.com/data_mining/dm_knowledge_discovery.htm
53
References
• Marsh, J., 2014. Data Science Knowledge Discovery in Databases: 9 Steps to
Success. Udemy Blog. Retrieved from https://blog.udemy.com/knowledge-
discovery-in-databases/
• Tech Differences, 25-Mar-2019. Difference Between Descriptive and Predictive
Data Mining. Retrieved from https://techdifferences.com/difference-between-
descriptive-and-predictive-data-mining.html
• Han, J., Pei, J., Kamber, M.. Data Mining: Concepts and Techniques Third
Edition
54

DB m8 9 10 11 PDF

Uploaded by

Copyright:

Available Formats

DB m8 9 10 11 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DB m8 9 10 11 PDF

Uploaded by

Copyright:

Available Formats

INTENDED LEARNING OUTCOME

• Define Data warehouse.

• A type of data management system that is

a wide range of sources such as application log

• A data warehouse is programmed to aggregate structured data over a period of

• Load manager is also called the front component.

• Warehouse manager performs operations associated with the

• Query manager is also known as backend component.

4. End-user Access Tools

• This is categorized into five different groups like:

• Identify the types of Data warehouse

There are three types of data warehouse:

1. Enterprise Data Warehouse

• The databases where raw data is stored.

2. Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) layer.

• The data is finally loaded into the storage space.

• These are tools that give end users access to data..

• In ODS, Data warehouse is refreshed in real time. Hence, it is widely

• A data mart is a scaled-down version of a data warehouse aimed at

Designing Constructing Populating Accessing Managing

• Describe star schema

• A schema is a logical description that describes the entire

• Dimension is a collection of reference information about a measurable in the

• A snowflake schema is an extension of star schema where the dimension

• Financial product dimensions for banks, brokerage houses, and insurance

• Multi-enterprise calendar dimensions because each organization has

• The OLAP (Online Analytical Processing) is a powerful

• OLAP is for the data analytic purpose, hence it enables us to

• It is a computing method that allows users to easily extract

• It is basically based on the huge data that is called data

• An OLAP Cube is a data structure that

• Roll-up – Also known

• Slice – To perform the analysis

• Dice – To perform the analysis,

• Pivot – To perform the

• Relational OLAP (ROLAP)

MOLAP This server utilizes a multi-dimensional Database (MDDB)

DOLAP Desktop On-Line Analytical Processing (DOLAP) is a single-

• Palo OLAP Server

• High Speed of Data Processing • High Cost

• ETL stands for Extract, Transform and Load.

• It is a process in data warehousing used to extract data from the

Data Extraction Strategies

• Understand Data Mining Concepts

Fundamentally, data mining is about processing

Data mining is also called as Knowledge

machine learning (algorithms that can

Cluster Analysis. Enables to

Supervised learning requires a known output, sometimes called a label or

Unsupervised learning algorithms do not require a predefined set of outputs

• CRISP-DM stands for Cross Industry Standard Process for Data

• Is a 1996 methodology created to shape Data Mining projects. It

Focuses on understanding the project objectives and requirements from a

Consists of presenting the results in a useful and understandable manner, and by

• The ultimate goal is to extract high-level knowledge from low-level data.

Step 1. Developing and understanding of the application domain.

Step 6. Choosing the data mining algorithm.

STEP1.Developing and understanding of the application domain. This is the

You might also like