DB m8 9 10 11 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 170

INTENDED LEARNING OUTCOME

• Define Data warehouse.


• Understand Data warehouse concept.
• List the advantages and disadvantages of data warehouse
• List the usage of data warehouse

4
What is a Data Warehousing?
• An electronic storage of a large amount of
information by a business or organization.

• A type of data management system that is


designed to enable and support business
intelligence (BI) activities, especially analytics.
Data warehouses are solely intended to
perform queries and analysis and often contain
large amounts of historical data. The data
https://www.betterbuys.com/wp-
within a data warehouse is usually derived from content/uploads/2015/01/Data-Warehouse-Visual21.png

a wide range of sources such as application log


files and transaction applications.

5
Data Warehousing vs. Databases
• A database is a transactional system that is set to monitor and update real-time
data in order to have only the most recent data available.

• A data warehouse is programmed to aggregate structured data over a period of


time.

https://blog.panoply.io/hs-fs/hubfs/db_vs_dw.jpg?width=694&name=db_vs_dw.jpg

6
Alternative Names for Data Warehouse System

https://www.guru99.com/images/1/data_warehousing.png
7
History of Data Warehouse
• 1960- Dartmouth and General Mills in a joint research project, develop the terms
dimensions and facts.
• 1970- A Nielsen and IRI introduces dimensional data marts for retail sales.
• 1983- Tera Data Corporation introduces a database management system which is
specifically designed for decision support
• Data warehousing started in the late 1980s when IBM worker Paul Murphy and
Barry Devlin developed the Business Data Warehouse.
• However, the real concept was given by Inmon Bill. He was considered as a father of
data warehouse. He had written about a variety of topics for building, usage, and
maintenance of the warehouse & the Corporate Information Factory.

8
Benefits of Data Warehouse

https://cdn.educba.com/academy/wp-content/uploads/2019/10/Benefits-of-Data-Warehouse.png

9
Data Warehouse Architecture

• Simple. All data warehouses share a basic design in which metadata, summary data,
and raw data are stored within the central repository of the warehouse. The
repository is fed by data sources on one end and accessed by end users for analysis,
reporting, and mining on the other end.

• Simple with a staging area. Operational data must be cleaned and processed before
being put in the warehouse. Although this can be done programmatically, many data
warehouses add a staging area for data before it enters the warehouse, to simplify
data preparation.

10
Data Warehouse Architecture
• Hub and spoke. Adding data marts between the central repository and end users
allows an organization to customize its data warehouse to serve various lines of
business. When the data is ready for use, it is moved to the appropriate data mart.

• Sandboxes. Sandboxes are private, secure, safe areas that allow companies to quickly
and informally explore new datasets or ways of analyzing data without having to
conform to or comply with the formal rules and protocol of the data warehouse.

11
Data Warehouse Architecture
The Data Warehouse Architecture
generally comprises of three tiers.

https://cdn.educba.com/academy/wp
-content/uploads/2019/05/OLAP-
Server.png

12
Data Warehouse Architecture
Top Tier
The Top Tier consists of the Client-side front end of the architecture.
The Transformed and Logic applied information stored in the Data Warehouse will be
used and acquired for Business purposes in this Tier.

Middle Tier
The Middle Tier consists of the OLAP Servers
OLAP is Online Analytical Processing Server
OLAP is used to provide information to business analysts and managers

Bottom Tier
The Bottom Tier mainly consists of the Data Sources, ETL Tool, and Data Warehouse.

13
Components of Data Warehouse
Query
Warehouse Manager
Manager
Load Manager

End-user
Access Tools

https://www.teksystems.com/-/media/teksystems/images/blogs/content%20images/conventional-data-warehouse-
architecture.png?h=327&w=601&hash=7FBF0193539FC596BBDA0A36A2373CB30D802006&hash=7FBF0193539FC596BBDA0A
36A2373CB30D802006&la=en

14
Components of Data Warehouse

1. Load Manager

• Load manager is also called the front component.


• It performs with all the operations associated with the extraction and
load of data into the warehouse.
• These operations include transformations to prepare the data for
entering the Data warehouse.

15
Components of Data Warehouse

2. Warehouse Manager

• Warehouse manager performs operations associated with the


management of the data in the warehouse.
• It performs operations like analysis of data to ensure consistency,
creation of indexes and views, generation of denormalization and
aggregations, transformation and merging of source data and
archiving and baking-up data.

16
Components of Data Warehouse

3. Query Manager

• Query manager is also known as backend component.


• It performs all the operation operations related to the management of
user queries.
• The operations of this Data warehouse components are direct queries
to the appropriate tables for scheduling the execution of queries.

17
Components of Data Warehouse

4. End-user Access Tools

• This is categorized into five different groups like:


o Data Reporting
o Query Tools
o Application development tools
o EIS tools
o OLAP tools and data mining tools.

18
Designing a Data Warehouse
Any data warehouse design must address the following:
• Specific data content
• Relationships within and between groups of data
• The systems environment that will support the data warehouse
• The types of data transformations required
• Data refresh frequency

19
Applications of Data Warehouse
1. Airline
2. Banking
3. Healthcare
4. Public Sector
5. Investment and Insurance Sector
6. Retail sectors
7. Telecommunication
8. Hospitality Industry
9. Financial services
10. Consumer goods
11. Controlled manufacturing

20
Advantages of Data Warehousing
1. Cleans data 8. Timely access to data
2. Indexes multiple types 9. Enhanced data consistency and quality
3. Secured data and its access 10. Return on Investment is high
4. Query process 11. Increase revenues
5. Enhanced business intelligence 12. Standardizes data across the organization
6. Increased system and query 13. Database Normalization
performance
7. Business Intelligence

21
Disadvantages of Data Warehousing
1. Raising ownership
2. Extra reporting
3. Data flexibility
4. Compatibility with the existing
system
5. Keeping data online
6. Dimensional technique
7. Costs

22
Example of Data Warehouse Tools:
20 Best ETL / Data Warehousing Tools in 2020
1. Cdata Sync
2. QuerySurge
3. Xplenty
4. CloverDX
5. Oracle
6. Amazon RedShift
7. Domo
8. Teradata
9. SAP
10. SAS

23
Example of Data Warehouse Tools:
20 Best ETL / Data Warehousing Tools in 2020
11. IBM DataStage
12. Informatica
13. MS SSIS
14. Talend Open Studio
15. The Ab Initio Software
16. Dundas
17. Sisense
18. TabLeau
19. MicroStrategy
20. Pentaho

24
References:
• Frankenfiled, J., 20-Aug-2019. Data Warehousing. Retrieved from Investopedia
https://www.investopedia.com/terms/d/data-warehousing.asp
• Oracle, 2020. What is Data Management?. Retrieved from
https://www.oracle.com/database/what-is-data-management/
• Oracle, 2020. What is a Data Warehouse?. Retrieved from
https://www.oracle.com/database/what-is-a-data-warehouse/
• Guru99, 2020. What is Data Warehouse? Types, Definition & Example. Retrieved
from https://www.guru99.com/data-warehousing.html#2.
• EDUCBA, 2020. Data Warehouse Architecture. Retrieved from
https://www.educba.com/data-warehouse-architecture/

25
References:
• Reddy, C.. Data Warehousing: Characteristics, Functions, Pros & Cons. Wisestep.
Retrieved from https://content.wisestep.com/data-warehousing-characteristics-
functions-pros-cons/
• Guru99, 2020. 20 Best ETL / Data Warehousing Tools in 2020. Retrieved from
https://www.guru99.com/top-20-etl-database-warehousing-tools.html

26
INTENDED LEARNING OUTCOME

• Identify the types of Data warehouse


• Understand the concept of Enterprise Data Warehouse
• Understand the concept of Data Mart
• Understand the concept of Operational Data Warehouse
• Differentiate between data warehouse, data mart, enterprise data
warehouse, and operational data warehouse

30
Types of Data Warehouse

There are three types of data warehouse:

1. Enterprise Data Warehouse


2. Operational Data Store
3. Data Mart

31
Enterprise Data Warehouse (EDW)
• Enterprise Data Warehouse is a centralized warehouse.
• It provides decision support service across the enterprise.
• It offers a unified approach for organizing and representing data.
• It also provide the ability to classify data according to the subject and give
access according to those divisions.

32
Enterprise Data Warehouse Architecture
One-tier
Architecture

https://content.altexsoft.com/media/2019/10/word-image-19.png

33
Enterprise Data Warehouse Architecture
Two-tier
Architecture

https://content.altexsoft.com/media/2019/10/word-image-20.png

34
Enterprise Data Warehouse Architecture
Three-tier
Architecture

https://content.altexsoft.com/media/2019/10/word-image-21.png

35
Enterprise Data Warehouse Architecture
One-tier architecture for EDW means that you have a database directly connected
with the analytical interfaces where the end user can make queries.

Two-tier architecture. A data mart level is added between the user interface and
EDW. A data mart is a low-level repository that contains domain-specific
information. Simply put, it’s another, smaller-sized database that extends EDW
with dedicated information for your sales/operational departments, marketing, etc.

Three-tier architecture. On top of the data mart layer, enterprises also use online
analytical processing (OLAP) cubes. An OLAP cube is a specific type of database that
represents data from multiple dimensions.

36
Enterprise Data Warehouse Components

https://www.teksystems.com/-/media/teksystems/images/blogs/content%20images/conventional-data-warehouse-
architecture.png?h=327&w=601&hash=7FBF0193539FC596BBDA0A36A2373CB30D802006&hash=7FBF0193539FC596BBDA0A
36A2373CB30D802006&la=en

37
Enterprise Data Warehouse Components
1. Sources

• The databases where raw data is stored.

2. Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) layer.

• These are the tools that perform actual connection with source data, its
extraction, and loading to the place where it will be transformed.
Transformation unifies data format.
• ETL and ELT approaches differ in that in ETL the transformation is done
before EDW, in a staging area. ELT is a more modern approach that
handles all the transformation in a warehouse.

38
Enterprise Data Warehouse Components
3. Staging Area

• In the case of ETL, the staging area is the place data is loaded before
EDW. Here, it will be cleaned and transformed to a given data model.

4. DW Database

• The data is finally loaded into the storage space.

5. Reporting Layer

• These are tools that give end users access to data..

39
Operational Data Store

https://sqlwithmanoj.files.wordpress.com/2014/12/ods_dw.jpg

40
Operational Data Store
• ODS support only daily operations, so their view of historical data is very
limited. Although they work very well as sources of current data and are
often used as such by data warehouses, they do not support historically rich
queries.

• In ODS, Data warehouse is refreshed in real time. Hence, it is widely


preferred for routine activities like storing records of the Employees.

41
Operational Data Store

https://static.javatpoint.com/tutorial/datawarehouse/images/operational-database-vs-data-warehouse.png

42
Operational Database vs. Data Warehouse
• The Operational Database is the source • Data Warehouse Systems serve users or
of information for the data warehouse. knowledge workers in the purpose of
It includes detailed information used to data analysis and decision-making. Such
run the day to day operations of the systems can organize and present
business. The data frequently changes information in specific formats to
as updates are made and reflect the accommodate the diverse needs of
current value of the last transactions. various users. These systems are called
as Online-Analytical Processing (OLAP)
• Operational Database Management Systems.
Systems also called as OLTP (Online
Transactions Processing Databases),
are used to manage dynamic data in
real-time.

43
Advantages of Operational Data Stores
• An ODS provides access to only the current, finely crumbled, non-aggregated,
less complicated data, which can be queried in a well-fitted approach without
using the operational systems.

• When the Reporting & Analysis tools need data that are closer to real-time
operations, it can be queried from the Operational Data Store, as and when it is
received from the respective source systems, instead of opting for more
prolonging conversion and loading operations from the other database sources
in the data warehouse systems.

• ODS is a secure option as it does not contain all the historical data and
operations, which makes it resilient for any cyber-attacks or hacking of data
privacy.
44
Advantages of Operational Data Stores
• An Operational Data Store is a practically feasible structural design option
when bearing in mind the complex requirements provided by the business, in
order to generate input for analysis and reporting processes that leads to
business decision-making.

• ODS facilitates considerably less total time for turnaround when in trouble, like
the environment failure or when a database needs a restart, which implicates
less stress on the business or the stack holders of the application.

• In order to fetch data from an ODS, querying need not be complex or with
multi-level joins and conditional Simple queries will be sufficient enough as it
holds moderately detailed operational data.

45
Data Mart

http://1.bp.blogspot.com/-pUOXIPxlxps/VLHslazE74I/AAAAAAAAIXM/Rj3oUxpe_dY/s1600/DWH_P002_004.jpg
Data Mart
• A data mart is a subset of the data warehouse. It is specially designed
for a particular line of business, such as sales, finance, sales or finance.
In an independent data mart, data can collect directly from sources.

• A data mart is a scaled-down version of a data warehouse aimed at


meeting the information needs of a homogeneous small group of end
users such as a department or business unit (marketing, finance,
logistics, or human resources). It typically contains some form of
aggregated data and is used as the primary source for report
generation and analysis by this end user group.

47
Types of Data Mart
1. Dependent: A dependent data mart
allows sourcing organization's data from a
single Data Warehouse.

https://www.guru99.com/images/1
/022218_0604_DataMartTut1.png

48
Types of Data Mart
2. Independent: Independent data mart is
created without the use of a central data
warehouse.

https://www.guru99.com/images/1/02
2218_0604_DataMartTut2.png

49
Types of Data Mart
3. Hybrid: This type of data marts can take
data from data warehouses or operational
systems.

https://www.guru99.com/images/1/
022218_0604_DataMartTut3.png

50
Steps in Implementing a Data Mart

Designing Constructing Populating Accessing Managing

51
Steps in Implementing a Data Mart
Designing
It covers all the tasks between initiating the request for a data mart to gathering
information about the requirements. Finally, we create the logical and physical design
of the data mart.

Constructing
It involves creating the physical database and the logical structures.

Populating
data in populated in the data mart.

52
Steps in Implementing a Data Mart
Accessing
Involves putting the data to use: querying the data, creating reports, charts, and
publishing them. End-user submit queries to the database and display the results of the
queries

Managing
This step covers management tasks such as-
• Ongoing user access management.
• System optimizations and fine-tuning to achieve the enhanced performance.
• Adding and managing fresh data into the data mart.
• Planning recovery scenarios and ensure system availability in the case when the
system fails.

53
Advantages of Data Mart
• Data marts contain a subset of organization-wide data. This Data is valuable to a
specific group of people in an organization.
• It is cost-effective alternatives to a data warehouse, which can take high costs to
build.
• Data Mart allows faster access of Data.
• Data Mart is easy to use as it is specifically designed for the needs of its users. Thus a
data mart can accelerate business processes.
• Data Marts needs less implementation time compare to Data Warehouse systems. It
is faster to implement Data Mart as you only need to concentrate on subset of data.
• It contains historical data which enables the analyst to determine data trends.

54
Disadvantages of Data Mart
• Many times enterprises create too many unrelated data marts without much
benefit. It can become a big hurdle to maintain.

• Data Mart cannot provide company-wide data analysis as their data set is limited.

55
References:
• Guru99, 2020. What is Data Warehouse? Types, Definition & Example. Retrieved
from https://www.guru99.com/data-warehousing.html#4.
• EDUCBA, 2020. Types of Data Warehouse. Retrieved from
https://www.educba.com/types-of-data-warehouse/
• Technopedia, 02-May-2013. Enterprise Data Warehouse. Retrieved from
https://www.techopedia.com/definition/26204/enterprise-data-warehouse
• Altexsoft Software R&D Engineering, 24-Oct-2019. Enterprise Data Warehouse:
Concepts, Architecture, and Components. Retrieved from
https://www.altexsoft.com/blog/enterprise-data-warehouse-concepts/
• Oracle, 2020. What is a Data Warehouse?. Retrieved from
https://www.oracle.com/database/what-is-a-data-warehouse/

56
References:
• JavaTpoint, 2018. Difference between Operational Database and Data Warehouse.
Retrieved from https://www.javatpoint.com/operational-database-vs-data-
warehouse
• EDUCBA, 2020. Types of Data Warehouse. Retrieved from
https://www.educba.com/operational-data-stores/
• Rouse, M., TechTarget. Operational Data Store (ODS). Retrieved from
https://searchoracle.techtarget.com/definition/operational-data-store
• Grenier, R. 05-Feb-2011. Operational Data Stores (ODS). Retrieved from
http://randygrenier.blogspot.com/2011/02/operational-data-stores-ods.html

57
References:
• Baesens, B., 26-Sept-2018. Data Warehouses, Data Marts, Operational Data Stores,
and Data Lakes: What’s in a Name?. Database Trends and Applications. Retrieved
from http://www.dbta.com/BigDataQuarterly/Articles/Data-Warehouses-Data-
Marts-Operational-Data-Stores-and-Data-Lakes-Whats-in-a-Name-127417.aspx
• EDUCBA, 2020. Types of Data Warehouse. Retrieved from
https://www.educba.com/operational-data-stores/
• Guru 99, 2020. What is Data Mart? Types & Example. Retrieved from
https://www.guru99.com/data-mart-tutorial.html

58
INTENDED LEARNING OUTCOME

• Describe star schema


• Describe snowflake schema
• Describe Galaxy schema
• Understand the difference between the types of data warehouse
schema based on its design.

4
Types of Data Warehouse Schema

• Star Schema

• Snowflake Schema

• Galaxy Schema

5
Schema

• A schema is a logical description that describes the entire


database.
• In the data warehouse there includes the name and description of
records.
• It has all data items and also different aggregates associated with
the data.

6
Star Schema

http://www.durofy.com/wp-content/uploads/2013/07/star.png https://www.guru99.com/images/1/022218_0758_StarandSnow1.png

7
Star Schema
• It is known as star schema as its structure
resembles a star.
• The star schema is the simplest type of
Data Warehouse schema.
• It is also known as Star Join Schema and is
optimized for querying large data sets.
• The center of the star can have one fact
table and a number of associated
dimension tables.

http://www.durofy.com/wp-content/uploads/2013/07/star.png

8
Star Schema

Fact Table

https://www.oercommons.org/editor/images/6978

https://i.stack.imgur.com/aB9k9.jpg

9
Star Schema

Fact Table

• A Fact table in a Data Warehouse system is nothing but the table that contains
all the facts or the business information, which can be subjected to analysis
and reporting activities when required.
• These tables hold fields that represent the direct facts, as well as the foreign
fields that are used to connect the fact table with other dimension tables in
the Data Warehouse system.
• A Data Warehouse system can have one or more fact tables, depending on the
model type used to design the Data Warehouse.

10
Star Schema

Dimension Table

https://www.dotnetinterviewquestions.in/contentpics/star1.JPG

11
Star Schema

Dimension Table

• Dimension is a collection of reference information about a measurable in the


fact table.
• The primary key column of the dimension table has uniquely identifies each
dimension record or row.
• The dimension tables are organized has descriptive attributes. For example, a
customer dimension’s attributes could include first and last name, birth date,
gender, Qualification, Address etc.

12
Characteristics of Star Schema
• Every dimension in a star schema is represented with only one-dimension
table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a foreign key
• The dimension table are not joined to each other
• Fact table would contain key and measure
• The Star schema is easy to understand and provides optimal disk usage.
• The dimension tables are not normalized.
• The schema is widely supported by BI Tools

13
Steps in Designing Star Schema
1. Identify the business process.
2. Identify the measures or fact
data.
3. Identify the Dimension table
related to fact table.
4. List of attributes or column in
each dimension table
5. Find the lowest level of
aggregation or summary
analysis in the fact table.
https://www.oercommons.org/editor/images/6929

14
Snowflake Schema

https://upload.wikimedia.org/wikipedia/commons/1/1d/Snowflake_schema.png https://static.javatpoint.com/tutorial/datawarehouse/images/star-schema-vs-snowflake-
schema2.png

15
Snowflake Schema

https://www.guru99.com/images/1/022218_0758_StarandSnow2.png

16
Snowflake Schema

• A snowflake schema is an extension of star schema where the dimension


tables are connected to one or more dimensions.
• The dimension tables are normalized which splits data into additional tables.
• Snowflake schema keeps same fact table structure as star schema.
• In the dimension, it has multiple levels with multiple hierarchies. From each
hierarchy of levels any one level can be attached to Fact Table.

17
When to Use Snowflake Schema

Ralph Kimball, the data warehousing guru, proposes three cases where snowflake
implementation is not only acceptable but is also the key to a successful design:

• Large customer dimensions where, for example, 80 percent of the fact table
measurements involve anonymous visitors about whom you collect little
detail, and 20 percent involve reliably registered customers about whom you
collect much detailed data by tracking many dimensions

18
When to Use Snowflake Schema

• Financial product dimensions for banks, brokerage houses, and insurance


companies, because each of the individual products has a host of special
attributes not shared by other products

• Multi-enterprise calendar dimensions because each organization has


idiosyncratic fiscal periods, seasons, and holidays

19
Characteristics of Snowflake Schema

• The main benefit of the snowflake schema is it uses smaller disk space.
• Easier to implement a dimension is added to the Schema.
• Due to multiple tables, query performance is reduced.
• The primary challenge that you will face while using the snowflake Schema
is that, you need to perform more maintenance efforts because of the more
lookup tables.

20
Galaxy Schema

http://wiki.bi-academy.eu/lib/exe/fetch.php?media=bicn01:modelling:en-img-0079.png

21
Galaxy Schema
• Multiple fact tables share
dimension tables.
• This schema is viewed as
collection of stars hence called
galaxy schema.
• It is also called Fact
Constellation Schema.

https://www.guru99.com/images/1/022218_0758_StarandSnow3.png

22
Characteristics of Galaxy Schema
• The dimensions in this schema are separated into separate dimensions based
on the various levels of hierarchy. For example, if geography has four levels of
hierarchy like region, country, state, and city then Galaxy schema should have
four dimensions.
• Moreover, it is possible to build this type of schema by splitting the one-star
schema into more Star schema.
• The dimensions are large in this schema which is needed to build based on the
levels of hierarchy.
• This schema is helpful for aggregating fact tables for better understanding.

23
References
• EDUCBA, 2020. Introduction to Data warehouse Schema. Retrieved from
https://www.educba.com/data-warehouse-schema/
• TechTerms, 2020. Schema. Retrieved from
https://techterms.com/definition/schema
• Guru99, 2020. Star and SnowFlake Schema in Data Warehouse. Retrieved from
https://www.guru99.com/star-snowflake-data-warehousing.html
• OER Commons – Open Educational Resources. Data warehouse Design Using
Oracle. Retrieved from https://www.oercommons.org/authoring/21861-data-
warehouse-design-using-oracle/3/view
• EDUCBA, 2020. Overview of Fact Table. Retrieved from
https://www.educba.com/what-is-fact-table/

24
References
• EDUCBA, 2020. Introduction to Data warehouse Schema. Retrieved from
https://www.educba.com/data-warehouse-schema/
• JavaTpoint, 2018. Difference between Star and Snowflake Schemas. Retrieved
from https://www.javatpoint.com/data-warehouse-star-schema-vs-snowflake-
schema
• Star and Snowflake Schema. Retrieved from
https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/ow
b/owb10gr2_gs/owb/lesson3/starandsnowflake.htm

25
INTENDED LEARNING OUTCOME

• Define OLAP
• Determine the types of OLAP
• Understand OLAP operations.
• List OLAP Tools
• Define ETL and understand its process in data warehouse.

4
Online Analytical Processing (OLAP)

• The OLAP (Online Analytical Processing) is a powerful


technology behind many Business Intelligence (BI) applications
that discovers data, report viewing capabilities, complex
analytical calculations, and predictive “what if” scenario, budget
planning, forecast planning.

• OLAP is for the data analytic purpose, hence it enables us to


analyze information from multiple database systems at the same
time.

5
Online Analytical Processing (OLAP)

• It is a computing method that allows users to easily extract


required data and query data in order to analyze it from different
points of view.

• It is basically based on the huge data that is called data


warehouse; it collects the required data from the data warehouse
and perform the business required analysis to take some decision
in the business to improve in profit, to improve sale, to improve
brand, to improve marketing and so all.

6
OLAP Architecture

https://image.slidesharecdn.com/1-150506060450-conversion-gate01/95/14-data-warehouse-13-638.jpg?cb=1430892370

7
How Does It Work?
• A Data warehouse would extract information from multiple data sources and
formats like text files, excel sheet, multimedia files, etc.

• The extracted data is cleaned and transformed. Data is loaded into an OLAP
server (or OLAP cube) where information is pre-calculated in advance for
further analysis.

8
OLAP Cube

• An OLAP Cube is a data structure that


allows fast analysis of data according
to the multiple Dimensions that
define a business problem. A
multidimensional cube for reporting
sales might be, for example,
composed of 7 Dimensions:
Salesperson, Sales Amount, Region,
Product, Region, Month, Year. https://gccontent.blob.core.windows.net/gccontent/blo
gs/legacy/c1/2014/11/OLAP_cube.png

9
Basic Analytical Operations of OLAP

• Roll-up – Also known


as drill-up or
consolidation, use to
summarize operation data
along with the dimension.

https://www.guru99.com/images/1/022218_1238_WhatisOLAPO2.png

10
Basic Analytical Operations of OLAP

• Drill-down – To perform
the analysis in deeper
among the dimensions of
data. For example, drilling
down from “time period”
to “years” and “months”
and to “days” and so on to
plot sales growth for a
product.
https://www.guru99.com/images/1/022218_1238_WhatisOLAPO3.png

11
Basic Analytical Operations
of OLAP

• Slice – To perform the analysis


take one level of information
for display, such as “sales in
2019.”

https://www.guru99.com/images/1/022218_1238_WhatisOLAPO4.png

12
Basic Analytical Operations
of OLAP

• Dice – To perform the analysis,


select data from multiple
dimensions to analyze, such as
“sales of Laptop in Region 4 in
2019.”

https://www.guru99.com/images/1/022218_1238_WhatisOLAPO5.png

13
Basic Analytical Operations
of OLAP

• Pivot – To perform the


analysis that can gain a
new view of data by
rotating the data axes
of the cube.

https://www.guru99.com/images/1/022218_1238_WhatisOLAPO6.png

14
Types of OLAP System

• Relational OLAP (ROLAP)


• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
• Desktop OLAP (DOLAP)
• Web OLAP (WOLAP)
• Mobile OLAP
• Spatial OLAP (SOLAP)
https://www.guru99.com/images/1/022218_1238_WhatisOLAPO7.png

15
Types of OLAP System
TYPE OF OLAP EXPLANATION
ROLAP utilize a relational database management system to keep
and control the data. These are the servers that exist
between the database and the user. ROLAP systems work
on the information that resides in a relational database.

MOLAP This server utilizes a multi-dimensional Database (MDDB)


for storing and analyzing information. MDDB can
proficiently store summaries, giving a method for quick
questioning and recovering information from the database
for processing.

16
Types of OLAP System
TYPE OF OLAP EXPLANATION
HOLAP It is a blend of MOLAP and ROLAP. By utilizing both ROLAP
and MOLAP information stores, Hybrid OLAP offers the
qualities of both techniques. HOLAP stores data summaries
in the binary files or in the pre-calculated cubes. It leaves
the quantities of fact and dimension information in the
relational database.

DOLAP Desktop On-Line Analytical Processing (DOLAP) is a single-


tier, desktop-based OLAP technology.

17
Types of OLAP System
TYPE OF OLAP EXPLANATION
WOLAP Web OLAP which is OLAP system accessible via the web
browser. WOLAP is a three-tiered architecture. It consists of
three components: client, middleware, and a database
server.
Mobile OLAP Mobile OLAP helps users to access and analyze OLAP data
using their mobile devices
SOLAP SOLAP is created to facilitate management of both spatial
and non-spatial data in a Geographic Information system
(GIS)

18
Examples of OLAP Tools

• IBM Cognos

• Micro Strategy

• Palo OLAP Server

• Apache Kylin

• icCube

19
Examples of OLAP Tools

• Pentaho BI
• Mondrian
• Oracle Business Intelligence
Enterprise Edition(OBIEE)
• JsHypercube
• Jedox

20
Advantages of OLAP Disadvantages of OLAP

• High Speed of Data Processing • High Cost


• Aggregated and Detailed Data • OLAP is Relational
• Multidimensional Data Representation • Computation Capability
• Using Familiar Business Expressions • Some Potential Risk
• “What-if” Scenarios
• Flat Learning Curve

21
22
What is ETL?

https://lh5.googleusercontent.com/Dg1ZqrEAp-nzxVEog5ZfI2lKD-
quLUttcbQoawUN1a5gyk4lL3DG7k6FSA9_rI_oi9aPGyZZDntlo5ErYoB4os4S07vlHP1Y4iWW1XCy3-UaFpaFcwibzR3GVTSEI50j81FXmmA

23
What is ETL?

• ETL stands for Extract, Transform and Load.

• It is a process in data warehousing used to extract data from the


database or source systems and after transforming placing the
data into data warehouse. It is a combination of three database
functions i.e. Extract, Transform and Load.

24
ETL Process

http://busitelce.com/images/14articles/etl-process/etl-process-in-data-warehouse.jpg

25
ETL Process

• Step 1 - Extraction

All the preferred data from various source systems such as databases,
applications, and flat files is identified and extracted. Data extraction can be
completed by running jobs during non-business hours.

26
ETL Process

• Step 1 - Extraction

Data Extraction Strategies


Full Extraction: This is followed when whole data from sources get loaded into the data
warehouses that show either data warehouse is being populated the first time or no strategy
has been made for data extraction.

Partial Extraction (with update notification): This strategy is also known delta, where only the
data being changed is extracted and update data warehouses

Partial Extraction (without update notification): This strategy refers to extract specific required
data from sources according to load in the data warehouses instead of extracting whole data.

27
ETL Process

• Step 2 - Transformation

Most of the extracted data can’t be directly loaded into the target system.
Based on the business rules, some transformations can be done before
loading the data.
The transformation process also corrects the data, removes any incorrect
data and fixes any errors in the data before loading it.

28
ETL Process

• Step 3 - Loading

All the gathered information is loaded into the target Data Warehouse
tables.
Types of Loading:
Initial Load — populating all the Data Warehouse tables
Incremental Load — applying ongoing changes as when needed periodically.
Full Refresh —erasing the contents of one or more tables and reloading with
fresh data.

29
Example of ETL Tools
RightData
Informatica Data Validation SSISTester
QuerySurge TestBench
ICEDQ GTL QAceGen
Datagaps ETL Validator Zuzena Automated Testing Service
QualiDI DbFit
Talend Open Studio for Data Integration AnyDbTest
Codoid's ETL Testing Services 99 Percentage ETL Testing
Data Centric Testing

30
References
• MOOCs : edX Courses. StanfordOnline: SOE.YDB-OLAP0001
Databases: On-Line Analytical Processing
• EDUCBA, 2020. What is OLAP?. Retrieved from
https://www.educba.com/what-is-olap/
• EDUCBA, 2020. Types of OLAP. Retrieved from
https://www.educba.com/types-of-olap/
• Guru99, 2020. What is OLAP? Cube, Operations & Types in Data Warehouse.
Retrieved from https://www.guru99.com/online-analytical-processing.html#2
• OLAP.com. OLAP Cube. Retrieved from https://olap.com/learn-bi-olap/olap-bi-
definitions/olap-cube/
• OLAP.com. Types of OLAP System. Retrieved from https://olap.com/types-of-
olap-systems/

31
References
• Tutorialspoint Simply Easy Learning, 2020. Data Warehousing – OLAP.
Retrieved from https://www.tutorialspoint.com/dwh/dwh_olap.htm
• Software Testing Help, 2020. Top 10 Best Analytical Processing (OLAP) Tools:
Business Intelligence. Retrieved from
https://www.softwaretestinghelp.com/best-olap-tools/
• EDUCBA, 2020. OLAP Tools. Retrieved from https://www.educba.com/olap-
tools/
• Galaktikasoft, 2020. Advantages Of Using OLAP for Business Intelligence.
Retrieved from https://galaktika-soft.com/blog/advantages-of-using-olap-for-
business-intelligence.html

32
References
• EDUCBA, 2020. What is ETL?. Retrieved from https://www.educba.com/what-
is-etl/
• Software Testing Help, 16-Apr-2020. What Is ETL (Extract, Transform, Load)
Process In Data Warehouse?. Retrieved from
https://www.softwaretestinghelp.com/etl-process-in-data-warehouse/
• Software Testing Help, 16-Apr-2020. Top 10 ETL Testing Tools In 2020.
Retrieved from https://www.softwaretestinghelp.com/top-4-etl-testing-tools/

33
INTENDED LEARNING OUTCOME

• Understand Data Mining Concepts


• Understand the uses of data mining
• Enumerate and explain the different techniques used in data mining
• Describe supervised and unsupervised learning
• Enumerate the different data mining algorithms
• Understand the steps of Crisp DM
• Understand the concept of Define KDD

4
What is Data Mining?

https://miro.medium.com/max/2800/0*2883vRd4rpXWUIz_.jpg
What is Data Mining?
It is basically the extraction of vital
information/knowledge from a large set of data.

Fundamentally, data mining is about processing


data and identifying patterns and trends in that
information so that you can decide or judge.

Data mining is also called as Knowledge


discovery, Knowledge extraction, data/pattern
analysis, information harvesting, etc.
https://www.helponclick.com/blog/data-
mining-applications-improved-customer-
support/
What is Data Mining?
Its foundation comprises three scientific
disciplines:
statistics (the numeric study of data
relationships)
artificial intelligence (human-like
intelligence displayed by software
https://www.sas.com/en_us/insights/analyti
and/or machines) cs/data-mining.html

machine learning (algorithms that can


learn from data to make predictions).
Data Mining Examples
Example 1:

So there is a Mobile network operator. They consult a Data miner to dig into the call
records of the operator. No specific targets are given to the Data Miner.
A quantitative target of finding at least 2 new patterns in a month is given.
As the data miner starts digging into the data he finds a pattern that there are less
international calls on Wednesday as compared to other days.
This information is shared with the management and they come up with the plan to
reduce the international call rates on Wednesdays and start a campaign.
Call rates surge, customers are happy with low call price, more customers sign up and the
company makes more money! Win-Win situation!
Data Mining Examples
Example 2 :

Consider a marketing head of telecom service provides who wants to increase revenues
of long distance services. For high ROI on his sales and marketing efforts customer
profiling is important. He has a vast data pool of customer information like age, gender,
income, credit history, etc. But its impossible to determine characteristics of people who
prefer long distance calls with manual analysis. Using data mining techniques, he may
uncover patterns between high long distance call users and their characteristics.
For example, he might learn that his best customers are married females between the
age of 45 and 54 who make more than $80,000 per year. Marketing efforts can be
targeted to such demographic.
Data Mining Examples

Example 3:

A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were halved.
Bank has multiple years of record on average credit card balances, payment
amounts, credit limit usage, and other key parameters. They create a model to
check the impact of the proposed new business policy. The data results show that
cutting fees in half for a targetted customer base could increase revenues by $10
million.
Data Mining Techniques

Cluster Analysis. Enables to


identify a given user group
according to common features in
a database. These features could
include age, geographic location,
education level and so on.
Data Mining Techniques

Anomaly Detection.
It is used to determine when
something is noticeably different
from the regular pattern. It is
used to eliminate any database
inconsistencies or anomalies at
the source.
Data Mining Techniques

Regression Analysis.
This technique is used to make
predictions based on
relationships within the data set.
Data Mining Techniques

Classification.
This deals with the things which have
labels on it. Note in cluster detection,
the things did not have a label in it and
by using data mining we had to label
and form into clusters, but in
classification, there is information
existing that can be easily classified
using an algorithm.
Data Mining Techniques

Associative Learning.
It is used to analyze which things
tend to occur together either in
pairs or larger groups.
Example of Data Mining Tools
• Rapid Miner
• Orange
• Weka
• Knime
• Sisense
• SQL Server Data
Tools (SSDT)
• Apache Mahout
• Oracle Data Mining
Example of Data Mining Tools

• Rattle
• DataMelt
• IBM Cognos
• IBM SPSS Modeler
• SAS Data Mining
• Teradata
• Board
• Dundas BI
Benefits of Data Mining
• Data mining technique helps companies to get knowledge-based
information.
• Data mining helps organizations to make the profitable adjustments in
operation and production.
• The data mining is a cost-effective and efficient solution compared to
other statistical data applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviors as well as
automated discovery of hidden patterns.
• It can be implemented in new systems as well as existing platforms
• It is the speedy process which makes it easy for the users to analyze
huge amount of data in less time.
Disadvantages of Data Mining
• There are chances of companies may sell useful information of their
customers to other companies for money.
• Many data mining analytics software is difficult to operate and requires
advance training to work on.
• Different data mining tools work in different manners due to different
algorithms employed in their design. Therefore, the selection of correct
data mining tool is a very difficult task.
• The data mining techniques are not accurate, and so it can cause serious
consequences in certain conditions.
Data Mining Applications
• Communications
• Insurance
• Education
• Manufacturing
• Banking
• Retail
• Service Providers
• E-Commerce
• Super Markets
• Crime Investigation
• Bioinformatics

20
Data Mining Applications
Applications Usage
Communications Data mining techniques are used in communication sector to predict
customer behavior to offer highly targeted and relevant campaigns.
Insurance Data mining helps insurance companies to price their products profitable
and promote new offers to their new or existing customers.
Education Data mining benefits educators to access student data, predict
achievement levels and find students or groups of students which need
extra attention. For example, students who are weak in maths subject.
Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of
production assets. They can anticipate maintenance which helps them
reduce them to minimize downtime.
Data Mining Applications
Applications Usage
Banking Data mining helps finance sector to get a view of market risks and manage
regulatory compliance. It helps banks to identify probable defaulters to
decide whether to issue credit cards, loans, etc.
Retail Data Mining techniques help retail malls and grocery stores identify and
arrange most sellable items in the most attentive positions. It helps store
owners to comes up with the offer which encourages customers to
increase their spending.
Service Providers Service providers like mobile phone and utility industries use Data Mining
to predict the reasons when a customer leaves their company. They
analyze billing details, customer service interactions, complaints made to
the company to assign each customer a probability score and offers
incentives.
Data Mining Applications
Applications Usage
E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-
sells through their websites. One of the most famous names is
Amazon, who use Data mining techniques to get more customers
into their eCommerce store.
Super Markets Data Mining allows supermarket's develope rules to predict if their
shoppers were likely to be expecting. By evaluating their buying
pattern, they could find woman customers who are most likely
pregnant. They can start targeting products like baby powder, baby
shop, diapers and so on.
Crime Investigation Data Mining helps crime investigation agencies to deploy police
workforce (where is a crime most likely to happen and when?), who
to search at a border crossing etc.
Bioinformatics Data Mining helps to mine biological data from massive datasets
gathered in biology and medicine.
Predictive vs. Descriptive Data Mining Algorithm

Predictive Descriptive

https://www.researchgate.net/figure/Main-data-mining-techniques_fig2_270552309
Predictive Data Mining vs. Descriptive Data Mining

The descriptive analysis is used to mine data and provide the latest
information on past or recent events.

The predictive analysis provides answers of the future queries that move
across using historical data as the chief principle for decisions.

25
Key Data Mining Algorithms
Key Data Mining Algorithms

Supervised learning requires a known output, sometimes called a label or


target. These algorithms include Naïve Bayes, Decision Tree, Neural
Networks, SVMs, Logistic Regression, etc.

Unsupervised learning algorithms do not require a predefined set of outputs


but rather look for patterns or trends without any label or target. They
include k-Means Clustering, Anomaly Detection, and Association Mining.

27
What is Data Mining Algorithm?
• Data Mining Algorithms are a special category of algorithms that are useful
for analyzing data and developing data models to identify meaningful
patterns. These are part of machine learning algorithms.
• These algorithms are implemented through various programming like R
language, Python and using data mining tools to derive the optimized data
models.
• These algorithms are part of data analytics implementation for business.
• These algorithms are based upon statistical and mathematical formulas
which applied to the data set.
Data Mining Algorithms

• C4.5
• k-means
• Support Vector Machines (SVM)
• Apriori
• Expectation-Maximization (EM)
• PageRank
• AdaBoost
• kNN (k-Nearest Neighbors)
• Naïve Bayes
• Classification and Regression Trees (CART)
Data Mining Algorithms
C4.5
C4.5 constructs a classifier in the form of a decision tree. In order to do this, C4.5 is
given a set of data representing things that are already classified.

k-means
k-means creates k groups from a set of objects so that the members of a group are
more similar. It’s a popular cluster analysis technique for exploring a dataset.

SVM
Support vector machine (SVM) learns a hyperplane to classify data into 2 classes. At
a high-level, SVM performs a similar task like C4.5 except SVM doesn’t use decision
trees at all.
Data Mining Algorithms
Apriori
The Apriori algorithm learns association rules and is applied to a database
containing a large number of transactions.

EM
In data mining, expectation-maximization (EM) is generally used as a clustering
algorithm (like k-means) for knowledge discovery.

PageRank
PageRank is a link analysis algorithm designed to determine the relative
importance of some object linked within a network of objects.
Data Mining Algorithms

AdaBoost
AdaBoost is a boosting algorithm which constructs a classifier. As you
probably remember, a classifier takes a bunch of data and attempts to predict
or classify which class a new data element belongs to.

kNN
kNN, or k-Nearest Neighbors, is a classification algorithm. However, it differs
from the classifiers previously described because it’s a lazy learner.
Data Mining Algorithms

Naïve Bayes
Naive Bayes is not a single algorithm, but a family of classification
algorithms that share one common assumption: Every feature of the data
being classified is independent of all other features given the class.

CART
CART stands for classification and regression trees. It is a decision tree
learning technique that outputs either classification or regression trees.
Like C4.5, CART is a classifier.
CRISP-DM

• CRISP-DM stands for Cross Industry Standard Process for Data


Mining.

• Is a 1996 methodology created to shape Data Mining projects. It


consists of 6 steps to conceive a Data Mining project and they
can have cycle iterations according to developers’ needs.

34
Phases of CRISP-DM

https://cdn.educba.com/academy/wp-content/uploads/2019/02/steps-involved.png
Phases of CRISP-DM
1. Business Understanding

Focuses on understanding the project objectives and requirements from a


business perspective, and then converting this knowledge into a data mining
problem definition and a preliminary plan.

2. Data Understanding

Starts with an initial data collection and proceeds with activities in order to get
familiar with the data, to identify data quality problems, to discover first
insights into the data, or to detect interesting subsets to form hypotheses for
hidden information.

36
Phases of CRISP-DM

3. Data Preparation

The data preparation phase covers all activities to construct the final dataset
from the initial raw data.

4. Modeling

Modeling techniques are selected and applied. Since some techniques like
neural nets have specific requirements regarding the form of the data, there
can be a loop back here to data prep.

37
Phases of CRISP-DM
5. Evaluation

Once one or more models have been built that appear to have high quality based on
whichever loss functions have been selected, these need to be tested to ensure they
generalize against unseen data and that all key business issues have been sufficiently
considered. The end result is the selection of the champion model(s).

6. Deployment

Consists of presenting the results in a useful and understandable manner, and by


achieving this, the project should achieve its goals. It is the only step not belonging to
a cycle.

38
Knowledge Discovery in Databases (KDD)
• Some people don’t differentiate data mining from knowledge discovery while
others view data mining as an essential step in the process of knowledge
discovery.

• The term knowledge discovery in databases, or KDD for short, refers to the
broad process of finding knowledge in data, and emphasizes the high level
application of particular data minded methods. It is of interest to researchers
in machine learning, pattern recognition, databases, statistics, artificial
intelligence, knowledge acquisition for expert systems, and data visualization.

• The ultimate goal is to extract high-level knowledge from low-level data.

39
9-step KDD Process Starting from Managerial Step

Step 1. Developing and understanding of the application domain.


Step 2 Selecting and creating a data set on which discovery will be performed,
based on goals.
Step 3. Pre-processing and cleansing. Data reliability is enhanced in this stage.
Step 4. Data Transformation.
Step 5. Choosing the appropriate data mining task.

40
9-step KDD Process Starting from Managerial Step

Step 6. Choosing the data mining algorithm.


Step 7. Next is employing the data mining algorithm.
Step 8. Evaluation.
Step 9. Using the discovered knowledge.

41
9-step KDD Process Starting from Managerial Step
(Explanation)

STEP1.Developing and understanding of the application domain. This is the


preparatory step that sets the scene for understanding what should be done with
transformation, algorithms, and representation. Those in charge of the KDD
project need to understand and define the goals of the end user, then where the
knowledge discovery process will take place, and other relevant prior knowledge.

42
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 2
Selecting and creating a data set on which discovery will be performed, based on
goals. Determine what data will be used for the knowledge discovery, such as:
what data is available, obtaining additional necessary data, and the integrating all
the data for the knowledge discovery into one data set, including the attributes
that will be considered for the process. This process is very important because the
data mining learns and discovers from the available data. This is the evidence base
for constructing the models. If some important attributes are missing, then the
entire study may fail. From this respect, the more attributes considered, the
better.

43
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 3
Pre-processing and cleansing. Data reliability is enhanced in this stage. It includes
data clearing, such as handling missing values, and removing of outliers.
It may involve complex statistical methods, or using a data mining algorithm in this
context. For example: If one suspects that a certain attribute is of insufficient
reliability, or has many missing data, then this attribute could become the goal of a
data mining supervised algorithm. A prediction model for this attribute will be
developed, and then missing data can be predicted.

44
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 4
Next is data transformation. In this stage, the generation of better data, for the
data mining is prepared and developed. Methods here include dimension
reduction, such as feature selection, and extraction, and record sampling, and
attribute transformation such as discretization of numerical attributes and
functional transformation.

45
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 5
Choosing the appropriate data mining task. We’re now ready to decide which type
of data mining to use. For example: classification, regression, or clustering. This
mostly depends on the KDD goals, and also on the previous steps. There are two
major goals in data mining: prediction and description. Prediction is often referred
to as supervised data mining, while descriptive data mining includes the
unsupervised, and visualization aspects of data mining.

46
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 6
Choosing the data mining algorithm. Now that you have the strategy, we can
decide which tactics to use. This stage includes selecting the specific method for
searching patterns, including multiple inducers. For example, when considering
precision versus understandability, the former is better with neural networks,
while the latter is better with decision trees.

47
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 7
Next is employing the data mining algorithm. Finally you can implement the
appropriate data mining algorithm. In this step we might need to employ the
algorithm several times until a satisfying the result is obtained. For instance, by
tuning the algorithms control parameters, such as the minimum number of
instances in a single leaf of a decision tree.

48
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 8
evaluation. In this stage we evaluate and interpret the mined patterns with respect
to the goals defined in the first step. Here we consider the pre-processing steps
with respect to their effect on the data mining algorithm results. This step focuses
on the comprehensible nature and usefulness of the induced model. In this step,
the discovered knowledge is also documented for further usage.

49
9-step KDD Process Starting from Managerial Step
(Explanation)

Step 9
Using the discovered knowledge. We’re now ready to incorporate the knowledge
into another system for further action. The knowledge becomes active in the sense
that we may make changes to the system, and measure the effects. Actually, the
success of this step determines the effectiveness of the entire KDD process.

50
References
• EDUCBA, 2020. Data Mining Algorithms. Retrieved from
https://www.educba.com/data-mining-algorithms/
• Ray Li, 2015. Top 10 Data Mining Algorithms. Hacker Bits. Retrieved from
https://hackerbits.com/data/top-10-data-mining-algorithms-in-plain-english/
• TechDifferences, 25-Mar-2019. Difference Between Descriptive and Predictive
Data Mining. Retrieved from https://techdifferences.com/difference-between-
descriptive-and-predictive-data-mining.html
• Guru99, 2020. Data Mining Tutorial: Process, Techniques, Tools, Examples.
Retrieved from https://www.guru99.com/data-mining-tutorial.html
• SAS, 2020. Data Mining What it is and why matters. SAS Institute Inc..
Retrieved from https://www.sas.com/en_us/insights/analytics/data-
mining.html

51
References
• JavaTpoint, 2018. Data Mining Tutorial. Retrieved from
https://www.javatpoint.com/data-mining
• EDUCBA, 2020. Introduction to Data Mining. Retrieved from
https://www.educba.com/introduction-to-data-mining/
• Guru99, 2020. Supervised vs Unsupervised Learning: Key Differences.
Retrieved from https://www.guru99.com/supervised-vs-unsupervised-
learning.html
• Software Testing Help, 16-Apr-2020. Top 15 Best Free Data Mining Tools: The
Most Comprehensive List. Retrieved from
https://www.softwaretestinghelp.com/data-mining-tools/

52
References
• RapidMiner, 2020. Data Mining Tools. Retrieved from
https://rapidminer.com/glossary/data-mining-tools/
• Vorhies, W., 26-Jul-2016. CRISP-DM – a Standard Methodology to Ensure a
Good Outcome. Data Science Central. Retrieved from
https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-
methodology-to-ensure-a-good-outcome
• Rodrigues, I.. CRISP-DM methodology leader in data mining and big data.
Towards Data Science. Retrieved from https://towardsdatascience.com/crisp-
dm-methodology-leader-in-data-mining-and-big-data-467efd3d3781
• Tutorialspoint, 2020. Data Mining - Knowledge Discovery. Retrieved from
https://www.tutorialspoint.com/data_mining/dm_knowledge_discovery.htm

53
References
• Marsh, J., 2014. Data Science Knowledge Discovery in Databases: 9 Steps to
Success. Udemy Blog. Retrieved from https://blog.udemy.com/knowledge-
discovery-in-databases/
• Tech Differences, 25-Mar-2019. Difference Between Descriptive and Predictive
Data Mining. Retrieved from https://techdifferences.com/difference-between-
descriptive-and-predictive-data-mining.html
• Han, J., Pei, J., Kamber, M.. Data Mining: Concepts and Techniques Third
Edition

54

You might also like