IT223 Advance Database System Course Pack Module 4
IT223 Advance Database System Course Pack Module 4
IT223 Advance Database System Course Pack Module 4
INSTITUTE OF COMPUTING
Bachelor of Science in Information Technology
New Visayas, Panabo City
©2021
COURSE
OVERVIEW
This Course pack is designed for educational administrators, school heads and
teachers.
This was been realized with the initiative of CHED RO XI through its PROJECT
WRITE. Some part of this course pack module was been developed by the
Advance Database Systems Module Team with its corresponding members across
the Colleges and Universities around the region:
• Lanie B. Laureano
• Erwin P. Acedillo
• Michelle Banawan
• Luchi A. Dela Cruz
• Vladimer Kobayashi
• Eric P. Lozarita
• Mary Cris S. Magbanua
• Arwin Bugayong Rañola
In order for learners to gain competency in this course, this course pack has been
structured into four modules as follows:
Students in this course are encouraged to go through each lesson in every module
sequentially to maximize their learning. They should work on all exercises as they
build on the concepts of each topic introduced in each lesson.
So, to make this learning experience rewarding for you, study this course pack
with your co-learners at your own pace. You can also ask the help and support of
your peers, classmates and corresponding instructors.
Godspeed!
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
TABLE OF MODULE 01
REVIEW OF RELATIONAL
DATABASE CONCEPTS
AND SQL COMMANDS
CONTENTS 02
10
Lesson 01: SQL Statement Review
Lesson 02: Deepening SQL
Functions
MODULE 02
ADVANCE DATABASE
OPERATIONS
20 Lesson 01: Efficiency of SQL
Indexes
25 Lesson 02: Virtual Tables: Creating
a View
31 Lesson 03: SQL Stored Procedures
37 Lesson 04: Utilizing SQL Triggers
MODULE 03
ADVANCE DATABASE
CONCEPTS
49 Lesson 01: Transaction Management
and Concurrency Control
56 Lesson 02: Foreign Key Constraints
62 Lesson 03: Query Optimization
MODULE 04
DATA WAREHOUSE AND
DATA MINING CONCEPTS
71 Lesson 01: Fundamentals of Data
Warehouse
81 Lesson 02: Business Intelligence
and Data Mining
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FORR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
4
MODULE 04
DATA
WAREHOUSE AND
DATA MINING
CONCEPTS
Business intelligence (BI) is the collection of
best practices and software tools developed
to support business decision making in this
age of globalization, emerging markets,
rapid change, and increasing regulation.
The complexity and range of information
required to support business decisions
has increased, and operational database
structures were unable to support all of
these requirements. Therefore, a new data
storage facility, called a data warehouse,
developed. The data warehouse extracts
its data from operational databases as well
as from external sources, providing a more
comprehensive data pool.
Module Outcomes:
• Discuss the importance of Big
Data
• List the life cycle of Big Data
• Differentiate the different life cycle
of Big Data
• Identify and discuss the phases of
the Big Data life cycle
• Design a data warehouse model
that addresses the information
requirements and decision-making
goals of an organization;
• Design an appropriate DW or Data
mining model for an organization
that will implement the correct
algorithm given the organization’s
strategic goal/s and the nature of
its data
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
photos/lVZjvw-u9V8
Source: Ayhan, I.E. (2020). Brown wooden hallway with gray metal doors [Photograph]. Unsplash. https://unsplash.com/
MODULE 04: LESSON 01
FUNDAMENTALS OF DATA
WAREHOUSE
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
• https://www.youtube.com/watch?v=AHR_7jFCMeY
• https://www.youtube.com/watch?v=VxJYySJ_c_0
ANALYSIS 1. In your own words, how is a database different from a data warehouse?
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 72
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
A data warehouse centralizes and consolidates large amounts of data from multiple
sources. Its analytical capabilities allow organizations to derive valuable business
insights from their data to improve decision-making. Over time, it builds a historical
record that can be invaluable to data scientists and business analysts. Because of
these capabilities, a data warehouse can be considered an organization’s “single
source of truth.”
What is ETL?
ETL is a process that extracts the data from different source systems, then transforms
the data (like applying calculations, concatenations, etc.) and finally loads the data
into the Data Warehouse system. Full form of ETL is Extract, Transform and Load.
• Financial services
• Banking services
• Consumer goods
• Retail sectors
• Controlled manufacturing
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 73
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 74
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
Comparison Chart
Star Schema
• Each dimension in a star schema is represented with only one-dimension
table.
• This dimension table contains the set of attributes.
• The following diagram shows the sales data of a company with respect to the
four dimensions, namely time, item, branch, and location.
• There is a fact table at the center. It contains the keys to each of four
dimensions.
• The fact table also contains the attributes, namely dollars sold and units sold.
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 75
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
* Note - Each dimension has only one dimension table and each table holds a set of attributes.
For example, the location dimension table contains the attribute set {location_key, street, city,
province_or_state,country}. This constraint may cause data redundancy. For example, “Vancouver”
and “Victoria” both the cities are in the Canadian province of British Columbia. The entries for such
cities may cause data redundancy along the attributes province_or_state and country
Snowflake Schema
• Some dimension tables in the Snowflake schema are normalized.
• The normalization splits up the data into additional tables.
• Unlike Star schema, the dimensions table in a snowflake schema are normalized.
For example, the item dimension table in star schema is normalized and split
into two dimension tables, namely item and supplier table.
* Note - Due to normalization in the Snowflake schema, the redundancy is reduced and therefore, it
becomes easy to maintain and the save storage space.
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 76
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
a. Define the main facts to be analyzed. (Hint: These facts become the source
for the design of the fact table.)
b. Define and describe the appropriate dimensions. (Hint: These dimensions
become the source for the design of the dimension tables.)
c. Draw the lab usage star schema, using the fact and dimension structures you
defined in Problems 1a and 1b.
d. Define the attributes for each of the dimensions in Problem B.
e. Implement your data warehouse design, using the star schema you created in
Problem C and the attributes you defined in Problem D.
f. Create the reports that will meet the requirements listed in this problem’s
introduction.
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 77
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
Solutions:
A. Facts:
Facts are defined as the numbers or the numeric values that are necessary to
explain the aspects of the business.
For example the numeric figures involved in the sale process of the company.
These figures can be the cost of the product, prices of the products or the
revenue of the total sale which is used in business analysis.
The facts in any company are stored in the tables and it represents the center
of the schema that is the fact table is situated in the center of the schema.
For the inquiry of the management of university computer lab there has to
be the data about the students. That is the quantity of students studying in
the university. The total number of computers available in the university. The
majors and the periods in a day and total in a semester.
Hence for the university schema the main facts are the sum of the number of
students by times, the semester the student the classification and the major.
These facts are used to create the tables on the basis of which the schema of
warehouses is implemented.
B. Dimensions:
The dimensions are defined as the possessions that define the quality of the
schema and they give some more aspects to the fact table.
The dimensions are the most important for the business as the decisions are
made on the basis of dimensions.
For example in the sale department the sale of a product is analyzed in every
region in which it is being sold.
C. Star Schema:
It is the schema for any company which is designed by analyzing the facts
and the dimensions of the company.
It is designed in the shape of a star that is the fact table resides in the center
and the tables of the dimensions reside around that fact table.
A Star schema diagram gives a brief idea about the data warehouse.
For the university management system the fact table is the LAB and the
dimensions of this table are time, class, major and the semester. The schema
for the lab is as shown in the Figure 4.7
D. Attributes:
The attributes are the properties of the entities that are the dimensions that
define their structure and characteristics.
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 78
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
E. Data Warehouse:
The data warehouse design drawn using the above star schema diagram is
shown below:
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 79
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE
F. Reports:
1. The total number of students in the different times of the day or the
semester. This information can be generated by tables STUDENT,
TIME and LAB.
2. The usages of the computers in the labs by the major and the student
category. This information can be extracted from the tables STURNT,
MAJOR and TIME.
3. The comparison of the majors with reference to the different
semesters. This information can be extracted from the tables MAJOR
and STUDENT.
Hence all the required information for the analysis and decision making can
be generated from the schema of the university lab management. That is the
schema is able to manage all database of the university lab management.
APPLICATION This Application will be done by group, based on the groupings set during the
approval of your Midterm Major Assessment Project Proposals. For complying this
requirement, you must perform the following assessment tasks:
Save this assessment task on a PDF file with the following format:
• File name - [Subject Title] [Course-year-set] [Last Names (Separated through
commas)] - Module [number] (Lesson [number]) ex. IT223 BSIT3Z Dela
Cruz, Doe, Rizal - Module 04 (Lesson 01).pdf
• Size - Letter (8.5 x 11 inches)
• Orientation - Portrait
• Margin - 1 Inch all sides
• Font - Style: Arial; Size: 10
• Spacing - Single
* Stipulate all of your respective full names, course, year, set, subject enrolled and the title of the
approved project in the first line/s (upper left corner) of the file. Make sure also to optimize your
image/s and crop only the essential parts for it to fit with the maximum file size required on the
LMS submission. Though this assessment was done through group but submission on LMS shall still
be in individual basis. Failure to do the following instructions will invalidate your submission to this
assessment.
> Congratulations! for completing the 1st lesson of the 4th module in Advance Database System
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 80
Source: Nowakowski, A. (2020). Man sitting in front of MacBook Pro [Photograph]. Unsplash. https://unsplash.com/photos/D4LDw5eXhgg
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING
• https://www.youtube.com/watch?v=AHR_7jFCMeY
ANALYSIS 1. In your own words, how is a database different from a data warehouse?
A. Decision support
B. Data Mining
C. OLAP
D. All of the above
3. Which of the following is an essential process in which the intelligent methods are
applied to extract data patterns?
A. Warehousing
B. Data Mining
C. Text Mining
D. Data Selection
A. Association analysis
B. Classification analysis
C. Regression analysis
D. Clustering analysis
E. All of the above
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 82
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING
BI tools perform data analysis and create reports, summaries, dashboards, maps,
graphs, and charts to provide users with detailed intelligence about the nature of the
business.
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 83
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING
Big data involves the process of storing, processing and visualizing data. It is essential
to find the right tools for creating the best environment to successfully obtain valuable
insights from your data.
Big data often provides companies with answers to the questions they did not know
they wanted to ask: How has the new HR software impacted employee performance?
How do recent customer reviews relate to sales? Analyzing big data sources illuminates
the relationships between all facets of your business.
Therefore, there is inherent usefulness to the information being collected in big data.
Businesses must set relevant objectives and parameters in place to glean valuable
insights from big data.
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 84
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING
• The New York Stock Exchange generates about one terabyte of new
trade data per day.
• Social Media
The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges,
putting comments etc.
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 85
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING
In that same vein, data mining is most optimal for processing datasets concentrated
on a particular department, customer segment, or competitor(s). By analyzing these
smaller datasets, data mining can reveal hidden answers to specific business questions.
Unlike the specificity of data mining, business intelligence processes dimensional or
relational databases in order to deduce how an enterprise is performing on the whole.
Since data mining is more oriented towards getting data into a usable format and
resolving unique business problems, the results of data mining are unique datasets.
Conversely business intelligence results are presented in charts, graphs, dashboards,
and reports. Displaying BI results is vital to influence data-driven decisions.
Lastly, data mining and business intelligence differ in their focus. Studying patterns
through data mining helps companies develop new KPIs for business intelligence.
Business intelligence is therefore focused on showing progress towards data mining-
defined KPIs. Broad metrics like total revenue, total customer support tickets, and
ARR over time paint a holistic picture of company performance and give stakeholders
the confidence to make significant decisions.
APPLICATION Look for a partner within your section/set and answer the following questions:
Save this assessment task on a PDF file with the following format:
• File name - [Subject Title] [Course-year-set] [Last Name/s] - Module [number]
(Lesson [number]) ex. IT223 BSIT3Z Dela Cruz & Doe - Module 04 (Lesson
02).pdf
• Size - Letter (8.5 x 11 inches)
• Orientation - Portrait
• Margin - 1 Inch all sides
• Font - Style: Arial; Size: 10
• Spacing - Single
* Stipulate your full name/s (arranged in ascending order), course, year, set and subject enrolled in
the first line/s (upper left corner) of the file. Make sure also to optimize your image/s and crop only
the essential parts for it to fit with the maximum file size required on the LMS submission. Tough this
assessment was done through by-pair but submission on LMS shall still be in individual basis.
Failure to do the following instructions will invalidate your submission to this assessment.
> Congratulations! for completing the 2nd lesson of the 4th module in Advance Database System
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 86
MODULE
SUMMARY
• Transaction Management is a sequence of many actions that are being performed by a single
user or application program, which reads or updates the contents of the database.
• There are properties that all transactions should follow and possess. The four basic are in
combination termed as ACID properties.
• Concurrency Control in Database Management System is a procedure of managing simultaneous
operations without conflicting with each other. It ensures that Database transactions are
performed concurrently and accurately to produce correct results without violating data
integrity of the respective Database.
• Foreign key is a column (or combination of columns) in a table that points to the primary key of
another table.
• Foreign key ensures values in one table must be present in another table.
• SQL Query optimization is a process of writing thoughtful SQL queries to improve database
performance.
REFERENCES
KidOstrichPerson7 (n.d). DBMS Concurrency Control. Retrieved March 12, 2021, from https://www.
coursehero.com/file/64141989/DBMS-Concurrency-Controldocx/
Mode Analytics, Inc (n.d.). Performance Tuning SQL Queries. Retrieved March 30, 2021 from https://
mode.com/sql-tutorial/sql-performance-tuning/.
Singh, A. (2020). Top 10 SQL Query Optimization Tips to Improve Database Performance. Retrieved
March 30, 2021 from https://www.mantralabsglobal.com/blog/sql-query-optimization-
tips/#:~:text=SQL%20Query%20optimization%20is%20a,to%20the%20queries%20they%20write
THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 87