Introduction To Relational Databases (RDBMS)
Introduction To Relational Databases (RDBMS)
Introduction To Relational Databases (RDBMS)
4. Introduction to Relational
Databases (RDBMS)
Semana 1
Review of data fundamentals
Information and Data Models
Entities
Information model
Data model
ERDs and types of relationship
Mapping entities to tables
Data types
Relational model concepts
Summary and highlights
Database architecture
2-Tier database architecture
3-Tier database architecture
Distributed architecture and clustered databases
Share disk architectures
Replication
Partitioning and sharding
Database usage patterns
Data engineers and database administrators
Data scientists and business analysts
Application developers and programmers
Introduction to Relational Database Offerings
IBM Db2
Features:
Products
Cloud park for data
MySQL
Storage engines
Clustering options
PostgreSQL
Summary and highlights
Semana 2
structured
RDBMS
semi-structured
unstructured
NoSQL
ibm db2
mysql
Entities
can be from the real world
Information model
conceptual
relationships
Data model
blueprint of a database system
relational model
data independence
entity → rectangle
attribute → oval
Attributes = columns
Data types
character string
integer
int
smallint
bigint
decimal
relation = table
relation schema
relation instance
Entities are objects that exist independently of any other entities in the database,
while attributes are the data elements that characterize the entity.
The building blocks of a relationship are entities, relationship sets, and crows
foot notations.
Data types define the type of data that can be stored in a column and can
include character strings, numeric values, dates/times, Boolean values and
more.
The advantages of using the correct data type for a column are data integrity,
data sorting, range selection, data calculations, and the of standard functions.
Database architecture
Local / Desktop
single-tier architecture
Client / Server
Cloud
The Data Access layer server includes interfaces for different types of clients which
can include data industry standard APIs such as JDBC and ODBC, Command Line
Processor (CLP) interfaces as well vendor specific or proprietary interfaces.
replication
partitioning
high availability
partitioning
sharding
command line
they talk to de DB
ORM frameworks
example: django
Commercial databases:
oracle
ibm db2
mysql
postgresql
sqlite
IBM Db2
db2 database
db2 warehouse
db2 on cloud
Features:
AI-powered funcionallity
column store
data skipping
Common SQL-Engine
Scalability
Products
Db2 database:
on premises
provides
high availability
scalabilitiy
resilience
Db2 warehouse:
on premises
provides
machine learning
Db2 on cloud:
fully managed
cloud-based
provides:
performance
high availability
scalability
resilience
fully managed
cloud-based
provides:
MPP
machine learning
mpp
advanced querying
works with:
RDBMS
NoSQL
platforms:
memory-optimized
avaialbility
scalability
security
mission critical
MySQL
MySQL is an object-relational database management system.
You can run MySQL on many versions of UNIX, as well as Microsoft Windows
and Linux and you can write client applications for it using most modern
programming languages.
MySQL uses standard SQL syntax, as well as its own extensions for additional
functionality such as the LOAD DATA statement that very quickly reads rows
from a text file into a database table
replication
Storage engines
InnoDB
transactions
row-level locking
clustered indexes
foreign keys
MyISAM
table-level locking
NDB
cluster
high availability
high redundancy
Clustering options
InnoDB storage engine
extensions PostGIS
patitioning
2-tier. The database resides on a remote server and users access it from client
systems.
3-tier. The database resides on a remote server and users access it through an
application server or a middle tier.
Cloud deployments. The database resides in the cloud, and users access it
through an application server layer or another interface that also resides in the
cloud.
Partitioning. Very large tables are split across multiple logical partitions.
There are different classes of database users, who use databases in different ways:
Three main classes of users are Data Engineers, Data Scientists and Business
Analysts, and Application Developers.
Semana 2
Types of SQL statements (DDL vs DML)
DDL (data definition language) statements
define, change or drop data
create
select
Creating tables
visual or UI tools
mysql phpMyAdmin
postgreSQL PGAdmin
administrative APIs
preserve all objects in a data base: schemas, tables, user defined types,
security, data
files:
JSON
load
Loading data
INSERT is not practical for loading thousands of records:
you can load data from different sources and in different formats
s3 object storage
1. source → authenticate
2. target
4. finalize
LAB
Other Contributor(s)
Estimated time needed: 30 minutes In this lab, you will learn
how to create tables and load data in Db2. In this lab, you will
use IBM Db2 Database. Db2 is a Relational Database
https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cl
oud/IBM-DB0110EN-SkillsNetwork/labs/Lab%20-%20Create%2
0Tables%20and%20Load%20Data%20in%20Db2/instructional-l
abs.md.html?origin=www.coursera.org
Summary and highlights
DDL statements, including CREATE, ALTER, TRUNCATE, and DROP, are used
for defining objects like tables in a database.
Most RDBMS provide a GUI through which you can create and alter the
structure of tables.
You can also create and alter tables by using DDL SQL statements:
CREATE TABLE. Creates entities (tables) in a relational database and sets the
attributes (columns) in a table, including the names of columns, the data types of
columns, and constraints (for example, the Primary Key.)
There are utilities that help you to manage the movement of data:
You use the BACKUP and RESTORE utilities to create and recover copies of
entire databases, including all objects like tables, views, constraints, and data.
You use the IMPORT utility to insert data into a specific table from different
formats, such as DEL/CSV, ASC and IXF, and the EXPORT utility to save data
from a specific table into various formats, such as CSV.
You can use the LOAD utilities, instead of INSERT statements, to quickly insert
large amounts of data a variety of different data sources into tables.
The Load Data utility is a simple to use interface in the Db2 Web Console
2. database
3. schema
naming context
1. tables
2. constraints
3. indexes
4. views
A foreign key is a column that contains the same information as the primary key on
another table
Note: ON UPDATE NO ACTION means if any existing row is updated in the foreign key colu
mn of the referencing table (the table containing the foreign key), the update will on
ly be allowed if the new value of the foreign key column exists in the referenced prim
ary key column of the referenced table (the table containing the primary key). Howeve
r, any update on a row of the referenced primary key column of the referenced table is
always rejected if there is the existence of a corresponding row in the referencing fo
reign key column of the referencing table.
Indexes
when you create a primary key, an index is created based on that key, but you can
also create your own indexes
Indexes:
PROS:
improved performance of SELECT queries
reduce need to sort data
Normalization
data duplication leads to inconsistencies
most used
example:
In this example, the Book table contains some basic information about books,
including title,
To normalize this table, you can add an extra row, and split the two formats of
Patterns
So now you have a row for the paperback version, and a row for the hardback
version.
Each cell in the table now has only one entry, and .so the table is in first normal
form.
BEFORE:
AFTER:
create separate tables for sets of values that apply to multiple rows
example
BEFORE:
AFTER:
example
BEFORE:
the Ships from data does not depend on the Primary Key.
To meet the requirements of 3NF, you must separate the Publisher and Ships from
information
AFTER:
constraints
entity integrity
primary key
referential integrity
semantic integrity
null
check
LAB
Exercise 3: Constraints
Estimated time needed: 25 minutes In this lab, you will learn
about normalization, keys, and constraints in IBM Db2 on Cloud
using SQL. First, you will learn how to minimize data
https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cl
oud/IBM-DB0110EN-SkillsNetwork/labs/Lab%20-%20Normalizat
ion%20-%20Keys%20-%20Constraints%20in%20Relational%20
Database/instructional-labs.md.html?origin=www.coursera.org
CREATE TABLE BookShop(
BOOK_ID VARCHAR(4) NOT NULL,
PRICE_USD DECIMAL(6,2) CHECK(Priice_USD>0) NOT NULL
);
Database partitions. You can split very large tables across multiple partitions to
improve performance.
Database objects. Database objects are the items that exist within the
database, such as tables, constraints, indexes, views, and aliases.
Primary keys enforce uniqueness of rows in a table, whereas Foreign keys are
columns in a table that contain the same information as the primary key in
another table.
You can use primary and foreign keys to create relationships between tables.
Relationships between tables reduce redundant data and improve data integrity.
Indexes provide ordered pointers to rows in tables and can improve the
performance of SELECT queries, but can decrease the performance of INSERT,
UPDATE, and DELETE queries.
Normalization reduces redundancy and increases consistency of data. There are two
forms of normalization:
First normal form (1NF). In this form, the table contains only single values and
has no repeating groups.
Second normal form (2NF). This form splits data into multiple tables to reduce
redundancy.
Entity integrity constraint. Ensures that the primary key is a unique value that
identifies each tuple (or row.)
Semana 3
Getting started with MySQL
opensource RDBMS
mysqladmin
USE employees;
load data infile 'employeesdata.csv' into table employees_details #importing data usin
g load data infile
#access that will ask for password = MTIxMzMtbWFqb2Nh -> this password was generated w
hen i run the start_mysql command and it was shown on the terminal
mysql --host=127.0.0.1 --port=3306 --user=root --password
LAB on phpMyAdmin
#that link should be pasted into the address bar in another tab
not null
unique
indexed
null contraints
mysql and mysqladmin command line interfaces. You use these CLIs to run SQL
statements.
phpMyAdmin. An easy to use, third-party web interface for working with MySQL
databases.
API calls.
Use import and export functionality to populate tables and save their data to
files.
When creating foreign keys, you can define ON DELETE and ON UPDATE actions.
MySQL columns are NOT NULL by default.
includes
OLTP
data analytics
TOOLS
pgAdmin
Navicat, DBeaver
\connect employees;
pg_dump employees > employeesbackup.sql #create backup of entire schema and data
wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0110EN-S
killsNetwork/datasets/sakila/sakila_pgsql_dump.sql
\connect sakila;
\include sakila_pgsql_dump.sql;
#to restore the database dump file OUTSIDE of the psql commmand prompt:
wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0110EN
-SkillsNetwork/datasets/sakila/sakila_pgsql_dump.tar
SELECT * FROM store; #retrieve all the records from the store table
\q #quit postgres
#To only dump/backup the table store from the database in non-text format .tar,
pg_dump --username=postgres --host=localhost --password --dbname=sakila --table=store
--format=tar > sakila_store_pgsql_dump.tar
Views
A view is an alternative way of representing data from one or more tables or
other views.
You can interact with views in the same way as you interact with tables,
inserting, updating, and deleting data as required.
Views are a useful way of limiting access to sensitive data, simplifying data
retrieval, and reducing access to underlying tables.
For example, you could create a view to include the name and email columns
from these two tables.
materialized views
FROM employee_details
ON employee_details.empid = employee_contact_info.empid
PostgreSQL includes several options for creating databases and tables, loading and
querying data, and importing and exporting data relational databases:
The psql command line interface. You use this CLI to run SQL statements.
Navicat and Dbeaver. Commercial graphical interface options that you can use
to access PostgresSQL, MySQL, and other types of databases.
Use the Import/Export tool to load data into and export data from tables.
Using views:
You can use views to limit access to sensitive data and simplify data retrieval.
Views can be materialized, which means that the view store the result set for
quicker subsequent access.
Materialized views enhance performance because the view is saved and often
stored in memory. However, you cannot insert, update, or delete rows in a
materialized view, and they must be refreshed before you can see updated data.
Semana 4
Approach to Database Design (Including ERD)
Database design process
Requirements analysis
attributes
Logical design
normalization
Physical design
Project
Scenario
In this scenario, you have recently been hired as a Data Engineer by a New York
based coffee shop chain that is looking to expand nationally by opening a number of
franchise locations. As part of their expansion process, they want to streamline
operations and revamp their data infrastructure.
Your job is to design their relational database systems for improved operational
efficiencies and to make it easier for their executives to make data driven decisions.
In your scenario, you will be working with data from the following sources:
Sales data output as a CSV file from the POS system in the sales outlets
staff
sales_outlet
sales_transaction
Attributes:
transaction_id
transaction_date
transaction_time
staff_id
customer_id
product_id
quantity
price
customer
product
Review the data in the sales transaction table. Note that the transaction id column
does not contain unique values because some transactions include multiple
products.
→ to put the transaction table in second normal form, i create sales_detail table to
put productid and price
Review the data in the product table. Note that the product category and product
type columns contain redundant data.
-- This script was generated by a beta version of the ERD tool in pgAdmin 4.
-- Please log an issue at https://redmine.postgresql.org/projects/pgadmin4/issues/new
if you find any bugs, including reproduction steps.
BEGIN;
END;
Solution:
In PostgreSQL, it’s vitally important that a foreign key references columns that either
are a primary key or form a unique constraint. If your query references a column that
does not have the UNIQUE constraint, you’ll get the “There is no unique constraint
matching given keys for referenced table” error.
Upload and run this SQL code, it creates all the other tables:
-- This script was generated by a beta version of the ERD tool in pgAdmin 4.
-- Please log an issue at https://redmine.postgresql.org/projects/pgadmin4/issues/new
if you find any bugs, including reproduction steps.
BEGIN;
-- The DROP statements have been added to ensure that no errors occur if the script is
mistakenly run more than once
DROP TABLE IF EXISTS public.staff CASCADE;
DROP TABLE IF EXISTS public.sales_outlet CASCADE;
DROP TABLE IF EXISTS public.customer CASCADE;
DROP TABLE IF EXISTS public.sales_detail CASCADE;
DROP TABLE IF EXISTS public.product CASCADE;
DROP TABLE IF EXISTS public.product_type CASCADE;
DROP TABLE IF EXISTS public.sales_transaction CASCADE;
--
END;
The external payroll company have requested a list of employees and the locations
at which they work. This should not include the CEO or CFO who own the company.
In this task, you will create a view in your PostgreSQL database that returns this
information and export the results to a CSV file.
SELECT staff.staff_id,
staff.first_name,
staff.last_name,
staff.location
2. create columns
3. import staff_locations.csv
When I imported the csv, column names were taken as first row so I run this to
delete only that row DELETE FROM product-info LIMIT 1 , case I didint find an
option to take first row as column name.
Final quiz
1. Question 1: Data types define the type of data that can be stored in which part of
a database table?
Columns
5. Question 5: How can the Entity Relationship Diagram (ERD) model be used?
The ERD model helps you to define entities and their attributes, and map them
to tables, and identify the relationships between the tables
6. Question 6: Which of the following sources is NOT supported by the Db2 Web
Console for loading data?
Python Code
7. Question 7: When creating a foreign key, how can you define an action to take if
a parent table row is updated?
NOT create table command and NOT alter table command
10. Question 10: What is the difference between system schemas and user
schemas?
System schemas store configuration information and metadata.
11. Question 11: Which MySQL tool can you use to visually design a MySQL
database?
MySQL workbench
12. Question 12: When creating a MySQL database using phpMyAdmin, at which
point in the process do you define the length of the data in a column?
When you define the columns in the table
13. Question 13: What is the maximum size of a data file you can import with
phpMyAdmin?
14. Question 14: On which operating systems can you install PostgreSQL?
macOs, windows, linux
15. Question 15: Why would you refresh a materialized view in a PostgreSQL
database before you use it?
The materialized view stores the data, so if you need to use the most current
data you should refresh the view first.