4 - SQL For Data Science
4 - SQL For Data Science
4 - SQL For Data Science
IBM - DB0201EN
Introduction to Databases
SQL (Structured Query Language) is a language used for relational databases to query
or get data out of a database.
So SQL is a language used for a database to query data. But what is data and what is
a database?
Data is a collection of facts in the form of words, numbers, or even pictures.
Data is one of the most critical assets of any business. It is used and collected
practically everywhere.
Data is important; so, it needs to be secure, and it needs to be stored and accessed
quickly. The answer is a database.
A database is a repository of data. It is a program that stores data.
A database also provides the functionality for adding, modifying, and querying that
data.
When data is stored in tabular form, the data is organized in tables like in a
spreadsheet, which is columns and rows.
That's a relational database.
In a relational database, you can form relationships between tables. So a database is a
repository of data.
A set of software tools for the data in the database is called a database management
system or DBMS for short.
The terms database, database server, database system, data server, and database
management systems are often used interchangeably.
For relational databases, it's called a relational database management system or
RDBMS.
Examples of relational database management systems are:
my SQL;
Oracle Database,
IMB DB2 Warehouse;
IBM DB2 on Cloud.
For the majority of people using a database, there are five simple commands:
create a table;
insert data to populate the table;
select data from the table;
update data in the table;
delete data from the table.
So those are the building blocks for SQL for data science.
Página 1 de 10
SELECT Statement
So, after creating a relational database table and inserting data into the table, we want
to see the data.
To see the data, we use the SELECT statement.
The SELECT statement is a data manipulation language statement.
Data Manipulation Language statements or DML statements are used to read and
modify data.
The SELECT statement is called a query, and the output we get from executing this
query is called a result set or a result table.
In its simplest form, a SELECT statement is select star from table name:
In the book entity example, select star from book gives the result set of four rows.
All the data rows for all columns in the table book are displayed.
In addition, you can also retrieve all the rows for all columns by specifying the column
names individually in the SELECT statement.
You don't always have to retrieve all the columns in a table.
You can retrieve just a subset of columns.
Página 2 de 10
You don't always have to retrieve all the columns in a table. You can retrieve just a
subset of columns.
If you want, you can retrieve just two columns from the table book.
However, what if we want to know the title of the book whose book_id is B1.
Relational operation helps us in restricting the result set by allowing us to use the
clause WHERE.
The WHERE clause always requires a predicate. A predicate is conditioned evaluates
to true, false or unknown.
Predicates are used in the search condition of the WHERE clause. So, if we need to
know the title of the book whose book_id is B1, we use the WHERE clause with the
predicate book_id equals B1.
Página 3 de 10
The previous example used comparison operator equal to in the WHERE clause.
There are other comparison operators supported by a relational database management
system:
COUNT is a built-in database function that retrieves the number of rows that match the
query criteria.
For example, get the total number of rows in a given table:
Let's say you create a table called MEDALS which has a column called COUNTRY,
and you want to retrieve the number of rows where the medal recipient is from Canada.
You can issue a query like this:
Página 4 de 10
DISTINCT is used to remove duplicate values from a result set.
Example, to retrieve unique values in a column:
In the MEDALS table mentioned earlier, a country may have received a gold medal
multiple times.
Example, retrieve the list of unique countries that received gold medals.That is,
removing all duplicate values of the same country.
The third expression is LIMIT, LIMIT is used for restricting the number of rows retrieved
from the database.
Example, retrieve just the first 10 rows in a table.
Example, retrieve just a few rows in the MEDALS table for a particular year.
Retrieve the first 15 rows from the "FilmLocations" table starting from row 11.
Página 5 de 10
INSERT Statement
After table is created, the table needs to be populated with data. To insert data into a
table, we use the INSERT statement.
The INSERT statement is used to add new rows to a table.
The INSERT statement is one of the data manipulation language statements.
Data manipulation language statements or DML statements are used to read and
modify data.
The syntax of the INSERT statement looks like this:
In this statement, table name identifies the table, the column name list identifies each
column in the table, and the values clause specifies the data values to be added to the
columns in the table.
Tables do not need to be populated one row at a time. Multiple rows can be inserted by
specifying each row in the values clause.
In the values clause, each row is separated by a comma.
Página 6 de 10
UPDATE and DELETE Statements
The UPDATE statement is one of the data manipulation language or DML statements.
DML statements are used to read and modify data.
The syntax of the UPDATE statement looks like this:
In this example, you want to update the first name and last name of the author with
Author_Id A2 from Rav Ahuja to Lakshmi Katta.
Sometime later, there might be a need to remove one or more rows from a table. The
rows are removed with the DELETE statement.
The DELETE statement is one of the data manipulation language statements used to
read and modify data.
The syntax of the DELETE statement looks like this:
Página 7 de 10
Based on the author entity example, we want to delete the rows for author ID A2 and
A3.
Let's look at an example:
The relational model is the most used data model for databases because this model
allows for data independence.
Data is stored in a simple data structure, tables: this provides logical data
independence, physical data independence, and physical storage independence.
An entity relationship data model, or ER data model, is an alternative to a relational
data model.
Using a simplified library database as an example, this figure shows an entity
relationship diagram or ERD that represents entities called tables and their
relationships.
Página 8 de 10
The ER model is used as a tool to design relational databases.
In the ER model, entities are objects that exist independently of any other entities in the
database. The building blocks of an ER diagram are entities and attributes.
An entity can be a noun: person, place, or thing.
In an ER diagram, an entity is drawn as a rectangle. Entities have attributes which are
the data elements that characterize the entity.
Attributes tell us more about the entity. In an ER diagram, attributes are drawn as
ovals.
Each table is assigned a primary key. The primary key of a relational table uniquely
identifies each tuple or row in a table, preventing duplication of data and providing a
way of defining relationships between tables.
Tables can also contain foreign keys which are primary keys defined in other tables,
creating a link between the tables.
Common data types include characters such as Char and Varchar, numbers such as
integer and decimal, and timestamps including date and time.
Página 9 de 10
How to Create a Database Instance on Cloud
SQL Statements are used for interacting with Entities (that is, tables), Attributes (that is,
columns) and their tuples (or rows with data values) in relational databases. SQL
statements fall into two different categories:
Data Definition Language statements
Data Manipulation Language statements.
Data Definition Language (or DDL) statements are used to define, change, or drop
database objects such as tables.
Common DDL statement types include CREATE, ALTER, TRUNCATE, and DROP.
CREATE: which is used for creating tables and defining its columns;
ALTER: is used for altering tables including adding and dropping columns and
modifying their datatypes;
TRUNCATE: is used for deleting data in a table but not the table itself;
DROP: is used for deleting tables.
Data Manipulation Language (or DML) statements are used to read and modify data in
tables.
These are also sometimes referred to as CRUD operations, that is, Create, Read,
Update and Delete rows in a table.
Common DML statement types include INSERT, SELECT, UPDATE, and DELETE.
INSERT: is used for inserting a row or several rows of data into a table;
SELECT: reads or selects row or rows from a table;
UPDATE: edits row or rows in a table;
And DELETE: removes a row or rows of data from a table.
Página 10 de 10