Advanced Data Selection
Advanced Data Selection
Advanced Data Selection
SQL function :
COUNT(*)
COUNT(column name)
MIN(column name)
MAX (column name)
SUM(column name)
AVG(column name)
COUNT()
The COUNT(*) and COUNT(column name)
function provides a row count for a table.
SELECT statements using any of these two
aggregate functions can also use two
optional clauses, GROUP BY and HAVING.
SELECT COUNT(*) column list FROM
table name WHERE condition [GROUP BY
column name [HAVING aggregate
condition]]
GROUP BY
Is an additional condition that can be
applied to SELECT statements.
It is normally useful only when an
aggregate function is being used.
The optional HAVING clause allows us
to pick out particular rows where the
COUNT(*) meets some condition
The COUNT(*) function is more efficient
than retrieving the entire data set , for two
reasons:
No data that we don’t need to see has to be
retrieved from the database or sent across a
network
COUNT(*) allows the database to use its
internal knowledge of the table to retrieve the
answer without inspecting any real data rows at
all
Never retrieve data when all you need is a
count of number of rows
Basic use of COUNT(*)
How many customers we have in our
customer table who live in the town of
Bingham.
SELECT COUNT(*) FROM customer
WHERE town=‘Bingham’;
SELECT COUNT(*) FROM customer;
SELECT COUNT(*) AS ‘Total
Customers’ FROM customer;
GROUP BY and COUNT(*)
How many customers live in each town
SELECT COUNT(*), town FROM
customer GROUP BY town;
MySQL orders the result by the column
name list in the GROUP BY clause.
It then keeps a running total of rows,
and each time the town name changes,
it writes a result row, and reset its
counter to zero.
How many customers are in each town
and how many different last names they
have
SELECT COUNT(*), lname, town FROM
customer GROUP BY town, lname;
HAVING and COUNT(*)
HAVING is the equivalent of the
WHERE clause, but only valid for
aggregate functions
We use HAVING to restrict the results
returned to rows where a particular
aggregate condition is true such as
COUNT(*) is > 1.
All towns where we have more than one
customer
SELECT COUNT(*), town FROM
customer GROUP BY town
HAVING COUNT(*) > 1;
COUNT(*), GROUP BY, HAVING
We want to know the last names and
towns of all our customers, except that
we want to exclude Lincoln, and we are
only interested in last names that occur
more than once in any given town
SELECT lname, town FROM customer
WHERE town <> ‘Lincoln’;
SELECT COUNT(*), lname, town FROM
customer WHERE town <> ‘Lincoln’
GROUP BY lname,town;
SELECT COUNT(*), lname, town FROM
customer WHERE town <> ‘Lincoln’
GROUP BY lname,town HAVING
COUNT(*) > 1;
We solved our problem in three stages:
We wrote a simple SELECT statement to
retrieve all the rows we were interested in.
We added a COUNT(*) and a GROUP BY to
count the different lname and town
combinations
Finally we added a HAVING clause to
extract only those rows where the COUNT(*)
was greater than one.
COUNT (column name)
It only counts occurrences in the table
where the value in the specified column
is not NULL.
SELECT customer_id FROM customer
WHERE phone IS NULL;
SELECT COUNT(*) FROM customer;
SELECT COUNT(phone) FROM
customer;
MIN() function
MIN() takes column name parameter
and returns the minimum value found in
that column.
For numeric type columns the result
would be as expected.
For temporal types such as date, it
returns the earliest data from the
column, which might be in the past or
the future.
Ensure that all dates use 4 digit years.
SELECT MIN(shipping) FROM
orderinfo;
SELECT MIN(phone) FROM customer;
Given that NULL generally means
unknown, the function can not know
whether the NULL value is the smallest,
so the MIN() function ignores NULL
values.
Ignoring NULL values is a feature of all
aggregate functions.
MAX() function
Takes a column name parameter and
returns the maximum value found in that
column.
SELECT MAX(shipping) FROM
orderinfo;
SELECT MAX(phone) FROM customer;
SELECT MAX(DATE_ADD(date_placed,
INTERVAL 0 DAY)) FROM orderinfo;
SUM ()function
Takes the name of a numeric column
and provides the total
NULL values are ignored.
SELECT SUM(shipping) FROM
orderinfo;
AVG() function
Takes a column name and returns the
average of the entries
Ignores NULL values
SELECT AVG(shipping) FROM
orderinfo;
The Union Join
Suppose that in the time period between
loading our tcust table with new customer
data and being able to clean it and load it
into our main customer table, we had been
asked for a list of all towns where we had
customers with the new data
We could solve this problem by selecting
the town column from the customer table,
saving it, and then selecting the town from
the tcust table and then saving it again and
then combining the two lists.
SELECT town FROM tcust;
SELECT town from customer;
SELECT town FROM tcust UNION
SELECT town FROM customer;
SELECT town FROM tcust UNION ALL
SELECT town FROM customer;
mySQL has taken the list of towns from
both tables and combined them into a
single list.
It also have removed all duplicates.
If you want to list all the towns including
duplicates, we could have writing
UNION ALL.
SELECT town, zipcode FROM tcust
UNION SELECT town, zipcode FROM
customer;
Self Join
Is used where we want to use a join
between columns that are in the same
table.
Suppose we sell a set of chairs and
table item, and also sell the table and
chairs as individual items. What we
would like to do is store not the
individual items, but also the relationship
between them when they are sold as a
single item (‘parts explosion’)
Create table called part that can hold
only an item_id and its description, but
also a second item_id
CREATE table part (part_id INT,
Description VARCHAR(32),
Parent_part_id INT);
The parent_part_id is used to store the
component id, of which this is a
component.
Suppose we had a table and chairs set,
item_id 1,which was composed of chairs
item_id2, and a table item_id3.
INSERT INTO part(part_id,description,
parent_part_id)
VALUES (1,’table and chairs’, NULL)
INSERT INTO part(part_id,description,
parent_part_id)
VALUES (2, ‘chairs’, 1);
INSERT INTO part(part_id,description,
parent_part_id)
VALUES (1,’table’, 1);
SELECT p1.description, p2.description
FROM part p1,p2
WHERE p1.part_id=p2=parent_part_id;