Advanced Data Selection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Aggregate functions

 SQL function :
 COUNT(*)
 COUNT(column name)
 MIN(column name)
 MAX (column name)
 SUM(column name)
 AVG(column name)
COUNT()
 The COUNT(*) and COUNT(column name)
function provides a row count for a table.
 SELECT statements using any of these two
aggregate functions can also use two
optional clauses, GROUP BY and HAVING.
 SELECT COUNT(*) column list FROM
table name WHERE condition [GROUP BY
column name [HAVING aggregate
condition]]
GROUP BY
 Is an additional condition that can be
applied to SELECT statements.
 It is normally useful only when an
aggregate function is being used.
 The optional HAVING clause allows us
to pick out particular rows where the
COUNT(*) meets some condition
 The COUNT(*) function is more efficient
than retrieving the entire data set , for two
reasons:
 No data that we don’t need to see has to be
retrieved from the database or sent across a
network
 COUNT(*) allows the database to use its
internal knowledge of the table to retrieve the
answer without inspecting any real data rows at
all
 Never retrieve data when all you need is a
count of number of rows
Basic use of COUNT(*)
 How many customers we have in our
customer table who live in the town of
Bingham.
 SELECT COUNT(*) FROM customer
WHERE town=‘Bingham’;
 SELECT COUNT(*) FROM customer;
 SELECT COUNT(*) AS ‘Total
Customers’ FROM customer;
GROUP BY and COUNT(*)
 How many customers live in each town
 SELECT COUNT(*), town FROM
customer GROUP BY town;
 MySQL orders the result by the column
name list in the GROUP BY clause.
 It then keeps a running total of rows,
and each time the town name changes,
it writes a result row, and reset its
counter to zero.
 How many customers are in each town
and how many different last names they
have
 SELECT COUNT(*), lname, town FROM
customer GROUP BY town, lname;
HAVING and COUNT(*)
 HAVING is the equivalent of the
WHERE clause, but only valid for
aggregate functions
 We use HAVING to restrict the results
returned to rows where a particular
aggregate condition is true such as
COUNT(*) is > 1.
 All towns where we have more than one
customer
 SELECT COUNT(*), town FROM
customer GROUP BY town
HAVING COUNT(*) > 1;
COUNT(*), GROUP BY, HAVING
 We want to know the last names and
towns of all our customers, except that
we want to exclude Lincoln, and we are
only interested in last names that occur
more than once in any given town
 SELECT lname, town FROM customer
WHERE town <> ‘Lincoln’;
 SELECT COUNT(*), lname, town FROM
customer WHERE town <> ‘Lincoln’
GROUP BY lname,town;
 SELECT COUNT(*), lname, town FROM
customer WHERE town <> ‘Lincoln’
GROUP BY lname,town HAVING
COUNT(*) > 1;
 We solved our problem in three stages:
 We wrote a simple SELECT statement to
retrieve all the rows we were interested in.
 We added a COUNT(*) and a GROUP BY to
count the different lname and town
combinations
 Finally we added a HAVING clause to
extract only those rows where the COUNT(*)
was greater than one.
COUNT (column name)
 It only counts occurrences in the table
where the value in the specified column
is not NULL.
 SELECT customer_id FROM customer
WHERE phone IS NULL;
 SELECT COUNT(*) FROM customer;
 SELECT COUNT(phone) FROM
customer;
MIN() function
 MIN() takes column name parameter
and returns the minimum value found in
that column.
 For numeric type columns the result
would be as expected.
 For temporal types such as date, it
returns the earliest data from the
column, which might be in the past or
the future.
 Ensure that all dates use 4 digit years.
 SELECT MIN(shipping) FROM
orderinfo;
 SELECT MIN(phone) FROM customer;
 Given that NULL generally means
unknown, the function can not know
whether the NULL value is the smallest,
so the MIN() function ignores NULL
values.
 Ignoring NULL values is a feature of all
aggregate functions.
MAX() function
 Takes a column name parameter and
returns the maximum value found in that
column.
 SELECT MAX(shipping) FROM
orderinfo;
 SELECT MAX(phone) FROM customer;
 SELECT MAX(DATE_ADD(date_placed,
INTERVAL 0 DAY)) FROM orderinfo;
SUM ()function
 Takes the name of a numeric column
and provides the total
 NULL values are ignored.
 SELECT SUM(shipping) FROM
orderinfo;
AVG() function
 Takes a column name and returns the
average of the entries
 Ignores NULL values
 SELECT AVG(shipping) FROM
orderinfo;
The Union Join
 Suppose that in the time period between
loading our tcust table with new customer
data and being able to clean it and load it
into our main customer table, we had been
asked for a list of all towns where we had
customers with the new data
 We could solve this problem by selecting
the town column from the customer table,
saving it, and then selecting the town from
the tcust table and then saving it again and
then combining the two lists.
 SELECT town FROM tcust;
 SELECT town from customer;
 SELECT town FROM tcust UNION
SELECT town FROM customer;
 SELECT town FROM tcust UNION ALL
SELECT town FROM customer;
 mySQL has taken the list of towns from
both tables and combined them into a
single list.
 It also have removed all duplicates.
 If you want to list all the towns including
duplicates, we could have writing
UNION ALL.
 SELECT town, zipcode FROM tcust
UNION SELECT town, zipcode FROM
customer;
Self Join
 Is used where we want to use a join
between columns that are in the same
table.
 Suppose we sell a set of chairs and
table item, and also sell the table and
chairs as individual items. What we
would like to do is store not the
individual items, but also the relationship
between them when they are sold as a
single item (‘parts explosion’)
 Create table called part that can hold
only an item_id and its description, but
also a second item_id
 CREATE table part (part_id INT,
Description VARCHAR(32),
Parent_part_id INT);
 The parent_part_id is used to store the
component id, of which this is a
component.
 Suppose we had a table and chairs set,
item_id 1,which was composed of chairs
item_id2, and a table item_id3.
 INSERT INTO part(part_id,description,
parent_part_id)
VALUES (1,’table and chairs’, NULL)
 INSERT INTO part(part_id,description,
parent_part_id)
VALUES (2, ‘chairs’, 1);
 INSERT INTO part(part_id,description,
parent_part_id)
VALUES (1,’table’, 1);
 SELECT p1.description, p2.description
FROM part p1,p2
WHERE p1.part_id=p2=parent_part_id;

 SELECT p1.description AS “Combined,


p2.description AS “Parts”
FROM part p1, part p2
WHERE p1.part_id=p2.parent_id
The Outer Join
 All the items that we might sell are held
in the item table, but only the items we
actually stock are held in the stock table.
 Suppose we want to have a list of all
items, indicating the quantity we have in
stock.
 SELECT i.item_id, s.quantity FROM
item I, stock s
WHERE i.item_id = s.item_id;
 There are some item_ids that are
missing.
 The rows that are missing are those
relating to items that we do not stock,
hence the join between the item and
stock table fails for these rows, as the
stock table has no entry for that item.
 What we really wanted was a list of all
the item_ids, including a quantity in
stock figure, even if we had no stock.
 the SQL92 syntax for outer joins
replaces the WHERE clause with by an
ON clause for joining tables, and adds
the LEFT OUTER JOIN keywords.
 SELECT columns FROM table1 LEFT
OUTER JOIN tables2 ON
table1.column=table2.column
 The table name to the left of LEFT
OUTER JOIN is always the preserved
table, the one from which all rows are
shown.
 SELECT i.item_id,s.quantity FROM item
i LEFT OUTER JOIN stock s
ON i.item_id=s.item_id;
 Suppose that we are interested in the
details of all items with a cost price greater
than five, and we also want to know if we
have more than two of the items in stock.
 We want to apply one rule to the item
table(cost price> 5) and a different rule to
the stock table(quantity>5), but we still
want to list all rows from the item table
where the condition on the item table is
true, even if there is no stock at all.
 What we do is combine ON conditions
that work on left outer joined tables only
with WHERE conditions that limit all the
rows returned after the table join has
been performed.
 The condition on the stock table is part
of the outer join. We don’t want to
restrict rows where there is no quantity
 ON i.item_id=s.item_id AND s.quantity>2
 For the item condition, which applies to
all rows, we use a WHERE clause.
 WHERE i.cost_price > 5.0;
 SELECT i.item_id,i.cost_price,s.quantity
FROM item I LEFT OUTER JOIN
Stock s ON i.item_id=s.item_id AND s.quantity
>5
WHERE i.cost_price > 5.0;
 We use a LEFT OUTER JOIN between the
item table and the stock table, so that we
can select data from the item table, even if
no related rows exist in the stock table.
 The condition stock.quantity>2 oprevents
any values appearing in the quantity
column unless there are at least 2 items in
stock.
 The WHERE clause is then applied, which
only allows through rows where the cost
price (from the item table) is greater than
five.

You might also like