A SQL Puzzle

A SQL puzzle in the December 1993 issue of Database Programming & Design magazine
was to use a supplier and parts table to find pairs of suppliers who provide exactly the
same parts. This is the same thing as finding two equal sets. How many ways can you
find to do this problem?
Related Content
This is an excerpt from Joe Celko's SQL Puzzles and Answers, Second Edition by Joe
Celko. Printed with permission from Morgan Kaufmann, a division of Elsevier.
Copyright 2007. For more information about this book, and other similar titles,
Set theory has two symbols for subsets. One is a "horseshoe" on its side,
which means that set A is contained within...
set B and is sometimes called a proper subset. The other is the same
symbol with a horizontal bar under it, which means "contained in or
equal to," which is sometimes called just a subset or containment
operator.
Standard SQL has never had an operator to compare tables against each
other. Several college textbooks on relational databases mention a
CONTAINS predicate that does not exist in standard SQL. Two such
offenders are An Introduction to Data Base Systems by Bipin C. Desai
(West Publishing, 1990, ISBN 0-314-66771-7) and Fundamentals of
Database Systems by Elmasri and Navthe (Benjamin Cummings, 1989,
ISBN 0-8053-0145-3). This predicate used to exist in the original System
R, IBM's first experimental SQL system, but it was dropped from later
SQL implementations because of the expense of running it.
The IN() predicate is a test for membership, not for subsets. For those of
you who remember your high school set theory, membership is shown
with a stylized epsilon with the containing set of the right side, thus .
Membership is for one element, while a subset is itself a set, not just an
element.
Chris Date's puzzle in the December 1993 issue of Database
Programming & Design magazine ("A Matter of Integrity, Part II"
According to Date, December 1993) was to use a supplier and parts table
to find pairs of suppliers who provide exactly the same parts. This is the
same thing as finding two equal sets. Given his famous table:
CREATE TABLE SupParts
(sno CHAR(2) NOT NULL,
pno CHAR(2) NOT NULL,
PRIMARY KEY (sno, pno));
How many ways can you find to do this problem?

Answer #1
One approach would be to do a FULL OUTER JOIN on each pair of
suppliers. Any parts that are not common to both would show up, but
would have generated NULLs in one of the columns derived from the
supplier who was not in the INNER JOIN portion. This tells you which pairs
are not matched, not who is. The final step is to remove these
nonmatching pairs from all possible pairs.
SELECT SP1.sno, SP2.sno
FROM SupParts AS SP1
INNER JOIN
SupParts AS SP2
ON SP1.pno = SP2.pno
AND SP1.sno < SP2.sno
EXCEPT
SELECT DISTINCT SP1.sno, SP2.sno
FULL OUTER JOIN
SupParts AS SP2
AND SP1.sno < SP2.sno)
WHERE SP1.sno IS NULL
OR SP2.sno IS NULL;
This is probably going to run very slowly. The EXCEPT operator is the SQL
equivalent of set difference.
Answer #2
The usual way of proving that two sets are equal to each other is to show
that set A contains set B, and set B contains set A. What you would
usually do in standard SQL would be to show that there exists no
element in set A that is not in set B, and therefore A is a subset of B. So
the first attempt is usually something like this:
FROM SupParts AS SP1, SupParts AS SP2
WHERE SP1.sno < SP2.sno
AND SP1.pno IN (SELECT SP22.pno
WHERE SP22.sno = SP2.sno)
AND SP2.pno IN (SELECT SP11.pno
WHERE SP11.sno = SP1.sno));
Oops, this does not work because if a pair of suppliers has one item in
common, they will be returned.
Answer #3
You can use the NOT EXISTS predicate to imply the traditional test
mentioned in Answer #2.
AND NOT EXISTS (SELECT SP3.pno -- part in SP1 but not in
SP2
WHERE SP1.sno = SP3.sno
AND SP3.pno
NOT IN (SELECT pno
WHERE SP2.sno = SP4.sno))
SP1
AND SP5.pno
NOT IN (SELECT pno

WHERE SP1.sno = SP4.sno));
Answer #4
Instead of using subsets, I thought I would look for another way to do set
equality. First, I join one supplier to another on their common parts,
eliminating the situation where supplier 1 is the same as supplier 2, so
that I have the intersection of the two sets. If the intersection has the
same number of pairs as each of the two sets has elements, then the two
sets are equal.
SELECT SP1.sno, SP2.sno
INNER JOIN
SupParts AS SP2
AND SP1.sno < SP2.sno
GROUP BY SP1.sno, SP2.sno
HAVING (SELECT COUNT(*) -- one to one mapping EXISTS
= (SELECT COUNT(*)
WHERE SP4.sno = SP2.sno);
If there is an index on the supplier number in the SupParts table, it can

provide the counts directly as well as help with the join operation.
Answer #5
This is the same as Answer #4, but the GROUP BY has been replaced
with a SELECT DISTINCT clause:
FROM (SupParts AS SP1
INNER JOIN
SupParts AS SP2
AND SP1.sno < SP2.sno)
WHERE (SELECT COUNT(*)
= (SELECT COUNT(*)
WHERE SP4.sno = SP2.sno);
Answer #6
This is a version of Answer #3, from Francisco Moreno, which has the
NOT EXISTS predicate replaced by set difference. He was using Oracle,
and its EXCEPT operator (called MINUS in their SQL dialect) is pretty fast.
SP2

EXCEPT
SELECT SP4.pno
AND NOT EXISTS (SELECT SP5.pno -- part in SP2 but notin
SP1
FROM SupParts
WHERE SP2.sno =
EXCEPT
SELECT SP6.pno
FROM SupParts
WHERE SP1.sno =
AS SP5
SP5.sno
AS SP6
SP6.sno);
Answer #7
Alexander Kuznetsov once more has a submission that improves the old
"counting matches in a join" approach:
SELECT A.sno, B.sno AS sno1
FROM (SELECT sno, COUNT(*), MIN(pno), MAX(pno)
FROM SubParts GROUP BY sno)
AS A(cnt, min_pno, max_pno)
INNER JOIN
(SELECT sno, COUNT(*), MIN(pno), MAX(pno)
FROM SubParts GROUP BY sno)
AS B(cnt, min_pno, max_pno)
-- four conditions filter out most permutations
ON A.cnt = B.cnt
AND A.min_pno = B.min_pno
AND A.max_pno = B.max_pno
AND A.sno < B.sno
-- Expensive inner select below does not have to execute for
every pair
WHERE A.cnt
= (SELECT COUNT(*)
FROM SubParts AS A1,
SubParts AS B1
WHERE A1.pno = B1.pno
AND A1.sno = A.sno
AND B1.sno = B.sno);
sn sn
=======
ab bb
aq pq
The clever part of this query is that most optimizers can quickly find the
MIN() and MAX() values on a column because they are stored in the
statistics table.
Answer #8
Let's look at notation and some of the usual tests for equality:
((A B) = (B A)) (A = B)
((A B) = (B A)) (A = B)
The first equation is really the basis for the comparisons that use joins.
The second equation is done at the set level rather than the subset level,
and it implies this answer:
SELECT DISTINCT 'not equal'
FROM (SELECT * FROM A)
INTERSECT
SELECT * FROM B)
EXCEPT
(SELECT * FROM A)
UNION
SELECT * FROM B);
The idea is to return an empty set if tables A and B are equal. You have
to be careful about using the ALL clauses on the set operators if you
have duplicates. The good news is that these operators work with rows
and not at the column level, so this template will generalize to any pairs
of union-compatible tables. You do not have to know the column names.

A SQL Puzzle

Uploaded by

Copyright:

Available Formats

A SQL Puzzle

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A SQL Puzzle

Uploaded by

Copyright:

Available Formats

A SQL puzzle in the December 1993 issue of Database Programming & Design magazine

How many ways can you find to do this problem?

NOT IN (SELECT pno

If there is an index on the supplier number in the SupParts table, it can

FROM SupParts AS SP3

You might also like