CS 332: Algorithms: Universal Hashing

CS 332: Algorithms
Universal Hashing
Reminder: Homework 4
Programming assignment:
Implement Skip Lists
Evaluate performance
Extra credit: compare performance to randomly
built BST
Due Mar 9 (Fri before Spring Break)
Monday after break counts as 1 late day
Review: Resolving Collisions

How can we solve the problem of collisions?
Open addressing
To insert: if slot is full, try another slot, and another,
until an open slot is found (probing)

To search, follow same sequence of probes as would be
used when inserting the element
Chaining
Keep linked list of elements in slots
Upon collision, just add new element to list
Review: Chaining
Chaining puts elements that hash to the same
slot in a linked list:

U
(universe of keys)
k6
k2
k5
k8
k1
k4
k5
k2
k1
k4
K
(actual
k7
keys)
k3
k3
k8
k6
k7
Review: Analysis Of Hash Tables

Simple uniform hashing: each key in table is
equally likely to be hashed to any slot

Load factor = n/m = average # keys per slot
Average cost of unsuccessful search = O(1+)
Successful search: O(1+ /2) = O(1+ )
If n is proportional to m, = O(1)
So the cost of searching = O(1) if we size our
table appropriately
Review: Choosing A Hash Function

Choosing the hash function well is crucial
Bad hash function puts all elements in same slot
A good hash function:
Should distribute keys uniformly into slots
Should not depend on patterns in the data
We discussed three methods:

Division method
Multiplication method
Universal hashing
Review: The Division Method

h(k) = k mod m
In words: hash k into a table with m slots using the
slot given by the remainder of k divided by m

Elements with adjacent keys hashed to
different slots: good

If keys bear relation to m: bad
Upshot: pick table size m = prime number not
too close to a power of 2 (or 10)
Review: The Multiplication Method

For a constant A, 0 < A < 1:
h(k) = m (kA - kA )
Fractional part of kA
Upshot:
Choose m = 2P
Choose A not too close to 0 or 1
Knuth: Good choice for A = (5 - 1)/2
Review: Universal Hashing

When attempting to foil an malicious
adversary, randomize the algorithm

Universal hashing: pick a hash function
randomly when the algorithm begins
(not upon every insert!)
Guarantees good performance on average, no
matter what keys adversary chooses

Need a family of hash functions to choose from
Universal Hashing
Let
be a (finite) collection of hash functions
that map a given universe U of keys

into the range {0, 1, , m - 1}.
is said to be universal if:

U,
the number of hash functions h
for which h(x) = h(y) is ||/m
In other words:
for each pair of distinct keys x, y
With a random hash function from , the chance of a collision
between x and y is exactly 1/m
(x y)
Universal Hashing
Theorem 12.3:
Choose h from a universal family of hash functions
Hash n keys into a table of m slots, n m
Then the expected number of collisions involving a particular key
x is less than 1
Proof:
For each pair of keys y, z, let cyx = 1 if y and z collide, 0 otherwise
E[cyz] = 1/m (by definition)
Let Cx be total number of collisions involving key x
n 1
E[C x ] E[cxy ]
Since n m, we
1
yT have E[Cx] < m
yx
A Universal Hash Function

Choose table size m to be prime
Decompose key x into r+1 bytes, so that
x = {x0, x1, , xr}

Only requirement is that max value of byte < m
Let a = {a0, a1, , ar} denote a sequence of r+1 elements
chosen randomly from {0, 1, , m - 1}

Define corresponding hash function ha :
r
With this definition,

has mr+1
ha x
aimembers
xi mod m
i 0
A Universal Hash Function

is a universal collection of hash functions
(Theorem 12.4)
How to use:
Pick r based on m and the range of keys in U

Pick a hash function by (randomly) picking the as
Use that hash function on all keys
Augmenting Data Structures

This course is supposed to be about design and
analysis of algorithms
So far, weve only looked at one design
technique (What is it?)
Augmenting Data Structures

This course is supposed to be about design and
analysis of algorithms
So far, weve only looked at one design
technique: divide and conquer
Next up: augmenting data structures
Or, One good thief is worth ten good scholars
Dynamic Order Statistics

Weve seen algorithms for finding the ith element
of an unordered set in O(n) time

Next, a structure to support finding the ith element
of a dynamic set in O(lg n) time
What operations do dynamic sets usually support?
What structure works well for these?
How could we use this structure for order statistics?
How might we augment it to support efficient extraction
of order statistics?
Order Statistic Trees

OS Trees augment red-black trees:
Associate a size field with each node in the tree
x->size records the size of subtree rooted at x,
including x itself:
M
8
C
5
P
2
A
1
F
3
D
1
Q
1
H
1
Selection On OS Trees
M
8
C
5
P
2
A
1
F
3
D
1
Q
1
H
1
How can we use this property

to select the ith element of the set?
OS-Select
OS-Select(x, i)
{
r = x->left->size + 1;
if (i == r)
return x;
else if (i < r)
return OS-Select(x->left, i);
else
return OS-Select(x->right, i-r);
}

CS 332: Algorithms: Universal Hashing

Uploaded by

Copyright:

Available Formats

CS 332: Algorithms: Universal Hashing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS 332: Algorithms: Universal Hashing

Uploaded by

Copyright:

Available Formats

CS 332: Algorithms

Review: Resolving Collisions

until an open slot is found (probing)

slot in a linked list:

Review: Analysis Of Hash Tables

equally likely to be hashed to any slot

So the cost of searching = O(1) if we size our

Review: Choosing A Hash Function

We discussed three methods:

Review: The Division Method

slot given by the remainder of k divided by m

different slots: good

Review: The Multiplication Method

Review: Universal Hashing

adversary, randomize the algorithm

matter what keys adversary chooses

be a (finite) collection of hash functions

that map a given universe U of keys

is said to be universal if:

With a random hash function from , the chance of a collision

between x and y is exactly 1/m

A Universal Hash Function

x = {x0, x1, , xr}

chosen randomly from {0, 1, , m - 1}

With this definition,

A Universal Hash Function

Pick r based on m and the range of keys in U

Augmenting Data Structures

Augmenting Data Structures

Dynamic Order Statistics

of an unordered set in O(n) time

Order Statistic Trees

How can we use this property

You might also like