Hash Functions
Hash Functions
Hash Functions
• Hashing is a technique used to store and quickly retrieve data in hash tables or
hash maps.
• Searching techniques : Linear and Binary with O(n) and O(log n) complexity
respectively.
• Approach to searching calculates the position of the key in the table based on the
value of the key.
• A function h that can transform a particular key K, be it a string, number, record, or
the like, into an index in the table used for storing items of the same type as K. The
function h is called a hash function.
• h transforms different keys into different numbers, it is called a perfect hash
function.
• Hash functions are much faster when going one way (input → hash).
• Hash functions reduce input to a smaller, fixed-size output.
• If hash functions were reversible, sensitive data like passwords could easily be
retrieved from their hashes.
• Case: When you want to look up a value (like a phone number) based on a key (like
a person's name), the hash function generates an index from the key, and the value
is stored at that index.
• Hashing provides constant-time O(1) searching, insertion, and deletion
operations, making it very fast for large datasets.
• The worth of a hash function depends on how well it avoids collisions.
• For Example: If the variable is “K” made out of 31 letters “z”, the ASCII value of “z”
is 122, so the hash function is:
h(K)=ASCII of character * position of character
h(K)=31×122=3782
Thus, the table needs at least 3,782 cells to accommodate all potential variable
names.
• Collision: two different inputs ("abc" and "acb") produce the same hash value,
leading to both being stored in the same index in a hash table.
Assume, we have 2 string “abc” and “acb” (exclude case sensitivity) then,
'a' = 97
'c' = 99
'b' = 98
sum = 97 + 99 + 98 = 294
So, h("acb")=294
Both "abc" and "acb" generate the same hash value (294), even though they are
different strings.
• Now, improved hash function:
For “abc”:
h("abc")=(97×1)+(98×2)+(99×3)= 97+196+297=590
For “acb”:
h("acb ")=(97×1)+(99×2)+(98×3)= 97+198+294=589
Now, the hash values for "abc" and "acb" are different, preventing the collision.
Hash Functions
• When assigning positions to “n” items in a table of “m” positions, the number of possible
hash functions is given by:
mn
n=
m=
For 50 elements and a table with 100 positions (cells), the number of possible hash
functions is:
10050=10100
This means there are 10¹⁰⁰ possible hash functions that could map the 50 items into the
100-cell table.
1. Division
• The primary requirement of a hash function is to return a valid index within the
size of the hash table.
• For example, if a hash table has 100 cells, the hash function must return a number
between 0 and 99 (i.e., a valid index in the table).
• One of the simplest ways to ensure that the hash function returns a valid index is
by using the modulus (mod) operation.
• The formula (for TSize is Prime) is:
h(K) = K mod TSize
Shift Folding:
• In shift folding, the key is divided into several parts, and these parts are
processed in a sequential manner.
• The parts can be added, and the sum can be divided by the table size (TSize) to
get the hash value.
• Example:
Consider a Social Security Number (SSN): 123-456-789
Alternatively,
Let’s divide it into five parts:
part 1: 12
part 2: 34
part 3: 56
part 4: 78
part 5: 9
Add all parts together: 12 + 34 + 56 + 87 + 9 = 189
if the table size (TSize) is 1000 then modulus the summation of parts by TSize
to get hash index: 189 mod 1000 = 189
• In boundary folding, the key is divided into parts, and every other part is
reversed before processing.
• Imagine the key is written on a piece of paper, and you fold it along the
boundary between parts to reverse certain sections.
• Example:
Using the same SSN: 123-456-789
Let’s divide it into three parts:
part 1: 123
part 2: 456
part 3: 789
Reverse every alternative part and sum all of them
part 2: 654 (reverse of 456)
Add all parts together: 123 + 654 + 789 = 1566
if the table size (TSize) is 1000 then modulus the summation of parts by TSize
to get hash index: 1566 mod 1000 = 566
Alternatively,
Let’s divide it into five parts:
part 1: 12
part 2: 34
part 3: 56
part 4: 78
part 5: 9
Reverse every alternative part and sum all of them
part 2: 43 (reverse of 34)
part 4: 87 (reverse of 78)
Add all parts together: 12 + 43 + 56 + 87 + 9 = 207
if the table size (TSize) is 100 then modulus the summation of parts by TSize to
get hash index: 207 mod 100 = 7
• In both shift folding and boundary folding the key is usually divided into even
parts of some fixed size plus some remainder and then added.
• A bitwise version of shift folding can use the XOR (exclusive-or) operation to
combine the parts.
• This is efficient and often used for hashing strings.
• It is suitable for large datasets or real-time applications where speed is
important.
• Example:
Consider the
• bcd" and TSize is 1000:
Alternatively,
Consider the string "abcd" and TSize is 100:
3. Mid-Square Function
• The Mid-Square Hashing Method is a technique where the key is squared, and
the middle part of the result is used to generate a hash value (or address).
• This method works by ensuring that the entire key participates in the hashing
process, which increases the likelihood of different keys producing different
addresses.
• By squaring the key, the resulting number is larger, which ensures more variation in
the middle digits, reducing the chances of collisions between different keys.
• Example:
It is more efficient to use a power of 2 for the table size. If we assume the table
size is 1,024 (which is a power of 2, 0-1023), then we work with binary
representations.
Convert the binary middle part 0101000010 to its decimal equivalent is:
01010000102 = 322
In some universities, international student IDs all start with the number 999.
Since this portion is common to all international students, the first three digits
(999) can be excluded.
All books from the same publisher have a common prefix in their ISBN (e.g., 1133
for Cengage Learning).
If a hash table only contains books from one publisher, the common prefix (1133)
can be omitted from the hashing process.
This leaves the unique part of the ISBN to be used for computing the hash value.
• This method is easy to implement and simple also without any processing of key.
5. Radix Transformation
• The key K is transformed into another number base and then the transformed
number is used to compute the hash value or address.
• Example: Decimal to Nonal (Base 9) Conversion
If the key K=345 (in decimal), it can be converted to its value in base 9 (nonal
system).
The decimal number 345 is equal to 423 in base 9.
9 345 3
38 2
4
After the transformation, the resulting value is divided modulo TSize (size of the hash
table), and the remainder is used as the hash address.
If TSize is 100, the transformed base-9 value 423 is divided by 100, giving:
423 mod 100 = 23
Now, the hash value is 23.
|H| |keys|
Assume we have a set of keys and choose a prime p=17, and a hash table
size TSize=10.
o Example: