Dictionaries and Maps
Dictionaries and Maps
Dictionaries and Maps
Hash-Maps in Python
In computer science, a Hash table or a Hashmap is a type of data structure that
maps keys to its value pairs (implement abstract array data types). Suppose we are
given a string or a character array and asked to find the maximum occurring
character. It could be quickly done using arrays.
● We can simply create an array of size 256.
● Initialize this array to zero.
● Traverse the array to increase the count of each character against its ASCII
value in the frequency array.
● In this way, we will be able to figure out the maximum occurring character in
the given string.
The above method will work fine for all the 256 characters whose ASCII values are
known. But what if we want to store the maximum frequency string out of a given
array of strings? It can’t be done using a simple frequency array, as the strings do
not possess any specific identification value like the ASCII values. For this, we will be
using a different data structure called hashmaps.
In hashmaps, the data is stored in the form of keys against which some value is
assigned. Keys and values don’t need to be of the same data type.
If we consider the above example in which we were trying to find the maximum
occurring string, using hashmaps, the individual strings will be regarded as keys,
and the value stored against them will be considered as their respective frequency.
For example, T
he given string array is:
1
Hashmap will look like follows:
“abc” 3
“def” 2
“ab” 1
From here, we can directly check for the frequency of each string and hence figure
out the most frequent string among them all.
Note: O
ne more limitation of arrays is that the indices could only be whole numbers but
this limitation does not hold for hashmaps.
When it comes to Python, Hash tables are used via dictionary i e, the built-in data
type.
Dictionaries
● A Dictionary is a Python’s implementation of a data structure that is more generally
known as an associative array.
● A dictionary is an unordered collection of key-value pairs.
● Indexing in a dictionary is done using these “ keys”.
● Each pair maps the key to its value.
● Literals of the type dictionary a
re enclosed within curly brackets.
● Within these brackets, each entry is written as a key followed by a colon ” : ”, which is
further followed by a value. This is the value that corresponds to the given key.
● These keys must be unique and cannot be any immutable data type. Eg- string,
integer, tuples, etc. They are always mutable.
● The values need not be unique. They can be repetitive and can be of any data type
(Both mutable and immutable)
● Dictionaries are mutable, which means that the key-value pairs can be changed.
2
What does a dictionary look like?
This is the general representation of a dictionary. Multiple key-value pairs are enclosed
within curly brackets, separated by commas.
myDictionary = {
<key>: <value>,
<key>: <value>,
.
.
<key>: <value>
}
Creating A Dictionary
myDict= {"red":"boy", 6
: 4, "name":"boy"}
months= {1:"January", 2 :"February", 3: "March"}
3
This method is particularly useful if we want to create a dictionary with variable keys and all
the keys must have the same value. The values corresponding to all the keys are exactly the
same. This is done as follows:
# We can initialise all the values to a custom value too. This is done by
providing the second argument as the desired value.
>>> d2= dict.fromkeys(["abc",1,"two"],6)
>>> print(d2)
["abc":6 ,1:6, "two":6]
Similar to the list and tuples, this can be done by the square bracket operator [ ].
foo= {"a":1,"b":2,"c":3}
print(foo["a"])
--> 1
print(foo["c"])
--> 3
If we want the value corresponding to any key, we can even use an inbuilt dictionary
method called get.
4
>>> print(foo.get("a"))
1
>>> print(foo.get("c"))
3
A very unique feature about this method is that , in case the desired key is not present in
the dictionary , it won’t throw an error or an exception. It would simply return N
one.
We can make use of this feature in another way. Say we want the method to do the
following action: If the key is present in the dictionary then return the value corresponding
to the key. In case the key is not present, return a custom desired value ( say 0 ).
5
element of the tuple as the key and the second element as the value corresponding to this
key.
bar= {2:1,3:2,4:3}
for t in bar:
print(t)
Out[]:
2
3
4
# Here t is the key in the dictionary and hence when we print t in all
iterations then all the keys are printed.
n bar:
for t i
print(t, bar[t])
Out[]:
2 1
3 2
6
4 3
# Here along with the keys, the values which are bar[t] are printed. In
this loop, the values are accessed using the keys.
If we want to update the value of an already existing key in the dictionary then we can
simply assign the new value to the given key. This is done as follows:
>>> bar["man"]="boy"
>>> print(bar)
{2:1,3:2,4:3,"man":"boy"}
7
Adding or concatenation of two dictionaries:
If we have 2 dictionaries and we want to merge the contents of both the dictionaries and
form a single dictionary out of it . It is done as follows:
a= {1:2,2:3,3:4}
b= {7:2,10:3,6:4}
a.update(b)
print(a)
--> {1:2,2:3,3:4,7:2,10:3,6:4}
In this process, the second dictionary is unchanged and the contents of the second
dictionary are copied into the first dictionary. The uncommon keys from the second
dictionary are added to the first with their corresponding values. However, if these
dictionaries have any common key, then the value of the common key present in the first
dictionary is updated to the new value from the second.
Deleting an entry:
To delete an entry corresponding to any key in a dictionary, we can simply pop the key
from the dictionary. The method used here is .pop(). This method removes the key-value
pair corresponding to any particular key and then returns the value of the removed
key-value pair.
>>> c={1:2,2:(3,23,3),3:4}
>>> c.pop(2)
(3,23,3)
>>> c={1:2,2:(3,23,3),3:4}
>>> c.clear()
>>> print(c)
{}
8
Deleting the entire dictionary:
We can even delete the entire dictionary from the memory by using the d
el keyword. This
would remove the presence of the dictionary. This is similar to tuples and lists.
Now we have a dictionary with unique keys and their respective frequencies. Now we run
another loop to print the keys with the frequency 'k'.
9
dict[i]=1
# Loop for printing the keys with frequency as k
for t in dict:
if dict[t]==k:
print(t)
Now, we want to store the key-value pairs in an array, named bucket array. We
need an integer corresponding to the key so that we can keep it in the bucket array.
To do so, we use a hash function. A hash function converts the key into an integer,
which acts as the index for storing the key in the array.
For example: Suppose, we want to store some names from the contact list in the
hash table, check out the following image:
10
Suppose we want to store a string in a hash table, and after passing the string
through the hash function, the integer we obtain is equal to 10593, but the bucket
array’s size is only 20. So, we can’t store that string in the array as 10593, as this
index does not exist in the array of size 20.
To overcome this problem, we will divide the hashmap into two parts:
● Hash code
● Compression function
The first step to store a value into the bucket array is to convert the key into an
integer (this could be any integer irrespective of the size of the bucket array). This
part is achieved by using a h
ashcode. For different types of keys, we will be having
different kinds of hash codes. Now we will pass this value through the compression
function, which will convert that value within the range of our bucket array’s size.
Now, we can directly store that key against the index obtained after passing
through the compression function.
11
The compression function can be used as (%bucket_size).
But, there is still a possibility that after passing the key through from hash code,
when we give the same through the compression function, we can get the same
values of indices. For example, let s1 = “ab” and s2 = “cd”. Now using the above hash
function for p = 2, h1 = 292 and h2 = 298. Let the bucket size be equal to 2. Now, if
we pass the hash codes through the compression function, we will get:
Collision Handling
We can handle collisions in two ways:
● Closed hashing (or closed addressing)
● Open addressing
In closed hashing, each entry of the array will be a linked list. This means it should
be able to store every value that corresponds to this index. The array position holds
the address to the head of the linked list, and we can traverse the linked list by
using the head pointer for the same and add the new element at the end of that
linked list. This is also known as separate chaining.
On the other hand, in open addressing, we will check for the index in the bucket
array if it is empty or not. If it is empty, then we will directly insert the key-value pair
12
over that index. If not, then will we find an alternate position for the same. To find
the alternate position, we can use the following:
To figure out this f(i), the following are some of the techniques:
1. Linear probing: In this method, we will linearly probe to the next slot until
we find the empty index. Here, f(i) = i.
2. Quadratic probing: A
s the name suggests, we will look for alternate i2
positions ahead of the filled ones, i.e., f(i) = i2.
3. Double hashing: A
ccording to this method, f(i) = i * H(a), where H(a) is
some other hash function.
apNode:
class M
def __init__ (self,key,value):
self.key = key #to store key
self.value = value #to store value
13
self.next = None #to the next pointer
class Map:
def insert(self,key,value):
hc = hash(key)
index = self.getBucketIndex(hc)
head = self.buckets[index]
while head is not None:
if head.key == key:
head.value = value
return
head = head.next
newNode = MapNode(key,value)
newNode.next = head
self.buckets[index] = newNode
self.count+=1
NOTE:
● The hash() method returns the hash value of an object if it has one. Hash
values are just integers that are used to compare dictionary keys during a
dictionary lookup quickly.
14
● Internally, hash() m
ethod calls __hash__() method of an object which is set by
default for any object.
● The syntax of hash() method is:
hash(object)
def remove(self,key):
hc = hash (key) #Getting hashcode
index= self.getBucketIndex(hc) #Finding index corresponding to hc
head = self.buckets[index] #Finding head pointer
prev= None
while head is not None:
if head.key == key:#If found
if prev is None: #Removing
self.buckets[index] = head.next
else:
prev.next= head.next
return head.value
prev =head
head =head.next
return None
15
3. b =
number of buckets. On average, each box contains (n/b) entries. This is
known as load factor (This means b
boxes contain n entries). We also need
to ensure that the load factor is always less than 0.7, i.e., (n/b) < 0.7(This
will ensure that each bucket does not contain too many entries in
it).
4. To make sure that load factor < 0.7, we can’t reduce the number of entries,
but we can increase the bucket size comparatively to maintain the ratio. This
process is known as R
ehashing.
This ensures that time complexity is on an average O(1) f or insertion, deletion, and
search operations each.
Rehashing
Now, we will try to implement the rehashing in our map. After inserting each
element into the map, we will check the load factor. If the load factor’s value is
greater than 0.7, then we will rehash. Refer to the code below for better
understanding.
def rehash(self):
temp= self.buckets #To store the old bucket
self.buckets = [None for i in range(2*self.bucketSize))
self.bucketSize = 2*self.bucketSize #doubling the size
self.count =0
for head in temp:#inserting each value of old bucket to new one
while head is not None:
self.insert(head.key,head.value)
head = head.next
hc = hash(key)
index = self.getBucketIndex(hc)
head = self.buckets[index]
while head is not None:
16
if head.key == key:
head.value = value
return
head = head.next
newNode = MapNode(key,value)
newNode.next = head
self.buckets[index] = newNode
self.count+=1
# Now we will check the load factor after insertion.
loadFactor = self.count/self.bucketSize
if loadFactor>=0.7:
self.refash()
Practice problems:
● https://www.codechef.com/problems/STEM
● https://codeforces.com/problemset/problem/525/A
● https://www.spoj.com/problems/ADACLEAN/
17