PDSA Week 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Quicksort

08 October 2021 19:58

Shortcomings of merge sort


Merge needs to create a new list to hold the merged elements
• No obvious way to efficiently merge two lists in place
• Extra storage can be costly
Inherently recursive
• Recursive calls and returns are expensive
Merging happens because elements in the left half need to move to the right half and vice versa
• Consider an input of the form [0,2,4,6,1,3,5,9]
Can we divide the list so that everything on the left is smaller than everything on the right?
• No need to merge!
Divide and conquer without merging
Suppose the median of L is m
Move all values <= m to the left half of L
• Right half has values > m
Recursively sort left and right halves
• L is now sorted, no merge!
Recurrence: T(n) = 2T(n/2) + n
• Rearrange in a single pass, time O(n)
So T(n) is O(n log n)
How do we find the median?
• Sort and pick up the middle element
• But our aim is to sort the list!
Instead pick some value in L - pivot
• Split L with respect to the pivot element
Quicksort [C. A. R. Hoare] (A - Anthony)
Choose a pivot element
• Typically the first element in the array
Partition L into lower and upper parts with respect to the pivot
Move the pivot between the lower and upper partition
Recursively sort the two partitions

Partitioning
Scan the list from left to right
Four segments: Pivot, Lower, Upper, Unclassified

Week 3 Page 1
Four segments: Pivot, Lower, Upper, Unclassified
Examine the first unclassified element
• If it is larger than the pivot, extend Upper to include this element
• If it is less than or equal to the pivot, exchange with the first element in Upper. This extends Lower
and shifts Upper by one position
Pivot is always the first element
Maintain two indices to mark the end of the Lower and Upper segments
After partitioning, exchange the pivot with the last element of the Lower segment
Quicksort code
def quicksort(L,l,r): # Sort L[l:r]
if (r - l <= 1):
return(L)
(pivot,lower,upper) = (L[l],l+1,l+1)
for i in range(l+1,r):
if L[i] > pivot: # Extend upper segment
upper = upper+1
else: # Exchange L[i] with start of upper segment
(L[i], L[lower]) = (L[lower], L[i])
# Shift both segments
(lower,upper) = (lower+1,upper+1)
# Move pivot between lower and upper
(L[l],L[lower-1]) = (L[lower-1],L[l])
lower = lower-1
# Recursive calls
quicksort(L,l,lower)
quicksort(L,lower+1,upper)
return(L)

Summary
Quick sort uses divide and conquer, like merge sort
By partitioning the list carefully, we avoid a merge step
• This allows an in place sort
We can also provide an iterative implementation to avoid the cost of recursive calls
The partitioning strategy we described is not the only one used in the literature
• Can build the lower and upper segments from opposite ends and meet in the middle
Need to analyse the complexity of quick sort

Week 3 Page 2
Analysis of quick sort
08 October 2021 20:19

Analysis
Partitioning with respect to the pivot takes time O(n)
If the pivot is the median
• T(n) = 2T(n/2) + n
• T(n) is O(n log n)
Worst case? Pivot is maximum or minimum
• Partitions are of size 0, n-1
• T(n) = T(n-1) + n
= … = n(n+1)/2
• T(n) is O(n^2)
Already sorted array: worst case!
However, average case is O(n log n)
Sorting is a rare situation where we can compute this
• Values don’t matter, only relative order is important
• Analyze behaviour over permutations of {1, 2, …, n}
• Each input permutation equally likely
Expected running time is O(n log n)
Randomization
Any fixed choice of pivot allows us to construct worst case input
Instead, choose pivot position randomly at each step
Expected running time is again O(n log n)
Iterative quicksort
Recursive calls work on disjoint segments
• No recombination of results is required
Can explicitly keep track of left and right endpoints of each segment to be sorted
Quicksort in practice
In practice, quick sort is very fast
Very often the default algorithm used for in-built sort functions
• Sorting a column in a spreadsheet
• Library sort function in a programming language
Summary
The worst case complexity of quicksort is O(n^2)
However, the average case is O(n log n)
Randomly choosing the pivot is a good strategy to beat worst case inputs
Quicksort works in-place and can be implemented iteratively
Very fast in practice, and often used for built-in sorting functions
• Good example of a situation when the worst case upper bound is pessimistic

Week 3 Page 3
Concluding remarks on sorting algorithms
08 October 2021 20:32

Stable sorting
Often list values are tuples
• Rows from a table, with multiple columns / attributes
• A list of students, each student entry has a roll number, name, marks, …
Suppose students have already been sorted by roll number
If we now sort by name, will all students with the same name remain in sorted order with respect to roll
number?
Stability of sorting is crucial in many applications
Sorting on column B should not disturb sorting on column A
The quicksort implementation we described is not stable
• Swapping values while partitioning can disturb existing sorted order
Merge sort is stable if we merge carefully
• Do not allow elements from the right to overtake elements on the left
• While merging, prefer the left list while breaking ties
Other criteria
Minimizing data movement
• Imagine each element is a heavy carton
• Reduce the effort of moving values around
Best sorting algorithm?
Quicksort is often the algorithm of choice, despite O(n^2) worst case
Merge sort is typically used for "external" sorting
• Database tables that are too large to store in memory all at once
• Retrieve in parts from the disk and write back
Other O(n log n) algorithms exist - heap sort - later discussed
Sometimes hybrid strategies are used
• Use divide and conquer for large n
• Switch to insertion sort when n becomes small ( e.g. n < 16)

Week 3 Page 4
Difference between Lists and Arrays (Theory)
08 October 2021 20:44

Sequences
Two basic ways of storing a sequence of values
• Lists
• Arrays
What is the difference?
Lists
• Flexible length
• Easy to modify the structure
• Values are scattered in memory
Arrays
• Fixed size
• Allocate a contiguous block of memory
• Supports random access
Lists
Typically a sequence of nodes
Each node contains a value and points to the next node in the sequence
• Linked list
Easy to modify
• Inserting and deletion is easy via local "plumbing"
• Flexible size

Need to follow links to access A[i]


• Takes time O(i)
Arrays
Fixed size, declared in advance
Allocate a contiguous block of memory
• n times the storage for a single value
Random access
• Compute offset to A[i] from A[0]
• Accessing A[i] takes constant time, independent of i

Week 3 Page 5
Inserting and deleting elements is expensive
• Expanding and contracting requires moving O(n) elements in the worst case
Operations
Exchange A[i] and A[j]
• Constant time for arrays
• O(n) for lists
Delete A[i], insert v after A[i]
• Constant time for lists if we are already at A[i]
• O(n) for arrays
Need to keep implementation in mind when analyzing data structures
• For instance, can we use binary search to insert in a sorted sequence?
• Either search is slow, or insertion is slow, still O(n)
Summary
Sequences can be stored as lists or arrays
Lists are flexible but accessing an element is O(n)
Arrays support random access but are difficult to expand, contract
Algorithm analysis needs to take into account the underlying implementation
How does it work in Python?
• Is the built-in list type in Python really a "linked" list?
• Numpy library provides arrays - are these faster than lists?

Week 3 Page 6
Designing a flexible list and operations on the same
08 October 2021 21:02

Implementing lists in Python


Python class Node
A list is a sequence of nodes
• self.value is the stored value
• self.next points to next node
Empty list?
• self.value is None
Creating lists
• l1 = Node() - empty list
• l2 = Node(5) - singleton list
• l1.isempty() == True
• l2.isempty() == False
Appending to a list
• Add v to the end of list l
• If l is empty, update l.value from None to v
• If at last value, l.next is None
○ Point next at new node with value v
• Otherwise, recursively append to rest of list
• Iterative implementation
○ If empty, replace l.value by v
○ Loop through l.next to end of list
○ Add v at the end of the list
Insert at the start of the list
• Want to insert v at head
• Create a new node with v
• Cannot change where the head points!
• Exchange the values v0 and v
• Make new node point to head.next
• Make head.next point to new node
Delete a value v
• Remove first occurrence of v
• Scan list for first v - look ahead at next node
• If next node value is v, bypass it
• Cannot bypass the first node in the list
○ Instead, copy the second node value to head
○ Bypass second node
• Recursive implementation
• Exersice: iterative implementation

Code
class Node:
def __init__(self, v = None):
self.value = v
self.next = None
return
def isempty(self):
if self.value == None:

Week 3 Page 7
if self.value == None:
return(True)
else:
return(False)
def append(self, v):
# append, recursive
if self.isempty():
self.value = v
elif self.next == None:
self.next = Node(v)
else:
self.next.append(v)
return
def appendi(self, v):
# append, iterative
if self.isempty():
self.value = v
return
temp = self
while temp.next != None:
temp = temp.next
temp.next = Node(v)
return
def insert(self, v):
if self.isempty():
self.value = v
return
newnode = Node(v)

# Exchange values in self and newnode


(self.value, newnode.value) = (newnode.value, self.value)

# Switch links
(self.next, newnode.next) = (newnode, self.next)
return
def delete(self, v):
# delete, recursive

if self.isempty():
return
if self.value == v:
self.value = None
if self.next != None:
self.value = self.next.value
self.next = self.next.next
return
else:
if self.next != None:
self.next.delete(v)
if self.next.value == None:
self.next = None
return
Summary
• Use a linked list of nodes to implement a flexible list

Week 3 Page 8
• Use a linked list of nodes to implement a flexible list
• Append is easy
• Insert requires some care, cannot change where the head points to
• When deleting, look one step ahead to bypass the node to be deleted

Week 3 Page 9
Implementation of Lists in Python
08 October 2021 22:08

Lists in Python
Python lists are not implemented as flexible linked lists
Underlying interpretation maps the list to an array
• Assign a fixed block when you create a list
• Double the size if the list overflows the array
Keep track of the last position of the list in the array
• l.append() and l.pop() are constant time, amortised - O(1)
• Insertion / deletion require time O(n)
Effectively, Python lists behave more like arrays than lists
Arrays vs Lists in Python
Arrays are useful for representing matrices
In list notation, these are nested lists
0 1
1 0
[ [0, 1], [1, 0] ]
Need to be careful when initializing a multidimensional list
zerolist = [0,0,0]
zeromatrix = [zerolist,zerolist,zerolist]

zeromatrix[1][1] = 1
print(zeromatrix)
Output:
[[0, 1, 0], [0, 1, 0], [0, 1, 0]]
Mutability aliases different values
Instead, use list comprehension
zeromatrix = [ [ 0 for i in range(3) ] for j in range(3) ]
Numpy arrays
The Numpy library provides arrays as a basic type
import numpy as np
zeromatrix = np.zeros(shape=(3,3))
Can create an array from any sequence type
newarray = np.array([[0, 1], [1, 0]])
arange is the equivalent of range for lists
row2 = np.arange(5)
Can operate on a matrix as a whole
C = 3*A + B
C = np.matmul(A,B)
• Very useful for data science
Summary
Python lists are not implemented as flexible linked structures
Instead, allocate an array, and double space as needed
Append is cheap, insert is expensive
Arrays can be represented as multidimensional lists, but need to be careful about mutability, aliasing
Numpy arrays are easier to use

Week 3 Page 10
Implementation of dictionaries in python
08 October 2021 23:31

Dictionary
• An array / list allows access through positional indices
• A dictionary allows access through arbitrary keys
○ A collection of key-value pairs
○ Random access - access time is the same for all keys
• How is a dictionary implemented?
Implementing a dictionary
• The underlying storage is an array
○ Given an offset i, find A[i] in constant time
• Keys have to be mapped to {0, 1, …, n-1}
○ Given a key k, convert it to an offset i
• Hash function
○ h: S --> X maps a set of values S to a small range if integers X = {0, 1, …, n-1}
○ Typically |X| << |S|, so there will be collisions, h(s) = h(s'), s != s'
○ A good hash function will minimize collisions
○ SHA-256 is an industry standard hashing function whose range is 256 bits
 Use to hash large files - avoid uploading duplicates to cloud storage
Hash table
• An array A of size n combined with a hash function h
• h maps keys to {0, 1, …, n-1}
• Ideally, when we create an entry for key k, A[h(k)] will be unused
○ What if there is already a value at that location?
• Dealing with collisions
○ Open addressing (closed hashing)
 Probe a sequence of alternate slots in the same way
○ Open hashing
 Each slot in the array points to a list of values
 Insert into the list for the given slot
• Dictionary keys in Python must be immutable
○ If key changes, hash also changes!
Summary
A dictionary is implemented as a hash table
• An array plus a hash function
Creating a good hash function is important (and hard!)
Need a strategy to deal with collisions
• Open addressing/closed hashing - probe for free space in the array
• Open hashing - each slot in the hash table points to a list of key-value pairs
• Many heuristics/optimizations possible for data

Week 3 Page 11
Difference between Lists and Arrays (Implementation)
08 October 2021 23:45

Worst case odd numbers even numbers searching


Using python lists
Naive search - ~10 seconds
Binary search - ~0.9 seconds

Using Numpy lists


Naive search - ~19 seconds
Binary search - ~0.2 seconds

Selection sort slows down with numpy arrays (insertion sort also)
Merge sort also slows down with numpy arrays

Hence, lists are preferable for such kind of searchs and sorts
For matrix operations or Graphs or 2D representation, numpy array is more convenient

Week 3 Page 12
Programming Assignments
08 October 2021 23:53

PPA 1

Solution:
class Node:
def __init__(self, data):
self.data = data
self.next = None
self.prev = None

class doubly_linked_list:
def __init__(self):
self.head = None
self.last = None
def insert_end(self,data):
newnode = Node(data)
newnode.prev = self.last
if self.head == None:
self.head = newnode
self.last = newnode
else:
self.last.next = newnode
self.last = newnode
def delete_end(self):
if(self.head != None):

Week 3 Page 13
if(self.head != None):
if self.head == self.last:
self.head = None
self.last = None
else:
self.last = self.last.prev
self.last.next = None
def traverse(self):
temp = self.head
while temp != None:
if temp.next != None:
print(temp.data, end=',')
else:
print(temp.data)
temp = temp.next
def traverse_rev(self):
temp = self.last
while temp != None:
if temp.prev != None:
print(temp.data, end=',')
else:
print(temp.data)
temp = temp.prev

ins = eval(input())
delt = int(input())
A = doubly_linked_list()
for i in ins:
A.insert_end(i)
for j in range(delt):
A.delete_end()
A.traverse()
A.traverse_rev()

PPA 2

Week 3 Page 14
Code:
class Hashing:
def __init__(self,c1,c2,m):
self.hashtable = []
for i in range(m):
self.hashtable.append(None)
self.c1 = c1
self.c2 = c2
self.m = m
def hashfunction(self,data):
i=0
key = (data % self.m + self.c1*i + self.c2*(i**2)) % self.m
while self.hashtable[key] != None and i < self.m:
key = (data % self.m + self.c1*i + self.c2*(i**2)) % self.m
i=i+1
return key
def store_data(self,data):
if self.hashtable.count(None) != 0:
key = self.hashfunction(data)
self.hashtable[key]=data
else:
print('Hash table is full')
def display_hashtable(self):
return self.hashtable
c1 = int(input())
c2 = int(input())
m = int(input())
data=eval(input())

Week 3 Page 15
data=eval(input())
A = Hashing(c1,c2,m)
for i in data:
A.store_data(i)
print(A.display_hashtable())

GrPA 1

Code:
def DishPrepareOrder(order_list):
d = {}
for item in order_list:
if item not in d:
d[item] = 1
else:
d[item] += 1
d2 = {}
for item,val in d.items():
if val not in d2:
d2[val] = [item]
else:
d2[val].append(item)
l = []
for item in reversed(sorted(d2)):
for i2 in sorted(d2[item]):
l.append(i2)
return l
nums = eval(input())
print(DishPrepareOrder(nums))

Solution:
def insertionsort(L): #use this because it is stable sort
n = len(L)
if n < 1:
return(L)
for i in range(n):

Week 3 Page 16
for i in range(n):
j=i
while(j > 0 and L[j][1] > L[j-1][1]):
(L[j],L[j-1]) = (L[j-1],L[j])
j = j-1
return(L)

def DishPrepareOrder(order_list):
order_count = {}
R = []
for order in order_list:
if order in order_count:
order_count[order] += 1
else:
order_count[order] = 1
for ID in sorted(order_count.keys()):
R.append((ID,order_count[ID]))
R=insertionsort(R)
Res = []
for tup in R:
Res.append(tup[0])
return Res
nums = eval(input())
print(DishPrepareOrder(nums))

GrPA2

Solution:
class create_stack:
def __init__(self):
self.stack = []
def push(self,d):
self.stack += [d]
def pop(self):
t = self.stack[-1]
self.stack = self.stack[:-1]

Week 3 Page 17
self.stack = self.stack[:-1]
return t

def EvaluateExpression(exp):
opt = ['+','-','*','/','**']
stk = create_stack()
L = exp.split(' ')
for i in L:
if i not in opt:
stk.push(i)
else:
b = stk.pop()
a = stk.pop()
res = eval(a + i + b)
stk.push(str(res))
return stk.pop()
print(float(EvaluateExpression(input())))

GrPA 3

Solution:
def reverse(root):
if (root.isempty()):
return root
temp = root
prev = None

Week 3 Page 18
prev = None
while (temp):
next, temp.next = temp.next, prev
prev, temp = temp, next
return prev

Week 3 Page 19
Assignments
09 October 2021 00:03

Practice assignment
1.

a.

Week 3 Page 20
b.

c.

d.

Accepted answers:

Week 3 Page 21
2. Which of the correct solution in the options will be faster than others.[MCQ] (This Question is the
followup question of Question number 1)

a.

Week 3 Page 22
b.

c.

d.

Accepted answers:

Week 3 Page 23
3. What is the effect of method operation(in_string) ?

a. Reverse the input string.

b. Replace the middle character of string with last character.

c. Create a string omitting first character

d. Create a new string with original string but of length shorter than the original string .

4. Which of the following codes can be used as a STACK ?[MSQ]

Week 3 Page 24
4. Which of the following codes can be used as a STACK ?[MSQ]

a.

b.

Week 3 Page 25
c.

d.

Accepted answers:

Week 3 Page 26
5. Consider a simple hash function h(x) = x mod 7, where x mod 7 returns the remainder when x is
divided by 7. Our hash table will be an array Ah of size 7, with array blocks referenced by indices 0
to 6 . To insert an element x in hash table we put it at index h(x) . For example to insert 12 we will
insert it at index 5 in empty array Ah , since h(12) = 5 . For resolving collisions we will use open
addressing(closed hashing), where if for an element a position is already occupied we probe the
next available slot sequentially in the array and insert the element. We insert the elements 2, 5,
11, 10, 4 in the same order in an empty hash table Ah using the above hash function and collision
resolution method. What will be the index in Ah at which 4 will be inserted.
Accepted answer:
Numeric: 6
6. What does the function operation(head) do? head is the first node of the linked list, where each
node is an object of class Node , and the function returns True or False .

Week 3 Page 27
a. Check if linked list contains loop or not.

b. Check if linked list is palindrome or not.

c. Search if a particular element is present in linked list.

d. Check if linked list is in sorted order.

Graded assignment
1. Consider the Quick sort algorithm in which the partitioning procedure splits elements into two
sub-arrays and each sub-array contains at least one-third of the elements. Let T(n) be the number
of comparisons required to sort array of n elements
a. T(n) <= 2 T(n/3) + cn

b. T(n) <= T(n/3) + T(2n/3) + cn

c. T(n) <= 2 T(2n/3) + cn

d. T(n) <= T(n/3) + T(3n/4) + cn

2. Which function correctly inserts an element after x nodes in a linked list, where each node of
linked list is an object of class Node and x>=0 ? In the options below,
○ head is the first node in the linked list and value is the value that needs to be inserted after x
nodes.
○ The function insertAtBeginning(head, v) inserts a node at the beginning of the list with value
v.

Week 3 Page 28
a.

b.

c.

Week 3 Page 29
d.

Accepted answer:

3. Which of the following code can be used as a queue?

a.

b.

c.

Week 3 Page 30
c.

d.

Accepted answer:

4. Consider a Quicksort implementation where we first find the median then use the median as a
pivot. Assume that we have a O(n) time to find the median of an unsorted list. What will be the
worst-case time complexity of this modified Quicksort?
a. O(n)

b. O(log n)

c. O(n log n)

d. O(n^2)

Week 3 Page 31
5.

The above function can be used as partitioning function for sorting an array through Quicksort.
What will be the state of Array after the call 'partition([13, 18, 8, 10, 21, 7, 2, 32, 6, 19], 0, 9)'
return?
a. [8, 10, 7, 2, 6, 13, 18, 21, 32, 19]

b. [2, 6, 7, 8, 10, 13, 18, 19, 21, 32]

c. [6, 8, 10, 7, 2, 13, 18, 21, 32, 19]

d. [7,6, 8, 10, 2, 13, 21, 32, 18, 19]

6. Linear probing is an open addressing scheme in computer programming for resolving hash
collisions in hash tables. Linear probing operates by taking the original hash index and adding
successive values linearly until a free slot is found.

An example sequence of linear probing is:


h(k)+0, h(k)+1, h(k)+2, h(k)+3 .... h(k)+m-1
where m is a size of hash table, and h(k) is the hash function.

Hash function
Let h(k) = k mod m be a hash function that maps an element k to an integer in [0, m−1],
where m is the size of the table. Let the ith probe position for a value k be given by the function
h'(k,i) = (h(k) + i) mod m

The value of i = 0, 1, . . ., m – 1. So we start from i = 0, and increase this until we get a free block in
hash table.

Consider inserting the keys 24, 36, 58, 65, 62, 79 into a hash table of size m = 11 using linear
probing, the primary hash function is h(k) = k mod m. What will be the hash table after inserting all
keys in given order? Suppose initial hash table is:
[None,None,None,None,None,None,None,None,None,None,None]
a. [None,None,24,36,58,None,79,62,None,65,None]

b. [None,None,24,36,58,79,None,62,None,None,65]

c. [None,24,None,36,58,79,None,62,None,None,65]

d. [None,None,24,36,58,None,79,62,None,None,65]

Week 3 Page 32
d. [None,None,24,36,58,None,79,62,None,None,65]

7. In a quicksort algorithm, we choose the last element in the array as pivot for partitioning. For
which of the following arrays will this algorithm exhibit the worst-case behavior?
a. [2, 3, 5, 7, 8, 13, 34, 46, 67]

b. [67, 46, 34, 13, 8, 7, 5, 3, 2]

c. [3, 5, 34, 2, 13, 67, 7, 46, 8]

d. [8, 46, 7, 67, 13, 2, 34, 5, 3]

8. In a linked list, what is the asymptotic worst-case running time for finding the size of the list?
a. O(1)

b. O(log n)

c. O(n log n)

d. O(n)

9. For a Quicksort algorithm on an input size n, where we always select the first element as pivot,
how many calls are required to the partitioning procedure in worst-case and average-case,
respectively?
a. log n, n

b. log n, log n

c. n, log n

d. n, n

10. Consider two lists L1 and L2 both containing n integers. We need to find if lists L1 and L2 are
disjoint. Two lists are disjoint, if there is no integer common in both the lists.
For example.
1. L1 = [5, 2, 7 , 1] and L2 = [3, 5], these two lists are not disjoint as 5 is common in both
lists.
2. L1 = [2, 7, 1] and L2 = [3, 5], these two lists are disjoint as there is no integer common in
both.
Select the most efficient solution or solutions for this in terms of asymptotic running time
complexity from the below list.
a. For every integer in L1 search if it is present in L2 or not.

b. Sort list L1, then for each integer i in L2, use binary search to find if i is present in L1 or not.

c. Sort both lists L1 and L2, then use merge like process to compare integers in both the lists.

d. Sort list L1, then for each integer i in L1 search if i is present in L2 or not.

Week 3 Page 33

You might also like