AAD Unit3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 100

Unit 3- Search & Sort

-Ms. Anjali Deshpande


Searching

• Process of locating a specific element or item within a


collection of data.
• Determine whether the desired element exists within the
data, and if so, to identify its precise location or retrieve it
Why do we need searching?
• Searching is one of the core computer science algorithms.
• We know that today’s computers store a lot of information.
• To retrieve this information proficiently we need very
efficient searching algorithms.
• Efficiency: Efficient searching algorithms improve program
performance.
• Data Retrieval: Quickly find and retrieve specific data from
large datasets.
• Database Systems: Enables fast querying of databases.
• Problem Solving: Used in a wide range of problem-solving
tasks.
Applications of Search Algorithms
• Information Retrieval: to retrieve relevant information from
vast amounts of data on the web.
• Database Systems: retrieving specific data records based on user
queries,
• E-commerce: to find products quickly based on their
preferences, specifications, or keywords.
• Networking: for routing packets efficiently through networks,
finding optimal paths, and managing network resources.
• Artificial Intelligence: problem-solving, game playing (e.g.,
chess), and decision-making processes
• Pattern Recognition: pattern matching tasks, such as image
recognition, speech recognition, and handwriting recognition.
Search Techniques
• Sequential search
• Binary search
• Fibonacci search
• Hashed search
• Index sequential search
Linear Search
• The linear search is a sequential search
• It compares each element with the value being
searched for, and stops when either the value is
found, or the end of the array is encountered.
• Uses a loop to step through an array, starting with
the first element.
• If the value being searched is not in the array, the
algorithm will unsuccessfully search to the end of
the array.
Linear Search
• Since the array elements are stored in linear order
searching the element in the linear order make it
easy and efficient.
Linear Search: Algorithm
Start
Initialize index = 0
While index < n:
If arr[index] == x:
return index
End If
Increment index by 1
End While
Return -1 (Element not
found)
End
Linear Search: Pseudocode

function linear_search(arr, target):


for each element in arr:
if element == target:
return the index of element
return -1 // target not found
Linear Search: Advantages

It operates on
Easy to Easy to Time
both sorted and
understand. implement complexity ?
unsorted list
Linear Search: Disadvantages
Linear search is not efficient when list is large

Maximum no. of comparison are n(i.e. n elements).

Not suitable for large problems.

You need to search whole list.

Linear search is slower.


Analysis of Linear Search
Binary Search
• “Binary search is a searching algorithm which is used to find
element from the sorted list”
Properties of Binary Search
1.Works on Sorted Arrays:
1. Binary search requires the input array to be sorted. If the array is not sorted, the algorithm cannot guarantee
correct results.
2.Divide and Conquer Approach:
1. Binary search works by repeatedly dividing the search interval in half. It compares the target value to the middle
element of the array and decides whether to search in the left or right half of the array.
3.Time Complexity:
1. The time complexity of binary search is O(log n), where n is the number of elements in the array. This logarithmic
time complexity makes binary search much faster than linear search, especially for large arrays.
4.Space Complexity:
1. The space complexity of binary search is O(1)for the iterative version since it requires a constant amount of extra
space.
2. The recursive version has a space complexity of O(log n) due to the recursion stack.
Binary Search
Assume that two variables are declared, variable first and last, they denote
beginning and ending indices of the list under consideration, respectively.
Step 1. Algorithm compares key with middle element from
list ( A[middle] == key ), if true go to step 4 or else go to next.
Step 2. if key < A[ middle ], search in left half of the list or
else go to step 3
Step 3. if key > A[ middle ], search in right half of the list or go to step 1
Step 4. display the position of key else display message “NOT FOUND”
Binary Search
Binary Search: Advantages
1.Binary search is optimal searching algorithm

2.Excellent time efficiency

3.Suitable for large list.

4.Faster because no need to check all elements.

5.Most suitable for sorted array

6.It can be search quickly

7.Time complexity O(log n)


Binary Search: Disadvantages
1.Elements must be sorted

2.Need to find mid element every iteration

3.Bit more complicated to implement and test

4.It does not support random access.

5.Key element is required to be compared with middle


Sentinel Search
• Sentinel Search is a linear search technique used in computer
science, often employed in situations where performance optimization
is needed over a traditional linear search.
• In a linear search, every element in the array is compared with the
target element until a match is found, or the end of the array is
reached.
• However, this can be slow, especially for large arrays.
• Sentinel Search introduces a small optimization by placing an
additional "sentinel" element at the end of the array, which is equal
to the target element being searched for.
• This ensures that the search will always find a match, thus
eliminating the need to check the array bounds on every iteration,
which can slightly improve performance.
Sentinel Search
1. Add a Sentinel: The target element is temporarily placed
at the end of the array as a sentinel.
2. Search: The algorithm performs a linear search from the
start of the array.
3. Check Sentinel: When the target is found, the algorithm
checks whether the found position is within the original
array bounds. If it's the sentinel at the end, it means the
target was not in the array.
4. Restore Array: The sentinel element is removed, restoring
the array to its original state.
Sentinel Search
Let's consider an array arr = [10, 20, 30, 40, 50] and we want to
find the element 30.
Step 1: Add Sentinel: The array becomes [10, 20, 30, 40, 50, 30]
with 30 added as the sentinel at the end.
Step 2: Search: Start searching from the first element:
Compare 10 with 30 -> No match
Compare 20 with 30 -> No match
Compare 30 with 30 -> Match found
Step 3: Check Sentinel: The algorithm checks if the found position
is within the bounds of the original array. Since it is, the search
is successful, and the index 2 is returned.
Step 4: Restore Array: The sentinel is removed, and the array is
back to [10, 20, 30, 40, 50].
Sentinel Search: Pros

1. Slightly Faster: It reduces the number of comparisons by


avoiding the need to check array bounds in each
iteration.
2. Simple to Implement: It's a straightforward
enhancement of linear search with minimal additional
logic.
3. No Extra Space: Unlike some other search algorithms, it
doesn't require significant additional memory.
Sentinel Search: Cons
1. Limited Improvement: The performance gain is marginal,
especially with small arrays or when the target is near
the start of the array.
2. Array Modification: The need to temporarily modify the
array to add the sentinel might not be desirable in
certain scenarios, especially with large or immutable
arrays.
3. Not Suitable for All Data Structures: Sentinel search is
primarily applicable to arrays or similar data structures
and isn't useful for linked lists or other non-contiguous
memory structures.
Fibonacci Search

• Fibonacci search is a method of searching a sorted array


using a divide and conquer algorithm that narrows down
possible locations with the aid of Fibonacci numbers.
• Compared to binary search where the sorted array is divided
into two equal-sized parts, one of which is examined further,
Fibonacci search divides the array into two parts that have
sizes that are consecutive Fibonacci numbers.
Fibonacci Search
Fibonacci Search

Step 1
• The size of the input array is 10. The smallest Fibonacci number greater than 10 is 13.
• Therefore, Fk = 13, Fk-1 = 8, Fk-2 = 5.
• We initialize offset = -1

Fibonacci Sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89…


• Step 2
• In the first iteration, compare it with the element at
i = minimum (offset + Fk-2, n – 1) = minimum (-1 + 5, 9) =
minimum (4, 9) = 4.
• The fourth element in the array is 20, which is not a
match and is less than the key element.
• Step 3
S>A[i]
F(k)=F(k-1)=8
Offset=i=4
i = minimum (offset + Fk-2, n – 1) = minimum (4+3, 9) =
minimum (7, 9) = 7.
Element at the 7th index of the array is 43, which is not a
match and is also lesser than the key.
• Step 4
S<A[i]
F(k)=F(k-2)=3
i = minimum (offset + Fk-2, n – 1) = minimum (4+1, 9) =
minimum (5, 9) = 5.
The element at index 5 in the array is 24, which is our
key element. 5th index is returned as the output for this
example array.
Fibonacci Search

Time Complexity
• Best Case: O(1), when the target element is at the Fibonacci
index being checked.
• Average and Worst Case: 𝑂(log𝑛), similar to Binary Search, as
Fibonacci numbers grow exponentially and the array is
divided logarithmically.
Fibonacci Search: Pros

Efficient for Large Arrays: Like Binary Search, it works well for
large sorted arrays, offering logarithmic time complexity.
Less Dependent on Uniform Distribution: Fibonacci Search
doesn't require the array to be evenly distributed, making it
more adaptable than Interpolation Search.
Better for Unknown or Large Array Sizes: It doesn’t require
knowledge of the array size in advance, which can be an
advantage in some scenarios.
Fibonacci Search: Cons

Requires Sorted Array: It only works on sorted arrays.


More Complex than Binary Search: Although it provides similar
performance to Binary Search, Fibonacci Search is slightly more
complex to implement.
Less Known: Due to its complexity and the fact that it provides
only marginal benefits over Binary Search in most cases, it is
less commonly used.
Sorting

• “Sorting is the process of ordering list of elements in either


ascending or descending order.”
• Sorting is the operation of arranging the records of a table
according to the key value of each record, or it can be defined
as the process of converting an unordered set of elements to
an ordered set of elements
• Sorting is a process of organizing data in a certain order to
help retrieve it more efficiently
Sorting: Types
• Any sort algorithm that uses main memory exclusively
during the sorting is called as internal sort algorithm
• Internal sorting is faster than external sorting
• Internal Sorting techniques :
1. Bubble sort
2. Selection sort
3. Insertion sort
4. Quick sort
5. Shell sort
6. Heap sort
7. Radix sort
8. Bucket sort
Sorting: Types
• Any sort algorithm that uses external memory, such as tape or
disk, during the sorting is called as external sort algorithm
• Merge sort is an external sorting algorithm

• An in-place sorting algorithm is a sorting algorithm that sorts


the input array in place, without using any additional memory.
This means that the algorithm does not need to create a new
array to store the sorted elements. In-place sorting algorithms
are often used in applications where memory is limited.
Examples?
• A sorting algorithm is said to be stable if two objects with equal
keys appear in the same order in sorted output as they appear in
the input array to be sorted (original position).
• Stable sorting algorithms are: Insertion sort, Merge Sort, Bubble Sort, etc.
• Unstable sorting algorithms are: Selection Sort, Quick Sort, Heap Sort
Sorting: Types

• A sorting algorithm is said to be stable if two objects with equal


keys appear in the same order in sorted output as they appear in
the input array to be sorted.
• Some sorting algorithms are stable by nature like Insertion sort,
Merge Sort, Bubble Sort, etc.
• Unstable sorting algorithms are Selection Sort, Quick Sort, Heap
Sort
Bubble Sort
• Bubble sort is a simple sorting algorithm.
• This sorting algorithm is comparison-based algorithm in
which each pair of adjacent elements is compared, and the
elements are swapped if they are not in order.
• This algorithm is not suitable for large data sets as its
average and worst-case complexity are of Ο(n2) where n is
the number of items.
• How Bubble Sort Works?
• Bubble sort start with first two elements, compare them to
check which one is greater. And swap it.
Bubble Sort: Pseudocode

DEMO:
https://www.w3schools.com/dsa/dsa_algo_bubblesort.php
Bubble Sort: Properties

• Stable
• In-place
• Internal sort
• O(n2): Time complexity
Bubble Sort: Example
• Given the following array. Sort it using bubble sort and show
array contents after each pass.
Selection Sort
Selection Sort

Demo:
https://www.w3schools.com/dsa/dsa_algo_selectionsort.php
Selection Sort: Example
• Given the following array. Sort it using selection sort and
show array contents after each pass.
Selection Sort
• Time complexity: O(n2)
• Properties?
Insertion Sort
• Insertion sort is a simple sorting algorithm that works by
iteratively inserting each element of an unsorted list into its
correct position in a sorted portion of the list.
• It is a stable sorting algorithm, meaning that elements with
equal values maintain their relative order in the sorted
output.
• It is considered an ” in-place ” sorting algorithm, meaning it
doesn’t require any additional memory space beyond the
original array.
Insertion Sort
STEPS
1. We start with second element of the array as first element
in the array is assumed to be sorted.
2. Compare second element with the first element and check if
the second element is smaller then swap them.
3. Move to the third element and compare it with the second
element, then the first element and swap as necessary to
put it in the correct position among the first three
elements.
4. Continue this process, comparing each element with the
ones before it and swapping as needed to place it in the
correct position among the sorted elements.
5. Repeat until the entire array is sorted.
Insertion Sort: Cards Example
Insertion Sort
Insertion Sort
Shell Sort
• Shell sort is mainly a variation of Insertion Sort.
• In insertion sort, we move elements only one position ahead.
When an element has to be moved far ahead, many
movements are involved.
• The idea of Shell Sort is to allow the exchange of far items.
• In Shell sort, we make the array h-sorted for a large value of
h.
• We keep reducing the value of h until it becomes 1.
• An array is said to be h-sorted if all sublists of every h’th
element are sorted.
Shell Sort
Step 1 − Start
Step 2 − Initialize the value of gap size, say h.
Step 3 − Divide the list into smaller sub-part. Each must have
equal intervals to h.
Step 4 − Sort these sub-lists using insertion sort.
Step 5 – Repeat this step 2 until the list is sorted.
Step 6 – Print a sorted list.
Step 7 – Stop.
Shell Sort
The interval between the elements is reduced based on the
sequence used. Some of the optimal sequences that can be used
in the shell sort algorithm are:
Shell Sort: Example
Shell Sort: Example
Shell Sort: Example
Shell Sort: Example
Shell Sort
Shell Sort: Analysis
1. Best Case:
• Time Complexity: O(nlogn)
• Explanation:
• The best case occurs when the array is already sorted, or nearly sorted.
• In this situation, the algorithm requires fewer comparisons and shifts as the elements
are already in their correct positions.
• The insertion sort portion of Shell Sort will have a minimal number of operations, leading
to a logarithmic number of gap comparisons.
Shell Sort: Analysis
2. Average Case:
• Time Complexity: O(n3/2)
• Explanation:
• On average, the algorithm will perform more operations than in the best case but fewer than in the
worst case.
• The average case performance is generally better than quadratic time complexity (i.e.O(n^2)) because
the initial large gaps allow for efficient element movement early in the sorting process.
3. Worst Case:
• Time Complexity: O(n^2)
• Explanation:
• The worst case occurs in situations where the gap sequence doesn't adequately address the ordering of
elements, leading to many comparisons and shifts.
• An example of a worst-case scenario could be when the elements are in reverse order or in some specific
patterns that result in poor performance.
• In this case, the algorithm performs like an insertion sort with a time complexity of O(n^2), as the
smaller gaps (especially the final gap of 1) need to sort nearly the entire array.
Efficiency: Shell sort has faster average-case
performance than many other sorting algorithms such
as bubble sort and selection sort.

In-place sorting: Shell sort can be implemented as an in-


Shell Sort:
place sorting algorithm, meaning that it does not
require additional memory space for sorting.
Advantages

Adaptive: Shell sort is adaptive in nature, meaning that


it can adjust its performance depending on the input
data. For example, if the input data is almost sorted, the
algorithm will perform faster than if the data is
completely unsorted.
Complexity: Shell sort has a more complex
implementation compared to simpler sorting
algorithms such as bubble sort and selection sort.

Not stable: Shell sort is not a stable sorting


Shell Sort:
algorithm, meaning that it does not preserve the Disadvantages
order of equal elements in the input data.

Non-optimal worst-case performance: While Shell


sort performs better than many sorting algorithms
on average, its worst-case performance is still
O(n^2), which could be more optimal.
Quick Sort
Quick Sort
Quick Sort
Quick Sort: Analysis
1. Best Case:
• Time Complexity: O(nlogn)
• Explanation:
• The best case occurs when the pivot element is always the median or close to the median. This results in
the array being evenly split into two nearly equal halves at each recursive step.
• As a result, the depth of the recursion tree is logn, and each level of the tree performs O(n) work, leading
to an overall time complexity of O(nlogn)
2. Average Case:
• Time Complexity: O(nlogn)
• Explanation:
• The average case assumes that the pivot element is not consistently the best or worst, but rather, it
splits the array into fairly even parts more often than not.
• The depth of the recursion tree in the average case is also logn, with each level requiring O(n) work,
resulting in an average-case time complexity of O(nlogn)
• This performance is observed in randomly ordered arrays, which is why Quick Sort is often chosen as the
default sorting algorithm in many standard libraries.
Quick Sort: Analysis
3. Worst Case:
• Time Complexity: O(n2)
• Explanation:
• The worst case occurs when the pivot element is consistently the smallest or largest
element in the array, leading to highly unbalanced partitions.
• For example, if the array is already sorted (or reverse sorted) and the first or last element
is always chosen as the pivot, the algorithm will create partitions of size n−1 and 0 at
each step.
• This results in a recursion tree with depth n(linear), and each level performs O(n) work,
leading to a worst-case time complexity of O(n2).
It is a divide-and-conquer algorithm that makes it easier
to solve problems.

It is efficient on large data sets.

It has a low overhead, as it only requires a small amount Advantages


of memory to function.
of Quick
It is Cache Friendly as we work on the same array to sort Sort:
and do not copy data to any auxiliary array.

Fastest general-purpose algorithm for large data when


stability is not required.

It is tail recursive and hence all the tail call


optimization can be done.
It has a worst-case time complexity of O(n 2 ),
which occurs when the pivot is chosen poorly.

Disadvantages
It is not a good choice for small data sets. of Quick Sort:

It is not a stable sort, meaning that if two elements


have the same key, their relative order will not be
preserved in the sorted output in case of quick sort,
because here we are swapping elements according to
the pivot’s position (without considering their original
positions).
Merge Sort
• Merge sort is a sorting technique based on divide and conquer technique.
With Average case and worst-case time complexity being Ο(n log n), it is one
of the most respected algorithms.
• The process of merge sort is to divide the array into two halves, sort each
half, and then merge the sorted halves back together. This process is
repeated until the entire array is sorted.
• Divide: Divide the list or array recursively into two halves until it can no more
be divided.
• Conquer: Each subarray is sorted individually using the merge sort algorithm.
• Merge: The sorted subarrays are merged back together in sorted order. The
process continues until all elements from both subarrays have been merged.
Merge Sort
Merge Sort
MERGE_SORT(arr, beg, end)
if beg < end
set mid = (beg + end)/2
MERGE_SORT(arr, beg, mid)
MERGE_SORT(arr, mid + 1, end)
MERGE (arr, beg, mid, end)
end of if
END MERGE_SORT
Merge Sort: Analysis

• Time Complexity:
• Best Case: O(n log n), When the array is already sorted or nearly sorted.
• Average Case: O(n log n), When the array is randomly ordered.
• Worst Case: O(n log n), When the array is sorted in reverse order.
• Auxiliary Space: O(n), Additional space is required for the temporary
array used during merging.
Stability : Merge sort is a stable sorting algorithm,
which means it maintains the relative order of equal
elements in the input array.

Guaranteed worst-case performance: Merge sort has


a worst-case time complexity of O(N logN) , which
means it performs well even on large datasets. Advantages
of Merge
Simple to implement: The divide-and-conquer Sort:
approach is straightforward.

Naturally Parallel : We independently merge subarrays


that makes it suitable for parallel processing.
Space complexity: Merge sort requires additional
memory to store the merged sub-arrays during the
sorting process.

Not in-place: Merge sort is not an in-place sorting Disadvantages


algorithm, which means it requires additional memory
to store the sorted data. This can be a disadvantage in of Merge Sort:
applications where memory usage is a concern.

Slower than QuickSort in general. QuickSort is more


cache friendly because it works in-place.
Radix Sort
• Radix Sort is a linear sorting algorithm that sorts elements
by processing them digit by digit.
• It is an efficient sorting algorithm for integers or strings with
fixed-size keys.
• Rather than comparing elements directly, Radix Sort
distributes the elements into buckets based on each digit’s
value.
• By repeatedly sorting the elements by their significant digits,
from the least significant to the most significant, Radix Sort
achieves the final sorted order.
• Radix sort is a non-comparative sorting algorithm that is
better than the comparative sorting algorithms. It has linear
time complexity that is better than the comparative
algorithms with complexity O(n logn).
Radix Sort: Algorithm
radixSort(arr)
max = largest element in the given array
d = number of digits in max
Now, create d buckets of size 0 - 9
for i -> 0 to d
sort the array elements using counting sort (or any stable sort) according to
the digits at the ith place
Radix Sort

Pass 1:
Radix Sort

Pass 2:
Radix Sort

Pass 3:
Final Sorted Array
Radix sort complexity
• Best Case Complexity –
It occurs when there is no sorting required, i.e. the array is
already sorted. The best-case time complexity of Radix sort is
Ω(n+k).
• Average Case Complexity –
It occurs when the array elements are in jumbled order that is
not properly ascending and not properly descending. The
average case time complexity of Radix sort is θ(nk).
• Worst Case Complexity –
It occurs when the array elements are required to be sorted in
reverse order. That means suppose you have to sort the array
elements in ascending order, but its elements are in descending
order. The worst-case time complexity of Radix sort is O(nk).
Counting sort
• Counting Sort is a non-comparison-based sorting
algorithm.
• It is particularly efficient when the range of input
values is small compared to the number of elements to
be sorted.
• The basic idea behind Counting Sort is to count the
frequency of each distinct element in the input array
and use that information to place the elements in their
correct sorted positions.
Counting sort: Algorithm
1. Declare an auxiliary array countArray[] of
size max(inputArray[])+1 and initialize it with 0.
2. Traverse array inputArray[] and map each element of inputArray[] as
an index of countArray[] array,
i.e.,execute countArray[inputArray[i]]++ for 0 <= i < N.
3. Calculate the prefix sum at every index of array inputArray[].
4. Create an array outputArray[] of size N.
5. Traverse array inputArray[] from end and update outputArray[
countArray[ inputArray[i] ] – 1] = inputArray[i]. Also,
update countArray[ inputArray[i] ] = countArray[ inputArray[i] ]- – .
Counting sort: Example
Bucket sort
• Bucket sort is a sorting technique that involves dividing
elements into various groups, or buckets formed by
uniformly distributing the elements.
• Once the elements are divided into buckets, they can be
sorted using any other sorting algorithm (scatter-gather
approach)
• Especially used in the case of floating-point numbers or
when the input is uniformly distributed
• Finally, the sorted elements are gathered in an ordered
fashion.
Bucket sort: Algorithm
Bucket Sort(A[])
1. Let B[0....n-1] be a new array
2. n=length[A]
3. for i=0 to n-1
4. make B[i] an empty list
5. for i=1 to n
6. do insert A[i] into list B[n*a[i]]
7. for i=0 to n-1
8. do sort list B[i] with insertion-sort
9. Concatenate lists B[0], B[1],........, B[n-1] together in order
End
Bucket sort: Example
Sorted Array:
Comparative Analysis

You might also like