Sorting and Searching
Sorting and Searching
Sorting and Searching
4.1. Sorting
The concept of arranging a set of values using any order relation is called sorting.
Use the following declarations and definitions as a general setting and the following general
assumption
1. The list is to be sorted in ascending order
2. The list is merely a list of keys, not of records containing fields and other associated
data
#define MAX 40
enum boolean{false,true};
void swap(int &x,int &y)
{
int temp;
temp=x;
x=y;
y=temp;
}
Algorithm
for(i=0;i<s-1;i++)
for(j=0;j<s-i;j++)
if(a[j]>a[j+1]) Outer
Inner Loop
(swap) Loop
…..
If we consider the comparison at the top of the inner loop, we note that it will be executed first n-
1 times, then n-2 times, and so on down to one time for the final execution of the loop. Hence,
the number of comparisons will be the sum of the sequences of numbers
n-1
n-2
n-3
.
.
.
1
i.e.
n-1
∑ (n-i)
i=1
From algebra we know that this sum will be computed to be n(n-1)/2. Therefore we conclude
that the bubble sort is an O(n2) algorithm
It is wise to note that the above algorithm is data independent. This means that even when the
input list is an already sorted one the bubble sort goes through all the above comparisons. This
inefficiency of the bubble sort can be improved so that it will never process an already sorted
data.
As you can see from the above algorithm by introducing the Boolean variable found to short-
circuit the outer loop, we can avoid potentially many unnecessary comparisons if the algorithm
receives a list which is already sorted. This will reduce the number of comparisons required by
bubble sort to O(n) magnitude but only in the best case when it gets data nearly ordered to start
with. In the worst case situation, this technique will in fact make the algorithm less efficient since
the outer loop will not be short-circuited, and now the Boolean variable must be tested in addition
to the other comparisons that are being made. Thus, regardless of tricks that we may introduce, we
can not claim with any certainty that bubble sort is better than an O(n2) algorithm for random data.
b) Insertion Sort
Idea
On the ith path the first i elements are sorted with respect to each other
Ex:
Given: 9 3 7 2 8
J=1 J=0
I=1 9 3
3 9
7 7
2 2
8 8
J=2 J=1 J=0
I=2 3 3 3
9 7 7
7 9 9
2 2 2
8 8 8
J=3 J=2 J=1 J=0
I=3 3 3 3 3
7 7 7 7
9 9 9 9
2 2 2 2
8 8 8 8
Algorithm
if(a[j]<a[j-1])
{
swap(a[j],a[j-1]);
j--;
}
else
done=true;
}
}
Big-O Analysis
The loop structure of the insertion sort is as follows
for(i=1;i<s;i++)
{
j=i;
done=false;
while((j>=i)&&(!done))
if(a[j]<a[j-1])
Outer
Loop
(swap) Inner
Loop
Here, if the inner loop is never short-circuited by the Boolean variable done, the comparison
appearing as its first statement will be executed once for the first execution of the outer loop, and
then twice, and so on, reaching n-1 executions on the final pass. Hence we have a situation virtually
identical to our preliminary analysis of the bubble sort. That is the number of comparisons can be
bounded by n2/2 and the algorithm is therefore O(n2) . Of course, with the insertion sort, the hope
is that the setting of the Boolean variable done in the else clause can frequently reduce the number
of comparisons made by the inner loop. However, there can be many data sets for which this will
have little or no effect. Hence, as with bubble sort, we are forced into concluding that insertion
sort cannot guarantee better than O(n2) comparison.
c) Selection Sort
Idea
On the ith path through the least it will determine the position of the largest entry among
a[0], a[1], a[2], …, a[last]
Where last=s-i-1
Ex: Given: 9 3 7 2 8
Selection sort works as follows
maxpos=0
last=4
J=1 J=2
I=2 2 2 2
3 3 3
7 7 7
8 8 8
9 9 9
maxpos=0
last=1
J=1
I=3 2 2
3 3
7 7
8 8
9 9
Algorithm
for(i=0;i<s-1;i++)
{
last=s-i-1;
maxpos=0;
Outer
for(j=1;j<=last;j++) Loop
Inner
if(a[j]>a[maxpos]) Loop
maxpos=j;
( swap)
….
Observe that the first time the inner loop is executed, the comparison in the if statement
will be made n-1 times. Then it will be made n-2 times, n-3 times… and finally just one time. This
is precisely the way the if statement in the bubble sort is executed in repeated passes. Thus, like
the bubble sort and the insertion sorts, the selection sort is an O(n2) algorithm in terms of number
of comparisons. The area in which the selection sort potentially offers better efficiency is that the
number of interchanges of data in array locations is guaranteed to be O(n) because the swap in the
selection sort occurs in the inner loop but is subject to a conditional test. This means that, in their
worst cases, both of the other algorithms require O(n2) swaps as well as O(n2) comparisons.
Despite the fact that selection sort will usually be better in the number of data interchanges
required to sort an array, it has a drawback not found in the other two. It is apparently impossible
to short-circuit either of the nested loops in selection sort when it is given a list nearly sorted order.
So, for such data sets, the selection sort may be an order of magnitude worse than the other two.
Our Big-O analysis indicates that there is little to choose from the bubble sort, insertion sort and
selection sort algorithms. But the fact that we were able to systematically reach such a conclusion
is significant in itself. It portends of how informative the results of a Big-O analysis can be. After
all, even knowledge of such negative variety can be valuable in choosing appropriate algorithms
under certain circumstances. For instance, if a particular application usually involved adding a
small amount of data at the end of an already sorted list and then re-sorting, we now know to avoid
selection sort.
Most software systems make extensive use of algorithms that find a particular data item in
a large collection of data items. Such algorithms are called searching algorithms. There are several
searching algorithms which are already designed and from which one can select for a particular
application. Here two simple algorithms will be discussed: sequential search and binary search.
a) sequential search
Idea
Sequentially pass through the list until the required item is found
Algorithm
void sequential_search(int a[],int s, int k)
{
int i=0;
boolean found=false;
while((i<s) &&(!found))
if(a[i]==k)
found=true;
else
i++;
if(found)
cout<<"\nThe element "<<k<<" is at the location "<<i<<" of the array";
else
cout<<"\nThere is no such array element";
}
Big-O Analysis
If the item being sought is at the end of the list or does not exist in the list, we have to pass through
all the items before we give conclusion. Therefore, we are going to make n comparisons between
the item being sought and the items in the list. Hence, in terms of Big-O classification, it is an O(n)
algorithm. But since searching is conceptually a much simpler algorithm than sorting, sequential
search is inefficient even though it is on O(n).
b) Binary search
idea
Always divide the remaining list to be searched into two and then identify the half
list where the element might belong
Conditions
- The list to be searched must be physically sorted
- The number of elements must be known prior to starting the searching
- The elements of the list must be accessed randomly and by their relative position i.e. it
must be an array and not linked list.
Big-O Analysis
In binary search the list to be searched is continually halved. For a list of n items, the
maximum number of times we would cut the list in half before finding the target item or declaring
the search unsuccessful is
(log2n) + 1
Thus, the binary search is an O(log2n)
Assignment 1
Write a menu-driven program for testing the various simple sorting and searching methods
you have just studied. One option is to read a list of values into an array. Other options will be to
run one of the various sorting and searching methods on the list, to print the unsorted and the sorted
list, and to quit. After the list is sorted/searched and (perhaps) printed, it should be discarded so
that testing a later sorting/searching method will start with the same input data. This can be done
either by copying the unsorted list to a second array and sorting/searching that one, or by
rearranging the program so that it reads the data again before each time it starts sorting/ searching.