Sorting Proj Details

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

PROGRAM OVERVIEW

For this project, you will implement and compare different methods for finding the median of an unsorted
list.

To avoid unimportant yet tedious issues in certain algorithms, we'll consider the median of a list with
an even number of elements to be the lesser of the 2 middle elements. So in a sorted list of 5 or 6
elements, the 3rd will be the median. In a sorted list of 7 or 8 elements, the 4th will be the median, etc.

All methods will be implemented using std::vector.

All except heap sort will be implemented using iterators.

All except one merge sort and the recursive calls in the median of medians and will be in-place.

Details will be given for dealing with duplicates.

Most of these methods will not fully sort the list in order to find the median. To check the correctness,
each (except where noted) will print the vector to standard output at the end of the function. This should
be fine on the autograder since it will redirect the output, but you probably don't want to print anything
more than the 1000-size inputs to screen.

The 7 methods you will implement are as follows:

1) Selection sort stopped halfway through

- Iterate through the list, finding the smallest element, then 2nd smallest, and so on until you find
the median. This is essentially a selection sort, stopped halfway through. This method will take
too long on very large lists, so for inputs of size >50,000 do not use this method, and instead print
a message saying that the input was too big for selection sort.
th th
- Details: For each iteration, move the i smallest element into the i position by swapping it with
the element already there. In the case of ties for smallest, swap the first one reached (the one with
the lesser index). Your last action should be swapping the median element into its proper place.

2) std::sort

- Use std::sort to sort the list, then simply check the middle element. Note that std::sort is some n
log(n) sort, typically introsort, though exact implementation varies by compiler.
- Details: You get easy points for calling a library function.

3) Merge Sort

- You will implement the recursive part of the merge sort, and use std::merge for the merging.
Since you can't really stop merge sort halfway through to get the median, run this sort to
completion.
- Details: Since your end result will be a fully sorted vector either way, it doesn't matter whether
the left side or right side contains the extra element when splitting an odd number.
4) In-place merge sort

- Same as above, except use std::inplace_merge.

5) Heapsort

- Implement heapsort, as laid out in the textbook. However, after building the heap, you'll only
have to remove n/2 elements from the heap rather than n, which will hopefully cut down on the
time significantly.
- Details: Before starting heapsort, take the first element in the vector (at index 0) and move it to
the back of the vector. This will leave index 0 open, as per the textbook implementation.
- For heapsort, you don't have to use iterators. Using array subscripts is more efficient since you'll
be doing a lot of arithmetic on indexes (e.g. doubling them to find the left child, halving to find
the parent).
- At the end of the sort, you should have deleted all elements smaller than but not including the
median - i.e. at the end of the sort, the median should be at the root of the remaining heap, and the
remaining heap should be about half the size of the original vector. You do not need to move
deleted elements to the back of the vector, or save them anywhere else. The vector should shrink
in size as you delete elements.
- If the hole's left child and right child are equal (and the hole's value is greater than theirs), swap it
with the left child.

6) Quickselect

- Implement a quickselect algorithm, which works like a quicksort, except that after partitioning
your array into the part that's less than your pivot and the part that's greater than your pivot, you
only recurse on one side instead of both. To pick your pivot, use the "median of 3" method, which
looks at the first, middle, and last element and picks the median of them as your pivot. Your quick
select should be in-place.
- Details: For pivot selection, use the median of 3 method. This method takes the first element, last
element, and middle element (or left-middle element if there are an even number of elements) and
chooses the median of those as the pivot. If two or three of these share the same value, make the
leftmost one your pivot.
- For partitioning, use the Hoare partition method, as described in textbook section 7.7.2. Swap
values that are equal to the pivot (this prevents turning an input of identical values into a
worst-case scenario). Consider edge cases, and make sure your iterators don't go out of bounds in
any case.
- Keep in mind for Quickselect that once the pivot is placed, you recurse on one side of the pivot,
not including the pivot (which is already in its correct position).
- HoarePartition will assume that the pivot is already in place, at the end of the subarray to be
partitioned. Thus, you must swap your pivot into position before calling hoarePartition. A helper
function for pivot selection and placement is advisable.
- Recursive base case: once your subarray size reaches 10 or less, use std::sort to sort it all rather
than recursively calling QuickSelect.
- Additionally, you'll write a separate function to generate a vector of each integer from 1-20000
that will cause the quickselect algorithm to perform terribly - worst-case or close. Test this file on
all methods, include it in your statistics, and briefly describe the approach you used at the end of
your report.

7) Median of medians (extra credit)

- Implement the full "median of medians" algorithm, which divides a list into groups of 5, finds the
median of each group, then recursively find the median of those, in order to pick a pivot for
quickselect.

Details: In separate post below

- Finally, you will time the execution of each program. Since we're using C++11, here is a
suggestion on how to time your programs.
- Each of the methods used will have its own .cpp file and header file. You will also write a
driver program which will read a text file of unsorted integers into a vector, then run each
method on a copy of that vector, finding the median and timing the algorithm. Make sure
each algorithm gets a copy of the original, and not one that's been altered.
- When you time your program, be careful - try not to include steps which are not involved
in the actual algorithm, such as reading the input from file, or even making a copy of the
input to run the algorithm on.

REQUIRED FILES AND FUNCTIONS


Each of the 7 methods used and the worst-case-quickselect-generation-function will be contained in
separate header files, each of which will have a primary function of the same name (aside from the
capitalization difference). Functions listed here are required and will be tested. You may write additional
helper functions as needed, but they will not be tested.

Each primary function will take a vector by reference containing a copy of the original input and an int,
duration, in which you will put how long the algorithm takes in milliseconds.

Each primary function returns the median.

After the algorithm, aside from returning the correct median, the vector nums is expected to match our
implementation exactly. As such, exact implementation details will be provided.

You may use the initial calls of the recursive algorithms as a wrapper for your recursive implementation
(and to time everything).
1) HalfSelectionSort.hpp

int halfSelectionSort ( std::vector<int>& nums, int& duration )

2) StandardSort.hpp

int standardSort ( std::vector<int>& nums, int& duration )

3) MergeSort.hpp

int mergeSort ( std::vector<int>& nums, int& duration )

4) InPlaceMergeSort.hpp

int inPlaceMergeSort ( std::vector<int>& nums, int& duration )

5) HalfHeapSort.hpp

int halfHeapSort ( std::vector<int>& nums, int& duration )

void percDown ( std::vector<int>& heap, std::vector<int>::size_type hole )

// parameter "hole" is the index of the hole.

// percDown precondition: value to be inserted into hole is stored in heap at index 0. The hole
itself may be in an unspecified state - i.e. it doesn't matter what's in it since you'll be overwriting it
anyway.

// percDown postcondition: hole has been moved into correct place and value has been inserted
into hole.

void buildHeap ( std::vector<int>& heap)

6) QuickSelect.hpp

int quickSelect ( std::vector<int>& nums, int& duration )

std::vector<int>::iterator hoarePartition ( std::vector<int>& nums, std::vector<int>::iterator low,


std::vector<int>::iterator high )

// hoarePartition precondition: low points to the first element in the subarray to be partitioned. The pivot
is the last element in the subarray to be partitioned, and is pointed to by high.

// hoarePartition returns an iterator to the pivot after it's placed.

Note that this implementation of hoarePartition makes it usable with different pivot selection methods, but
also requires that you select your pivot and swap it into the last position prior to calling hoarePartition.

7) WorstCaseQuickSelect.hpp
std::vector<int>& worstCaseQuickSelect (void)

// worstCaseQuickSelect generates a worst-case input for a quickselect that uses median-of-3


partitioning. The input it generates must be of length 20,000, and contain each number from 1-20000
once.

8) MedianOfMediansMethod.hpp

Update: Median of medians will be hand-graded and your vector after the algorithm finishes does not
need to exactly match my end result, but in order for your submission to qualify for grading, you must
meet the following requirements:

- You completed QuickSelect

- Your median of medians runs on the autograder and returns the correct median.

- Your time for median of medians is within the expected range (which will be very broad).

int medianOfMediansMethod ( std::vector<int>& nums, int& duration )

int medianOfMedians ( std::vector<int>& nums, std::vector<int>::iterator low, std::vector<int>::iterator


high )

// returns the median of medians of all elements contained between high and low, inclusive.

For maximum points, must also have a function called "medianOfFive" that calculates the median of 5
numbers in 6 comparisons, as we went over in class.

Note that median of medians is a pivot selection method. For partitioning, you'll use Hoare's partition
again.

MEDIAN OF MEDIANS ALGORITHMS


1) Median of medians does not have to be in-place. When you find the medians of the groups of 5
numbers, you may create a new list of them, so that you don't interfere with the main list you're working
on.
2) When dividing the list into groups of 5, and you get near the end of the list and can't form an entire
group of 5, if there are 1 or 2 numbers left over, ignore them. If there are 3, add the median of those 3 to
the list of medians. If there are 4, add the 2nd smallest value to the list of medians.

3) Finding the median of 5 should be a separate function.

4) The base case of your recursion should be when the list is down to size 24 or less. In this case, rather
than calling median of medians on it, use std::sort.

OUTPUTS
I'm providing outputs for the selection sort, heapsort, and quickSelect methods, for inputs 1 (size 1000)
and 4 (size 31k). If you implemented everything according to my specifications, your outputs should
match.

For heapsort, index 0 is not part of the heap, so it is not included in my output, nor should it be included
in yours.

For std::sort and mergesort, your vector should be fully sorted at the end, so I don't think I need to provide
those outputs.

For the median of medians, it will be hand graded if it runs, and if your quickSelect is correct.

You might also like