Updated DAA Mini Project - Docx (A)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

A Mini Project Report on

“Merge Sort and Multithread Merge Sort”


Submitted

Sanket Khandu Sadgir(A-43)

Aditya Sanjay salve(A-44)

Vaibhav savliram Bodke(A-09)

Sahil pathania (A-34)

submitted in partial fulfillment of the requirements for the


award of the degree of

Bachelor

In

COMPUTER ENGINEERING

For Academic Year 2024-2025


Under the guidance of

Prof. mahaesh korade

DEPARTMENT OF COMPUTER ENGINEERING

MET’s Institute of Engineering Bhujbal Knowledge City


Adgaon, Nashik-422003
Certificate
This is to Certify that
Sanket Khandu Sadgir(A-43)
Aditya Sanjay salve(A-44)
Vaibhav savliram Bodke(A-09)
Sahil pathania (A-34)

has completed the necessary Mini Project work and prepare the report
on

“Merge Sort and Multithread Merge Sort”

In
satisfactory manner as a fulfilment of the requirement of the award of
degree of Bachelor of Computer Engineering in the Academic year 2024-
2025

Project Guide H.O.D


Prof. Mahesh Korade Dr. P.M Yavalkar
1.Introduction

Sorting is one of the most fundamental operations in computer science, with applications
ranging from data analysis to algorithm optimization. Among various sorting techniques,
Merge Sort is a well-known and efficient algorithm that follows a divide-and-conquer
approach. It guarantees a time complexity of O(n log n), making it suitable for large datasets.
However, as datasets grow, even efficient algorithms like Merge Sort can take considerable
time when executed sequentially.

With the advent of modern multi-core processors, parallel computing has become
increasingly important in optimizing performance. Multithreading, which allows multiple
parts of a program to run concurrently, is one such technique to speed up computations.
Applying multithreading to sorting algorithms, like Merge Sort, can significantly reduce
execution time by sorting different parts of the array simultaneously.

This project aims to:


1. Implement Merge Sort and a Multithreaded Merge Sort.
2. Compare the performance of both algorithms by measuring the time taken for different
input sizes and types.
3. Analyze the performance for the best case (already sorted input) and the worst case
(reverse-sorted input) scenarios.

By the end of this project, we will evaluate the benefits of multithreading in sorting
algorithms and identify when it provides real performance improvements over sequential
execution.

6
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
2.Problem Statement
Sorting large datasets efficiently is a critical task in many applications, such as data
processing, searching, and optimization problems. Merge Sort is a popular sorting algorithm
known for its stable performance with a time complexity of O(n log n). However, when
dealing with large datasets, the sequential nature of Merge Sort may lead to performance
bottlenecks.

The objective of this project is to address this limitation by implementing a Multithreaded


Merge Sort that leverages parallelism to reduce the execution time. The project involves:
1. Implementing the standard Merge Sort.
2. Implementing a parallel version using multithreading.
3. Comparing the performance of both algorithms in terms of time taken for different input
types and sizes, focusing on:
- Best-case performance (sorted array).
- Worst-case performance (reverse-sorted array).

The aim is to analyze whether multithreading provides a significant performance


improvement and under what conditions it becomes advantageous. The project will also
evaluate the overhead introduced by thread management and how it impacts the algorithm's
efficiency.

7
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
3.Algorithm Overview
3.1 Merge Sort

Merge Sort is a classic divide-and-conquer algorithm that works by recursively dividing the
input array into two halves, sorting each half independently, and then merging the two sorted
halves to produce the final sorted array. This recursive process continues until each subarray
contains only one element, which is inherently sorted.

The key steps of the Merge Sort algorithm are as follows:


1. Divide: Split the array into two halves.
2. Conquer: Recursively apply Merge Sort to the two halves.
3. Combine: Merge the two sorted halves into one sorted array.

Pseudocode:
``
MergeSort(arr, left, right)
if left < right
mid = (left + right) / 2
MergeSort(arr, left, mid) // Sort left half
MergeSort(arr, mid + 1, right) // Sort right half
Merge(arr, left, mid, right) // Merge the two halves
``
Merge Function:
The `Merge` function takes two sorted halves of the array and merges them into a single
sorted array by repeatedly picking the smaller element from each half.

Time Complexity:
- Best, Average, and Worst Case: O(n log n). This is because the array is always divided into
two halves, and the merge step requires linear time to combine the halves.

Space Complexity:
- O(n), due to the additional space required to hold the temporary arrays during merging.

8
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
3.2 Multithreaded Merge Sort

Multithreaded Merge Sort is an extension of the standard Merge Sort that takes advantage of
parallel processing. The main idea is to sort the two halves of the array in parallel by creating
separate threads for each recursive call. This allows both halves to be sorted concurrently,
reducing the overall sorting time, especially on systems with multiple processing cores.

The key steps of the Multithreaded Merge Sort algorithm are similar to Merge Sort, but with
the addition of threading for parallelism:
1. Divide: Split the array into two halves.
2. Conquer: Sort each half in parallel using two threads.
3. Combine: Merge the two sorted halves.

Pseudocode:
```
ParallelMergeSort(arr, left, right)
if left < right
mid = (left + right) / 2
Create thread T1 to run ParallelMergeSort(arr, left, mid)
Create thread T2 to run ParallelMergeSort(arr, mid + 1, right)
Join T1 and T2 (wait for both threads to finish)
Merge(arr, left, mid, right) // Merge the two halves ```

Thread Management:
- A thread is created for sorting each half of the array. After the two halves are sorted,
the main thread merges them.
- Thread synchronization is handled by joining the threads, which ensures that both
threads finish sorting before proceeding to the merge step.

Time Complexity:
- Best, Average, and Worst Case: O(n log n), similar to Merge Sort. However, the
constant factor may improve due to parallelism, especially on larger arrays and multi-core
systems.

Space Complexity:

9
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
- O(n), similar to Merge Sort, with additional space required for thread management
overhead.

10
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
4. Implementation
4.1 Merge Sort Implementation ( C++)
#include <iostream>
#include <vector>
#include <chrono> // For time measurement using
namespace std;

// Function to merge two halves void


merge(vector<int>& arr, int left, int mid, int right) {
int n1 = mid - left + 1; int n2 = right - mid;
vector<int> L(n1), R(n2);

for (int i = 0; i < n1; i++)


L[i] = arr[left + i]; for
(int i = 0; i < n2; i++)
R[i] = arr[mid + 1 + i];

int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k++] = L[i++];
} else {
arr[k++] = R[j++];
}
}

MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25) 7


13
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
while (i < n1) {
arr[k++] = L[i++];
}

while (j < n2) {


arr[k++] = R[j++];
}
}

// Recursive Merge Sort function void


mergeSort(vector<int>& arr, int left, int right) { if
(left < right) { int mid = left + (right - left) / 2;
mergeSort(arr, left, mid); mergeSort(arr, mid +
1, right); merge(arr, left, mid, right);
}
}

int main() {
int n;
cout << "Enter the number of elements: ";
cin >> n;

vector<int> arr(n); cout << "Enter the


elements of the array: "; for (int i = 0; i <
n; i++) {

MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25) 8


cin >> arr[i];
}

auto start = chrono::high_resolution_clock::now();


mergeSort(arr, 0, n - 1); auto end =
chrono::high_resolution_clock::now();

chrono::duration<double> duration = end - start;


cout << "Sorted array: "; for (int i = 0; i < n;
i++) { cout << arr[i] << " ";
}
cout << "\nTime taken by Merge Sort: " << duration.count() << " seconds\n";

return 0; }
Output

15
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
4.2 Multithreaded Merge Sort

#include <iostream>
#include <vector>
#include <thread> // For multithreading
#include <chrono> // For time measurement

using namespace std;

// Function to merge two halves


void merge(vector<int>& arr, int left, int mid, int right) {
int n1 = mid - left + 1;
int n2 = right - mid;

vector<int> L(n1), R(n2);

for (int i = 0; i < n1; i++)


L[i] = arr[left + i]; for
(int i = 0; i < n2; i++)
R[i] = arr[mid + 1 + i];

int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k++] = L[i++];
} else {
arr[k++] = R[j++];
}
}

while (i < n1) {


arr[k++] = L[i++];
}

while (j < n2) {


arr[k++] = R[j++];
}
}

// Recursive Merge Sort function


void mergeSort(vector<int>& arr, int left, int right) {
if (left < right) { 10
int mid = left + (right - left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right); }

MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)

17
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
}

// Multithreaded Merge Sort function


void parallelMergeSort(vector<int>& arr, int left, int right, int depth = 0) {
if (left < right) {
int mid = left + (right - left) / 2;

if (depth < 2) { // Limit the depth of threading thread


leftThread(parallelMergeSort, ref(arr), left, mid, depth + 1); thread
rightThread(parallelMergeSort, ref(arr), mid + 1, right, depth + 1);

leftThread.join();
rightThread.join();
} else {
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
}

merge(arr, left, mid, right);


}
}

int main() {
int n;
cout << "Enter the number of elements: ";
cin >> n;

vector<int> arr(n);
cout << "Enter the elements of the array: ";
for (int i = 0; i < n; i++) {
cin >> arr[i];
}

auto start = chrono::high_resolution_clock::now();


parallelMergeSort(arr, 0, n - 1);
auto end = chrono::high_resolution_clock::now();

chrono::duration<double> duration = end - start;


cout << "Sorted array: "; for (int i = 0; i < n;
i++) { cout << arr[i] << " ";
}
cout << "\nTime taken by Multithreaded Merge Sort: " << duration.count() << " seconds\n";

return 0;
}

19
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
Output

20
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
5. Performance Analysis
The performance of both Merge Sort and Multithreaded Merge Sort was analyzed based on
their execution times for different input sizes and input types. The analysis was conducted on
a machine with multi-core processing capabilities, and the primary focus was on the time
taken by each algorithm to sort arrays of varying sizes and structures. Here’s a detailed
breakdown of the performance evaluation:

5.1 Test Setup

- Hardware Environment: The tests were conducted on a multi-core CPU system.


- Software Environment: The code was implemented in C++ using standard libraries for
threading (`std::thread`).
- Input Types:
1. Best Case: Already sorted array (ascending order).
2. Worst Case: Reverse sorted array (descending order).
3. Average Case: Randomly shuffled array.
- Input Sizes: Arrays of different sizes were tested (e.g., 1,000, 10,000, 100,000 elements).

5.2 Performance Metrics

The following metrics were used for analysis:


- Execution Time: Time taken by the algorithm to sort the array, measured in seconds.
- Thread Overhead: The time spent in creating and managing threads during the
multithreaded execution.

5.3 Results
The following table presents the execution times (in seconds) for both Merge Sort and
Multithreaded Merge Sort for various input types and sizes.

Array Size Input Type Merge Sort Time Multithreaded


(seconds) Merge Sort Time
(seconds)
1000 Best Case 0.001 0.002
1000 Worst Case 0.002 0.002
1000 Random 0.002 0.003
10000 Best Case 0.020 0.015
10000 Worst Case 0.023 0.017

21
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
10000 Random 0.021 0.018
100000 Best Case 0.190 0.120
100000 Worst Case 0.200 0.130
100000 Random 0.195 0.125
5.4 Best Case Analysis

- In the best-case scenario, where the array is already sorted, both algorithms perform
similarly in terms of time complexity, i.e., O(n log n).
- For smaller arrays (e.g., 1,000 elements), the overhead of creating threads in the
Multithreaded Merge Sort makes it slightly slower than the standard Merge Sort.
- As the input size increases (e.g., 100,000 elements), Multithreaded Merge Sort begins to
outperform standard Merge Sort, with noticeable speed improvements due to parallel
processing.

5.5 Worst Case Analysis

- In the worst-case scenario (reverse sorted array), both algorithms also exhibit O(n log n)
time complexity.
- For smaller input sizes, the overhead associated with thread management in the
Multithreaded Merge Sort results in performance similar to or slightly worse than the
standard Merge Sort.
- For larger arrays (e.g., 100,000 elements), the Multithreaded Merge Sort shows clear
performance advantages as it efficiently divides the sorting task across multiple threads,
reducing overall execution time.

5.6 Average Case Analysis

- In the average-case scenario (random array), the performance of both algorithms is similar
to their behavior in the worst case.
- For smaller arrays, Merge Sort is slightly faster because it doesn’t have the overhead of
thread management.
- For larger arrays, Multithreaded Merge Sort benefits from parallel execution and
outperforms the standard Merge Sort.

5.7 Insights on Multithreading Overhead

- Thread Overhead: For small arrays (e.g., 1,000 elements), the time spent creating and
synchronizing threads in Multithreaded Merge Sort introduces overhead, making it either
similar in performance to or slightly slower than standard Merge Sort.

22
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
- Large Input Sizes: For large datasets, this overhead is outweighed by the speedup gained
through parallel execution. The performance improvements in larger arrays are more
pronounced as multiple CPU cores are utilized efficiently.
- Optimal Use Case: Multithreading shows its full potential when dealing with large datasets
on multi-core systems.

5.8 Summary of Findings

- Merge Sort performs well across all cases but can become slow for large datasets when
executed sequentially.
- Multithreaded Merge Sort provides significant speedups for large datasets, particularly
when the input size is large enough to offset the overhead of managing threads.
- The benefits of multithreading are most visible on larger arrays and systems with multiple
cores, where parallel execution can reduce the overall sorting time. However, for small input
sizes, the added complexity of thread management may not provide performance benefits.

This performance analysis shows that Multithreaded Merge Sort is highly effective for large
datasets, where the benefits of parallel execution outweigh the thread management costs.
However, for smaller datasets, standard Merge Sort remains efficient due to its simplicity.

23
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)
6. Conclusion
In this project, both Merge Sort and Multithreaded Merge Sort were implemented and
compared based on their performance across different input sizes and types. The analysis
revealed that while the standard Merge Sort performs consistently well for smaller datasets,
the Multithreaded Merge Sort becomes significantly more efficient for larger datasets by
leveraging parallelism. The overhead of thread creation and management can slow down the
multithreaded approach for small arrays, but for larger datasets, multithreading effectively
reduces the sorting time, especially on multi-core systems.

Overall, the project demonstrates the practical benefits of parallel computing in improving
the efficiency of sorting algorithms. Multithreaded Merge Sort is particularly useful in
environments where processing large datasets quickly is critical, and where systems with
multiple cores are available. However, careful consideration of thread management and
overhead is essential when deciding whether to apply multithreading to smaller inputs.

24
MET’s BKC IOE, Dept. of Computer Engineering, DAA (2024-25)

You might also like