Parallelization of Smith-Waterman Algorithm Using MPI
Parallelization of Smith-Waterman Algorithm Using MPI
Parallelization of Smith-Waterman Algorithm Using MPI
MPI
Anes Luckin, Irfan Mehanovic, Haris Spahic, Kenan Mahmutovic, Mesud Klisura
Abstract— This paper describes the implementation of parallel II. MEASURING PROCEDURES
version of the Smith-Waterman algorithm for sequence alignment
on small Linux based cluster using OpenMPI. Code is written in C We set up a small cluster consisting of four personal
language. Measurements of parallel execution time on cluster with computers, connected in LAN network. One of them was set up
different number of processes and slight tuning of algorithm are as a master node, and the other three were set up as slaves. Each
described and compared. Main aim of these measurements is to computer has i5 dual core processor with two threads on each
push our cluster to its limits and to try to achieve significant core, and 4 GB of RAM memory. We used Ubuntu 16.04 as our
speedup in parallel execution over sequential one. operating system. Below is table with machine configurations.
IV. ANALYSIS AND RESULTS i5-4310M 2.70GHz i5-2520M 2.50GHz i5-3337U 1.80GHz
0
0 2000 4000 6000 8000 10000 12000 2.5
Sequence length 2
Sequential Parallel 1.5
1
Fig. 3. Sequential versus parallel average execution time
0.5
We can see above that both algorithms showed good 0
performance when matching sequences that are less than 4000 0 2000 4000 6000 8000 10000 12000
elements long. Whereas MPI takes great lead as sequence got Sequence length
longer. Sequential execution on sequences of 10000 elements
took 2.73 seconds. Parallel execution took 1.99 seconds. That
makes speed up of almost 30%. It’s clear that running MPI on Fig. 5. Average execution time of different sequence length (700 block size)
one machine with multiple processes is better than running
sequential code in this case.
Different configuration of nodes in our cluster definitely
impacted performance time, so to put that on paper we put those
machines on test, where each node executed parallel algorithm
on sequences that were 10000 elements long, but we changed
number of processes in each execution time.
Average execution time of different sequence length Average execution time of different sequence block size (45
(800 block size) processes)
2.5 3
2
2
1.5
1 1
0.5
0 0
0 2000 4000 6000 8000 10000 12000 0 1000 2000 3000 4000 5000
Fig. 6. Average execution time of different sequence length (800 block size) Fig. 9. Average execution time of different sequence block size (45)
2.5 3
2
2
1.5
1
1
0.5 0
0 1000 2000 3000 4000 5000
0
0 2000 4000 6000 8000 10000 12000 Sequence block size
Sequence length
Fig. 10. Average execution time of different sequence block size (50)
Fig. 7. Average execution time of different sequence length (900 block size)
3 than less computers, but still worse than the only one computer.
2.5 As we can see in the figure Fig. 11 below, four computers had
2 better results than three computers, as the sequence length
1.5 increased.
1
0.5
0
0 1000 2000 3000 4000 5000
Sequence block size
Fig. 8. Average execution time of different sequence block size (40 processes)
number of processes on four processors by changing sequence
Average execution time depending on number of length incrementally by 500 nucleobases until the length of
Average execution time (s)
V. CONCLUSION
100
In this paper we analyzed the speedup obtained by the
0 implementation of parallel version of the Smith-Waterman
0 2000 4000 6000 8000 10000 12000 algorithm for sequence alignment compared to sequential
Sequence length execution, based on the relation between different numbers of
1 processor 2 processors processes and the amount of work. This comparison was done
3 processors 4 processors by executing the code on single machine and several machines
running OpenMPI. The results show that performances and
Fig. 11. Average execution tme depending on number of processors efficiency of parallel processing code implementation is much
After execution of parallel code on this cluster we obtained better than sequential processing code implementation, but
different results than before, when we tested it on the processor results on parallel computers with parallel code are
that showed best results. As the next figure shows, our number disappointing, as in our case. We achieved speedup on single
of processes changed from 45 processes to just 4 processes on machine with parallelization but when we connected more and
the cluster. For detailed information you can investigate figure more machines to expand our cluster we didn't get better results
Fig. 12 which is shown below. from ones that are obtained by single machine.
In order to gain a larger speedup, it is important that the work
Average execution time with different sequence increases with the number of processes, that is the increase of
Average execution time (ms)
block size on 4 processes number of processes only makes sense with larger dataset.
300 Finally, the results show that it is possible to achieve speedup
250 by parallelizing code for machine with 4-core processor, but it
200 doesn't mean that we will get speedups by connecting more
150 machines in a cluster. In order to obtain ideal speedup, it is
100 necessary to find the optimum between the number of processes
50 and the size of the data being processed. Sometimes it is better
0 to use single machine if tasks are not huge and complicated
0 200 400 600 800 1000 1200 because results can be on their side because a lot of time is lost
Sequence block size during communication between processing units. In fact it all
depends on what do you need your program for. Clusters allow
so much user choice it really does depend on your specific needs.
Fig. 12. Average execution time with different sequence block size on 4
processes
REFERENCES
Average execution time running on 4 machines with 4 [1] Alice Dorottya DOMOKOS, BIOINFORMATICS AND
processes (Optimal block size) COMPUTATIONAL BIOLOGY. Cluj-Napoca: University of
Agricultural Sciences and Veterinary Medicine Cluj-Napoca, 2008.
200
[2] Oswaldo Trelles, On the Parallelization of Bioinformatic Applications.
Malaga: Computer Architecture Department, University of Malaga.
150 [3] Ananth Prabhu G, Dr.Ganesh Aithal, “Automatic parallelization for
Parallel Architectures using Smith- Waterman Algorithm” volume
100 3, issue 9, April 2014
[4] X. Meng, & V. Chaudhary, Bio-Sequence Analysis with Cradle's 3SoC
50 Software Scalable System on Chip. Proc. 19th ACM Symposium on
Applied Computing, Nicosia, Cyprus, 2004, 202-206
[5] Douglas Eadline, “High Performance Computing for Dummies”,
0
Published by: Wiley Publishing, Inc. 111 River Street Hoboken, NJ
0 2000 4000 6000 8000 10000 12000 07030-5774.
[6] Ananth Grama, George Karypis, Vipin Kumur, and Anshul Gupta.
Fig. 13. Average execution time running on 4 machines with 4 processes Introduction to Parallel Computing (2nd Edition). Addison Wesley, 2
(Optimal block size) edition, January 2003.
Curve in the figure Fig. 13 shows how Smith-Waterman
algorithm behaves when we set optimal block size B, optimal