This document discusses accelerating genomic sequencing through hardware acceleration. It reviews literature on using GPUs, FPGAs, and big data techniques to efficiently and scalably accelerate DNA and protein sequencing. Several papers are discussed that demonstrate accelerating sequence alignment algorithms like Smith-Waterman and pair-HMMs forward algorithm on GPUs, achieving speedups of up to 18x faster compared to CPU implementations. FPGA implementations of algorithms like pair-HMMs forward are shown to be up to 67x faster. Overall, hardware acceleration through GPUs and FPGAs is able to significantly improve the performance of genomic sequencing tools and pipelines.
This document discusses accelerating genomic sequencing through hardware acceleration. It reviews literature on using GPUs, FPGAs, and big data techniques to efficiently and scalably accelerate DNA and protein sequencing. Several papers are discussed that demonstrate accelerating sequence alignment algorithms like Smith-Waterman and pair-HMMs forward algorithm on GPUs, achieving speedups of up to 18x faster compared to CPU implementations. FPGA implementations of algorithms like pair-HMMs forward are shown to be up to 67x faster. Overall, hardware acceleration through GPUs and FPGAs is able to significantly improve the performance of genomic sequencing tools and pipelines.
This document discusses accelerating genomic sequencing through hardware acceleration. It reviews literature on using GPUs, FPGAs, and big data techniques to efficiently and scalably accelerate DNA and protein sequencing. Several papers are discussed that demonstrate accelerating sequence alignment algorithms like Smith-Waterman and pair-HMMs forward algorithm on GPUs, achieving speedups of up to 18x faster compared to CPU implementations. FPGA implementations of algorithms like pair-HMMs forward are shown to be up to 67x faster. Overall, hardware acceleration through GPUs and FPGAs is able to significantly improve the performance of genomic sequencing tools and pipelines.
This document discusses accelerating genomic sequencing through hardware acceleration. It reviews literature on using GPUs, FPGAs, and big data techniques to efficiently and scalably accelerate DNA and protein sequencing. Several papers are discussed that demonstrate accelerating sequence alignment algorithms like Smith-Waterman and pair-HMMs forward algorithm on GPUs, achieving speedups of up to 18x faster compared to CPU implementations. FPGA implementations of algorithms like pair-HMMs forward are shown to be up to 67x faster. Overall, hardware acceleration through GPUs and FPGAs is able to significantly improve the performance of genomic sequencing tools and pipelines.
Download as DOCX, PDF, TXT or read online from Scribd
Download as docx, pdf, or txt
You are on page 1of 5
EFFICIENT GENOMIC SEQUENCING THROUGH HARDWARE
ACCELERATION
ABSTRACT:-Sequence alignment is an important component in bioinformatics. Different
protein sequence and DNA sequence aligners are used for this purpose. Accelerators are becoming increasingly commonplace in delivering high performance computing. In this paper we reviewed the literature and find the hardware accelerated platform like GPUs, FPGAs, Big data that is used for efficient, scalable and for faster DNA or protein sequencing
Introduction
DNA sequencing is the process of determining the CLASSIFICATION OF LITERATURE
accurate order nucleotide along chromosome and genomes.DNA sequencing is essential for a deep In this section we classified the literature based understanding of human genetics[2],Next on hardware platform.we analyzed and Generation Sequencing also known as high compared some of the approaches used and throughput Sequencing, NGS has changed the proposed by authors in the research papers to study of Genomics, NGS allows to sequence DNA accelerate DNA and Protein Sequencing. more quickly and inexpensive than previous Sanger, So this fast reduction in the cost of DNA Sequencing is making it an accessible method for Using GPU’s researchers to use it at level that was never before Graphical Processing Unit is a computer part possible.the sequencer can produce millions of that makes a computer work and tick over short reads in parallel, small fragments of DNA speedily. GPU is used to tackle computational called short reads the NGS output so called Short challenges, and commonly used in many reads. NGS platform are able to generate large bioinformatics tools to accelerate the amounts of DNA Sequencing date ranges up to computationally intensive algorithms and hundreds of GB's, However, Due to large volume of improve their performance. data it would take long time to compute, for this purpose there is an urgent need to develop In paper [21] the author presents a Graphical scalable high performance computational Processing Units(GPUs) accelerated Smith- solutions to address these challenges. For this Waterman for protein sequence purpose the author used CPU, GPU’s, FPGA’s, big alignment.Smith-waterman(S-W) set of rules is a data techniques, cluster based and distributed superior sequence alignment method for organic based computing to solve these challenges. databases, however its computational complexity makes it too sluggish for realistic GCUPS.the inter-task implementation with purposes. Heuristics primarily based reorganized dataset is 18.19x faster than with approximate techniques like FASTA and BLAST originaldataset.Reorganizing the dataset makes offer quicker solutions but at the cost of intra-task implementation 2.19x faster.With decreased accuracy. Also the increasing extent original dataset, it is better to use intra-task and ranging lengths of sequences require implementation.The intra-task implementation execution proficient rebuilding of these with original dataset is 7.5x faster than the inter- databases. Subsequently to come up with a task implementation with original dataset. The precise and quick arrangement it’s far fairly intra-task implementation has the highest preferred to hurry up with the S-W set of rules, throughput as high as 23.56 GCUPS an elite protein arrangement for Graphics processing Units (GPUs). The brand new In paper [23] the author presents, to accelerate the PFA on GPU to improve the performance of implementation improves overall performance GATK HC. This paper is extended version of by advancing the database association also; work published in [22].After executing all the decreasing the quantity of memory gets to wipe implementations and comparing performance using out transfer speed bottlenecks. The different real data sets, Pair HMMs Forward implementation is referred to as Database algorithm achieved a speed of upto5.47x over Optimized Protein alignment (DOPA) and it existing GPU-based implementations. The naïve intra task implementation is the fastest over all the achieves a performance of 21.4 Giga cell updates GPU-base implementation when the number of per second which is 1.13 times higher than the read haplotypes pairs in each chunk is small.The quickest GPU implementation up to now. inter-task implementation can’t use the GPU resources efficiently when the size of the datasets In paper [22] the authors evaluate two different reduced to 200 pairs. implementation methods to accelerate the pair- HMMs forward algorithm using different In paper[24] theauthor presents, a GPU acceleration of the GATK haplotype Caller, and datasets to compare the performance. Inter-task load-balanced multi-process optimization that parallelization and intra-task parallelization are divides the genome into regions of different the way to accelerate PFA on GPU. In inter-task sizes to ensure a more equaldistribution of parallelization the whole processing is mapped computation load between different processes to a single thread. Several copies of algorithm and to address its implementation limitation running in parallel. Each thread implements the which forces the sequential executionof the algorithm independently. In intra-task program and prevents effective utilization of hardware acceleration.In single-threaded mode, parallelization the algorithm is mapped to a the GPU-based GATK HC is 1.71xfaster than the single thread block. This method requires a baseline implementation and 1.21x faster whole block of threads to compute a single copy thanthe vectorized GATK HC implementation. In of algorithm. This reduces the no of copies of the multi-threaded mode, the GATK HC workflow algorithm that can executed in parallel on the limits the performance improvement achievable GPUs compared to the inter-task by accelerating the pair-HMMs kernel. In multi- process mode, the GPU-based GATK HC parallelization.Operation of PFA are in the implementation is the fastest. In addition, the floating-point domain.The GPU implementations GPU-based implementation achieves up to 2.04x with reorganized dataset achieve larger and 1.40x speedup in load-balanced multi- throughput than with original dataset.The inter- process mode over the baseline implementation task implementation with reorganized dataset and vectorized GATK HC implementation in non- achieved the biggest throughput, which is 12.79 load-balanced multi-process mode. the design on the convey supercomputing platform.A number of architectural features In paper[25] the authorpresents, a high have been implemented to improve the performance GPU accelerated set of API(GASAL) performance of the design, such as early exit for pairwise sequence alignment of DNA and points to increase the utilization of the array for RNA sequences. The GASAL APIs provide small sequence sizes, as well as on-chip accelerated kernels for local, global as well as buffering to enable the processing of long semi-global alignment, allowing the computation sequences effectively. FPGA implementation of of the alignment score, and optionally the start the pair-HMMs forward algorithm is upto67x and end positions of the alignment.this library faster. contain functions that enable fast alignment of sequences and can be easily integrated in In paper [27] the authors proposes the first computer programs developed for NGS data accelerated implementation of BWA-MEM. analysis. One to one as well as all-to-all and one- BWA-MEMis a popular genome sequence to-many- pairwise alignment can be performed. algorithm widely used in NGS genomics pipeline, it is the latest and generally recommended for The sequences are first packed into unsigned 32- high quality queries as it is faster and more bit integer, followed by performing the accurate[28] A characteristics work load of this alignment. The total execution time is the sum of algorithm is to align millions of DNA reads data packing, data copying and alignment kernel against a reference genome, the BWA-MEM times. Without computing start-position GASAL processes reads in batches in which the kernel performs the alignment in much less time as processes a batch of reads The BWA-MEM compared to SSW and NVBIO. With start- algorithm alignment procedure generally consists of three main kernels 1-SMEM position computing the speed of GASAL and Generation 2-Seed Extension 3-Output NVBIO is nearly the same. Hence GASAL is 2-4x Generation[29]. BWA-MEM implements multi- faster and gives more speedup over state-of-the- threaded execution of all three kernels. The art libraries making it a good choice for authors propose and evaluate a number of sequence alignment FPGA-based systolic array architectures, presenting optimizations generally applicable to variable length Smith-Waterman execution.By optimizing one of the three main kernels of BWA- MEM, 45% increase in application performance was observed.By implementing the seed extension kernel as a systolic array, 3x faster performance is achieved than software only execution.The USING FPGAs dynamic programming type of algorithm used in seed extension kernel is a much better fit for Field Programmable Gate Arrays(FPGAs) with execution on an FPGA their flexible and reprogrammable substrate, are a natural fit for a computationally intensive In paper [30] the authors proposes highly algorithm. optimized Smith-Waterman implementation on Intel FPGAs using OpenCL. This implementation In paper [26] the authors proposed a novel is both faster and more efficient than other systolic array design to accelerate the pair- current Smith-Waterman implementations. HMMs forward algorithm on FPGAs, analyze a Compared to using normal hardware description number of optimization techniques to improve languages, using a high-level language such as performance and presents an implementation of OpenCL has two main benefits. First, OpenCL is more expressive (our Processing Element kernel In paper [] the authors a new Spartk based code in OpenCL requires 90 lines of code framework called SparkGA that allow a DNA compared to about 450 lines of VHDL). pipeline to run efficiently & cost effectively on a scalable computational cluster. SparkGA uses in- Second,OpenCL development has more memory computation capabilities to improve convenient testing and de- bugging the performance of the framework. SparkGA is capabilities.obtaining a theoretical performance of about 71% faster than other state of the art 214 GCUPS. solutions, In recent years a number of bigdata framework have emerged to efficiently manage and In paper [31] the authors process large datasets in a easy way. SparkGA addresses the problem by implementing a memory effiecient load balancing step, SparkGA runs the pipeline in three differenct steps: DNA mapping & Using Big Data static load balancing, dynamic load balancing & SAM to BAM and marking of duplicates and variant discovery.Ensuring Scalability, allow to run Big data is a term used to refer to data sets that the framework on low cost nodes upto 16 GB of are too large or complex for traditional data memory, achieving accuracy is 99.9981%. When processing application software to deal with. deployed on a 20 node IBM power* cluster, Spark GA can complete the GATK best practice pipeline In paper[] the authors using the apache spark in 90 minutes. It is 71% faster than the state-of-the- big data framework. The simultaneous art solutions. multithreading improves the performance of BWA for all systems, increasing performance by up to 87% for Spark. Spark has up to 27% better performance for high system utilization. Spark is Conclusion able to sustain high performance when the In this paper, we studied various Papers related to system is over-utilized. Spark versions divide sequence alignment algorithms. The S-W algorithm the input dataset of short reads into a number of proved to be the most accurate one to carry out smaller files referred to as chunks.Spark system sequence alignment; however it needs an is more capable of handling higher number of exceptionally long time to complete making it the threads. most suitable alternative for hardware acceleration.
REFERENCES
In paper[] the authors propose StreamBWA, a
new framework that allows the BWA mem program to run on a cluster in a distributed fashion, at the same time while the input data is [21] LaiqHasanMarijnKentieZaid Al-Ars being streamed into the cluster.this streaming distributed approach is approximately 2x faster 33rd Annual International Conference of than the non-streaming approach.compared to the IEEE EMBS SparkBWA, StreamBWA is almost 5x faster. Our framework consist of two utilites. One is the Boston, Massachusetts USA, August 30 - StreamBWA and the other is the chunker September 3, 2011 program. [22] ShanshanRen,KoenBertels,Zaid Al-Ars 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)