Assignment: Date of Submission

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

ASSIGNMENT

TOPIC – NEXT GENERATION SEQUENCING AND ITS


APPLICATIONS

PAPER – GENOMICS AND PROTEOMICS (BIOTECH/04/SC/28)

DATE OF SUBMISSION – 24/04/2020

SUBMITTED TO SUBMITTED BY
Dr. N. Senthil Kumar, FACHIT MOHAN
Professor. M. Sc. 4th Semester
Dept. of Biotechnology, Roll No. 18/BIOT/04
Mizoram University
Sl. TITLE Page
No. No.
1. INTRODUCTION 01

2. BRIEF HISTORY OF DNA SEQUENCING 01

3. COMPARISON BETWEEN SANGER SEQUENCING AND NEXT 01-02


GENERATION SEQUENCING

4. METHODES OF NEXT GENERATION SEQUENCING 02-12

A. LIBRARY PREPARATION 02-03

B. AMPLIFICATION 04-05
I. Emulsion PCR

II. Bridge PCR

C. SEQUENCING 05-12

I. 454 Pyrosequencing 06-07

II. Ion Torrent Semiconductor Sequencing 07-08

III. Sequencing By Ligation (SOLiD) 08-11

IV. Reversible Terminator Sequencing (Illumina) 11-12

5. APPLICATIONS OF NEXT GENERATION SEQUENCING 13-15

6. LIMITATIONS OF NGS TECHNOLOGY 16


7. REFERENCES 17-18

8. QUESTIONS BASED ON THE TOPIC 18


1) INTRODUCTION
DNA sequencing is the process of determining the sequence of nucleotides in a
section of DNA. The first commercialized method of DNA sequencing was
Sanger sequencing. The next generation sequencing (NGS) or high-throughput
sequencing is better than sager sequencing. Because the technologies of next
generation sequencing allow for sequencing of DNA and RNA much more
quickly and cheaply than the previously used Sanger sequencing. In this
assignment the history of sequencing, methods of NGS with different sequencing
approaches and the applications NGS are illustrated.

2) BRIEF HISTORY OF DNA SEQUENCING


First sequencing techniques were developed in 1977 by Frederic Sanger and Walter
Gilbert. A decade later, in 1987, Applied Biosystems (AB) introduced the first
automatic sequencing machine (AB370), utilizing capillary electrophoresis, which
made the sequencing process much faster and accurate. In 2003, sanger sequencing
technologies became the main tools for the completion of human genome project.
This project and X-PRIZE competition stimulated the development of the next or
second generation of sequencing applications that exhibits massively parallel
analysis, high throughput and reduced cost. Years of evolution yielded three major
sequencing systems: - 1. Roche 458 System – detection of pyrophosphate released
during nucleotide incorporation 2. AB sequencing by Oligo Ligation Detection
(SOLiD), 3. Illumina GA/HiSeq System that is based on Solexas Genome Analyzer
(GA)- Sequencing by synthesis. While NGS became widely popular in basic
research, new improvement in sequencing technologies open era of the third
generation sequencers which have two main characteristics- 1. PCR is not needed
before sequencing which shortens DNA preparation times to several hours, 2. The
signal, the fluorescence (PacBio) or the electric current (nanopore) is captured in
real time and get monitored during the enzymatic reaction. Two methods of third
generation sequencing are single molecular real time(SMRT) and Nanopore
sequencing method.

3) COMPARISON BETWEEN SANGER SEQUENCING AND NEXT-


GENERATION SEQUENCING
The principle behind Next Generation Sequencing (NGS) is similar to that
of Sanger sequencing, which relies on capillary electrophoresis. The genomic
1|Page
strand is fragmented, and the bases in each fragment are identified by emitted
signals when the fragments are ligated against a template strand.
The Sanger method required separate steps for sequencing, separation (by
electrophoresis) and detection, which made it difficult to automate the sample
preparation and it was limited in throughput, scalability and resolution. The NGS
method uses array-based sequencing which combines the techniques developed
in Sanger sequencing to process millions of reactions in parallel, resulting in very
high speed and throughput at a reduced cost. The genome sequencing projects
that took many years with Sanger methods can now be completed in hours with
NGS, although with shorter read lengths (the number of bases that are sequenced
at a time) and less accuracy.

4) METHODES OF NEXT GENERATION SEQUENCING


Next generation methods of DNA sequencing have three general steps-
A. Library preparation: Libraries are created using random fragmentation of
DNA, followed by ligation with custom linkers

B. Amplification: The library is amplified using clonal amplification methods


and PCR

C. Sequencing: DNA is sequenced using one of several different approaches

A. LIBRARY PREPARATION

Firstly, DNA is fragmented either enzymatically or by sonication (excitation using


ultrasound) to create smaller strands. Adaptors (short, double-stranded pieces of
synthetic DNA) are then ligated to these fragments with the help of DNA ligase, an
enzyme that joins DNA strands. The adaptors enable the sequence to become bound
to a complementary counterpart.

Adaptors are synthesized so that one end is 'sticky' whilst the other is 'blunt' (non-
cohesive) with the view to joining the blunt end to the blunt ended DNA. This could
lead to the potential problem of base pairing between molecules and therefore dimer
formation. To prevent this, the chemical structure of DNA is utilized, since ligation
takes place between the 3′-OH and 5′-P ends. By removing the phosphate from the

2|Page
sticky end of the adaptor and therefore creating a 5′-OH end instead, the DNA ligase
is unable to form a bridge between the two termini (Figure 1).

In order for sequencing to be successful, the library fragments need to be spatially


clustered in PCR colonies or 'polonies' as they are conventionally known, which
consist of many copies of a particular library fragment. Since these polonies are
attached in a planar fashion, the features of the array can be manipulated
enzymatically in parallel. This method of library construction is much faster than the
previous labour intensive procedure of colony picking and E. coli cloning used to
isolate and amplify DNA for Sanger sequencing, however, this is at the expense of
read length of the fragments.

Figure 1: - Library preparation of Next-generation sequencing

3|Page
B. AMPLIFICATION
Library amplification is required so that the received signal from the sequencer is
strong enough to be detected accurately. With enzymatic amplification, phenomena
such as 'biasing' and 'duplication' can occur leading to preferential amplification of
certain library fragments. Instead, there are several types of amplification process
which use PCR to create large numbers of DNA clusters. Two kind of PCR is usually
used for amplification- emulsion PCR and bridge PCR.

I. Emulsion PCR

Emulsion oil, beads, PCR mix and the library DNA are mixed to form an emulsion
which leads to the formation of micro wells (Figure 2).

Figure 2: - Emulsion PCR

In order for the sequencing process to be successful, each micro well should contain
one bead with one strand of DNA (approximately 15% of micro wells are of this
composition). The PCR then denatures the library fragment leading two separate
strands, one of which (the reverse strand) anneals to the bead. The annealed DNA is
amplified by polymerase starting from the bead towards the primer site. The original
reverse strand then denatures and is released from the bead only to re-anneal to the
bead to give two separate strands. These are both amplified to give two DNA strands
attached to the bead. The process is then repeated over 30-60 cycles leading to
clusters of DNA.

4|Page
II. Bridge PCR
The surface of the flow cell is densely coated with primers that are complementary
to the primers attached to the DNA library fragments (Figure 3). The DNA is then
attached to the surface of the cell at random where it is exposed to reagents for
polymerase based extension. On addition of nucleotides and enzymes, the free ends
of the single strands of DNA attach themselves to the surface of the cell via
complementary primers, creating bridged structures. Enzymes then interact with the
bridges to make them double stranded, so that when the denaturation occurs, two
single stranded DNA fragments are attached to the surface in close proximity.
Repetition of this process leads to clonal clusters of localized identical strands. In
order to optimize cluster density, concentrations of reagents must be monitored very
closely to avoid overcrowding.

Figure 3: - Bridging PCR

C. SEQUENCING
Several competing methods of Next Generation Sequencing have been developed
by different companies are described below

5|Page
I. 454 PYROSEQUENCING

Pyrosequencing is based on the 'sequencing by synthesis' principle, where a


complementary strand is synthesized in the presence of polymerase enzyme (Figure
4). In contrast to using dideoxynucleotides to terminate chain amplification (as in
Sanger sequencing), pyrosequencing instead detects the release of pyrophosphate
when nucleotides are added to the DNA chain. It initially uses the emulsion PCR
technique to construct the polonies required for sequencing and removes the
complementary strand. Next, a ssDNA sequencing primer hybridizes to the end of
the strand (primer-binding region), then the four different dNTPs are then
sequentially made to flow in and out of the wells over the polonies. When the correct
dNTP is enzymatically incorporated into the strand, it causes release of
pyrophosphate. In the presence of ATP sulfurylase and adenosine, the
pyrophosphate is converted into ATP. This ATP molecule is used for luciferase-
catalysed conversion of luciferin to oxyluciferin, which produces light that can be
detected with a camera. The relative intensity of light is proportional to the amount
of base added (i.e. a peak of twice the intensity indicates two identical bases have
been added in succession).

6|Page
Figure: - 454 Pyrosequencing

II. ION TORRENT SEMICONDUCTOR SEQUENCING


Ion torrent sequencing uses a "sequencing by synthesis" approach, in which a new
DNA strand, complementary to the target strand, is synthesized one base at a time.
A semiconductor chip detects the hydrogen ions produced during DNA
polymerization (Figure 5).

Following polony formation using emulsion PCR, the DNA library fragment is
flooded sequentially with each nucleoside triphosphate (dNTP), as in
pyrosequencing. The dNTP is then incorporated into the new strand if
complementary to the nucleotide on the target strand. Each time a nucleotide is
successfully added, a hydrogen ion is released, and it detected by the sequencer's pH
sensor. As in the pyrosequencing method, if more than one of the same nucleotide
is added, the change in pH/signal intensity is correspondingly larger.

7|Page
Figure 5: - Ion torrent semiconductor sequencing
Ion torrent sequencing is the first commercial technique not to use fluorescence and
camera scanning; it is therefore faster and cheaper than many of the other methods.
Unfortunately, it can be difficult to enumerate the number of identical bases added
consecutively. For example, it may be difficult to differentiate the pH change for a
homo repeat of length 9 to one of length 10, making it difficult to decode repetitive
sequences.

III. SEQUENCING BY LIGATION (SOLiD)


SOLiD is an enzymatic method of sequencing that uses DNA ligase, an enzyme used
widely in biotechnology for its ability to ligate double-stranded DNA strands (Figure
6). Emulsion PCR is used to immobilize/amplify a ssDNA primer-binding region
(known as an adapter) which has been conjugated to the target sequence (i.e. the
sequence that is to be sequenced) on a bead. These beads are then deposited onto a
glass surface − a high density of beads can be achieved which in turn, increases the
throughput of the technique.

8|Page
9|Page
Figure 6: - Sequencing by ligation

Once bead deposition has occurred, a primer of length N is hybridized to the


adapter, then the beads are exposed to a library of 8-mer probes which have different
fluorescent dye at the 5' end and a hydroxyl group at the 3' end. Bases 1 and 2 are
complementary to the nucleotides to be sequenced whilst bases 3-5 are degenerate
and bases 6-8 are inosine bases. Only a complementary probe will hybridize to the
target sequence, adjacent to the primer. DNA ligase is then uses to join the 8-mer
probe to the primer. A phosphorothioate linkage between bases 5 and 6 allows the
fluorescent dye to be cleaved from the fragment using silver ions. This cleavage
allows fluorescence to be measured (four different fluorescent dyes are used, all of
which have different emission spectra) and also generates a 5’-phosphate group

10 | P a g e
which can undergo further ligation. Once the first round of sequencing is completed,
the extension product is melted off and then a second round of sequencing is
perfomed with a primer of length N−1. Many rounds of sequencing using shorter
primers each time (i.e. N−2, N−3 etc) and measuring the fluorescence ensures that
the target is sequenced.

IV. REVERSIBLE TERMINATOR SEQUENCING (ILLUMINA)


Reversible terminator sequencing differs from the traditional Sanger method in that,
instead of terminating the primer extension irreversibly using dideoxynucleotide,
modified nucleotides are used in reversible termination. Whilst many other
techniques use emulsion PCR to amplify the DNA library fragments, reversible
termination uses bridge PCR, improving the efficiency of this stage of the process.

The sequencing reaction is conducted simultaneously on a very large number (many


millions in fact) of different template molecules spread out on a solid surface. The
terminator contains a fluorescent label, which can be detected by a camera. Only a
single fluorescent color is used, so each of the four bases must be added in a separate
cycle of DNA synthesis and imaging. Following the addition of the four dNTPs to
the templates, the images are recorded and the terminators are removed. This
chemistry is called “reversible terminators”. Finally, another four cycles of dNTP
additions are initiated. Since single bases are added to all templates in a uniform
fashion, the sequencing process produces a set of DNA sequence reads of uniform
length.

11 | P a g e
Fig 6: - Workflow of Illumina NGS

12 | P a g e
5) APPLICATIONS OF NEXT GENERATION SEQUENCING
NGS technologies have already been used for various applications, ranging from
whole genome sequencing, resequencing, single nucleotide polymorphism (SNP),
structural variation discovery, mRNA and noncoding RNA profiling, and protein-
nucleic acid interaction assays. NGS technologies are becoming a potential tool for
gene expression analysis, especially for those species having reference genome
sequences already available. An overview of NGS applications are discussed here.

a) Sequencing of whole exomes

The protein coding regions of all genes (i.e., ‘whole exome’) of the human genome
constitute only approximately 1–2% of the entire human genome but contain 85%
of all DNA mutations that have large effects on human disease. There may be many
more mutations that are in regulatory regions, but they have not been studied as
much as those in the coding regions. Therefore, combining NGS technologies with
DNA fragment capture approaches for selectively sequencing the complete protein
coding regions not only reduces sequencing cost, but is also an efficient way to
discover most mutations that underlie rare and common human diseases. This could
be a scenario where we may actually see a true molecular diagnostic assay. Through
this approach, we could build an understanding of the sensitivity and specificity for
mutation detection.

b) Single Nucleotide Polymorphisms (SNP) discovery

SNPs are important genomic resources which can be used in a variety of analyses
including physical characteristics like height and appearance as well as less obvious
traits such as personality, behavior, and disease susceptibility. Sequence data
generated for parental genotypes of the mapping population by using NGS
technologies Where NGS can be used for mining the SNPs at large scale.

c) Messenger RNA and Noncoding RNA Profiling

Apart from SNP discovery, expressed regions of a genome can be detected using
NGS technologies. The next generation sequencing platforms are capable of
identifying expression levels of nearly all genes, including those rare and species

13 | P a g e
specific transcripts. A similar approach can be applied to large genomes. RNA-seq
data can be used to characterize exon-exon splicing events namely cases of
alternative splicing.
Noncoding RNA, like microRNA (miRNA), is a broad class of regulatory RNA
molecules. The NGS technologies are useful for discovery of noncoding RNA for
their short lengths. Most studies to date have used 454 technologies, because of its
early availability to discover new and different noncoding RNA classes in several
species like Chlamydomonas, Drosophila, Arabidopsis, and so forth.

d) De Novo Sequencing and Resequencing

Metagenomics is defined as the application of genomics techniques to directly study


the communities of microbial organisms without isolation and cultivation of
individual species. It involves the characterization of the genomes in these
communities, as well as their mRNA, protein, and metabolic products. The next
generation sequencing technologies have enabled to move metagenomics from a
single organism type in isolation to the studies of whole communities. NGS enables
the researchers to avoid the cloning formation and culture steps which are the major
drawbacks of genomics. NGS strategies are straightforward in the following: (1)
deep sequencing of DNA fragments is conducted on an uncultured sample and (2)
short reads are compared against database of known sequences using bioinformatics
tool like MEGAN [41] with or without assembly, and (3) these data are then used to
compute and explore their contents to infer relative abundances. Therefore, NGS
technologies can be a potent tool for discovery of microorganisms and pathogens.

e) Regulatory Protein Binding

At low throughput, chromatin immunoprecipitation (ChIP) is enabled for regulatory


DNA-protein binding interactions. It is a lengthy process like association of specific
antibody to DNA-binding protein, followed by another protein-DNA cross linking
agent, so that any protein in close association with DNA becomes linked. Then, the
cells are lysed, DNA is fragmented, and the specific antibody is used to precipitate
the protein of interest along with any associated DNA fragment. These DNA pieces
are subsequently released by reversing the cross linking and identified by southern
blotting or qPCR. A probe is used to infer the DNA-binding site sequence of the
protein under study. So, NGS helps in identifying the regulatory protein binding site
for having shorter reads and specific binding site and it also provides higher
resolution.

14 | P a g e
f) Exploring Chromatin Packaging

Chromatin packaging denotes the packing of DNA in histones. This packaging


determines the transcription of a particular gene. Understanding of this DNA
packing, that is, chromatin packaging, is of a great interest. An initial 454-based
study of genomic DNA packaging into nucleosomes was described for the C.
elegans, by sequencing the DNA isolated from nucleosome cores after micrococcal
nuclease digestion and mapping them to the reference genome sequence. Mikkelsen
et al. used the Illumina platform to demonstrate the connection between chromatin
packaging and gene expression in several different cell types. Mikkelsen et al. found
that changes in chromatin state at specific promoters reflect changes in gene
expression for the genes they control. A better understanding of the chromatin
packaging will provide new strategies for the development of novel drugs.
g) Gene therapy
A large focus area in gene therapy is cancer treatment – one potential method would
be to introduce an antisense RNA (which specifically prevents the synthesis of a
targeted protein) to the oncogene, which is triggered to form tumorous cells. Another
method is named ‘suicide gene therapy’ which introduces genes to kill cancer cells
selectively. Many genetic codes for toxic proteins and enzymes are known, and
introduction of these genes into tumor cells would result in cell death. The difficulty
in this method is to ensure a very precise delivery system to prevent killing healthy
cells.
These methods are made possible by sequencing to analyze tumor genomes,
allowing medical experts to tailor chemotherapy and other cancer treatments more
effectively to their patients’ unique genetic composition, revolutionizing the
diagnostic stages of personalized medicine.

NGS also have applications in the discovery of new genomic biomarkers, detection
of mutations, pharmacogenetics, target identification and validation, clinical
diagnostics, vaccine development, investigating drug resistance and many others.

15 | P a g e
6) LIMITATIONS OF NGS TECHNOLOGY
The diversity and rapid evolution of NGS technology causes many challenges
associated with data generation, data manipulation and data storage. Some of the
major issues with analysis, interpretation, reproducibility and accessibility of NGS
data includes: -
(A) NGS is still too expensive to be accessible by small labs or an individual.
(B) Data analysis is time-consuming and needs sufficient knowledge of
bioinformatics.
(C) The short sequencing read lengths supported by NGS is one of the major
shortcomings which limit its application, especially in de novo and highly repetitive
regions sequencing.
(D) Data processing steps or bioinformatics is one major bottleneck for the
implementation of NGS.
(E) Routine analysis of NGS data requires multidisciplinary teams.
(F) It is critical to standardize the quality metrics for the NGS data generated. These
include validation and comparison among platforms, data reliability, robustness and
reproducibility, and quality of assemblers.
(G) It is crucial to have a complete knowledge of family and personal history of the
patient to help define the ideal analysis method, the analysis of the results obtained,
and the post-test counselling and management.

16 | P a g e
7) REFERENCES

A. BOOK

 High-Throughput Next Generation Sequencing, Methods and


Applications, Edited by-

 Young Min Kwon, Steven C. Ricke

B. RESEARCH PAPER
 Next-Generation DNA Sequencing Platforms. Annual Review 6, 287-303
(2013), By-

 Mardis

 Next-Generation DNA Sequencing. Nat. Biotech. 26, 10 1135-1144 (2008),


By-

 J. Shendure and H. Ji

 Application of next-generation sequencing in the era of precision


medicine, By-

 Michele Araujo pereira, Frederico scott varella malta, Maira cristina menezes
freire and Patrícia gonçalves pereira couto

 Common applications of next-generation sequencing technologies in


genomic research

 Chien-Yueh Lee, Yu-Chiao Chiu, Liang-Bo Wang, Yu-Lun Kuo, Eric Y.


Chuang, Liang-Chuan Lai, Mong-Hsun Tsai

 Next Generation Sequencing: Potential and Application in Drug


Discovery

 Navneet Kumar Yadav, Pooja Shukla, Ankur Omer, Shruti Pareek, and R. K.
Singh

17 | P a g e
 Next-generation sequencing and its applications in molecular diagnostics,
By-

 Zhenqiang Su, Baitang Ning, Hong Fang, Huixiao Hong, Roger Perkins,
Weida Tong and Leming Shi

C. INTERNET SOURCES

 Wikipedia
 https://sites.psu.edu/stemcellhershey/2017/03/07/the-history-of-next-
generation-sequencing-ngs/
 https://www.ebi.ac.uk/training/online/course/ebi-next-generation-
sequencing-practical-course/what-next-generation-dna-sequencing/illumina-
 https://bitesizebio.com/13546/sequencing-by-synthesis-explaining-the-
illumina-sequencing-technology/

8) QUESTIONS BASED ON THE TOPIC

1. What is the basic principle of NGS? And Why is Next Generation Sequencing
better than Sanger sequencing?
2. How is NGS important for human biology and medicine?
3. How reversible terminator sequencing differ from sanger sequencing?
4. Which one is the best sequencing approach of NGS in regard high data output?
Why?
5. What are the differences between DNA sequencing and RNA sequencing?
----------------------------------------------×---------------------------------------------------

18 | P a g e

You might also like