CPNC 55

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

RNA-Seq for Bacterial Gene Expression

Line Dahl Poulsen1 and Jeppe Vinther1,2


1
Department of Biology, University of Copenhagen, Copenhagen, Denmark
2
Corresponding author: [email protected]

RNA sequencing (RNA-seq) has become the preferred method for global quan-
tification of bacterial gene expression. With the continued improvements in
sequencing technology and data analysis tools, the most labor-intensive and
expensive part of an RNA-seq experiment is the preparation of sequencing
libraries, which is also essential for the quality of the data obtained. Here,
we present a straightforward and inexpensive basic protocol for preparation of
strand-specific RNA-seq libraries from bacterial RNA as well as a computa-
tional pipeline for the data analysis of sequencing reads. The protocol is based
on the Illumina platform and allows easy multiplexing of samples and the re-
moval of sequencing reads that are PCR duplicates.  C 2018 by John Wiley &

Sons, Inc.
Keywords: library construction r next-generation sequencing r RNA-seq

How to cite this article:


Poulsen, L.D., & Vinther, J. (2018). RNA-Seq for bacterial gene
expression. Current Protocols in Nucleic Acid Chemistry, e55. doi:
10.1002/cpnc.55

INTRODUCTION
This unit describes the experimental steps required for preparation of RNA-seq libraries
from total RNA isolated from bacteria. The method is based on a protocol for detection
of reverse transcriptase termination sites (Kielpinski, Boyd, Sandelin, & Vinther, 2013),
which we here have adapted for RNA-seq. As an example, we applied the method to RNA
isolated from Bacillus subtilis; however, the method is general and can be used to generate
RNA-seq libraries from a wide variety of bacteria. Depletion of ribosomal RNA from the
RNA sample can be achieved either with the Ribo-zero R
rRNA Removal Kit (Basic Pro-

tocol) or by selective degradation with the 5 -monophosphate-dependent TerminatorTM
exonuclease (Alternate Protocol). While we provide an RNA-seq dataset and some gen-
eral recommendations for data analysis (see Internet Resources), more detailed pipelines
have been described previously (Kielpinski, Sidiropoulos, & Vinther, 2015).

STRATEGIC PLANNING
The experiments should be carefully planned prior to sample collection. It is recom-
mended to include at least three biological replicates for the statistical analysis and also
to include a technical replicate to allow estimation of the reproducibility of the protocol.
Another important parameter to consider before performing a sequencing experiment is
the sequence depth (or library size). The higher the number of reads in each sample, the
more RNAs will be detected and quantified. For a typical bacterial transcriptome such as
Bacillus subtilis, one to two million reads per replicate will be sufficient to achieve good
coverage on the majority of expressed mRNAs when using input RNA depleted for ribo-
somal RNA with the Ribo-zero R
rRNA Removal Kit or a similar kit (Basic Protocol). In
some cases, it can be cost effective to use the 5 -monophosphate-dependent TerminatorTM
exonuclease (Alternate Protocol) instead of the Ribo-zero R
rRNA Removal Kit and then
Poulsen and
Vinther

Current Protocols in Nucleic Acid Chemistry e55 1 of 12


Published in Wiley Online Library (wileyonlinelibrary.com).
doi: 10.1002/cpnc.55
Copyright C 2018 John Wiley & Sons, Inc.
increase sequence depth to obtain sufficient coverage on mRNAs. For Bacillus subtilis,
which have 90% ribosomal RNA (rRNA) content, treatment with the exonuclease will
result in a reduction of rRNA transcripts to 60% and around four to five million reads
will be enough for this approach. If the goal of the experiment is to examine ribosomal
RNAs, then far fewer reads are necessary.

BASIC LIBRARY PREPARATION FROM TOTAL RNA


PROTOCOL
This protocol describes bacterial RNA sequencing from total RNA. The protocol for
library preparation includes the following steps (Fig. 1): First, the ribosomal RNA (rRNA)

Figure 1 Schematic representation of the RNA-seq workflow. The starting material is total RNA
from bacteria. The rRNA is depleted from this sample and RNA is then fragmented. Following
purification, RNA is reverse transcribed to cDNA using a random primer containing a 5 adapter
Poulsen and overhang. After ligation of an adaptor to the 3 -end, the cDNA serves as a template for PCR
Vinther amplification adding Illumina overhangs, which are necessary for sequencing.
2 of 12
Current Protocols in Nucleic Acid Chemistry
Table 1 Oligonucleotides Used for Library Preparationa

Name Sequence
RT_random_primer AGACGTGTGCTCTTCCGATCTNNNNNNNNNNNNNNN

3 -ligation_Adapter PHO-AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT-
3NHC3
PCR_FW PHO-NNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAA
GAGTGT-3NHC3
PCR_RV_17 CAAGCAGAAGACGGCATACGAGATCTCTACGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT
PCR_RV_18 CAAGCAGAAGACGGCATACGAGATGCGGACGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT
PCR_RV_19 CAAGCAGAAGACGGCATACGAGATTTTCACGTGACTGGA
GTTCAGACGTGTGCTCTTCCGATCT
a The oligonucleotide sequences of the Illumina genomic DNA adapters are copyrighted by Illumina, 2006. All rights
reserved. Index sequences for multiplexing are underlined. A table of additional PCR index primers can be found in
Kielpinski et al., 2013.

is depleted using the Ribo-zero R


rRNA Removal Kit (bacteria), which is based on
selective binding of biotinylated probes to rRNAs using a hybridization/bead capture
procedure. The remaining mRNA is then fragmented and reverse transcribed using a
random primer with an Illumina adapter overhang, and an adapter is ligated to the
3 end of the cDNA. The cDNA is PCR amplified using an indexed primer to allow
multiplexing. Finally, the DNA libraries are quantified and pooled, and 200 to 600 base
pair (bp) fragments are selected for sequencing. The libraries are strand specific and
compatible with the different Illumina sequencing platforms, including miSeq, HighSeq,
and NextSeq and can be distinguished by their unique single 6 base index in the P7 region
of the adapter.
Materials
RNA of interest (1 to 5 µg total RNA from bacteria; see Critical Parameters)
Ribo-zeroR
rRNA Removal Kit (bacteria; e.g., Illumina)
Glycogen (20 mg/mL)
Sodium acetate, pH 5.2 (3 M)
70%, 85%, and 100% ethanol
RNA Pico kit (Agilent)
RNase-free water
2× fragmentation buffer (see recipe)
EDTA (50 mM)
Agencourt R
RNAClean R
XP (Beckman Coulter)
Reverse transcription primer (RT_random_primer, Table 1)
dNTPs (10 mM)
PrimeScriptTM reverse transcriptase (Takara)
5× PrimeScript buffer (Takara)
Agencourt R
AMPure R
XP (Beckman Coulter)
TM
CircLigase 10× reaction buffer
Ligation adapter (3 -ligation_adapter, Table 1)
ATP (1 mM)
MnCl2 (50 mM)
50% PEG 6000
CircLigaseTM ssDNA ligase (100 U/µL; Epicentre) Poulsen and
Vinther
PCR forward primer (10 µM; PCR_FW, Table 1)
3 of 12
Current Protocols in Nucleic Acid Chemistry
PCR reverse primer (10 µM; PCR_RV_index#, Table 1)
PhusionR
High-Fidelity DNA polymerase (2000 U/mL; New England BioLabs)
5× Phusion HF buffer (New England BioLabs)
E-gelTM SizeSelectTM II agarose gel, 2% (or similar gel for separation of DNA
fragments)
High Sensitivity DNA kit (Agilent)
Nanodrop ND-1000 spectrophotometer (or similar equipment)
Agilent 2100 Bioanalyzer
Thermocycler for PCR tubes
Magnetic stand
SavantTM DNA SpeedVac R
DNA120 (optional)
R TM
E-Gel iBase Power System (or similar system for size selection of DNA
fragments)
2100 Bioanalyzer Expert software
rRNA depletion with ribozero kit
1. Deplete rRNA with Ribo-zero R
rRNA Removal Kit (bacteria). Use 1 to 5 µg total
RNA as input and follow manufacturer’s instructions.
2. Clean up rRNA-depleted RNA by precipitation. Add the following:

RNase-free water to 200 µL,


20 µL 3 M sodium acetate (pH 5.2),
1 µL 20 mg/mL glycogen,
600 µL 100 % ethanol.
3. Vortex and incubate at −20°C at least 1 hr.
4. Centrifuge at 12,000 × g, 30 min at 4 °C.
5. Discard supernatant and add 1 mL 70% ethanol to wash pellet.
6. Centrifuge at 12,000 × g, 30 min at 4 °C.
7. Discard supernatant and centrifuge briefly to collect and remove residual ethanol.
8. Resuspend pellet in 12 µL RNase-free water.
9. Measure RNA concentration on a fluorometer or a spectrophotometer to assess yield.
10. To evaluate RNA quality, run 1 µL Ribo-zero-treated RNA (with concentration
5 ng/µL) on an Agilent 2100 Bioanalyzer using an RNA 6000 Pico Chip (Fig. 2).
RNA fragmentation
11. On ice, add 5 µL 2× fragmentation buffer to 5 µL RNA sample.
12. Incubate at 95°C, 120 sec (see Critical Parameters and Troubleshooting).
13. Place samples back on ice immediately after incubation and add 2.5 µL 50 mM
EDTA.
14. Add 18 µL Agencourt R
RNAClean R
XP beads to samples and incubate at room
temperature 10 min, mixing after 5 min by vortexing.
15. Set tube on magnetic stand 5 min and discard supernatant.
16. Keep sample on magnetic stand and wash beads with ethanol by adding 150 µL
85% EtOH. Wait 30 sec and then remove supernatant. Repeat this washing step
Poulsen and once more.
Vinther

4 of 12
Current Protocols in Nucleic Acid Chemistry
Figure 2 Bioanalyzer electropherogram of RNA molecules. (A) Total RNA which has an RIN value of 9.6.
(B) RNA sample following rRNA depletion with the Ribo-zero
R
rRNA Removal Kit (bacteria). (C) RNA sample

following rRNA depletion with TerminatorTM 5 -phosphate-dependent exonuclease (TEX; Alternate Protocol).
(D) An example of an RNA sample after fragmentation (shown here is fragmented total RNA). RIN, RNA
integrity number.

17. Add 10 µL water to beads and pipet at least 60 times to elute RNA.
18. Set tube on magnetic stand 5 min to separate beads. Collect supernatant and keep it
on ice.
First-strand synthesis
19. In a PCR tube, mix 9.5 µL RNA and 0.5 µL 10 µM reverse transcription primer
(RT_random_primer).
20. Incubate at 65°C, 5 min and cool on ice.
21. Prepare master mix. For one reaction, add:

4 µL 5× PrimeScript buffer,
1 µL 10 mM dNTPs,
4 µL water,
1 µL PrimeScript reverse transcriptase.
22. Mix gently and spin down.
23. Add 10 µL master mix to RNA-reverse transcription primer solution, mix gently,
and spin down.
24. Incubate in a thermal cycler as follows: 30°C, 10 min; 42°C, 60 min; 70°C, 15 min;
and keep at 4°C until next step.
cDNA purification and concentration
25. Add 36 µL Agencourt R
AMPure R
XP beads to the samples and incubate at room Poulsen and
Vinther
temperature 20 min, mixing every 5 min by vortexing.
5 of 12
Current Protocols in Nucleic Acid Chemistry
26. Set tube on magnetic stand 5 min and discard supernatant.
27. Keep sample on magnetic stand and wash beads with ethanol by adding 150 µL
75% EtOH. Wait 30 sec and then remove supernatant. Repeat this washing step
once more.
28. Add 20 µL water to beads and pipet at least 60 times to elute cDNA.
29. Set tube on magnetic stand 5 min to separate beads. Collect supernatant.
30. Use SpeedVac to decrease sample volume to 4 µL. (Alternatively, the DNA can be
ethanol-precipitated and resuspended in 4 µL water).
Adapter ligation and purification
31. Prepare master mix. For one reaction, add:

1 µL CircLigaseTM 10× reaction buffer,


0.5 µL 100 µM adapter (3 -ligation_adapter),
0.5 µL 1 mM ATP,
0.5 µL 50 mM MnCl2 ,
2 µL 50% PEG 6000,
2 µL water,
0.5 µL 100 U/µL CircLigaseTM ssDNA ligase.
32. Mix 7 µL master mix with 3 µL cDNA.
33. Incubate 2 hr at 60°C, 1 hr at 68°C, 10 min at 80°C; and keep at 4°C until next step.
34. Purify with AgencourtR
AMPureR
XP beads as in step 25 through 29 but this time
add 18 µL beads to the 10-µL cDNA solution and elute in 10 µL water.
Library amplification and indexing
35. Prepare master mix on ice without the reverse PCR primer. For one reaction, add:

2.5 µL 10 µM PCR forward primer (PCR_FW, Table 1),


10 µL 5× HF buffer,
1 µL 10 mM dNTPs,
28 µL water,
1 µL 2000 U/mL Phusion
R
High-Fidelity DNA polymerase.
36. Mix and split master mix into 42.5 µL aliquots in PCR tubes.
37. Add 5 µL cDNA template and 2.5 µL PCR reverse primer (10 µM; PCR_RV_index#,
Table 1; use different index primers for multiplexing).
38. Mix gently and spin down.
39. Transfer PCR tubes to a PCR machine with the block preheated to 98°C and start
thermocycling immediately:

98°C, 3 min;
98°C, 80 sec; 68°C, 30 sec; 72°C, 30 sec: twelve times;
72°C, 5 min;
Hold at 4°C.
40. Purify with AgencourtR
AMPure R
XP beads as in step 25 to 29 but this time add
90 µL beads to the 50-µL PCR reaction and elute in 50 µL water.

Poulsen and
41. Run 1 µL purified PCR product on an Agilent 2100 Bioanalyzer using a DNA High
Vinther Sensitivity chip (Fig. 3A).

6 of 12
Current Protocols in Nucleic Acid Chemistry
Figure 3 DNA libraries. (A) Example of a bioanalyzer electropherogram of a library before size selection. The
peak around 145 bp (black arrow) represents ligation of the 3 -ligation adapter to the random RT primer followed
by PCR amplification but without insert. (B) Bioanalyzer electropherogram of a library after size selection.

Quantification and size selection


42. For multiplexing: After running PCR product on an Agilent 2100 Bioanalyzer using
a DNA High Sensitivity chip, perform smear analysis to quantify DNA. Use 2100
Expert software and choose: Global (in the side panel) > advanced > regions
(under smear analysis) > set regions to 200 to 600 bp. Choose individual traces
to estimate molarity in the given region in the ‘region table’ below the trace. Use
these estimates to mix DNA samples in the molar ratios relevant for your experiment
(typically equimolar ratios for obtaining approximately the same number of reads
for each sample).
43. After mixing PCR products, use an E-gelTM SizeSelectTM II agarose gel (or a similar
system) to select the 200 to 600 bp region.
Alternatively, size selection can be performed by gel electrophoresis followed by excision
of the desired bands.

44. Use a SpeedVac to decrease sample volume to 40 µL.


Alternatively, DNA can be ethanol-precipitated and resuspended in 40 µL water.

45. Purify with AgencourtR


AMPure R
XP beads as in step 25 to 29 but this time add
72 µL beads to the 40-µL collected sample and elute in 50 µL water.
46. Run 1 µL purified PCR product on an Agilent 2100 Bioanalyzer using a DNA High
Sensitivity chip or a DNA1000 chip, to validate successful size selection of the
library (Fig. 3B).
After ensuring correct size selection, the libraries are ready for sequencing on an Illumina
sequencing platform.

DEPLETION OF RIBOSOMAL RNA WITH TERMINATOR ALTERNATE


5 -PHOSPHATE-DEPENDENT EXONUCLEASE PROTOCOL
This alternate protocol uses TerminatorTM 5 -phosphate-dependent exonuclease (TEX)
to enrich for bacterial mRNAs, an approach that is fundamentally different from the
Ribo-Zero method, which uses probes that hybridize to ribosomal RNAs. Instead, TEX
selectively degrades RNAs with a 5 -monophosphate. In bacteria, the genes encod-
ing 23S rRNA, 16S rRNA, and 5S rRNA are typically co-transcribed and then pro- Poulsen and
Vinther
cessed post-transcriptionally, resulting in 23S rRNA and 16S rRNA transcripts with
7 of 12
Current Protocols in Nucleic Acid Chemistry
5 -monophosphates that are accessible for TEX degradation. It is important to note
that 5S rRNA and tRNAs are not degraded by TEX, and if these are not removed
by additional purification steps, they will contribute to a substantial fraction of the
sequencing reads. Furthermore, some RNAs are protected from TEX digestion by sec-
ondary structures that protect their 5 -monophosphates against the exonuclease. It is
essential that the input RNA is of high quality when using TEX to enrich for mR-
NAs, as rRNA degradation products with 5 -OH groups are also not degraded by
TEX.
Additional Materials (also see Basic Protocol)
RNA of interest (2 µg total RNA from bacteria; see Critical Parameters)
RNase-free water
Terminator 10× Reaction Buffer A (Epicentre)
TerminatorTM 5 -phosphate-dependent exonuclease (1 U/µL; Epicentre)
RiboGuardTM RNase Inhibitor
EDTA (100 mM, pH 8.0)
AgencourtR
RNAClean R
XP (Beckman Coulter)
85% ethanol (EtOH)

ThermoMixer
Magnetic stand
rRNA depletion with Terminator 5 -phosphate-dependent exonuclease
1. Adjust volume RNA to 15.5 µL with RNase-free water.
2. On ice, add the following:

2 µL Terminator 10× Reaction Buffer A


0.5 µL RiboGuard RNase Inhibitor
2 µL Terminator 5 -phosphate-dependent exonuclease (1 U/µL).
3. Mix gently and spin down.
4. Incubate 30°C, 60 min in a ThermoMixer.
5. Add 1 μL 100 mM EDTA (pH 8.0) to terminate reaction.
RNA purification
6. Add 50 µL Agencourt R
RNAClean R
XP beads to samples and incubate at room
temperature 10 min, mixing after 5 min by vortexing.
7. Set tube on magnetic stand 5 min and discard supernatant.
8. Keep sample on magnetic stand and wash beads with ethanol by adding 150 µL 85%
EtOH. Wait 30 sec and then remove supernatant. Repeat this washing step once more.
9. Add 20 µL water to the beads and pipet at least 60 times to elute cDNA.
10. Set tube on magnetic stand 5 min to separate beads. Collect supernatant and keep it
on ice.
11. Measure RNA concentration on a fluorometer or a spectrophotometer to assess yield.
12. To evaluate RNA, run 1 µL TEX-treated RNA (with a concentration 5 ng/µL) on
an Agilent 2100 Bioanalyzer using an RNA 6000 Pico Chip (Fig. 2).
The RNA is now ready for library preparation (from step 11 of Basic Protocol).
Poulsen and
Vinther

8 of 12
Current Protocols in Nucleic Acid Chemistry
REAGENTS AND SOLUTIONS
Fragmentation buffer, 2×
10 mL 1 M Tris·Cl, pH 8.0
1 mL 1 M MgCl2
Bring to 100 mL with RNase-free water
Autoclave and store at room temperature for up to 1 year
COMMENTARY
Background Information ligation (Zhang, Theurkauf, Weng, & Zamore,
Methods for quantification of gene expres- 2012). Also, the SMART-seq strategy takes
sion have been central for our understanding of advantage of the template switching mecha-
cellular properties and how cells interact with nism of reverse transcriptase (RT) for adding
and adapt to their environment. Over the years, a 3 adapter prior to PCR amplification (Zhu,
the methodologies have improved, thereby al- Machleder, Chenchik, Li, & Siebert, 2001).
lowing gene expression of many thousands of Here, we present a protocol that is based
genes to be analyzed in one experiment. In the on the ligation of an adapter to the 3 end
1990s, microarrays allowed high throughput of cDNAs produced by reverse transcription
identification of candidate transcripts, thereby of an RT-primer carrying an Illumina adapter
revolutionizing the approach to study gene ex- sequence overhang (Li & Weeks, 2006).
pression (Schena, Shalon, Davis, & Brown, We previously used this strategy for prepar-
1995), but recently with the invention of mas- ing libraries for RNA probing experiments
sive parallel sequencing, RNA-seq has become (Kielpinski et al., 2013; Poulsen, Kielpinski,
the method of choice to study gene expression. Salama, Krogh, & Vinther, 2015), but it also
Compared with DNA arrays, RNA-seq has an works nicely for RNA-seq that is based on
improved dynamic range for quantification of sequencing of fragmented purified mRNAs.
gene expression levels and allows the identifi- Our strategy resembles the Ligation Medi-
cation of novel RNA species (Wang, Gerstein, ated RNA sequencing protocol developed by
& Snyder, 2009). In bacteria, both RNA-seq Thomson and co-workers for eukaryotic RNA-
and more specialized protocols based on se- seq (Hou et al., 2015) and has many of the same
quencing have revolutionized the understand- advantages, including low costs and time con-
ing of bacterial transcriptomes (Croucher & sumption. In addition, our protocol includes
Thomson, 2010; Sharma & Vogel, 2014). the possibility to recognize PCR duplicates
The library preparation is essential for a through the use of a barcode in the adapter
successful RNA-seq experiment and many dif- ligated to the 3 end of the cDNA, which is
ferent commercial kits are available. However, especially valuable for samples with limited
these kits remain relatively expensive and in input material.
some cases contain secret reagents or compo-
sitions, which create the need for alternative
RNA-seq protocols that are robust and, in ad- Critical Parameters and
dition, time and cost effective. The key point Troubleshooting
for construction of a sequencing library, which The integrity of the input RNA is critical for
can be subjected to massive parallel sequenc- obtaining high quality RNA-seq data. While
ing on the Illumina platform, is the require- RNA isolation is not covered in this proto-
ment for addition of adapters to the ends of col, numerous methods are available in the lit-
DNA that are sequenced. Over the years, dif- erature and finding the most appropriate for
ferent strategies have been developed for this. a given bacteria species is essential for per-
In the initial RNA-seq experiments, adapters forming a successful experiment. Regardless
were attached to double-stranded cDNAs, and of which method is chosen, the RNA qual-
this resulted in a loss of directionality (Na- ity should be measured before RNA-seq. The
galakshmi et al., 2008). Later, several strate- RNA quality can be measured with an Agilent
gies were developed to preserve strand speci- Bioanalyzer which will produce an RNA In-
ficity, including attaching adapter sequences to tegrity number (RIN). The RIN number is be-
RNA molecules prior to reverse transcription tween 1 and 10, where 10 corresponds to the
(Mamanova et al., 2010) and incorporation of highest quality with no RNA degradation. Low
dUTP instead of dTTP in the second strand RIN values may result in incorrect biological
conclusions and we recommend using RNA Poulsen and
cDNA synthesis to allow selective degrada- Vinther
tion of the second strand following adapter that has a RIN value of at least 7. Alternatively,
9 of 12
Current Protocols in Nucleic Acid Chemistry
the RNA can be run on an agarose gel to infer lyzer DNA chip. To accommodate potential
RNA integrity, but this method is less sensitive. problems with PCR duplicates, our protocol
Since prokaryotic RNAs generally are introduces a random barcode in the 3 cDNA
non-polyadenylated, a major issue in library ligation step prior to PCR amplification. Ob-
preparation is depletion of ribosomal RNAs, serving multiple reads with identical barcodes
which are the most abundant RNA species on the same DNA fragment indicate that these
in the cells. Different approaches have been are PCR duplicates and should be collapsed to
used to solve this challenge, for instance one read.
mRNA polyadenylation followed by binding
to poly(dT) beads (Amara & Vijaya, 1997) Anticipated Results
or capture of rRNAs with sequence-specific RNA-seq results in raw sequencing reads
biotinylated probes. Here, we have used the that must be processed before biological
Ribo-Zero rRNA Removal Kit (bacteria; Ba- interpretation. The analysis of an RNA-seq
sic Protocol), which resulted in almost com- experiment has many variations and the
plete depletion of rRNA however, several other exact pipeline depends on the aims of a
kits are commercially available. An alternative given experiment. The main focus of this
is treatment with TerminatorTM 5 -phosphate- unit is preparation of high quality RNA-seq
dependent exonuclease (TEX; Alternate Pro- libraries from bacteria; however, we will
tocol), which selectively degrades RNAs con- briefly describe the framework for analyzing
taining 5 -monophosphates such as rRNAs. the obtained data and the expected results.
The enrichment for mRNAs is not as effi- We provide a dataset that has been generated
cient, as degradation can be blocked by sec- with the protocol and command line scripts
ondary structures in the substrate RNA. Also, for preprocessing, mapping, and barcode
it is important to remember that treatment with collapsing (see Internet Resources).
TEX results in enrichment of primary tran- The first step in any data analysis pipeline
scripts, as processed transcripts often contain is quality control, and this can be done with
5 -monophosphates. FASTQC (Andrews, 2010) or similar pro-
The presence of a band with a length corre- grams. Examining plots from the quality con-
sponding to around 145 bp indicates a problem trol is important for detecting and subse-
in library preparation. This size is equal to di- quently dealing with potential problems in
rect ligation of the 3 -ligation adapter to the the libraries. Following satisfying quality con-
random RT primer followed by PCR amplifi- trol, the adapter sequence should be removed
cation, but without insert (Fig. 3A). Observing from the reads. One of the tools developed
this band confirms successful 3 -adapter liga- for this purpose is Cutadapt (Martin, 2011).
tion and PCR amplification but also indicates Here, default settings can be used, with the
either too low input amount or a high degree of quality cutoff (-q option) set to 20. For se-
fragmentation of the input RNA. If fragments quences obtained for Illuminas’ two-dye sys-
of larger sizes are also observed in the library, tems (MiniSeq, NextSeq, and NovaSeq), use
then the problem may be solved with a size the –nextseq-trim=20 option to remove 3 ter-
selection step as described in Basic Protocol. minal Gs stemming from dark cycles on short
However, if no other bands are observed, it fragments. If the reads are paired-end reads
will be necessary to redo the library prepara- instead of single-end, then Cutadapt should
tion with more input RNA and it might also be be run once for each FASTQ file with the re-
advantageous to test different RNA fragmen- spective adapter sequence. Following adapter
tation incubation times. removal, the preprocessing tool developed in
PCR duplicates are sequence reads that our group (Kielpinski et al., 2015) can be used
arise from amplification of the same template to remove the barcode introduced at the 3 end
RNA molecule and these represent a major of the cDNA during library preparation. The
issue in next-generation sequencing experi- preprocessed reads can now be aligned to the
ments. When the RNA starting material is lim- investigated RNAs or genomes and this can be
ited, PCR amplification is necessary to obtain done with a number of alignment programs. In
enough material for sequencing; however, in- the provided example bowtie2 is used (Lang-
creasing the number of cycles in the PCR also mead & Salzberg, 2012). Following mapping,
increases the risk of PCR duplicates. In gen- PCR duplicates can be removed with the pro-
eral, it is recommended to perform a small- vided collapse script which will output a .sam
scale PCR with different number of cycles file. The code in this script removes all but one
Poulsen and to choose the lowest number of cycles that of the reads that mapped to the same position,
Vinther if they also contain identical barcodes.
are visible either on a gel or on a Bioana-
10 of 12
Current Protocols in Nucleic Acid Chemistry
Table 2 Mapping Statistics for the Example Data Provided

Sample name Total RNA Ribo-zero TEXa


Index 17 18 19
Total reads 11,457,655 17,818,671 8,374,065
Reads mapped to rRNA 10,372,952 24,908 5,379,252
(% reads mapped to rRNA) (90.5%) (0.1%) (64.2%)
Reads mapped to genome 11,307,818 16,580,740 7,973,142
(% reads mapped to genome) (98.7%) (93.1%) (95.2%)
Reads mapped to genome with 10,139,700 11,676,220 6,709,211
debarcoding (% reads mapped to (88.5%) (65.5%) (80.1%)
genome with debarcoding)
a Terminator 5 -phosphate-dependent exonuclease.

The provided example contains libraries needed to generate libraries. The time needed
generated from: (1) total RNA with no rRNA to perform data analysis can vary from days
depletion, (2) rRNA depleted RNA using the to weeks depending on which analyses are
Ribo-Zero rRNA Removal Kit (Basic Pro- performed.
tocol), and (3) TerminatorTM 5 -phosphate-
dependent exonuclease (TEX) treated RNA Acknowledgements
(Alternate Protocol). Following adapter re- The research was funded by Innovation
moval and preprocessing, the reads have been Fund Denmark.
aligned to the reference genome for the strain
used in this experiment (Bacillus subtilis Literature Cited
str. 168 (ASM904v1) from Ensembl). The Amara, R. R., & Vijaya, S. (1997). Spe-
cific polyadenylation and purification of total
number of reads aligning to the genome is messenger RNA from Escherichia coli. Nu-
93.1% to 98.7% (Table 2). Aligning only to cleic Acids Research, 25(17), 3465–3470. doi:
rRNA sequences shows that 90.5% of all reads 10.1093/nar/25.17.3465.
from the total RNA sample maps to rRNAs, Andrews, S. (2010). FastQC. A quality
and as expected, rRNA depletion with TEX control tool for high throughput se-
lowers this number to 64.2%. In contrast, quence data. Retrieved from https://www.
rRNA depletion using the Ribo-Zero rRNA bioinformatics.babraham.ac.uk/projects/fastqc/
Removal Kit as described in Basic Protocol Croucher, N. J., & Thomson, N. R. (2010). Studying
results in only 0.1% mapping to the ribosomal bacterial transcriptomes using RNA-seq. Cur-
rent Opinion in Microbiology, 13(5), 619–624.
RNAs. Using the provided collapse script to doi: 10.1016/j.mib.2010.09.009.
remove PCR duplicates, we can see that in-
Hou, Z., Jiang, P., Swanson, S. A., Elwell, A. L.,
cluding a barcode may be beneficial to avoid Nguyen, B. K. S., Bolin, J. M., . . . Thom-
PCR duplicates (Table 2). son, J. A. (2015). A cost-effective RNA se-
Downstream analysis will depend on the quencing protocol for large-scale gene expres-
specific scientific questions that are being sion studies. Scientific Reports, 5, 9570. doi:
investigated. In a standard RNA-seq setup, 10.1038/srep09570.
the reads mapping to the different genes are Kielpinski, L. J., Boyd, M., Sandelin, A., &
counted with a tool such as HT-seq. Finally, Vinther J. (2013). Detection of reverse tran-
scriptase termination sites using cDNA liga-
gene counts from replicate experiments are an- tion and massive parallel sequencing. In N.
alyzed with specialized tools such as EdgeR Shomron (Ed.), Deep sequencing data analy-
or DESeq2 based on the negative binominal sis. Methods in molecular biology Vol. 1038
distribution to produce lists of differentially (pp. 213–231). Totowa, NJ: Humana Press. doi:
expressed genes with associated fold changes 10.1007/978-1-62703-514-9_13.
and significance estimates. Kielpinski, L. J., Sidiropoulos, N., & Vinther, J.
(2015). Reproducible analysis of sequencing-
based RNA structure probing data with user-
Time Considerations friendly tools. Methods in Enzymology, 558,
Preparation of RNA-seq libraries from iso- 153–180. doi: 10.1016/bs.mie.2015.01.014.
lated RNA can be completed in 4 to 5 days.
Langmead, B., & Salzberg, S. L. (2012).
The use of a SpeedVac instead of ethanol Fast gapped-read alignment with Bowtie
precipitation for concentration of RNA or Poulsen and
2. Nature Methods, 9(4), 357–359. doi: Vinther
DNA molecules significantly reduces the time 10.1038/nmeth.1923.
11 of 12
Current Protocols in Nucleic Acid Chemistry
Li, T. W., & Weeks, K. M. (2006). Structure- expression patterns with a complementary DNA
independent and quantitative ligation of single- microarray. Science, 270(5235), 467–470. doi:
stranded DNA. Analytical Biochemistry, 349(2), 10.1126/science.270.5235.467.
242–246. doi: 10.1016/j.ab.2005.11.002. Sharma, C. M., & Vogel, J. (2014). Dif-
Mamanova, L., Andrews, R. M., James, K. D., ferential RNA-seq: The approach behind
Sheridan, E. M., Ellis, P. D., Langford, C. F., . . . and the biological insight gained. Current
Turner, D. J. (2010). FRT-seq: Amplification- Opinion in Microbiology, 19, 97–105. doi:
free, strand-specific transcriptome sequenc- 10.1016/j.mib.2014.06.010.
ing. Nature Methods, 7(2), 130–132. doi: Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-
10.1038/nmeth.1417. Seq: A revolutionary tool for transcriptomics.
Martin, M. (2011). Cutadapt removes adapter Nature Reviews Genetics, 10(1), 57–63. doi:
sequences from high-throughput sequencing 10.1038/nrg2484.
reads. EMBnet. Journal, 17(1), 10. doi: Zhang, Z., Theurkauf, W. E., Weng, Z., & Zamore,
10.14806/ej.17.1.200. P. D. (2012). Strand-specific libraries for high
Nagalakshmi, U., Wang, Z., Waern, K., Shou, throughput RNA sequencing (RNA-Seq) pre-
C., Raha, D., Gerstein, M., & Snyder, M. pared without poly(A) selection. Silence, 3(1),
(2008). The transcriptional landscape of the 9. doi: 10.1186/1758-907X-3-9.
yeast genome defined by RNA sequencing. Sci- Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R.,
ence, 320(5881), 1344–1349. doi: 10.1126/sci- & Siebert, P. D. (2001). Reverse transcriptase
ence.1158441. template switching: A SMART approach for
Poulsen, L. D., Kielpinski, L. J., Salama, full-length cDNA library construction. BioTech-
S. R., Krogh, A., & Vinther, J. (2015). niques, 30(4), 892–897.
SHAPE Selection (SHAPES) enrich for RNA
structure signal in SHAPE sequencing-based
probing data. RNA, 21(5), 1042–1052. doi: Internet Resources
10.1261/rna.047068.114. https://people.binf.ku.dk/jvinther/data/RNA-seq
Schena, M., Shalon, D., Davis, R. W., & Brown, The RNA-seq dataset and an example of the data
P. O. (1995). Quantitative monitoring of gene analysis workflow are available for download.

Poulsen and
Vinther

12 of 12
Current Protocols in Nucleic Acid Chemistry

You might also like