CPNC 55
CPNC 55
CPNC 55
RNA sequencing (RNA-seq) has become the preferred method for global quan-
tification of bacterial gene expression. With the continued improvements in
sequencing technology and data analysis tools, the most labor-intensive and
expensive part of an RNA-seq experiment is the preparation of sequencing
libraries, which is also essential for the quality of the data obtained. Here,
we present a straightforward and inexpensive basic protocol for preparation of
strand-specific RNA-seq libraries from bacterial RNA as well as a computa-
tional pipeline for the data analysis of sequencing reads. The protocol is based
on the Illumina platform and allows easy multiplexing of samples and the re-
moval of sequencing reads that are PCR duplicates. C 2018 by John Wiley &
Sons, Inc.
Keywords: library construction r next-generation sequencing r RNA-seq
INTRODUCTION
This unit describes the experimental steps required for preparation of RNA-seq libraries
from total RNA isolated from bacteria. The method is based on a protocol for detection
of reverse transcriptase termination sites (Kielpinski, Boyd, Sandelin, & Vinther, 2013),
which we here have adapted for RNA-seq. As an example, we applied the method to RNA
isolated from Bacillus subtilis; however, the method is general and can be used to generate
RNA-seq libraries from a wide variety of bacteria. Depletion of ribosomal RNA from the
RNA sample can be achieved either with the Ribo-zero R
rRNA Removal Kit (Basic Pro-
tocol) or by selective degradation with the 5 -monophosphate-dependent TerminatorTM
exonuclease (Alternate Protocol). While we provide an RNA-seq dataset and some gen-
eral recommendations for data analysis (see Internet Resources), more detailed pipelines
have been described previously (Kielpinski, Sidiropoulos, & Vinther, 2015).
STRATEGIC PLANNING
The experiments should be carefully planned prior to sample collection. It is recom-
mended to include at least three biological replicates for the statistical analysis and also
to include a technical replicate to allow estimation of the reproducibility of the protocol.
Another important parameter to consider before performing a sequencing experiment is
the sequence depth (or library size). The higher the number of reads in each sample, the
more RNAs will be detected and quantified. For a typical bacterial transcriptome such as
Bacillus subtilis, one to two million reads per replicate will be sufficient to achieve good
coverage on the majority of expressed mRNAs when using input RNA depleted for ribo-
somal RNA with the Ribo-zero R
rRNA Removal Kit or a similar kit (Basic Protocol). In
some cases, it can be cost effective to use the 5 -monophosphate-dependent TerminatorTM
exonuclease (Alternate Protocol) instead of the Ribo-zero R
rRNA Removal Kit and then
Poulsen and
Vinther
Figure 1 Schematic representation of the RNA-seq workflow. The starting material is total RNA
from bacteria. The rRNA is depleted from this sample and RNA is then fragmented. Following
purification, RNA is reverse transcribed to cDNA using a random primer containing a 5 adapter
Poulsen and overhang. After ligation of an adaptor to the 3 -end, the cDNA serves as a template for PCR
Vinther amplification adding Illumina overhangs, which are necessary for sequencing.
2 of 12
Current Protocols in Nucleic Acid Chemistry
Table 1 Oligonucleotides Used for Library Preparationa
Name Sequence
RT_random_primer AGACGTGTGCTCTTCCGATCTNNNNNNNNNNNNNNN
3 -ligation_Adapter PHO-AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT-
3NHC3
PCR_FW PHO-NNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAA
GAGTGT-3NHC3
PCR_RV_17 CAAGCAGAAGACGGCATACGAGATCTCTACGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT
PCR_RV_18 CAAGCAGAAGACGGCATACGAGATGCGGACGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT
PCR_RV_19 CAAGCAGAAGACGGCATACGAGATTTTCACGTGACTGGA
GTTCAGACGTGTGCTCTTCCGATCT
a The oligonucleotide sequences of the Illumina genomic DNA adapters are copyrighted by Illumina, 2006. All rights
reserved. Index sequences for multiplexing are underlined. A table of additional PCR index primers can be found in
Kielpinski et al., 2013.
4 of 12
Current Protocols in Nucleic Acid Chemistry
Figure 2 Bioanalyzer electropherogram of RNA molecules. (A) Total RNA which has an RIN value of 9.6.
(B) RNA sample following rRNA depletion with the Ribo-zero
R
rRNA Removal Kit (bacteria). (C) RNA sample
following rRNA depletion with TerminatorTM 5 -phosphate-dependent exonuclease (TEX; Alternate Protocol).
(D) An example of an RNA sample after fragmentation (shown here is fragmented total RNA). RIN, RNA
integrity number.
17. Add 10 µL water to beads and pipet at least 60 times to elute RNA.
18. Set tube on magnetic stand 5 min to separate beads. Collect supernatant and keep it
on ice.
First-strand synthesis
19. In a PCR tube, mix 9.5 µL RNA and 0.5 µL 10 µM reverse transcription primer
(RT_random_primer).
20. Incubate at 65°C, 5 min and cool on ice.
21. Prepare master mix. For one reaction, add:
4 µL 5× PrimeScript buffer,
1 µL 10 mM dNTPs,
4 µL water,
1 µL PrimeScript reverse transcriptase.
22. Mix gently and spin down.
23. Add 10 µL master mix to RNA-reverse transcription primer solution, mix gently,
and spin down.
24. Incubate in a thermal cycler as follows: 30°C, 10 min; 42°C, 60 min; 70°C, 15 min;
and keep at 4°C until next step.
cDNA purification and concentration
25. Add 36 µL Agencourt R
AMPure R
XP beads to the samples and incubate at room Poulsen and
Vinther
temperature 20 min, mixing every 5 min by vortexing.
5 of 12
Current Protocols in Nucleic Acid Chemistry
26. Set tube on magnetic stand 5 min and discard supernatant.
27. Keep sample on magnetic stand and wash beads with ethanol by adding 150 µL
75% EtOH. Wait 30 sec and then remove supernatant. Repeat this washing step
once more.
28. Add 20 µL water to beads and pipet at least 60 times to elute cDNA.
29. Set tube on magnetic stand 5 min to separate beads. Collect supernatant.
30. Use SpeedVac to decrease sample volume to 4 µL. (Alternatively, the DNA can be
ethanol-precipitated and resuspended in 4 µL water).
Adapter ligation and purification
31. Prepare master mix. For one reaction, add:
98°C, 3 min;
98°C, 80 sec; 68°C, 30 sec; 72°C, 30 sec: twelve times;
72°C, 5 min;
Hold at 4°C.
40. Purify with AgencourtR
AMPure R
XP beads as in step 25 to 29 but this time add
90 µL beads to the 50-µL PCR reaction and elute in 50 µL water.
Poulsen and
41. Run 1 µL purified PCR product on an Agilent 2100 Bioanalyzer using a DNA High
Vinther Sensitivity chip (Fig. 3A).
6 of 12
Current Protocols in Nucleic Acid Chemistry
Figure 3 DNA libraries. (A) Example of a bioanalyzer electropherogram of a library before size selection. The
peak around 145 bp (black arrow) represents ligation of the 3 -ligation adapter to the random RT primer followed
by PCR amplification but without insert. (B) Bioanalyzer electropherogram of a library after size selection.
ThermoMixer
Magnetic stand
rRNA depletion with Terminator 5 -phosphate-dependent exonuclease
1. Adjust volume RNA to 15.5 µL with RNase-free water.
2. On ice, add the following:
8 of 12
Current Protocols in Nucleic Acid Chemistry
REAGENTS AND SOLUTIONS
Fragmentation buffer, 2×
10 mL 1 M Tris·Cl, pH 8.0
1 mL 1 M MgCl2
Bring to 100 mL with RNase-free water
Autoclave and store at room temperature for up to 1 year
COMMENTARY
Background Information ligation (Zhang, Theurkauf, Weng, & Zamore,
Methods for quantification of gene expres- 2012). Also, the SMART-seq strategy takes
sion have been central for our understanding of advantage of the template switching mecha-
cellular properties and how cells interact with nism of reverse transcriptase (RT) for adding
and adapt to their environment. Over the years, a 3 adapter prior to PCR amplification (Zhu,
the methodologies have improved, thereby al- Machleder, Chenchik, Li, & Siebert, 2001).
lowing gene expression of many thousands of Here, we present a protocol that is based
genes to be analyzed in one experiment. In the on the ligation of an adapter to the 3 end
1990s, microarrays allowed high throughput of cDNAs produced by reverse transcription
identification of candidate transcripts, thereby of an RT-primer carrying an Illumina adapter
revolutionizing the approach to study gene ex- sequence overhang (Li & Weeks, 2006).
pression (Schena, Shalon, Davis, & Brown, We previously used this strategy for prepar-
1995), but recently with the invention of mas- ing libraries for RNA probing experiments
sive parallel sequencing, RNA-seq has become (Kielpinski et al., 2013; Poulsen, Kielpinski,
the method of choice to study gene expression. Salama, Krogh, & Vinther, 2015), but it also
Compared with DNA arrays, RNA-seq has an works nicely for RNA-seq that is based on
improved dynamic range for quantification of sequencing of fragmented purified mRNAs.
gene expression levels and allows the identifi- Our strategy resembles the Ligation Medi-
cation of novel RNA species (Wang, Gerstein, ated RNA sequencing protocol developed by
& Snyder, 2009). In bacteria, both RNA-seq Thomson and co-workers for eukaryotic RNA-
and more specialized protocols based on se- seq (Hou et al., 2015) and has many of the same
quencing have revolutionized the understand- advantages, including low costs and time con-
ing of bacterial transcriptomes (Croucher & sumption. In addition, our protocol includes
Thomson, 2010; Sharma & Vogel, 2014). the possibility to recognize PCR duplicates
The library preparation is essential for a through the use of a barcode in the adapter
successful RNA-seq experiment and many dif- ligated to the 3 end of the cDNA, which is
ferent commercial kits are available. However, especially valuable for samples with limited
these kits remain relatively expensive and in input material.
some cases contain secret reagents or compo-
sitions, which create the need for alternative
RNA-seq protocols that are robust and, in ad- Critical Parameters and
dition, time and cost effective. The key point Troubleshooting
for construction of a sequencing library, which The integrity of the input RNA is critical for
can be subjected to massive parallel sequenc- obtaining high quality RNA-seq data. While
ing on the Illumina platform, is the require- RNA isolation is not covered in this proto-
ment for addition of adapters to the ends of col, numerous methods are available in the lit-
DNA that are sequenced. Over the years, dif- erature and finding the most appropriate for
ferent strategies have been developed for this. a given bacteria species is essential for per-
In the initial RNA-seq experiments, adapters forming a successful experiment. Regardless
were attached to double-stranded cDNAs, and of which method is chosen, the RNA qual-
this resulted in a loss of directionality (Na- ity should be measured before RNA-seq. The
galakshmi et al., 2008). Later, several strate- RNA quality can be measured with an Agilent
gies were developed to preserve strand speci- Bioanalyzer which will produce an RNA In-
ficity, including attaching adapter sequences to tegrity number (RIN). The RIN number is be-
RNA molecules prior to reverse transcription tween 1 and 10, where 10 corresponds to the
(Mamanova et al., 2010) and incorporation of highest quality with no RNA degradation. Low
dUTP instead of dTTP in the second strand RIN values may result in incorrect biological
conclusions and we recommend using RNA Poulsen and
cDNA synthesis to allow selective degrada- Vinther
tion of the second strand following adapter that has a RIN value of at least 7. Alternatively,
9 of 12
Current Protocols in Nucleic Acid Chemistry
the RNA can be run on an agarose gel to infer lyzer DNA chip. To accommodate potential
RNA integrity, but this method is less sensitive. problems with PCR duplicates, our protocol
Since prokaryotic RNAs generally are introduces a random barcode in the 3 cDNA
non-polyadenylated, a major issue in library ligation step prior to PCR amplification. Ob-
preparation is depletion of ribosomal RNAs, serving multiple reads with identical barcodes
which are the most abundant RNA species on the same DNA fragment indicate that these
in the cells. Different approaches have been are PCR duplicates and should be collapsed to
used to solve this challenge, for instance one read.
mRNA polyadenylation followed by binding
to poly(dT) beads (Amara & Vijaya, 1997) Anticipated Results
or capture of rRNAs with sequence-specific RNA-seq results in raw sequencing reads
biotinylated probes. Here, we have used the that must be processed before biological
Ribo-Zero rRNA Removal Kit (bacteria; Ba- interpretation. The analysis of an RNA-seq
sic Protocol), which resulted in almost com- experiment has many variations and the
plete depletion of rRNA however, several other exact pipeline depends on the aims of a
kits are commercially available. An alternative given experiment. The main focus of this
is treatment with TerminatorTM 5 -phosphate- unit is preparation of high quality RNA-seq
dependent exonuclease (TEX; Alternate Pro- libraries from bacteria; however, we will
tocol), which selectively degrades RNAs con- briefly describe the framework for analyzing
taining 5 -monophosphates such as rRNAs. the obtained data and the expected results.
The enrichment for mRNAs is not as effi- We provide a dataset that has been generated
cient, as degradation can be blocked by sec- with the protocol and command line scripts
ondary structures in the substrate RNA. Also, for preprocessing, mapping, and barcode
it is important to remember that treatment with collapsing (see Internet Resources).
TEX results in enrichment of primary tran- The first step in any data analysis pipeline
scripts, as processed transcripts often contain is quality control, and this can be done with
5 -monophosphates. FASTQC (Andrews, 2010) or similar pro-
The presence of a band with a length corre- grams. Examining plots from the quality con-
sponding to around 145 bp indicates a problem trol is important for detecting and subse-
in library preparation. This size is equal to di- quently dealing with potential problems in
rect ligation of the 3 -ligation adapter to the the libraries. Following satisfying quality con-
random RT primer followed by PCR amplifi- trol, the adapter sequence should be removed
cation, but without insert (Fig. 3A). Observing from the reads. One of the tools developed
this band confirms successful 3 -adapter liga- for this purpose is Cutadapt (Martin, 2011).
tion and PCR amplification but also indicates Here, default settings can be used, with the
either too low input amount or a high degree of quality cutoff (-q option) set to 20. For se-
fragmentation of the input RNA. If fragments quences obtained for Illuminas’ two-dye sys-
of larger sizes are also observed in the library, tems (MiniSeq, NextSeq, and NovaSeq), use
then the problem may be solved with a size the –nextseq-trim=20 option to remove 3 ter-
selection step as described in Basic Protocol. minal Gs stemming from dark cycles on short
However, if no other bands are observed, it fragments. If the reads are paired-end reads
will be necessary to redo the library prepara- instead of single-end, then Cutadapt should
tion with more input RNA and it might also be be run once for each FASTQ file with the re-
advantageous to test different RNA fragmen- spective adapter sequence. Following adapter
tation incubation times. removal, the preprocessing tool developed in
PCR duplicates are sequence reads that our group (Kielpinski et al., 2015) can be used
arise from amplification of the same template to remove the barcode introduced at the 3 end
RNA molecule and these represent a major of the cDNA during library preparation. The
issue in next-generation sequencing experi- preprocessed reads can now be aligned to the
ments. When the RNA starting material is lim- investigated RNAs or genomes and this can be
ited, PCR amplification is necessary to obtain done with a number of alignment programs. In
enough material for sequencing; however, in- the provided example bowtie2 is used (Lang-
creasing the number of cycles in the PCR also mead & Salzberg, 2012). Following mapping,
increases the risk of PCR duplicates. In gen- PCR duplicates can be removed with the pro-
eral, it is recommended to perform a small- vided collapse script which will output a .sam
scale PCR with different number of cycles file. The code in this script removes all but one
Poulsen and to choose the lowest number of cycles that of the reads that mapped to the same position,
Vinther if they also contain identical barcodes.
are visible either on a gel or on a Bioana-
10 of 12
Current Protocols in Nucleic Acid Chemistry
Table 2 Mapping Statistics for the Example Data Provided
The provided example contains libraries needed to generate libraries. The time needed
generated from: (1) total RNA with no rRNA to perform data analysis can vary from days
depletion, (2) rRNA depleted RNA using the to weeks depending on which analyses are
Ribo-Zero rRNA Removal Kit (Basic Pro- performed.
tocol), and (3) TerminatorTM 5 -phosphate-
dependent exonuclease (TEX) treated RNA Acknowledgements
(Alternate Protocol). Following adapter re- The research was funded by Innovation
moval and preprocessing, the reads have been Fund Denmark.
aligned to the reference genome for the strain
used in this experiment (Bacillus subtilis Literature Cited
str. 168 (ASM904v1) from Ensembl). The Amara, R. R., & Vijaya, S. (1997). Spe-
cific polyadenylation and purification of total
number of reads aligning to the genome is messenger RNA from Escherichia coli. Nu-
93.1% to 98.7% (Table 2). Aligning only to cleic Acids Research, 25(17), 3465–3470. doi:
rRNA sequences shows that 90.5% of all reads 10.1093/nar/25.17.3465.
from the total RNA sample maps to rRNAs, Andrews, S. (2010). FastQC. A quality
and as expected, rRNA depletion with TEX control tool for high throughput se-
lowers this number to 64.2%. In contrast, quence data. Retrieved from https://www.
rRNA depletion using the Ribo-Zero rRNA bioinformatics.babraham.ac.uk/projects/fastqc/
Removal Kit as described in Basic Protocol Croucher, N. J., & Thomson, N. R. (2010). Studying
results in only 0.1% mapping to the ribosomal bacterial transcriptomes using RNA-seq. Cur-
rent Opinion in Microbiology, 13(5), 619–624.
RNAs. Using the provided collapse script to doi: 10.1016/j.mib.2010.09.009.
remove PCR duplicates, we can see that in-
Hou, Z., Jiang, P., Swanson, S. A., Elwell, A. L.,
cluding a barcode may be beneficial to avoid Nguyen, B. K. S., Bolin, J. M., . . . Thom-
PCR duplicates (Table 2). son, J. A. (2015). A cost-effective RNA se-
Downstream analysis will depend on the quencing protocol for large-scale gene expres-
specific scientific questions that are being sion studies. Scientific Reports, 5, 9570. doi:
investigated. In a standard RNA-seq setup, 10.1038/srep09570.
the reads mapping to the different genes are Kielpinski, L. J., Boyd, M., Sandelin, A., &
counted with a tool such as HT-seq. Finally, Vinther J. (2013). Detection of reverse tran-
scriptase termination sites using cDNA liga-
gene counts from replicate experiments are an- tion and massive parallel sequencing. In N.
alyzed with specialized tools such as EdgeR Shomron (Ed.), Deep sequencing data analy-
or DESeq2 based on the negative binominal sis. Methods in molecular biology Vol. 1038
distribution to produce lists of differentially (pp. 213–231). Totowa, NJ: Humana Press. doi:
expressed genes with associated fold changes 10.1007/978-1-62703-514-9_13.
and significance estimates. Kielpinski, L. J., Sidiropoulos, N., & Vinther, J.
(2015). Reproducible analysis of sequencing-
based RNA structure probing data with user-
Time Considerations friendly tools. Methods in Enzymology, 558,
Preparation of RNA-seq libraries from iso- 153–180. doi: 10.1016/bs.mie.2015.01.014.
lated RNA can be completed in 4 to 5 days.
Langmead, B., & Salzberg, S. L. (2012).
The use of a SpeedVac instead of ethanol Fast gapped-read alignment with Bowtie
precipitation for concentration of RNA or Poulsen and
2. Nature Methods, 9(4), 357–359. doi: Vinther
DNA molecules significantly reduces the time 10.1038/nmeth.1923.
11 of 12
Current Protocols in Nucleic Acid Chemistry
Li, T. W., & Weeks, K. M. (2006). Structure- expression patterns with a complementary DNA
independent and quantitative ligation of single- microarray. Science, 270(5235), 467–470. doi:
stranded DNA. Analytical Biochemistry, 349(2), 10.1126/science.270.5235.467.
242–246. doi: 10.1016/j.ab.2005.11.002. Sharma, C. M., & Vogel, J. (2014). Dif-
Mamanova, L., Andrews, R. M., James, K. D., ferential RNA-seq: The approach behind
Sheridan, E. M., Ellis, P. D., Langford, C. F., . . . and the biological insight gained. Current
Turner, D. J. (2010). FRT-seq: Amplification- Opinion in Microbiology, 19, 97–105. doi:
free, strand-specific transcriptome sequenc- 10.1016/j.mib.2014.06.010.
ing. Nature Methods, 7(2), 130–132. doi: Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-
10.1038/nmeth.1417. Seq: A revolutionary tool for transcriptomics.
Martin, M. (2011). Cutadapt removes adapter Nature Reviews Genetics, 10(1), 57–63. doi:
sequences from high-throughput sequencing 10.1038/nrg2484.
reads. EMBnet. Journal, 17(1), 10. doi: Zhang, Z., Theurkauf, W. E., Weng, Z., & Zamore,
10.14806/ej.17.1.200. P. D. (2012). Strand-specific libraries for high
Nagalakshmi, U., Wang, Z., Waern, K., Shou, throughput RNA sequencing (RNA-Seq) pre-
C., Raha, D., Gerstein, M., & Snyder, M. pared without poly(A) selection. Silence, 3(1),
(2008). The transcriptional landscape of the 9. doi: 10.1186/1758-907X-3-9.
yeast genome defined by RNA sequencing. Sci- Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R.,
ence, 320(5881), 1344–1349. doi: 10.1126/sci- & Siebert, P. D. (2001). Reverse transcriptase
ence.1158441. template switching: A SMART approach for
Poulsen, L. D., Kielpinski, L. J., Salama, full-length cDNA library construction. BioTech-
S. R., Krogh, A., & Vinther, J. (2015). niques, 30(4), 892–897.
SHAPE Selection (SHAPES) enrich for RNA
structure signal in SHAPE sequencing-based
probing data. RNA, 21(5), 1042–1052. doi: Internet Resources
10.1261/rna.047068.114. https://people.binf.ku.dk/jvinther/data/RNA-seq
Schena, M., Shalon, D., Davis, R. W., & Brown, The RNA-seq dataset and an example of the data
P. O. (1995). Quantitative monitoring of gene analysis workflow are available for download.
Poulsen and
Vinther
12 of 12
Current Protocols in Nucleic Acid Chemistry