Mgen 000087
Mgen 000087
Mgen 000087
Enrichment of DNA by hybridisation is an important tool which enables users to gather target-focused next-generation
sequence data in an economical fashion. Current in-solution methods capture short fragments of around 200–300 nt, poten-
tially missing key structural information such as recombination or translocations often found in viral or bacterial pathogens. The
increasing use of long-read third-generation sequencers requires methods and protocols to be adapted for their specific
requirements. Here, we present a variation of the traditional bait–capture approach which can selectively enrich large frag-
ments of DNA or cDNA from specific bacterial and viral pathogens, for sequencing on long-read sequencers. We enriched
cDNA from cultured influenza virus A, human cytomegalovirus (HCMV) and genomic DNA from two strains of Mycobacterium
tuberculosis (M. tb) from a background of cell line or spiked human DNA. We sequenced the enriched samples on the Oxford
Nanopore MinION and the Illumina MiSeq platform and present an evaluation of the method, together with analysis of the
sequence data. We found that unenriched influenza A and HCMV samples had no reads matching the target organism due to
the high background of DNA from the cell line used to culture the pathogen. In contrast, enriched samples sequenced on the
MinION platform had 57 % and 99 % best-quality on-target reads respectively.
2 Microbial Genomics
Enrichment of long DNA fragments for Nanopore sequencing
Hybridisation Hybridisation
Nanopore adapter ligation
PCR1
BLASR, LAST alignment
Long-range PCR
PCR2
End repair, dA-tailing
MiSeq sequencing
Nanopore leader/hairpin ligation
Bowtie alignment
MinION sequencing
Picard analysis
BLASR, LAST alignment
Fig. 1. Workflow for hybridisation and sequencing of long-fragment-enriched pathogen DNA. (a) Enrichment and library preparation for the
Oxford Nanopore MinION sequencer. (b) non-enriched Nanopore libraries, (c) library preparation for long-fragment-enriched Illumina control
experiments.
Technologies), and Agilent Tape Station (Genomic DNA Biotinylated custom RNA baits for the target organisms
ScreenTape #5067–5365 and Genomic DNA Reagents #5067– influenza virus A (49190 baits), HCMV (33809 baits) and
5366; High Sensitivity RNA ScreenTape #5067–5579, High M. tb (224612 baits) were designed with an in-house Perl
Sensitivity RNA ScreenTape Sample Buffer #5067–5580, High script (Depledge et al., 2011), using a database of 4968
Sensitivity RNA ScreenTape Ladder #5067–5581; High Sensi- H1N1 and 2966 H3N2 influenza virus A genomes, 115 par-
tivity D1000 ScreenTape #5067–5584, High Sensitivity D1000 tial and complete HCMV genomes and the M. tb strain
Reagents #5067–5585, Agilent) according to the manu- H37Rv reference genome (NC_018143.2), respectively, and
facturers’ instructions throughout the experiment. manufactured by Agilent. Sheared genomic DNA (HCMV,
Table 1. Average size of DNA fragments at various stages during the protocol
The table shows the modal fragment size post-shear and post-PCR, and mean Nanopore read length (‘pass’, and ‘fail’ read quality) with standard
deviations (SD), of the samples used in this study.
Samples Average (modal) size of DNA fragments Average (mean) read length
Post-shear (nt) Post-PCR (nt) ‘Pass’ reads [nt (SD)] ‘Fail’ reads [nt (SD)]
Influenza virus A non-enriched* 160, 320, 500, 670, 900, 1250, 3000+* 99–4000 1598 (1191) 805 (946)
Influenza virus A enriched* 160, 320, 500, 670, 900, 1250, 3000+* 370–4000 773 (683) 533 (733)
HCMV non-enriched 12 500† –‡ 3176 (2291) 487 (1203)
HCMV enriched 12 500 1587, 5640 1528 (975) 1083 (1099)
M. tb H37Rv 13 800 2000–7000 2402 (1865) 757 (1855)
M. tb strain C 15 000 1500 759 (355) 596 (713)
http://mgen.microbiologyresearch.org 3
S. E. Eckert and others
M. tb) and cDNA (influenza virus A) samples were hybri- Before each MinION run, flowcells were quality-tested with
dised and captured according to the SureSelectXT Target the script MAP_Platform_QC (MinKnow software version
Enrichment for Illumina Paired-End Multiplexed Sequenc- 0.46.2.8 to 0.49.2.9), then loaded with 12–60 ng of prepared
ing protocol (Version B.1, 2014, 16 h hybridisation). Follow- library, library fuel mix and EP buffer (ONT) as per the
ing capture, samples were heated to 95 C for 3 min, and manufacturer’s instructions, and run with script MAP_48
cooled to 35 C (ramp: 4 C min 1) to release the target Hr_Sequencing_Run, for an average of 26 h.
fragments from the baits bound to streptavidin beads.
Reads were analysed by the Metrichor 2D basecalling (ver-
Half of each hybridised sample was used for Nanopore sions 2.19 to 2.29) cloud-based platform, and the resulting
library preparation with ONT kit versions SQK-MAP003 fast5 files (‘pass’ quality, both strands read while passing
for M. tb strain H37Rv and SQK-MAP004 for HCMV, through the nanopore, resulting in higher confidence; and
influenza virus A, and M. tb strain C. The remainder was ‘fail’, where only one strand is read) converted to fasta for-
used for the generation of Illumina-compatible libraries. mat with Poretools (Loman & Quinlan, 2014). BLASR
(Chaisson & Tesler, 2012) and LAST (Kiełbasa et al., 2011)
were used to align reads to the pathogen reference sequences
Nanopore library preparation, sequencing and analysis. (HCMV herpesvirus HHV-5 GU179001.1, M. tb strain
End repair and dA-tailing of all samples were performed with H37Rv NC_018143.2, and influenza virus A strain H1N1, A/
enzymes from the SureSelectXT kit (#5500–0075, Agilent) and Puerto Rico/8/1934). Command lines were: ‘./blasr input.fa
adaptors and primers from the ONT SQK-MAP003 or SQK- reference.fa -sam -out output.sam’, ‘samtools view -bS out-
MAP004 library preparation kits as specified by the manufac- put.sam > output.bam’ for BLASR and ‘lastdb index_input
turers. Following AMPure XP bead purification (#A63880, input.fasta’, ‘lastal index_input reference.fa -r1 -a1 -b1 >
Beckman Coulter), the dA-tailed samples were ligated to output.maf’, ‘maf-convert -n sam < output.maf > output.
adaptors (ONT) for 15 min at room temperature. They were sam’ for LAST, respectively. Files were further tested with
cleaned up with AMPure XP beads and eluted in H2O. The both aligners against background human (Human_g1k_v37,
ligated DNA was amplified using Long Amp Taq 2x Master www.1000genomes.org) or dog (Ensembl CanFam3.1
mix (#M0287, NEB) and ONT PCR primers with the follow- GCA_000002285.2; NC_006583.3) sequences and the ONT
ing program: 95 C 3 min; 15–18 cycles of 95 C 15 sec, adapters used for PCR.
62 C 15 sec, 65 C 10 min; 65 C 20 min; 4 C hold.
A second round of end repair and dA-tailing was performed Illumina library preparation from long, hybridisation-
on 500 ng of enriched, amplified PCR product using Sure- enriched fragments. For the generation of Illumina librar-
SelectXT reagents as described above, but without purifica- ies, half of each hybridised sample (M. tb strain H37Rv, M.
tion after dA-tailing. Instead, leader/hairpin ligation and tb strain C, HCMV and influenza virus A) were sheared
sample clean-up were performed according to the ONT with a Covaris AFA instrument (Covaris) to 200 nt frag-
protocols for kit SQK-MAP003 (used in the M. tb strain ment size and converted into Illumina-compatible libraries
H37Rv experiments only) or SQK-MAP004. In detail, dA- (Fig. 1c) using Agilent reagents and SureSelectXT protocol
tailed sample, blunt/TA ligase master mix (#M0367, NEB), steps as before. Briefly, samples were end-repaired, dA-
tethered adapter mix and hairpin adapters (ONT) were tailed, had adapters ligated and were PCR-amplified (six
incubated for 10 min at room temperature in protein cycles) as described in the protocol. Following sample puri-
LoBind tubes (#0030108116, Eppendorf) for ligation. fication, the PCR products were re-amplified using post-
Libraries processed according to the ONT SQK-MAP003 capture indexed PCR2 primers for a further 15 cycles.
protocol were cleaned up with AMPure XP beads; those Sequencing (2300 nt read length) was performed on an
made according to the SQK-MAP004 method were purified Illumina MiSeq instrument with paired-end 600V3 kits
using Dynabeads for His-Tag isolation and pulldown (#MS-102-3003) with automatic adapter trimming. Results
(#10103D, Life Technologies) (Fig. 1a). Libraries were from the Illumina MiSeq runs were aligned to the respec-
eluted from the beads by incubation for 10 min at room tive references with Bowtie version 1.1.1 (http://bowtie-bio.
temperature in elution buffer (ONT). Library concentra- sourceforge.net/index.shtml). Additional alignment metrics
tions were typically 2–10 ng ml 1, as assessed by Qubit from the bam files were obtained using the Picard Col-
fluorometer. lectMultipleMetrics (http://broadinstitute.github.io/picard/)
tool, which generates metrics such as percentage of reads
The influenza virus A control sample that did not undergo aligned to a given target as well as coverage data.
hybridisation (75 ng, the equivalent of 2.71011 TCID50)
was end-repaired, dA-tailed and amplified with Long Amp
Taq polymerase as described above. Samples (500 ng) of Results
this PCR product were processed as recommended in the
Comparison of Nanopore library size and read
ONT Genomic DNA sequencing protocol SQK-MAP004.
length
For the non-hybridised HCMV sample, 500 ng (4.2107
copies) were used directly for Nanopore library preparation Table 1 shows the peak sizes of the DNA samples after
(SQK-MAP004) without amplification as enough material shearing, as determined on an Agilent Tape Station. The
was available to proceed directly to sequencing (Fig. 1b). size distribution of the influenza virus A RNA and cDNA
4 Microbial Genomics
Enrichment of long DNA fragments for Nanopore sequencing
prior to processing, showed distinct peaks at 160 nt, 320 nt, observations of Kilianski et al. (2015). A percentage of reads
500 nt, 670 nt, 900 nt, 1.2 kb, 3 kb, (Fig. S1a, b, available in (10–35 %) aligned to the reference by LAST are not aligned
the online Supplementary Material), with fragments up to by BLASR, and vice versa, indicating that neither aligner
15 kb. These were presumably short fragments of the eight works optimally for aligning Nanopore reads to the
influenza virus A segments NC_002016 to NC_002023, and reference.
residual dog cell line DNA. The size of fragments pre- and
post-reverse transcription were broadly similar (Fig. S1).
Due to the shortness of the fragments, influenza virus A Comparison of enriched and non-enriched
samples were not sheared. Nanopore libraries
The HCMV sample (g-TUBE-sheared and PreCR-treated) A total of 13 nanopore sequencing runs were included in
had a tight range of fragment sizes of around 12.8 kb. After our datasets. The average starting pore count per flowcell
PCR amplification, a broad range of fragment sizes both was 215. Most ‘pass’ quality reads aligned to either the target
within and between individual reactions were observed. In organism or the respective cell line, whereas most ‘fail’ qual-
general, the products were about half the size of the original ity reads did not match to target, cell line (Table 3) or
DNA before hybridisation, ranging between 1.6 kb and sequences in the PubMed Nucleotide database (November
5.6 kb. One exception was strain M. tb C, which had 2015). This has been reported elsewhere (e.g. Greninger
shorter (median size 1.5 kb) PCR products. et al., 2015; Kilianski et al., 2015). Regions of alignment
were shorter than read length, possibly due to regional
The Nanopore reads (Table 1) were similarly variable in increase of the error rates within reads.
length, reflecting the input material, as indicated by the
standard deviations in Table 1. Sequenced reads were Analysis of the 42 261 reads obtained from one non-
shorter on average than the PCR products, but with a wide enriched, PCR-amplified influenza virus A cDNA library
range. Reads classified as ‘pass’ quality by the Metrichor run on the Nanopore MinIONTM found 98.9 % ‘pass’ and
platform were longer than ‘fail’ quality reads. Non-hybri- 25.1 % ‘fail’ reads aligned to the MDCK dog cell line used
dised samples had longer read lengths than enriched sam- for cultivation of the virus, whilst only one read aligned to
ples, either due to DNA damage during the hybridisation the influenza virus A reference H1N1. After hybridisation
and wash processes, or preferential amplification of shorter and amplification, 57.2 % of ‘pass’ and 9.5 % of ‘fail’ reads
fragments during PCR. (34 211 reads in total) from one Nanopore run could be
aligned to influenza virus A. This amounts to an average
read depth of the influenza virus A genome of 62.9. Fig. 2
Comparison of BLASR and LAST aligners shows uneven distribution of reads per fragment, with dis-
tinct peaks of increased coverage. This probably reflects the
We used BLASR (Chaisson & Tesler, 2012) and LAST (Kieł-
size distribution of the input RNA (Fig. S1a) rather than
basa et al., 2011), with the settings used in Quick et al.
effects of reverse transcription, hybridisation or PCR bias.
(2014) for the alignment of Nanopore reads to their respec-
The frequency of cell line reads in influenza virus A-
tive references (pathogen and human/dog cell line). Table 2
enriched samples dropped to 28.4 % (‘pass’) and 2.9 %
shows statistics for the similarities to the target references
(‘fail’) (Table 3).
obtained with the two aligners. We found that BLASR align-
ment of reads showed slightly higher identity to the referen- The unenriched HCMV library (a total of 432 reads from one
ces, shorter aligned regions and lower standard deviation. flowcell) produced four reads (0.2 % of total) matching the
The LAST aligner produced longer alignments with lower HCMV reference HHV-5, while 47 reads (10.9 % of total)
identity and higher standard deviation. This is similar to the matched the human_g1k_v37 reference. After enrichment of
Table 2. Mean similarity and length (with standard deviations, SD) of Nanopore reads aligned to the pathogen targets using BLASR
and LAST
Mean similarity of reads to Mean length of Mean similarity of reads to Mean length of
target [% (SD)] alignment [nt (SD)] target [% (SD)] alignment [nt (SD)]
http://mgen.microbiologyresearch.org 5
S. E. Eckert and others
Table 3. Percentages of Nanopore reads aligned to target pathogen and cell line/human DNA in the samples prepared for this
study
Alignment statistics are the combined results from BLASR and LAST.
Sample Reads aligned to target pathogen Reads aligned to cell line/human DNA
the DNA with the HCMV-specific bait set, we obtained 890 363, 1 538 580–1 539 822, 2 635 594–2 640 242,
37 589 reads from three runs, with almost all (98.7 %) 3 544 391–3 547 252 and 3 788 312–3 789 669 of strain
‘pass’ reads and 35 % of ‘fail’ reads aligning to the HCMV H37Rv (Fig. 4). These regions have been highly enriched
reference (Table 3). This amounts to an average read depth compared with the background of human DNA, and also
of 87.6of the HCMV genome. Panels a in Fig. 3 show the compared with the rest of the M. tb genome. The sample orig-
coverage of all Nanopore reads aligned to the reference. inally containing 10 % H37Rv DNA showed the highest rate
of reads aligning to the reference, while both 90 % M. tb
A comparison of the consensus sequence generated from
the enriched HCMV reads aligned to the HCMV HHV-5 DNA samples (H37Rv and strain C) show enrichment mainly
reference using the genomic similarity search tool YASS in the transposase regions.
(Noé & Kucherov, 2005) revealed that the former had
99.4 % similarity to the reference (233 854 of 235 230 Sequencing of enriched long fragments on the
nucleotide residues). The conflicting/mismatch residues Illumina MiSeq
are mostly gaps in the Nanopore consensus sequence at
positions 46 364–46 433 (proteins UL34 and UL35), To assess the success of the long fragment hybridisation,
147 820–147 830 (helicase–primase subunit UL102), Illumina libraries were generated from the remaining half of
194 363–194 698 and 195 851–195 977. The last two the hybridised material, and sequenced on a MiSeq instru-
regions of difference coincide with inverted repeat regions ment (results shown in Table 4). A high percentage of influ-
(194 344–195 667, 195 090–197 626) (Masse et al., enza virus A and HCMV reads from long enriched
1992). A number of mismatches to the reference HHV-5 fragments aligned to the target reference in both Illumina
were identified upstream of base 1270; these were due to and Nanopore ‘pass’ reads.
low coverage of this region by Nanopore reads. We found Illumina-generated reads showed higher percentages of
regions with low (<5) coverage had a high number of alignment than Nanopore reads, presumably due to the
mismatches compared with the reference, but areas of lower error rates. Illumina libraries generated from the
greater coverage matched near-perfectly. hybridisation of long fragments, particularly the indepen-
For M. tb strain H37Rv, we obtained a total of 2028 ‘pass’ and dently generated, 10 % M. tb H37Rv libraries 1–4 in
9961 ‘fail’ reads (equivalent to 0.077coverage) from four Table 4, show successful enrichment of mycobacterial DNA,
flowcells, for the strain M. tb C, 202 ‘pass’ and 46 711 ‘fail’ with 56–96 % of reads aligning to the H37Rv genome.
reads (0.182) were obtained from three flowcells. Localized Results for M. tb strain C show a relatively low rate of align-
areas with high coverage were found in both strains; these ment of reads to the H37Rv genome in both Nanopore and
were found to correspond to open-reading frames encoding Illumina experiments. This could be due to less successful
transposases LH57_07500, LH57_18955, LH57_18175, and enrichment, and an imperfect match of the M. tb strain C
LH57_04320, at positions 887 429–887 488, 889 044– reads aligned to strain H37Rv, which has 98.9 % identity to
6 Microbial Genomics
Enrichment of long DNA fragments for Nanopore sequencing
Sample Fragment Number of reads Percentage of reads Mean depth of Percentage of target
hybridisation aligned to target aligned pathogen bases
pathogen to target pathogen coverage covered at 10
a consensus sequence generated from our Illumina- (2015) has shown that detection of moderate to high titres
sequenced M. tb strain C. of pathogen DNA (chikungunya virus, Ebola and hepatitis
C virus) from human blood samples is possible using Nano-
Results from the enriched influenza virus A (Fig. S1c) show
pore sequencing. However, this direct sequencing approach
concordance with the coverage by Nanopore results
(Fig. 2). The unevenness of the coverage is presumably a is inefficient if the region of interest is a small subset of the
result of the prevalence of short fragments in the original total DNA, the target is of low titre, or if high coverage is
RNA sample (Fig. S1a), reverse-transcribed to cDNA (Fig. required for strain typing and variant identification. In our
S1b). Illumina reads (Fig. 3b) generally show less even cov- Nanopore sequencing experiments with un-enriched influ-
erage of the HCMV genome compared with Nanopore enza virus A and HCMV DNA (from cell cultures), we
reads (Fig. 3a). Fifteen (out of a set of 23 525) aligned detected very low numbers of reads from the pathogen
Nanopore reads span the repetitive replication origin oriLyt compared with those from the host cell line. In contrast,
at position 94 488–94 588 (Chen et al., 1999) (Fig. S2c, sequencing data from enriched DNA produced good cover-
d). The complete (3.5 M aligned reads) Illumina dataset age of the influenza virus A and HCMV genomes and par-
(Table 4) has a 100 bp gap in the alignment at this repeti- tial coverage of the M. tb genome. Control experiments
tive position (Fig. S2a, b). Two Nanopore reads cover the using Illumina sequencing to assess the quality of enrich-
inverted repeat region 194 293–195 565, while no Illumina ment (Table 4, Fig. 4) showed good overall and minimum
reads aligned in this gap, and almost all Illumina reads in coverage, similar to the sample enriched by short-fragment
the adjacent 2.5 kb region show a mapping quality equal to hybridisation (Brown et al., 2015; Christiansen et al., 2014,
zero, when visualised in the IGV (Fig. S2g, h). A similar sample 9 and M. tb strain C sample 2 in Table 4), indicating
outcome has been observed for a comparison of Nanopore that the enrichment of long fragments does not introduce
and 454 reads for human herpesvirus type 1 (Karamitros bias. Preferential enrichment of certain regions (Fig. 4)
et al., 2016). Areas with increased coverage can also be seems to be due to redundancy of the captured sequence, in
observed in Nanopore- and Illumina-generated datasets this case the transposases.
(Fig. 4) in M. tb. Here, this is presumably due to the The drawbacks of our method, compared with the high-
redundancy of transposase-encoding sequences, which throughput protocol used by Brown et al. (2015), and
could result in localised increased aligning of reads. Christiansen et al. (2014), were lower target coverage and
throughput. Enrichment and library preparation take
Discussion approximately 28 h and include a 16 h hybridisation step
This study explores the capture of specific, long DNA frag- and 3–4 h of long-range PCR. In the future, the enrichment
ments for sequencing on a long-read platform, the Oxford step could be shortened to 4 h by using a different hybrid-
Nanopore MinION instrument. We demonstrate that our isation protocol, and PCR amplification could be replaced
method can be used to enrich large, specific regions of inter- with whole-genome amplification. Addition of molecular
est in mixed samples. Previous work by Greninger et al. barcodes would allow pooling of several samples to be run
http://mgen.microbiologyresearch.org 7
S. E. Eckert and others
NC_002023.1, PB2
NC_002021.1,PB2, PB1-F2
NC_002022.1,PA
NC_002017.1,HA
NC_002019.1,NP
NC_002018.1,NA
NC_002016.1,M1, M2
Fig. 2. Coverage profile of Nanopore reads from enriched influenza virus A cDNA, aligned to reference H1N1 with BLASR, coverage visu-
alized in the Integrated Genome Viewer (IGV, Robinson et al., 2011; Thorvaldsdóttir et al., 2013). Maximum read depths for the fragments
according to IGV are: 139 (NC_002023.1), 139 (NC_002021.1), 225 (NC_002022.1), 51 (NC_002017.1), 219 (NC_002019.1), 1589
(NC_002018.1), 185 (NC_002016.1), 16 (NC_002020.1).
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
Fig. 3. Nanopore reads of HCMV (a), aligned with BLASR to strain HHV-5, coverage visualized with IGV. Maximum read depth 239.
Panel (b), shows an Illumina run generated from a long-fragment enrichment, downsampled to similar coverage of 200–300.
8 Microbial Genomics
Enrichment of long DNA fragments for Nanopore sequencing
(a)
(b)
(c)
(d)
Fig. 4. Coverage by Nanopore reads from DNA of M. tb H37Rv (a), M. tb strain C (b) and Illumina-sequenced long-fragment-enriched
M. tb H37Rv (c) and M. tb strain C (d), shows a region (886 000–893 000) of high coverage in both Nanopore and Illumina experiments.
Nanopore reads were aligned with BLASR, Illumina reads with Bowtie, visualised in IGV. Maximum read depth, as determined by IGV, is 17
(a), 22 (b), 331 (c) and 277 (d).
simultaneously on one MinION flowcell. This, coupled with Speight, Jacqueline Chan, Jolyon Holdstock, Sabine E. Eckert, Mike
increasing speed, accuracy and throughput of MinION McAndrew and Amanda Brown (OGT).
reads (e.g. results in Norris et al., 2016), will reduce the Sabine E. Eckert is a Nanopore shareholder.
time and number of reads necessary for strain and variant
identification, making this method amenable for diagnostic
purposes. The relatively inexpensive and small-footprint References
MinION sequencers have been used in settings where con- Ammar, R., Paton, T. A., Torti, D., Shlien, A. & Bader, G. D. (2015).
ventional Illumina sequencing would be difficult (Quick Long read nanopore sequencing for detection of HLA and CYP2D6
et al., 2016). variants and haplotypes. F1000Res 4, 17.
Ashton, P. M., Nair, S., Dallman, T., Rubino, S., Rabsch, W.,
We see the main application of our method of enriching
Mwaigwisya, S., Wain, J. & O’Grady, J. (2015). MinION nanopore
long fragments in the detection of structural variants and in sequencing identifies the position and structure of a bacterial antibi-
generating comprehensive coverage of specific target regions otic resistance island. Nat Biotechnol 33, 296–300.
by long-read sequencing. Nanopore sequencing has previ- Brown, A. C., Bryant, J. M., Einer-Jensen, K., Holdstock, J.,
ously been used to detect structural variants in pathogenic Houniet, D. T., Chan, J. Z., Depledge, D. P., Nikolayevskyy, V.,
bacteria (Ashton et al., 2014), human DNA samples Broda, A. & other authors (2015). Rapid whole-genome sequencing
(Ammar et al., 2015) or human cancer cell lines (Norris of Mycobacterium tuberculosis Isolates directly from clinical samples.
et al., 2016); we believe our method could be employed as a J Clin Microbiol 53, 2230–2237.
non-amplicon-based alternative for this application, Carlet, J. (2015). The world alliance against antibiotic resistance: con-
improving library complexity and uniformity of the sample, sensus for a declaration. Clin Infect Dis 60, 1837–1841.
and aiding the detection of single-nucleotide variants Chaisson, M. J. & Tesler, G. (2012). Mapping single molecule
(Samorodnitsky et al., 2015). As the enrichment approach is sequencing reads using basic local alignment with successive refine-
platform-agnostic, it could also be used to generate libraries ment (BLASR): application and theory. BMC Bioinformatics 13, 238.
compatible with the other long-read sequencers, benefitting Chen, Z., Sugano, S. & Watanabe, S. (1999). A 189-bp repeat region
the field of research into structural variation. within the human cytomegalovirus replication origin contains a
sequence dispensable but irreplaceable with other sequences. Virology
258, 240–248.
Acknowledgements Christiansen, M. T., Brown, A. C., Kundu, S., Tutill, H. J., Williams, R.,
Brown, J. R., Holdstock, J., Holland, M. J., Stevenson, S. & other
We would like to thank Dietrich Lueersen, David Blaney and Dan
authors (2014). Whole-genome enrichment and sequencing of Chla-
Swan for their help in the analysis of the data; Richard Milne at the
mydia trachomatis directly from clinical samples. BMC Infect Dis 14,
Department for Virology, UCL Medical School (Royal Free Campus,
591.
Rowland Hill Street, London, UK) for the kind gift of the CMV
Merlin strain, Amanda Brown, Lise J Schreuder and Tanya Parish at Depledge, D. P., Palser, A. L., Watson, S. J., Lai, I. Y., Gray, E. R.,
Barts and The London School of Medicine and Dentistry (Queen Grant, P., Kanda, R. K., Leproust, E., Kellam, P. & Breuer, J. (2011).
Mary University of London, London, UK), for strain M.tb H37Rv, Specific capture and whole-genome sequencing of viruses from clini-
Philip Butcher and Jasvir Dhillon at St. George’s Hospital (Univer- cal samples. PLoS One 6, e27805.
sity of London, London, UK) for the generous donation of strain Doughty, E. L., Sergeant, M. J., Adetifa, I., Antonio, M. & Pallen, M. J.
M. tb C. (2014). Culture-independent detection and characterisation of Myco-
Past and present members of the PATHSEEK consortium are: Judith bacterium tuberculosis and M. africanum in sputum samples using
Breuer, Rachel Williams, Mette Theilgaard Christiansen, Josie Bryant, shotgun metagenomics on a benchtop sequencer. PeerJ 2, e585.
Sofia Morfopoulou, Helena Tutill, Erika Yara-Romero, Charlotte Greninger, A. L., Naccache, S. N., Federman, S., Yu, G., Mbala, P.,
Williams and Dan Depledge (UCL); Martin Schutten, Saskia Smits, Bres, V., Stryke, D., Bouquet, J., Somasekar, S. & other authors
Georges M.G.M. Verjans, Freek B. van Loenen, Anne van der Linden (2015). Rapid metagenomic identification of viral pathogens in clini-
and Albert Osterhaus (Erasmus MC); Katja Einer-Jensen, Martin cal samples by real-time nanopore sequencing analysis. Genome Med
Ludvigsen and Roald Forsberg (CLC Bio); James Clough, Graham 7, 99.
http://mgen.microbiologyresearch.org 9
S. E. Eckert and others
Jiang, J., Gu, J., Zhang, L., Zhang, C., Deng, X., Dou, T., Zhao, G. & (2016). Real-time, portable genome sequencing for Ebola surveil-
Zhou, Y. (2015). Comparing Mycobacterium tuberculosis genomes lance. Nature 530, 228–232.
using genome topology networks. BMC Genomics 16, 85. Quick, J., Quinlan, A. R. & Loman, N. J. (2014). A reference bacterial
Joseph, S. J. & Read, T. D. (2012). Genome-wide recombination in genome dataset generated on the MinION portable single-molecule
Chlamydia trachomatis. Nat Genet 44, 364–366. nanopore sequencer. Gigascience 3, 22.
Karamitros, T. & Magiorkinis, G. (2015). A novel method for the mul- Robinson, J. T., Thorvaldsdóttir, H., Winckler, W., Guttman, M.,
tiplexed target enrichment of MinION next generation sequencing Lander, E. S., Getz, G. & Mesirov, J. P. (2011). Integrative genomics
libraries using PCR-generated baits. Nucleic Acids Res 43, e152–e152. viewer. Nat Biotechnol 29, 24–26.
Karamitros, T., Harrison, I., Piorkowska, R., Katzourakis, A., Samorodnitsky, E., Jewell, B. M., Hagopian, R., Miya, J., Wing, M. R.,
Magiorkinis, G. & Mbisa, J. L. (2016). De Novo assembly of human Lyon, E., Damodaran, S., Bhatt, D., Reeser, J. W. & other authors
herpes virus type 1 (HHV-1) genome, mining of non-canonical (2015). Evaluation of hybridization capture versus amplicon-based
structures and detection of novel drug-resistance mutations using methods for whole-exome sequencing. Hum Mutat 36, 903–914.
short- and long-read next generation sequencing technologies. PLoS Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. (2013). Integrative
One 11, e0157600. Genomics Viewer (IGV): high-performance genomics data visualiza-
Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. (2011). tion and exploration. Brief Bioinform 14, 178–192.
Adaptive seeds tame genomic sequence comparison. Genome Res 21, Witney, A. A., Gould, K. A., Arnold, A., Coleman, D., Delgado, R.,
487–493. Dhillon, J., Pond, M. J., Pope, C. F., Planche, T. D. & other authors
Kilianski, A., Haas, J. L., Corriveau, E. J., Liem, A. T., Willis, K. L., (2015). Clinical application of whole-genome sequencing to inform
Kadavy, D. R., Rosenzweig, C. N. & Minot, S. S. (2015). Bacterial and treatment for multidrug-resistant tuberculosis cases. J Clin Microbiol
viral identification and differentiation by amplicon sequencing on the 53, 1473–1483.
minION nanopore sequencer. Gigascience 4, 12. Wlodarska, M., Johnston, J. C., Gardy, J. L. & Tang, P. (2015). A micro-
Loman, N. J. & Quinlan, A. R. (2014). Poretools: a toolkit for analyzing biological revolution meets an ancient disease: improving the man-
nanopore sequence data. Bioinformatics 30, 3399–3401. agement of tuberculosis with genomics. Clin Microbiol Rev 28, 523–
539.
Loman, N. J., Constantinidou, C., Christner, M., Rohde, H., Chan, J. Z.,
Quick, J., Weir, J. C., Quince, C. & Smith, G. P. & other authors (2013).
A culture-independent sequence-based metagenomics approach to
the investigation of an outbreak of Shiga-toxigenic Escherichia coli Data Bibliography
O104:H4. Jama 309, 1502–1510. The following reference sequences were used:
Masse, M. J., Karlin, S., Schachtel, G. A. & Mocarski, E. S. (1992).
1. Human CMV herpesvirus: HHV-5 GU1790079001.1, http://
Human cytomegalovirus origin of DNA replication (oriLyt) resides
within a highly complex repetitive region. Proc Natl Acad Sci U S A
www.ncbi.nlm.nih.gov/nuccore/GU179001.1
89, 5246–5250. 2. M. tb: strain H37Rv NC_018143.018143.2, http://www.ncbi.
Norris, A. L., Workman, R. E., Fan, Y., Eshleman, J. R. & Timp, W. nlm.nih.gov/nuccore/NC_018143.2
(2016). Nanopore sequencing detects structural variants in cancer. 3. Influenza virus: strain H1N1, A/Puerto Rico/8/1934, http://
Cancer Biol Ther 17, 246–253.
www.ncbi.nlm.nih.gov/nuccore/8486138
Noé, L. & Kucherov, G. (2005). YASS: enhancing the sensitivity of
DNA similarity search. Nucleic Acids Res 33, W540–543. 4. Human: Human_g1k_v37, www.1000genomes.org,
Quick, J., Loman, N. J., Duraffour, S., Simpson, J. T., Severi, E., 5. Dog: CanFam3.1 GCA_000002285.2, NC_006583.3, http://
Cowley, L., Bore, J. A., Koundouno, R., Dudas, G. & other authors www.ncbi.nlm.nih.gov/nuccore/NC_006583.3
10 Microbial Genomics