Fusarium Genome 23

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

PhytoFrontiersTM | XXXX r XX:X-X https://doi.org/10.

1094/PHYTOFR-10-22-0117-A

Improved Genome Assembly Resource of the Plant


Pathogen Fusarium avenaceum
Christopher Gottschalk, † Breyn Evans, and Tamara D. Collum Funding
This research was supported by the
Appalachian Fruit Research Station, Kearneysville, WV 25430 United States Department of Agriculture,
Agricultural Research Service
Abstract appropriated projects
8080-21000-030-000-D and
Fusarium avenaceum is a generalist plant pathogen of concern due to its potential to pro-
8080-21000-029-000-D.
duce mycotoxins on plant products. Previous research efforts have sequenced and assem-
bled genomes of F. avenaceum. However, those works relied on limited next-generation se-
quencing technologies that resulted in fragmented and incomplete genome assemblies. To Keywords
address this, we utilized high-depth third-generation long-read sequencing and several dif- fungi, long-read sequencing, mycotoxin,
ferent genome assembly software to generate a new, highly contiguous genome of F. ave- secondary metabolite clusters
naceum. Moreover, we conducted a thorough annotation of the genome using a mix of long-
read cDNA and short-read RNAseq data. Our genome was more contiguous than the current
reference genome Fustri1, matched the estimated genome size and chromosome number,
and contained a similar number of annotated genes to other F. avenaceum genome assem-
blies. Lastly, we conducted a secondary metabolite (SM) cluster analysis that identified 60
gene clusters associated with SM biosynthesis, five more than the reference F. avenaceum
genome. In conclusion, our genome and associated annotation information will help advance
research on plant-fungal interactions, food safety, and Fusarium spp. diversity.

Resource Announcement
Fusarium avenaceum is a generalist plant pathogen that infects many high-value crops.
The previous 12 sequenced genomes of F. avenaceum, including the NCBI representative
reference, have been assembled using next-generation sequencing approaches, including
454 pyroseq and Illumina (Kim et al. 2020; Lysøe et al. 2014; Mesny et al. 2021; Yang et al.
2022). None of the previous genome assemblies were obtained from pome fruit, common
hosts for the pathogen, nor utilized long-read sequencing technologies. Here, we present a
high-quality genome assembly for F. avenaceum and associated annotation using long-read
sequencing generated from the Oxford Nanopore Technologies (ONT; London, U.K.) platform
for both DNA and RNA obtained from fruit-sampled cultures.
Fusarium avenaceum was sampled from a pear grown and stored at the USDA ARS
Appalachian Fruit Research Station in Kearneysville, West Virginia. Taxonomic identification
was conducted using ITS and TEF1 gene sequencing, and sequences were deposited in
GenBank as OP007197 and OP007198, respectively, and with the strain name WV21P1A
(Collum et al. 2022). Liquid cultures generated from solid media-grown fungal samples were
used to extract a high-molecular-weight (HMW) gDNA sample using the large volume fun-
gal genomic DNA extraction protocol for PacBio (Ren and Moore 2021). HMW gDNA un-
derwent short-read elimination (SRE) using the 40Kb kit from Circulomics (Baltimore, MD).
Sequencing libraries were prepared from the SRE HMW gDNA using the Oxford Nanopore
SQK-LSK110 and sequencing on a Nanopore R10.3 flow cell using the MinION platform.

† Corresponding author: C. Gottschalk; [email protected]

The author(s) declare no conflict of interest.

Accepted for publication 1 December 2022.

Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0
International license.

apsjournals.apsnet.org/journal/phytofr Vol. X, No. X, XXXX | 1


Table 1. Genome assembly statistics for Fusarium avenaceum
Assembler Canua NECATb Flyec
Assembly size (bp) 42,517,872 41,834,346 42,091,861
Number of contigs 17 9 9 (1 circular)
Max. contig length 7,141,040 7,127,630 7,141,792
Min. contig length 89,058 711,171 50,747
N50 4,726,737 4,958,227 4,957,983
Assembly QVd 56.8 48.5 60.5
Assembly error 2.09E-06 1.41E-05 8.92E-07
Assembly completeness 98.81% 98.50% 98.97%
Assembly BUSCOe C:92.9%, F:3.8%, C:93.0%, F:3.6%, C:94.1% [S:93.9%, D:0.2%], F:2.8%,
M:3.3%, n:4494 M:3.0%, n:4494 M:3.1%, n:4494
Protein-coding genes 13,586
Exons 40,565
Introns 26,979
Mean gene length 1,908 bp
rRNA genes 102
tRNA genes 326
Annotation BUSCOe C:96.4% [S:93.5%, D:2.9%], F:2.1%,
Hypocreales_ODB10 M:1.5%, n:4494
Annotation BUSCOe C:97.1% [S:94.0%, D:3.1%], F:1.5%,
Sordariomycetes_ODB10 M:1.4%, n:3817

a Version 2.2, executed using default parameters except that the nanopore read input option was enabled.
b Version 0.0.1, executed using default parameters except that the Canu-corrected reads were used as the read input option.
c Version 2.9, executed using default parameters except that nano-corr was used as the read input option using the Canu-corrected reads and

scaffold mode was enabled.


d Quality value reported by Merqury.
e Benchmark Universal Single Copy Orthologs.

Raw sequence FAST5 signals were called with Guppy (v6.1.7 – GPU; Oxford Nanopore)
using the dna_r10.3_450bps_hac.cfg model. The sequencing generated 7.83 Gb from
298,551 reads, with an average length of 26,238 bp and an average quality score of 19.3 as
determined through fastq-scan (v1.0.0; https://github.com/rpetit3/fastq-scan). The sequenc-
ing generated an estimated genome coverage of 187×. Adapter sequences and reads with
middle adapters were removed using porechop (v.0.2.4;https://github.com/rrwick/Porechop),
and reads were filtered for lengths >10 Kb using NanoFilt (De Coster et al. 2018). Assembly
using only Nanopore reads was conducted using three common assemblers, Canu (v.2.2;
Koren et al. 2017), NECAT (v.0.0.1; Chen et al. 2021), and Flye (v.2.9; Kolmogorov et al.
2020), using the corrected reads generated from Canu with the estimated genome length of
46 Mb as a parameter for all assemblers. Assemblies were assessed for quality using genome
tools (v.1.6.2; http://genometools.org/) and BUSCO (Simão et al. 2015) to calculate contigu-
ity (N50), assembly length, contig number, and single copy ortholog assembly completeness
(Table 1). Due to the extensively deep long-read sequencing, we were able to increase the
per-base quality score within the assembly to a maximum QV of 60.5, which corresponds to
an error rate of 8.92E−7 determined by Merqury (v.1,3; Rhie et al. 2020). The polishing of the
assembly using the raw long reads failed to further increase the QV, and we determined that
no additional polishing using high-quality short reads was needed.
The Flye assembly represented the median for assembly size, number of contigs, N50,
and highest BUSCO score. For annotation, we undertook a hybrid approach, combining short-
read RNAseq and long-read Direct-cDNA sequencing. The short-read data were acquired
from public data available on the SRA database, SRR7962513. The long-read Direct-cDNA
sequencing was conducted using an RNA sample extracted from fungal mycelia grown in liq-
uid culture using the Qiagen RNeasy Plant Mini Kit (Germantown, MD) according to the man-
ufacturer’s instructions. The RNA sample was converted to cDNA, and a sequencing library
was prepared following the ONT protocol for direct cDNA sequencing using the SQK-DCS109
kit. The library was sequenced on an R9.4.1 flow cell on a MinION and basecalled using
Guppy (v.6.1.5 – GPU). The resulting sequencing generated 389,840 reads, of which 203,997
passed Guppy’s quality filters. Those high-quality cDNA reads had an average read length of
1,571 bp, an N50 of 2,295 bp, and a mean quality of Q 13.3, for a total of 320,382,707 called
bases. Adapter sequences and reads with middle adapters were removed using porechop.

2 | PhytoFrontiersTM
Table 2. Fusarium avenaceum genome assembly comparisons
Mesny et al. 2021 Yang Lysøe Lysøe Lysøe
Publication This study (NCBI reference) et al. 2022 et al. 2014 et al. 2014 et al. 2014
Name Fusav_v1.0 Fusti1 F156N33 Fa05001 FaLH03 FaLH27
Assembly size 42.0 Mb 42.2 Mb 41.2 Mb 41.5 Mb 42.7 Mb 43.2 Mb
N50 4.96 Mb 4.7 Mb 1.5 Mb 1.4 Mb 4.11 Mb 4.14 Mb
Chromosome/scaffold/contig 9 13 214 83 104 77
number
Protein-coding gene number 13,585 14,042 11,233 13,092 13,293 13,445
Noncoding gene number 428 307 - - - -
Reported Busco Transcriptome 96.4% 96.5%
Completeness
(Hypocreales_ODB10)

Before gene annotation, we annotated the repeat space using RepeatModeler (v.2.0.1;
Smit et al. 2015) and RepeatMasker (v.4.1.1; Smit and Hubley 2015). In total, 9,071 predicted
repeat sequences were found and masked. To annotate genes, we mapped the short reads
and long reads to a softmasked assembly using STAR (v.2.7.10a; Dobin et al. 2013) with
the –alignIntronMax 2000 option and minimap2 (v.2.24-r1122; Heng 2018) -ax splice -k14 -G
2000 options, respectively. The mapped reads were then assembled into transcripts using
StringTie2 (v.2.2.1; Kovaka et al. 2019) with the -mix option. FASTA sequences were obtained
from the assembled transcripts using gffread (v.0.9.9; Pertea and Pertea 2020). The tran-
script FASTA was used as an EST input into Maker (v.3.01.03; Campbell et al. 2014), along
with protein sequences from the Fustri1 (GCA_02744115.1) assembly from NCBI and a GFF
of the previously identified repeat features. Ab initio gene prediction was performed within
Maker using an Augustus (v.3.4.0; Hoff and Stanke 2018) that was trained over two rounds.
EvidenceModeler (v.1.1.1; Haas et al. 2008) was evoked during the Maker annotation pipeline
using default settings. Lastly, tRNAs were annotated using tRNAscan-SE (v.2.0.7; Chan et al.
2021) and rRNAs annotated using RNAmmer (v.1.2; Lagesen et al. 2007) and Snoscan (v.1.0;
Lowe and Eddy 1999). The resulting annotation identified 13,585 genes, and the resulting
transcriptome had BUSCO scores of 96.4 and 97.1% against the Hypocreales and Sordari-
omycetes ortholog database 10s, respectively (Table 1). One contig was identified as being
circular, contained a high number of tRNA annotations, and was sequenced to an extensive
coverage depth (268×) indicative of a possible plastid genome. The resulting blast searches
of the annotation within that contig identified hits to mitochondria genes and, thus, it was
removed from the assembly and annotation.
In comparison with the NCBI reference genome Fustri1 (GCA_020744115.1), our
genome assembly is more contiguous, matches the estimated genome size, and corresponds
to the known chromosome count of 8 + 1 (Mesny et al. 2021; Waalwijk et al. 2018) (Table 2).
Our assembly also exhibited high identity and collinearity with the Fustri1 genome when
aligned (Fig. 1). Furthermore, our gene annotation agrees with other previously published
genomes, yielding a similar number of protein-coding and noncoding genes and a similar
reported BUSCO score for the transcriptome (Table 2).
We additionally conducted a secondary metabolite (SM) cluster analysis using anti-
SMASH (v6.1.1) fungi with the detection strictness set to relaxed (Blin et al. 2021). In
total, 60 clusters were identified across all nine contigs, with ranges of 1 to 12 clusters
per contig. We identified SM clusters with similarities of >80% to ɑ-acorenol, oxyjavanicin,
chrysogine, fusarin, ACT-Toxin II, naphthopyrone, hexadehydroastechrome/terezine-D/
astechrome, fusarielin H, ilicicolin H, and karaiol. In addition, a cluster with a similarity of 20%
to Beauvericin, an antibiotic that can cause programmed cell death in mammals (Logrieco
et al. 1998), was located on contig 8 and contained the core biosynthetic gene. These SMs
are commonly found in Fusarium spp. serving as pigments, antibiotics, and mycotoxins. We
also conducted the same analysis for the reference Fustri1 genome, which was found to
contain 55 clusters and shared many of the same SM clusters in a conserved order, further
validating our assembly.
This genomic resource will facilitate further research on plant pathology, food safety, and
Fusarium spp. diversity. The genome has been made available in the NCBI GenBank under
accession number CP109663. The nanopore gDNA reads and cDNA were deposited in the

Vol. X, No. X, XXXX | 3


Fusav_v1.0
conti... contig_3 contig_4 contig_5 contig_6 contig_7 contig_8 contig_9

J JAGMVH0100000... JAGMVH010000001... JAGMVH01... JAGMVH010... JA... JAGMVH01... JAGMVH010... JA... JAGMV... JAGM...
38 M
33.8 M
29.6 M

GCA_020744115.1_Fustri1_genomic
25.3 M
21.1 M
16.9 M
12.7 M
8.4 M
4.2 M

4.2 M 8.4 M 12.6 M 16.8 M 21 M 25.2 M 29.4 M 33.6 M 37.8 M

Fig. 1. Genome alignment between the current study Fusav v1.0 genome (top) and the reference Fustri1 genome (GCA_020744115.1) (right).
Alignment and figure generated using D-GENIES (Cabanettes and Klopp 2018). The coloration of the alignments corresponds to the percent
sequence identity.

SRA under BioProject PRJNA890654. The assembly, extended annotation (rRNA, tRNA),
putative functional annotation, and SM cluster analysis can be retrieved from Zenodo (https:
//doi.org/10.5281/zenodo.7120993).

Literature Cited of nanopore reads via highly accurate and intact error correction. Nat. Commun.
Blin, K., Shaw, S., Kloosterman, A. M., Charlop-Powers, Z., van Wezel, G. P., Medema, 12:1-10.
M. H., and Weber, T. 2021. antiSMASH 6.0: Improving cluster detection and com- Collum, T. D., Evans, B., and Gottschalk, C. 2022. First report of Fusarium avenaceum
parison capabilities. Nucleic Acids Res. 49:W29-W35. causing postharvest decay of European Pear in Mid-Atlantic United States. Plant
Cabanettes, F., and Klopp, C. 2018. D-GENIES: Dot plot large genomes in an interac- Dis. https://doi.org/10.1094/PDIS-08-22-1784-PDN
tive, efficient and simple way. PeerJ 6:e4958. De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., and Van Broeckhoven, C. 2018.
Campbell, M. S., Holt, C., Moore, B., and Yandell, M. 2014. Genome annotation and NanoPack: Visualizing and processing long-read sequencing data. Bioinformatics
curation using MAKER and MAKER-P. Curr. Protoc. Bioinform. 12:4.11.1-39. 34:2666-2669.
Chan, P. P., Lin, B. Y., Mak, A. J., and Lowe, T. M. 2021. tRNAscan-SE 2.0: Improved Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P.,
detection and functional classification of transfer RNA genes. Nucleic Acids Res. Chaisson, M., and Gingeras, T. R. 2013. STAR: Ultrafast universal RNA-seq aligner.
49:9077-9096. Bioinformatics 29:15-21.
Chen, Y., Nie, F., Xie., S. Q., Zhen, Y., Dai, Q., Bray, T., Wang, Y., Xing, J., Huang, Z., Haas, B. J., Salzberg, S. L., Zhu, W., Pertea, M., Allen, J. E., Orvis, J., White, O.,
Wang, D., He, L., Luo, F., Wang, J., Liu, Y., and Xiao, C. 2021. Efficient assembly Buell, C. R., and Wortman, J. R. 2008. Automated eukaryotic gene structure

4 | PhytoFrontiersTM
annotation using EVidenceModeler and the Program to Assemble Spliced Align- A. K., Nielsen, K. F., Thrane, U., and Frandsen, R. J. 2014. The genome
ments. Genome Biol. 9:R7. of the generalist plant pathogen Fusarium avenaceum is enriched with
Heng, L. 2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics genes involved in redox, signaling and secondary metabolism. PLoS One 9:
34:3094-3100. e112703.
Hoff, K. J., and Stanke, M. 2018. Predicting genes in single genomes with AUGUSTUS. Mesny, F., Miyauchi, S., Thiergart, T., Pickel, B., Atanasova, L., Karlsson, M., Hüt-
Curr. Protoc. Bioinform. 65:e57. tel, B., Barry, K. W., Haridas, S., Chen, C., Bauer, D., Andreopoulos, W., Pangili-
Kim, H. S., Lohmar, J. M., Busman, M. Brown, D. W., Naumann, T. A., Divon, H. H., nan, J., LaButti, K., Riley, R., Lipzen, A., Clum, A., Drula, E., Henrissat, B.,
Lysøe, E., Uhlig, S., and Proctor, R. H. 2020. Identification and distribution of Kohler, A., Grigoriev, I. V., Martin, F. M., and Hacquard, S. 2021. Genetic deter-
gene clusters required for synthesis of sphingolipid metabolism inhibitors in diverse minants of endophytism in the Arabidopsis root mycobiome. Nat. Commun. 12:
species of the filamentous fungus Fusarium. BMC Genom. 21:510. 7227.
Kolmogorov, M., Bickhart, D. M., Behsaz, B., Gurevich, A., Rayko, M., Shin, S. B., Pertea, G., and Pertea, M. 2020. GFF Utilities: GffRead and GffCompare.
Kuhn, K., Yuan, J., Polevikov, E., Smith, T. P. L., and Pevzner, P. A. 2020. metaFlye: F1000Research 9:304.
Scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17: Ren, J. M., and Moore, L. 2021. Large volume fungal genomic DNA extraction protocol
1103-1110. for PacBio. Protocols.io.
Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., and Phillippy, A. M. 2017. Canu: Scal- Rhie, A., Walenz, B. P., Koren, S., and Phillippy, A. M. 2020. Merqury: Reference-free
able and accurate long-read assembly via adaptive k-mer weighting and repeat quality, completeness, and phasing assessment for genome assemblies. Genome
separation. Genome Res. 27:722-736. Biol. 21:245.
Kovaka, S., Zimin, A. V., Pertea, G. M., Razaghi, R., Salzburg, S. L., and Pertea, M. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., and Zdobnov, E. M.
2019. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. 2015. BUSCO: Assessing genome assembly and annotation completeness with
Genome Biol. 20:278. single-copy orthologs. Bioinformatics 31:3210-3212.
Lagesen, K., Hallin, P., Rødland, E. A., Staerfeldt, H. H., Rognes, T., and Ussery, D. W. Smit, A. F. A., and Hubley, R. 2015. RepeatModeler Open-1.0. http://www.repeatmasker.
2007. RNAmmer: Consistent and rapid annotation of ribosomal RNA genes. Nucleic org
Acids Res. 35:3100-3108. Smit, A. F. A., Hubley, R., and Green, P. 2015. RepeatMasker Open-4.0. http://www.
Logrieco, A., Moretti, A., Castella, G., Kostecki, M., Golinski, P., Ritieni, A., and repeatmasker.org
Chelkowski, J. 1998. Beauvericin production by Fusarium species. Appl. Environ. Waalwijk, C., Taga, M., Zheng, S. L., Proctor, R. H., Vaughan, M. M., and O’Donnell, K.
Microbiol. 64:3084-3088. 2018. Karyotype evolution in Fusarium. IMA Fungus 9:13-26.
Lowe, T. M., and Eddy, S. E. 1999. A computational screen for methylation guide Yang, S., Coleman, J. J., and Vinatzer, B. A. 2022. Genome Resource: Draft
snoRNAs in yeast. Science 283:1168-1171. genome of Fusarium avenaceum, strain F156N33, isolated from the atmosphere
Lysøe, E., Harris, L. J., Walkowiak, S., Subramaniam, R., Divon, H. H., Riiser, above Virginia and annotated based on RNA sequencing data. Plant Dis. 106:
E. S., Llorens, C., Gabaldón, T., Kistler, H. C., Jonkers, W., Kolseth, 720-722.

Vol. X, No. X, XXXX | 5

You might also like