Academia.eduAcademia.edu

Tara Oceans: towards global ocean ecosystems biology

2020, Nature Reviews Microbiology

Referring to uppermost layer of the ocean that receives sunlight, enabling the organisms inhabiting it to perform photosynthesis. Mesopelagic Referring to the ocean layer that receives very little to no sunlight, lying beneath the epipelagic layer, ranging from about 200 to 1,000 m in depth.

REVIEWS Tara Oceans: towards global ocean ecosystems biology Shinichi Sunagawa 1 ✉, Silvia G. Acinas2, Peer Bork 3,4,5, Chris Bowler 6,7, Tara Oceans Coordinators*, Damien Eveillard 7,8, Gabriel Gorsky 7,9, Lionel Guidi Daniele Iudicone10, Eric Karsenti6,7,11, Fabien Lombard 7,9, Hiroyuki Ogata 12, Stephane Pesant13,14, Matthew B. Sullivan 15,16,17, Patrick Wincker 7,18 and Colomban de Vargas7,19 ✉ 7,9 , Abstract | A planetary-scale understanding of the ocean ecosystem, particularly in light of climate change, is crucial. Here, we review the work of Tara Oceans, an international, multidisciplinary project to assess the complexity of ocean life across comprehensive taxonomic and spatial scales. Using a modified sailing boat, the team sampled plankton at 210 globally distributed sites at depths down to 1,000 m. We describe publicly available resources of molecular, morphological and environmental data, and discuss how an ecosystems biology approach has expanded our understanding of plankton diversity and ecology in the ocean as a planetary, interconnected ecosystem. These efforts illustrate how global-scale concepts and data can help to integrate biological complexity into models and serve as a baseline for assessing ecosystem changes and the future habitability of our planet in the Anthropocene epoch. Epipelagic Referring to uppermost layer of the ocean that receives sunlight, enabling the organisms inhabiting it to perform photosynthesis. Mesopelagic Referring to the ocean layer that receives very little to no sunlight, lying beneath the epipelagic layer, ranging from about 200 to 1,000 m in depth. ✉e-mail: [email protected]; [email protected] https://doi.org/10.1038/ s41579-020-0364-5 The Tara Oceans project The ocean ecosystem covers ~70% of Earth’s surface and contains 97% of all water on our planet. Plankton are the dominant life forms in the ocean and comprise highly dynamic and interacting populations of viruses, bacteria, archaea, single- celled eukaryotes (protists) and animals that drift with the currents. Together, these mostly microscopic organisms play a major role in maintaining the Earth system by, for example, carrying out almost half of the net primary production on our planet1 and by exporting photosynthetically fixed carbon to the deep oceans2–4. Plankton also form the base of food webs that sustain the complexity of life in the oceans and beyond5. With the goal to gain a holistic understanding of this complexity, ocean ecosystems biology investigates how biotic and abiotic processes determine emergent properties of the ocean ecosystem as a whole6. Analogously to systems biology studies that require well-characterized cell lines or model organisms for a mechanistic, molecular understanding of their phenotypes, achieving this goal will require to establish an inventory of the ocean’s plankton, to collect data on the interactions of organisms with each other and the environment, and to integrate this information in the context of physicochemical boundaries in the ocean ecosystem across space and time7. Global-scale efforts, although challenging, are poised to offer new insights into each of these directions NATURE REVIEWS | MICROBIOLOGY and should make possible better predictions of the impact of climate change on this crucial component of the biosphere. Planetary- scale studies of open- ocean organisms have long been the stuff of dreams — from the Challenger Expedition (1872–1876), which led to the discovery and description of countless eukaryotic organisms, to the Global Ocean Sampling Expedition (2004–2008), which pioneered the genomic exploration of ocean microbial communities8–10. Following this dream, Tara Oceans was conceived in 2008: a multidisciplinary project and team, including researchers with expertise in biological and physical oceanography, marine ecology, cell and systems biology, genomics, imaging as well as (bio)informatics, with a common goal to study epipelagic and mesopelagic plankton on a global scale (Fig. 1a) from the gene level to the community level7. At its beginning (Box 1), this project, which would use the 36- m schooner Tara (Fig. 1b) for the expedition, required trade-offs and innovations in sampling needs and capabilities. Enormous planning was required to identify oceanic areas of scientific interest; to negotiate international waters, ports and sampling authorizations; and to resolve intense debates across disciplines to establish baseline sampling protocols. Finally, in September 2009, Tara set sail from Lorient, France, partially navigating through stormy weather and around pirates, to collect samples for analysis by REVIEWS state- of- the- art molecular and imaging technologies (Supplementary Box 1). The primary objectives of Tara Oceans have been to generate a baseline understanding of plankton diversity, interactions, functions and phenotypic complexity across global taxonomic and spatial scales, and to communicate the scientific findings to the public and policymakers (Box 2). In addition, all protocols, data and analyses (Supplementary Box 2) should be open access to promote further research. Working towards these goals, the consortium grew organically, and by the time the expedition was completed in 2013, Tara Oceans comprised 19 international partner institutions committed to generating, organizing and analysing the vast volumes of new and heterogeneous data derived from the thousands of plankton samples collected worldwide11. While complementary expeditions exploring the deep ocean as well as local-scale and other global-scale plankton surveys are under way12–16, the focus of this Review is the work and outcomes of the Tara Oceans project. We describe how genetic, morphological and environmental data were combined and highlight the insights gained from the analysis of different plankton size spectra using a global ocean ecosystems biology approach. By providing an overview of the development of Tara Oceans from an adventurous initiative into a multinational, multidisciplinary, collaborative project, we also hope to stimulate planetary-scale research not only within a biome, but also across different ones, which will be crucial for integrating biology into models of Earth system functioning. Author addresses Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland. 2 Department of Marine Biology and Oceanography, Institute of Marine Sciences–CSIC, Barcelona, Spain. 3 Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany. 4 Max Delbrück Center for Molecular Medicine, Berlin, Germany. 5 Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany. 6 Institut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France. 7 Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, Paris, France. 8 Université de Nantes, CNRS, UMR6004, LS2N, Nantes, France. 9 Sorbonne Université, CNRS, Laboratoire d’Océanographie de Villefranche, Villefranche-sur-Mer, France. 10 Stazione Zoologica Anton Dohrn, Naples, Italy. 11 Directors’ Research, European Molecular Biology Laboratory, Heidelberg, Germany. 12 Institute for Chemical Research, Kyoto University, Kyoto, Japan. 13 PANGAEA, University of Bremen, Bremen, Germany. 14 MARUM, Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany. 15 Department of Microbiology, The Ohio State University, Columbus, OH, USA. 16 Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, USA. 17 Center for RNA Biology, The Ohio State University, Columbus, OH, USA. 18 Génomique Métabolique, Genoscope, Institut de Biologie Francois Jacob, Commissariat à l’Énergie Atomique, CNRS, Université Evry, Université Paris-Saclay, Evry, France. 19 Sorbonne Université and CNRS, UMR 7144 (AD2M), ECOMAP, Station Biologique de Roscoff, Roscoff, France. *A list of authors and their affiliations appears at the end of the paper. 1 Sample collection and processing From its departure in 2009 to its return in 2013, Tara sailed 140,000 km over a period of 38 months, systematically collecting more than 35,000 ocean water and plankton samples as well as environmental data (Supplementary Table 1) at 210 stations (Fig. 1). The aim was to sample most of the biogeographic and biogeochemical provinces17 of the global ocean and to follow standardized protocols and logistics for sample collection, distribution and storage to facilitate comparative analysis11. To assess ocean plankton from viruses to small animals from the sunlit, epipelagic waters to the dark, mesopelagic waters down to 1,000 m, seawater was sampled with use of Niskin bottles, pumps and nets (Figs 1c,2A). Total plankton were then separated into fractions of organisms of different size ranges (Fig. 2B), and the fractions were cryopreserved on filter membranes or preserved in different fixatives for molecular and/or morphological analyses on land (Supplementary Box 3). Back in the laboratory, nucleic acids were extracted from the filters and subjected to high-throughput sequencing (HTS) to generate metabarcoding (metaB), metagenomic (metaG) and metatranscriptomic (metaT) data sets as well as to yield single-cell genomes (Fig. 2C). Deep sequencing was performed with Illumina technology at high coverage rates per sample to access the genomic content of plankton species, including those that are rare in the environment. As of 2019, more than 60 terabases from more than 2,800 individual plankton samples had been made publicly available (Supplementary Table 2). In addition, high-throughput imaging (HTI) tools captured the abundance and morphological features of plankton across size fractions spanning seven orders of magnitude (Fig. 2D). In total, 6.8 million images of planktonic organisms from more than 9,200 samples, amounting to more than 30 terabytes of data, have been generated (Supplementary Table 3). The HTS and HTI data, combined with data from environmental conditions measured in situ and additional metadata (Supplementary Box 4), have been used for integrative analyses (see later and Supplementary Box 1) and released in open-source repositories (Supplementary Box 2), where they should serve as a treasure trove for future analyses. Furthermore, bioinformatic methods were either adopted or newly developed to facilitate the analysis, comparability and integration of the large volumes of data18. Still, despite these numerous bioinformatics resources and tools (Supplementary Boxes 5,6), powerful all-encompassing interfaces that make possible integration of the vast amounts of heterogeneous data generated by such global ecosystem-scale projects remain much needed for efficient secondary data usage. Global ocean plankton biodiversity Viruses. When Tara set sail in 2009, viruses were known to be abundant (one million to 100 million per millilitre of seawater), and were suggested to kill about one third of the microbial cells in seawater per day19–21. These findings, however, were largely derived from counts of virus- like particles and incubation experiments. Over the last decade, a confluence of rapidly advancing sequencing technologies and low-input molecular www.nature.com/nrmicro REVIEWS a 194 201 196 175 163 205 158 209 210 173 168 206 133 146 135 132 193 155 144 142 134 131 149 151 150 147 002 004 148 143 136 130 137 129 139 128 127 125 123 126 141 138 124 110 099 014 015 023 024 012 022 025 011 026 003 010 027 028 016 021 005 006 009 013 029 017 008 030 007 019 018 020 033 031 122 112 113 118 114 119 115 120 116 121 117 100 111 097 096 140 091 094 079 080 092 090 074 076 078 082 083 084 089 Tara Oceans (2009–2013) • 140,000 km sailed • >35,000 plankton samples collected • 210 sampling stations • >60 terabases of DNA and RNA sequenced • ~7 million images captured • 120 crew members and scientists on-board • 52 stopovers in 37 countries • 35,000 schoolchildren on board at stopovers 088 087 30 20 10 081 085 086 b 036 037 039 041 038 040 043 042 044 046 071 075 093 °C 045 047 058 056 054 053 052 048 059 055 061 049 069 057 070 060 062 051 050 067 063 064 068 066 065 072 073 077 095 098 032 034 108 104 107 103 106 109 105 102 101 189 180 152 153 001 145 188 0 c Fig. 1 | Tara Oceans sampled the global ocean ecosystem. a | The map shows the cruise track of Tara as she sailed the world from September 2009 to December 2013 and the location of 210 stations, which were chosen to cross and to sample as many biogeographic provinces and environmental features as possible (sea surface temperature shown as a colour gradient). Overall, more than 35,000 samples of seawater and plankton were collected and archived in partner laboratories. The samples are cross-referenced with the physicochemical data associated with each sample and sampling site (Supplementary Table 1). b | The 36-m aluminium-hulled schooner Tara hosted 15 crew members and scientists during the legs of her open-ocean sampling mission. c | In addition to the installation of a ‘dry laboratory’, mainly for imaging instruments, an on-board, ergonomic ‘wet-laboratory’ for plankton ecosystem sampling, from viruses to animals, was built on the rear deck of Tara. Heterotrophic Capable of incorporating organic carbon into biomass. biology techniques22 set the stage for systematic, quantitative global ocean virome surveys. These new capabilities advanced our knowledge of ocean viral genomes from 39 publicly available isolate genomes before Tara Oceans to metaG- derived sequence information for nearly 200,000 predominantly double- stranded DNA virus populations in the most recent global ocean virome (version 2; GOV2) data set23–25. This incrementally increasing (Fig. 3a) data set set the stage for new approaches to the taxonomy of double-stranded DNA viruses that infect bacteria and archaea. Population genomic analyses and ecological phenotyping suggest, at least for culturable phage– host model systems, that these hundreds of thousands of virus populations across the global ocean represent NATURE REVIEWS | MICROBIOLOGY a taxonomic rank of ‘species’. For one, this conclusion is based on the notion that gene flow is higher within than between the populations of cyanophages that were deeply sampled from coastal Pacific Ocean seawater and from evolutionary selection analyses that suggest that most of the populations are under differential selection pressures24. Among ‘heterotrophic phages’, however, the populations had measurably differing niches as inferred from host-range differences26. Furthermore, at least for viral populations that were assembled from short-read metaG data, the population-delineating benchmark of 95% average nucleotide identity differentiated more than 99% of the virus populations in the GOV2 data set24. Undoubtedly, there are microdiverse populations that cannot be assembled completely from such data sets, REVIEWS Box 1 | A historical account of the Tara Oceans expedition Tara Oceans was conceived by Eric Karsenti to popularize fundamental science using a sailing boat. The Tara Ocean Foundation proposed the use of its schooner Tara for a global expedition. A scientific component focused on plankton was soon added through inputs from Christian Sardet and Gaby Gorsky in 2007 (zooplankton) and Chris Bowler and Colomban de Vargas in 2008 (microbial plankton). The idea was shared with other scientists, leading to the development of an international, multidisciplinary, collaborative consortium aimed at studying oceanic plankton at a planetary scale. Development from a rough concept to a project of its current magnitude required a coalescence of many factors. Through 2008, a team of Tara Oceans coordinators with complementary expertise began to grow. New members joined the project through word of mouth, and regular meetings were held, approximately every 3 months, to define the structure of the project. This crucial start-up phase was made possible through seed funding from the French National Research Agency (ANR), the French National Centre for Scientific Research (CNRS), the European Molecular Biology Laboratory (EMBL), the Veolia Foundation and Region Bretagne, which recognized the potential of the project. During this time, the overall collaborative philosophy of the project, the holistic and systematic sampling strategy and the details of the sampling and analysis protocols were established. Meetings for project coordination and networking continued over the last 11 years, rotating between Paris, Roscoff (France) and Heidelberg (Germany), among other locations. The principles for the consortium were modelled on the basis of a scientific unit at EMBL, in which group leaders from different disciplines with interests in the same broad scientific question meet regularly and structure projects with a bottom-up approach. Similarly, for Tara Oceans, decisions were often made on the fly on the basis of discussions between the coordinators and were overseen by a programme manager, Stefanie Kandels. This form of planning represents an entirely different and more agile type of science than what is generally supported by peer-reviewed funding bodies that require an a priori statement of the research design and goals. Although riskier, this blue-sky approach offers opportunities to develop creative ideas that may lead to novel and innovative research directions. Furthermore, Tara Oceans is an example of how adventurous science can profit from engaged philanthropists and private entities, such as agnès b., the Tara Ocean Foundation and other private foundations and companies (see Acknowledgements), to catalyse a new approach for supporting fundamental biological research. Importantly, Tara Oceans consortium members invested their own resources in the project, which did not necessarily fit into public mainstream channels of science funding. Finally, as the project gained momentum and credibility, funding from national agencies, including the French Government through its Investissement d’Avenir programme (project OCEANOMICS), was acquired, covering the substantial costs associated with the processing and sequencing of all samples as well as the general running costs of the project. Major catalysts of the project were the flexibility and cost-effectiveness provided by use of a sailing boat to collect plankton samples across the global ocean, and the subsequent use of the most advanced technologies in sequencing, imaging, data analysis and computing onshore at the multiple partner institutions involved. Remineralized Derived from the breakdown of organic matter into its simplest inorganic form. as inferred from emerging single-virus genomics27 and long-read viromics measurements28, but the prevalence of such populations remains unclear. Beyond the species level, the scale of the data necessitated an automated and systematic approach to organize the viral sequence space at the level of viral genera29,30, resulting in the modernization of the gene-sharing network-based taxonomy31,32 that is currently a leading tool for classifying the ‘dark matter’ that dominates the virosphere33,34. Once taxonomically organized, the data provided a first glimpse into large-scale ecological patterns and drivers for ocean viruses. For example, viral communities seem structured (likely indirectly through their hosts) by temperature and oxygen, and are passively transported by oceanic currents, consistent with the notion that ‘everything is everywhere and the environment selects’23,35. Viral ecology patterns were also revealed at the between-population (macrodiversity) and within- population (microdiversity) levels, the latter tracking more recent ecological and evolutionary changes. These data, derived from the GOV2 data set24, revealed not only that the oceans globally comprise five ecological zones but also latitudinal biodiversity patterns that are both consistent with (low in the Southern Ocean, at least at the northernmost margins sampled by Tara Oceans, and high at the equator) and contrast with (surprisingly high in the Arctic) those known for macroorganisms24,36 (Fig. 3b). Beyond these large-scale patterns, ocean viruses have now also been linked in silico to ecologically important marine microbial hosts, providing foundational hypotheses that can be tested by experimental virus–host linkage methods. These data have provided global virus–host infection maps25, and for cyanobacteria they have advanced our understanding of how viral infections associate with diel dynamics of host communities37–39. Specifically, a cross-omics analysis that leverages Tara Oceans viral genome data revealed a peak of cyanophage gene expression in the afternoon or dusk followed by an increase of genomes from the virions at night, confirming that cyanophages drive the diel release of cyanobacteria-derived organic matter into the environment39. The data have also revealed key virus-encoded auxiliary metabolic genes that indicate extensive metabolic reprogramming of the hosts and are likely to directly modulate biogeochemical cycles25,29,40. These genes range from photosynthesis genes, which were identified in cyanophage isolate genomes more than a decade ago41,42, to genes that manipulate central carbon, sulfur and nitrogen metabolism25,43. From an ecosystem perspective, these findings question some paradigms in phage and ocean microbiology. For example, a multi-omics mechanistic study of a cultured representative of the third-most-abundant ocean viral genus (Bacteroidetes phage phi38:1) revealed that these viruses infect two bacterial strains with identical 16S ribosomal RNA gene sequences in completely different ways due to a diversity of mismatched metabolic machinery and ameliorated cellular defences. These data suggest that phage resistance in nature is not due to simple, single- step mutational events but is rather due to a multistep and more complex process44. A solid understanding of the ecosystem outputs of cells infected by viruses (‘virocells’) remains elusive. To address this knowledge gap, an experimental model system using viruses that infect Pseudoalteromonas, the second-most highly predictive bacterial genus for ocean carbon flux45, was investigated by a multi-omics mechanistic approach, which revealed that virocells differ drastically from uninfected cells and that virocells infected with one phage are completely different from those infected with another phage46. Additionally, the vast Tara Oceans organismal data set coupled with global measurements of ocean carbon export provided an opportunity to determine which organisms best predicted this crucial ecosystem function. For decades, the paradigm in viral ecology has been that viruses ‘keep carbon small’, as lysis products are rapidly remineralized47; however, early observations that viruses seemed to sink, as inferred from photosynthesis gene sequences at various depths48 and later gene-to-ecosystem modelling predictions45, suggested www.nature.com/nrmicro REVIEWS bacterial and archaeal viruses are key players in the biological carbon pump, at least in the predominantly open-ocean waters that were sampled. Specifically, the abundances of viruses best predicted global ocean carbon flux in comparison with abundances of bacteria and archaea or eukaryotes, and a handful of the most predictive viruses were identified to guide future work45. Although improved modelling techniques are needed to better capture the magnitudes of variation in carbon flux and to simultaneously compare the relative impact of all organismal groups, these gene-to-ecosystem modelling predictions suggest that viruses are important to an ecosystem process that was traditionally viewed to be dominated by the mere presence of large protists and metazoans. Furthermore, Tara Oceans deep- sequencing data provide unprecedented opportunities to unveil the global eukaryotic virosphere49–52. Although eukaryotic viruses are less abundant than bacterial and archaeal viruses, gene marker-based approaches have revealed that eukaryotic viruses are at least as abundant as archaea in the epipelagic layer of the ocean50. Both metaG data and metaT data suggested that nucleocytoplasmic giant DNA viruses are ubiquitous and transcriptionally active across oceans49. Reminiscent of the phage-to-host ratio, these giant DNA viruses outnumber the abundance of their potential hosts by an order of magnitude50 and show dispersal at a planetary scale53. Moreover, the taxonomic richness of these giant viruses was shown to be potentially greater than that of bacteria and archaea54. The study of the associated genomes has also led to an improved understanding of the impact of eukaryotic viruses on the evolution of the host’s sexual life cycle. For example, the loss of the genomic capacity to carry out a sexual life cycle in the cosmopolitan phytoplanktonic organism Emiliania huxleyi in the oligotrophic ocean was found to be associated with decreased biotic pressure due to the low abundance of large virus infection55. Environmental factors, such as phosphate availability, were further found to drive giant virus community Box 2 | Outreach and societal impact The mission of the Tara Ocean Foundation includes outreach that makes possible the combination of innovative science with diverse activities to communicate the project goals and findings to the public, company managers, policymakers and schoolchildren. During the expeditions from 2009 to 2013, Tara completed 52 stopovers in 37 countries. In each of the ports of call and in the city of Paris, where Tara was docked during the 2015 United Nations Climate Change Conference, the crew and scientists welcomed several thousand visitors onboard. More than 50 exhibits explained the goals of the project and demonstrated the key role that tiny organisms drifting in the oceans play in the global ecology of our planet. Tours aboard Tara and the numerous conferences presented by members of the team in every country were inspirational to the visitors, who included local decision makers (mayors, ministers, heads of states and the United Nations Secretary General) in addition to many schoolchildren. The story of scientists, sailors and inspired artists criss-crossing the planet on a schooner to explore the ocean using state-of-the-art technologies also attracted non-scientists and the media around the world. Photographs and videos received worldwide attention (for example, plankton chronicles and artist profiles, and journalists wrote articles about the expedition from many different angles, especially following the publication of major scientific results in 2015 (ReF.149). In addition, the Tara Ocean Foundation published three journals in French, English, Chinese, Japanese and Portuguese, and the expedition was popularized through documentaries on television (for example, 35 prime-time Thalassa shows on French television in 2009 and 2010) as well as several DVDs and books. NATURE REVIEWS | MICROBIOLOGY structure in some regions of the ocean56–59. Furthermore, some eukaryotic viruses are predicted to influence the efficiency of the vertical carbon flux60. Overall, these studies led to the development of community resources, including iVirus, which provides access to viromic tools and data sets61 and a knowledge base of virus–host interactions62, to facilitate further eco- genomic exploration of marine viruses. Of note, single- stranded DNA viruses and RNA viruses from the Tara Oceans samples and data sets have yet to be analysed. Although single-stranded DNA viruses are not thought to be abundant in marine systems25, RNA viruses are suggested to constitute as much as half of the viral particles in the oceans63. Bacteria and archaea. The Global Ocean Sampling Expedition pioneered the exploration of the genomic diversity of ocean bacteria and archaea on the basis of environmental DNA sequencing8,10. Analysis of surface water samples collected at 41 locations from the eastern North American coast through the Gulf of Mexico and into the equatorial Pacific revealed approximately six million new protein sequences, almost doubling the number available in public databases in 2005 (ReF.10). Despite this prompt expansion of ocean microbial protein sequences, the diversity of protein-coding genes in nature was too large to be captured with the sequencing technology available at the time. Specifically, new protein families were discovered at nearly linear rates with additional sequencing10. Because of the advent of HTS technologies and the drastic reduction of costs since 2008, Tara Oceans generated unprecedented amounts of environmental sequencing data for each sample with the goal to obtain an ecosystem- wide overview of the diversity, function, biogeography and activity of the global ocean microbiome64,65. For a set of 243 samples enriched in planktonic organisms smaller than 3 μm, more than 7.2 terabases of metaG sequencing data were assembled into an ocean microbial reference gene catalogue (Fig. 3a) by use of a method originally developed for human microbiome research66–68. This first ocean gene catalogue comprised more than 40 million non-redundant protein- coding sequences. This number was four times higher than the one reported for the human gut microbiome at the time65, although approximately two thirds of the gene abundances could be attributed to core gene families that were shared between the two biomes. Moreover, 80% of these sequences were previously unknown, on the basis of a nucleotide sequence identity of more than 95% with sequences in reference databases. The rate of detecting new genes from an estimated 35,000 bacterial and archaeal operational taxonomic units (OTUs; 97% clusters) decreased to 0.01% by the end of sampling, suggesting near saturation of gene diversity in these samples. These newly established resources have thus substantially expanded our knowledge of the ocean microbial gene repertoire and made possible taxonomic and gene functional composition analyses of ocean microbial communities on a global scale65. In agreement with prior studies69,70, microbial communities sampled worldwide REVIEWS A C High-throughput sequencing Depth 0m -1 m Single-cell or singleorganism genomics -20 m -30 m -40 m -50 m -100 m Epipelagic zone -10 m Community DNA Community RNA Total cDNA -150 m Messenger cDNA Aa -200 m Ab -300 m -400 m -600 m -700 m MetaB MetaG Mesopelagic zone -500 m -800 m 25 Pigments Carbonate system 5 Nutrients DOC, CDOM CTD Oxygen Chlorophyll Particle backscattering Photosyntetic efficiency PAR Ac Barcodes Expressed Active Total Community genomes organismal genetic and eukaryotic genes and genes diversity and organismal relative diversity, RNA viruses, abundances non-coding RNA and so on -900 m -1,000 m B Plankton size fractions Virus MetaT Protists Metazoans Planktonic organisms Bacteria Picoplankton Nanoplankton < 0.2 µm The 12 Tara Oceans plankton size fractions 0.02 µm 0.2 to 1.6 µm 0.2 to 3 µm 0.1 µm 0.2 µm 1 µm Microplankton Mesoplankton > 200 µm 20 to 180 µm 0.8 to 5 µm 3 to 20 µm 5 to 20 µm 5 µm 10 µm 20 µm Macroplankton Megaplankton > 680 µm 180 to 2,000 µm > 50 µm > 300 µm 100 µm 1 mm 1 cm 10 cm 1m D High-throughput imaging Flow cytometry eHCFM FlowCam ZooScan UVP www.nature.com/nrmicro REVIEWS Mixotrophy Capacity to incorporate carbon into biomass from either inorganic or organic sources. Photoheterotrophy Capacity to derive energy from light and carbon from organic matter. Haptophytes group of single-celled photosynthetic planktonic organisms. Metagenome-assembled genomes (MAgs). Consensus genome sequences that are reconstructed using sequencing reads of DNA extracted from whole microbial communities. at 68 locations from epipelagic and mesopelagic waters were primarily structured by depth. In addition, the taxonomic and gene functional diversity was higher in mesopelagic layers than in epipelagic layers, whereas viruses showed the opposite pattern for the same latitudinal range24 (Fig. 3b). Beyond depth, previous studies suggested temperature and other factors, such as salinity and nutrients, as important drivers of the taxonomic composition of ocean microbial communities71. Because of global sampling by Tara Oceans, it was possible to disentangle geographic effects from environmental effects (that is, the similarity of microbial community compositions may be driven by geographic proximity rather than environmental similarity of the respective sampling locations), and as a result pinpoint temperature as a key variable to predict the taxonomic and gene functional composition in epipelagic waters of the open ocean65. In addition, the availability of metaT data from 126 sampling sites, including the Arctic Ocean (Fig. 1a), made it possible to address the question of how microbial communities adjust to global environmental variation. Such adjustments seem to differ not only for individual metabolic processes but also for oceanic regions64. These conclusions were reached through the integration of metaT and metaG data, along with the development of new bioinformatics resources and normalization procedures. Specifically, a new ocean microbial gene catalogue with 47 million sequence entries was generated to facilitate the integration of metaT and metaG data for gene- level quantitative analyses of community transcript, gene abundance and gene expression levels. Normalization procedures based on single-copy marker genes64,72 made it possible to distinguish organismal turnover and gene expression changes as underlying ◀ Fig. 2 | Tara Oceans assessed plankton across taxonomic, organismal and environmental scales to study the whole ecosystem. A | At each open-ocean station (Fig. 1), Tara sampled plankton during daytime and night-time, guided by satellite data, using five types of plankton nets with different mesh sizes (part Aa), an industrial, highvolume peristaltic pump (part Ab) and a rosette water sampling system equipped with Niskin bottles (part Ac), from sunlit (surface and subsurface, including the deep chlorophyll maximum) to dark (mesopelagic) waters down to 1,000-m depth. Key physicochemical parameters of the sampled water were measured in situ or back in the laboratory (Supplementary Table 1). B | The Tara Oceans sampling protocol targeted in total, although not at every sampling site, 12 organismal size fractions from picoplankton to megaplankton, that is, across more than seven orders of magnitude in size, corresponding to the range from the size of a bee to ten times the height of Mount Everest. C | The Tara Oceans high-throughput sequencing workflow generated multi-omics data sets (Supplementary Box 1; Supplementary Table 2) for assessment of the diversity and relative abundance of genomes, genes and taxonomic barcodes across the kingdoms of life. D | The high-throughput imaging methods applied by Tara Oceans imaged plankton from different size fractions (Supplementary Table 3) to quantify organismal richness, sizes, biovolumes and morphological complexities. Owing to field work conditions and prioritization of specific analyses, it was not possible to collect every sample at each station and to subject every sample to all possible types of analyses. However, each sample is cross-referenced to a rich set of metadata to provide researchers with the possibility to ensure comparability of different samples and data types. All physicochemical, sequencing and imaging data obtained are archived following FAIR (findable, accessible, interoperable and reusable) principles to facilitate integrative analyses (Fig. 6). CDOM, coloured dissolved organic matter; CTD, conductivity, temperature and depth; DOC, dissolved organic carbon; eHCFM, environmental high-content fluorescence microscopy; metaB, metabarcoding; metaG, metagenomics; metaT, metatranscriptomics; PAR, photosynthetically active radiation; UVP, underwater vision profiler. Parts B and D adapted from ReF.7, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). NATURE REVIEWS | MICROBIOLOGY mechanisms for changes in the pool of community transcripts. These analyses suggested that polar communities are more specifically adapted to their niches and may undergo stronger organismal composition changes than their physiologically more variable counterparts in temperate and tropical waters in response to rising seawater temperatures64 (Fig. 3b). More focused analyses of Prochlorococcus and Synechococcus, the two most abundant and widespread phototrophic bacterial genera in the ocean, have led to a better understanding of their genetic capacity for mixotrophy as well as factors that control their biogeographic distribution73–75. In addition, single-cell genome sequencing revealed a novel species in the genus Kordia (phylum Bacteroidetes) with the potential capacity for photoheterotrophy 76,77, and combined HTS and morphological analyses contributed to identifying the functional and ecological importance of a symbiosis between nitrogen-fixing cyanobacteria and haptophytes in epipelagic waters worldwide78–80. Finally, the resources and data provided by Tara Oceans have also made possible research by investigators outside the consortium. For example, following the first release of metaG data65, thousands of draft genomes, so-called metagenome-assembled genomes (MAGs), from ocean bacteria and archaea were reconstructed81–84. Some MAGs shed new light on the metabolic capabilities of certain bacterial lineages81. Specifically, a number of bacteria in the phylum Planctomycetes, a group that had previously not been linked to nitrogen fixation, were found to contain and express64 genes that are required for this process. Other MAGs led to an extended view of the phylogenetic diversity of Alphaproteobacteria, questioning the long-standing hypothesis about the origin of mitochondria. Rather than mitochondria having evolved from the alphaproteobacterial order Rickettsiales, the authors suggested that mitochondria may have evolved from a lineage that branched off before the divergence of all Alphaproteobacteria sampled to date82. Other studies discovered new families of light-sensing rhodopsins85, including a new class of anion- conducting channel rhodopsin86. Furthermore, Tara Oceans data were used in other ways, such as to define metabolic functional groups and to decouple their global variation from taxonomic community composition87, and to estimate the contribution of marine organisms to the total biomass on Earth88. These examples showcase the accomplishment of one of the initial goals of the project: the wide use of Tara Oceans data by the broad scientific community to make possible new discoveries and follow up on diverse research questions. Protists. Protists, which span a wide range of organismal sizes from less than 1 μm to a few millimetres, can have extremely large genomes and complex morphologies, physiologies and behaviours89 (Fig. 4). In 2009, knowledge of planktonic protists was based mainly on DNA metaB and flow-cytometry surveys for the smallest taxa, microscopic observations for the larger ones (larger than 20 μm) and genomic characterization of a handful of model organisms, largely phototrophic species (phytoplankton), including alveolates, diatoms, REVIEWS a Resource generation Sample collection DNA extraction Sequences of genes and viral populations Serial filtration Viral populations (×10³) 1.6 or 3 µm Bacteria and archaea 0.22 µm 250 48 200 46 42 100 40 50 0 Bacteria and archaea 44 150 Viruses 38 43 106 145 243 370 Gene sequences (×106) Sequencing 36 Samples Filtrate De novogenerated databases Viruses Assembly and gene prediction b Data analysis Environmental parameters DNA or RNA sequencing data Quantitative profiles Epipelagic OTUs, genes or viruses MetaG S1–Sn De novogenerated databases MetaT S1–Sn + N 5 viral zones Epipelagic Arctic All depths Viruses Bacteria and archaea Temperate and tropical Epipelagic Mesopelagic Bathypelagic Mesopelagic Equator North Pole Antarctic All depths Metatranscriptome variation Viruses Most organisms Chlorophyll a Biogeographic patterns Depth Latitude O2 P Samples S1–Sn Diversity patterns South Pole Mesopelagic Temperature Community turnover dominates Gene expression changes dominate Temperature www.nature.com/nrmicro REVIEWS ◀ Fig. 3 | Tara Oceans viral, bacterial and archaeal analysis pipeline and highlighted discoveries. The analysis of sequencing data from environmental DNA enriched for viruses, bacteria and archaea and environmental RNA enriched only for the last two involved the establishment of reference resources that were subsequently used to study diversity and biogeographic patterns. a | Reference resources (for example, seawater samples) were sequentially filtered to separate plankton into several size fractions. For bacteria- and archaea-enriched samples, seawater was filtered through 1.6- or 3-μm filters and collected on 0.22-μm filters. Viruses were flocculated in 0.22-μm filtrates using ferric chloride and collected on 1-μm filters11. On DNA extraction, library preparation and sequencing, DNA sequencing reads were assembled into contigs. For viruses, contigs were screened for sequences of viral origin and then grouped for individual viral populations24,25,149. For bacteria and archaea, genes were predicted on contigs and clustered to yield catalogues of non-redundant gene sequences64,65. b | These de novogenerated resources were used as reference databases for quantifications of viruses, genes and microbial species per sample (S). Bacterial and archaeal quantifications were derived as abundances of operational taxonomic units (OTUs) based on 16S ribosomal RNA fragments directly identified from metagenomic (metaG) sequencing reads150. Integration of quantitative profiles with environmental parameters (Supplementary Table 1) facilitated the study of diversity gradients across latitude and depth, with partly contrasting patterns observed for viruses compared with bacteria and archaea24,64. Biogeographic analyses revealed five ecological zones of viral populations24, and differences in the mechanisms driving community transcriptomic compositions were identified by combining metaG and metatranscriptomic (metaT) data64. Part b, bottom right adapted from ReF.64 and bottom left from ReF.24, Elsevier. Eocene epoch second geological epoch of the Palaeogene period (66 million to 23 million years ago) that began 56 million years ago and ended 34 million years ago. Southern Ocean gateways Pathways of the oceanic circulation that are influenced by the displacement of continents (for example, the Drake Passage, south African gateway and the Tasman gateway between Antarctica and south America, Africa and Australia, respectively). coccolithophores and prasinophytes. In oceanography, protists are traditionally divided on the basis of their broad ecological function into phytoplankton, heterotrophic nanoflagellates and larger predators. To expand this incomplete knowledge base, Tara Oceans developed an automated high-resolution 3D imaging workflow for quantitative subcellular exploration of microeukaryotes90 and generated more than 220 billion DNA sequencing reads from about 2,200 samples (Supplementary Table 1). Furthermore, 6.8 million images from ~9,000 eukaryote samples covering organismal size fractions from 0.8 µm to a few centimetres (picoplankton and nanoplankton to macroplankton and megaplankton; Supplementary Table 2) were obtained (Figs 4b,d,5). This data set of protist biocomplexity was completed with the building of new reference resources of taxonomically curated DNA barcodes91,92, transcriptomes49 and single-cell genomes93,94. The first large- scale metaB survey based on a fragment of the 18S ribosomal RNA gene95 revealed about 150,000 eukaryotic taxa (genus or higher taxonomic levels) in the epipelagic ocean, and only ~10% of these taxa were known previously. More than 85% of these taxa represent uncharacterized protists of mostly heterotrophic groups96, including many parasites and symbionts78,79,97–99 (Fig. 4c), in addition to the traditional members of the plankton community (such as diatoms100, dinoflagellates101, prasinophytes102 and ciliates103). The Tara Oceans metaB survey has become a reference baseline for the community to assess global upper ocean diversity and biogeography of specific taxa or functional groups104–106 and as a test data set for new bioinformatic tools107,108. Targeted analysis of the major eukaryotic phytoplankton groups (diatoms, dinoflagellates, haptophytes, pelagophytes and chlorophytes) has clarified their relative abundances with respect to each other as well as with respect to mixotrophs and known photosymbionts and with respect to the different NATURE REVIEWS | MICROBIOLOGY organismal size fractions collected by Tara Oceans109. Clade-specific analyses across plankton size fractions further revealed the importance of nano- sized and pico- sized diatoms that had previously been overlooked in ocean surveys110. These minute diatom species, including Minidiscus spp. and Minutocellus spp., were found to be globally distributed, and data from the DeWeX cruise in the Mediterranean Sea revealed that these organisms can generate massive blooms and can also be found at depth, implying a substantial contribution to carbon export111. In addition, the combination of metaB data with palaeoenvironmental data and phylogenetic models of diversification were used to analyse the evolutionary diversification of the entire group of diatoms. There was a negative correlation between carbon dioxide partial pressure and early diatom diversification, consistent with increased primary productivity (that is, conversion of inorganic carbon into organic carbon) that favours increased diversity. Subsequently, in the late eocene epoch, a major burst of diversification occurred at around the same time as the southern ocean gateways opened, creating a new ecosystem where diatoms could thrive. The molecular data are consistent with previous reports based on analysis of diatom microfossils112. This diversification was affected by changes in sea level, an influx of silica and competition with other planktonic groups, and different diatom clades were affected differently. This heterogeneity in diversification dynamics across diatoms suggests that a changing climate will favour some clades at the expense of others94. Furthermore, the deep coverage of the Tara Oceans metaB data sets (typically one million to two million sequence reads per sample) has made possible exploration of the rare protist biosphere92. Briefly, an adaptive algorithm was used to explore the variant abundance distributions of non-dominant OTUs across plankton communities. These rare OTUs constituted more than 99% of the local richness in each sample, and their relative abundances were governed by a power law. Despite the apparently very high spatial turnover in species composition at a given site in the ocean, the power-law exponent varied by less than 10% across locations and showed no biogeographic signature. Such striking regularity suggests that the assembly of protist communities is governed by large-scale ubiquitous processes, despite the highly dynamic and variable environment. The underlying drivers of this relationship are unknown, but the similarity of the power-law exponent to 3/2 resembles the temporal spectra of intermittently varying ecosystems113,114, suggesting that local abundances are influenced by spatiotemporal variability. Understanding the origin and impact of this apparently universal abundance signature of non-dominant protists on plankton ecology is important for evaluating the resilience of marine biodiversity in a changing ocean. Of note, the global eukaryotic metaB survey has revealed a realm of unknown diversity and functions among heterotrophic and symbiotic (sensu lato) protists (Fig. 4c). For example, planktonic diplonemids may well be the most diverse group of planktonic eukaryotes in the ocean, with the majority of their abundance and diversity REVIEWS in deeper waters96. Although the underlying causes of their hyperdiversification and the roles of these different lineages in the ecosystem remain unknown, specific trophic interactions, such as bacterivory or parasitism, appear most probable96,115. Sequencing of barcodes from individual protists isolated from ethanol- preserved plankton samples (Supplementary Box 3) showed widespread symbiotic associations, such as those between the coral-associated dinoflagellate Symbiodinium and Bacterivory organisms that obtain carbon and energy primarily from the consumption of bacteria. the calcified ciliate Tiarina116 (Fig. 5A) and between the chain-forming pennate diatom Fragilariopsis doliolus and tintinnid ciliates99 (Fig. 5B). Image acquisition of fragile plankton from the surface to 1,000-m depth highlighted the abundance of giant photosymbiotic rhizarian protists (order Collodaria; Fig. 5g) detected by metabarcoding in mesoplankton size fractions95 and showed that their biomass exceeds that of all zooplankton in (sub) tropical oceans117. a b >20 cm Gelatinous predator Jellyfish HTI HTS Armoured swimmer Macro or mega Colonial phototroph Parasite, symbiont, phototroph, heterotroph or mixotroph Giant mixotroph Haptophyte (Phaeocystis) Dinoflagellate Radiolaria Copepods UVP Meso sis ZooScan bio m Sy n o ati MetaB MetaT MetaG d Pre FlowCam Micro IFCB Nano SAGs eHCFM Pico or nano 0.8 µm Protists Metazoans Organismal size (adults) c d 1012 Protists (phototroph) Flow cytometry IFCB Flowcam ZooScan UVP 10 10 Protists (heterotroph) Photohosts Abundance 108 Metazoans Endophotosymbiont Parasitic protists 106 104 102 100 10-2 -1 10 100 101 102 Diameter (µm) 103 104 105 www.nature.com/nrmicro REVIEWS Miocene epoch First geological epoch of the Neogene period (2.6 million to 23 million years ago) that extends from about 23 million to 5 million years ago. Single-cell genomics, metaG, and metaT analyses of Tara Oceans eukaryote-enriched samples confirmed the hyperdiversification of heterotrophic and symbiotrophic protists and suggested potential mechanisms underlying these processes. A single-cell genomics survey revealed hidden functional complexity and niche differentiation in unculturable heterotrophic protists, partially explaining their unforeseen diversity93,94,118. Although a direct comparison with bacterial and archaeal gene diversity is difficult18, global eukaryotic metaT data from 441 communities yielded an extreme richness of more than 116 million transcripts from eukaryotes (including metazoans) without apparent saturation49. Many unknown genes were detected, and the biogeography of their specific expression revealed a potential link to niche adaptation. On the basis of these findings, protists have arguably emerged as the group of organisms that drive today’s plankton complexity (Fig. 4a). To unify the analyses of the emerging complexity of protist data under a single ontology, an initiative for building a universal taxonomic framework for eukaryotes has been launched (UniEuk), and the Tara Oceans metaT data have yielded the largest available gene collection for eukaryotes49. Zooplankton. Zooplankton have a central role in the ocean by transferring energy, nutrients and biomass from lower to higher trophic levels5,119. Biodiversity patterns in planktonic metazoans are far less understood than those in their terrestrial counterparts. In Tara Oceans, five different types of nets (Supplementary Table 3) were used to collect nearly 1,500 standardized zooplankton samples at depths from the surface to a few hundred meters. Imaging and HTS were then used (Fig. 4b) to assess the morphogenetic complexity of zooplankton communities in well-defined oceanographic provinces. ◀ Fig. 4 | Tara Oceans analysis of eukaryotic plankton complexity and highlighted discoveries. a | This illustration shows the biological and functional complexity of eukaryotes across the plankton organismal size fractions analysed in Tara Oceans. Whereas tiny phytoplanktonic organisms (for example, Phaeocystis) can assemble into visible colonies, heterotrophic protists in association with phytoplanktonic organisms (for example, Collodaria) can form giant holobionts, which outweigh all animals in (sub) tropical sunlit oceans117. On the other hand, animals produce gametes, juvenile stages and debris which might be important components of microbial plankton size fractions. Overall, eukaryotes show diverse and complex interactions and behaviours along the symbiosis and predation axes, reflecting their immense and non-saturating gene repertoire49. Note that the viral, bacterial and archaeal diversity associated with eukaryotes is not represented in this scheme. b | Different molecular and imaging methods were developed and/or deployed by Tara Oceans to explore and assess unicellular and multicellular eukaryotes across their ontogenic, life-cycle and symbiotic complexity. c | This schematic network synthesizes the relative importance of the main eukaryotic taxonomic and functional groups36 and their interactions (symbiosis sensu lato in green and predation in red). Metabarcoding (metaB) data highlighted the dominant diversity of heterotrophic and parasitic protists95 and their central role in shaping the global plankton interactome98. d | The suite of Tara Oceans automated imaging devices (shown here: flow-cytometry, imaging flow cytobot (IFCB), FlowCam, ZooScan and underwater vision profiler (UVP)) make possible quantification of the abundance of organisms ranging from 0.8 µm to several centimetres in size. These spectra can then be used to estimate how biomass is distributed along plankton size spectra or functional groups. eHCFM, environmental high-content fluorescence microscopy; HTI, high-throughput imaging; HTS, high-throughput sequencing; metaG, metagenomic; metaT, metatranscriptomic; SAG, single amplified genome. Part d adapted from ReF.132, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). NATURE REVIEWS | MICROBIOLOGY In terms of imaging, all zooplankton samples have been processed, and most have been validated by experts using EcoTaxa (Supplementary Table 2). Several Tara Oceans imaging data sets were used to assess the mechanisms that contribute to the limited dispersal of Indian Ocean plankton populations into the Atlantic52. Imaging data have also demonstrated substantially reduced abundances of metazoan plankton in the Indian Ocean oxygen minimum zone and its positive effect on carbon flux120. Analyses were also performed on targeted zooplankton taxa. For example, combined morphological and phylogenetic data revealed that sea snails (clade Thecosomata) diversified through four major morphogenetic events that coincide with climate events from the Eocene epoch to the Miocene epoch; this evolutionary scenario is potentially driven by skeleton selection to avoid predation or to increase buoyancy121. Additionally, a comprehensive phylogenetic study of chaetognaths based on complete ribosomal DNA genes amplified from preserved specimens showed that their evolution corresponds to simplification of a pre-existing body plan rather than to an increase in morphological complexity122. Finally, metaG data have been used to study the population structure of the abundant copepod Oithona in the Mediterranean Sea123, providing evidence for genes under selection in specific contexts and allowing the creation of a collection of single-nucleotide polymorphisms in a reference-independent manner124. The current data pave the way for studies of diversity and expression at the gene level for the main groups of zooplankton; however, these studies merely scratch the surface of the morphogenetic information buried in the Tara Oceans collection of zooplankton samples and data. Increased efforts to sequence metaB data and genomes of the major organisms, to use metaG and metaT information for detecting genes under active selection and to correlate genetic data to imaging information in the future will undoubtedly advance our knowledge of these important planktonic organisms and their role in the ocean ecosystem. Integrative ocean ecosystems biology A unique feature of the growing Tara Oceans data set is its relatively uniform and deep coverage over spatial and taxonomic scales, encompassing the variability of the global plankton ecosystem from the surface to mesopelagic depths (Figs 1,2). This scope and the large data sets derived from it facilitate data-driven analyses to extract information in a comprehensive eco-evolutionary framework. Tara Oceans thus attempted to integrate the different layers of ecosystem organization, from genes to organismal populations, across environmental and spatial variations and beyond analyses of plankton within specific size fractions (Fig. 6). To decipher the plankton metacommunity structure, a global plankton co-occurrence network was drafted to include both biotic and abiotic information98. The results showed that biotic and positive co-occurrences predominate over environmental influences on community structure. Furthermore, this network revealed the prevalence of parasitic and photosymbiotic protists95,117 REVIEWS Aa Ab Ac Ba Ca Cb Bb Bc Da Db E Ga Fa Fb Gb www.nature.com/nrmicro REVIEWS ◀ Fig. 5 | Eukaryotic shapes and symbioses explored by Tara Oceans plankton imaging. All eukaryotes host a cohort of more or less specific or beneficial viral, bacterial, archaeal or eukaryotic symbionts. The staining strategy developed for automated confocal microscopy of aquatic microbial eukaryotes (environmental high-content fluorescence microscopy)90 revealed symbiotic interactions in marine protists. A | Photosymbiosis occurs between the calcareous ciliate Tiarina sp. and Symbiodinium dinoflagellates. The image shows confocal laser scanning microscopy (CLSM; panels Aa,Ab) and scanning electron microscopy (panel Ac) reconstructions of the ciliate host; false colours show nuclei of the ciliate in cyan, nuclei of the symbiotic microalgae in blue and Symbiodinium chloroplasts in red (scale bars 20 μm). B | Diatoms (Fragilariopsis sp. cells assembled in a chain) and a heterotrophic tintinnid ciliate (Salpingella sp., shell with trumpet-shaped oral opening) form a symbiotic relationship (scale bars 10 μm). C | Intracellular cyanobacterial symbionts (Richelia sp.) are seen within the pennate diatom Rhizosolenia (panel Ca; scale bar 20 μm). Two cyanobacterial trichomes are visible with their heterocysts (panel Cb; scale bar 10 μm). D | Association between the heterotrophic dinoflagellate Amphisolenia and unidentified cyanobacteria hosted inside the cell wall (arrowhead) (scale bar 30 μm). E | The diatom Corethron sp. harbours several epiphytic nanoflagellates living in small shells and attached to the diatom cell wall (scale bar 30 μm). F | Dinoflagellates from the genus Ornithocercus host extracellular cyanobacterial symbionts in their ‘symbiotic chamber’ (OmCyn cyanobacteria)151 (dinoflagellate cell size between 80 and 100 μm). G. Giant colonial protists (Collodaria) harbour intracellular dinoflagellate symbionts (Brandtodinium sp.)152. This light stereoscope image (panel Ga) shows an entire colony (scale bar 1 mm); the CLSM image (panel Gb) within a colony shows collodarian cells (blue, 200 μm), endosymbiotic dinoflagellates (red, 20–30 μm) and a reticulate cytoplasmic network (green filaments). All images (except for those in panels Ac,Ga) are CLSM reconstructions from single Tara Oceans cells. Panel A adapted from ReF.116, Springer Nature Limited; panel B from ReF.99, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/); panels C and F from ReF.153, Springer Nature Limited; and panels D and E from ReF.90, CC BY 4.0 (https:// creativecommons.org/licenses/by/4.0/). Panel G provided by C.d.V. Agulhas choke point oceanic system south of Africa where warm and salty indian ocean waters leak into the south Atlantic ocean impacting the global oceanic circulation. as keystone taxa that increase the connectivity of plankton food webs (Fig. 4c). Together, plankton metabarcoding95, 3D fluorescence microscopy90,99,116 and underwater imaging117 confirmed that putative symbioses, including parasitism and mutualism, derived from global co- occurrences are ubiquitous in plankton ecology (Fig. 5) and may underlie the hyperdiversification of protists95 through evolutionary mechanisms that remain elusive. Overall, this notion challenges traditional ocean ecology that focuses on the negative relationships between producers and consumers and most of the models that typically ignore symbioses to predict nutrient and energy flow in the ocean. Tara Oceans data were used to address basin-scale oceanographic questions in a study on the impact of the Agulhas choke point on plankton communities52. Specifically, connectivity between the Indian Ocean and the Atlantic Ocean influenced different planktonic groups in different ways, largely as a function of their size. The Agulhas rings were important conduits for transporting plankton between the two oceans. These findings highlight the need to investigate the relationship between plankton diversity and global ocean circulation. Specifically, the Agulhas current retroflection was a key factor constraining diatom diversity, in line with previous palaeo-oceanographic studies based on diatom microfossils100,112. Additionally, Tara Oceans data have been used in graph-based methods from systems biology to integrate the full suite of ecological, morphological and genetic information for inclusion of biological complexity in ocean modelling (Fig. 6). A study using network-partitioning45 identified plankton subcommunities and gene modules NATURE REVIEWS | MICROBIOLOGY associated with carbon export from the upper epipelagic zone to the ocean interior, demonstrating the possibility to scale up from genes to ecosystems and to derive insightful models of key ocean biogeochemical processes. In addition to viruses emerging as the best predictors for the variability of carbon export in the oligotrophic ocean45, the same graph- based methods showed that plankton subcommunities varied with the iron products from two global-scale biogeochemical models125. Within these subcommunities, genomic adaptation based on gene-copy numbers was disentangled from transcriptomic adaptation based on gene expression for specific groups and functions. For example, many photosynthetic protists respond to iron limitation by shifting the use of a key gene coding for ferredoxin to an iron-independent analogue, flavodoxin. The rapidly responding groups, such as diatoms, are frequently adapted at the genomic level by harbouring variable numbers of each gene depending on optimal growth conditions. In contrast, other organisms, such as haptophytes or pelagophytes, rely on differential transcription to shift to the best analogue49,125. Such meta-omics analyses were used to explore the underpinnings of recurrent phytoplankton blooms in the Marquesas archipelago in the central equatorial Pacific Ocean, and revealed that an increase in iron bioavailability is likely to be the underlying cause of the blooms125. This example demonstrates that the field of ocean meta-omics is now sufficiently mature to provide an independent, biologically based validation of ecosystem models. In another case, the abundance and expression of transporter genes in diatoms was determined as a function of environmental variation, and the observed variation was then used to train an algorithm to predict the functional response of diatoms to future seawater temperatures126. The combination of global biogeochemical models with genomics and community composition analysis highlights the transformative nature of integrating quantitative omics data and oceanography to better understand the functions of marine ecosystems127. More recently, latitudinal gradients and global predictors of plankton diversity across archaea, bacteria, eukaryotes and major virus clades have been explored with use of molecular and imaging data from Tara Oceans36. Latitudinal diversity gradients were previously studied primarily in terrestrial macroorganisms and typically consist of a monotonic poleward decline of local diversity128. Studies in ocean ecosystems have been fragmentary and have often led to different results129,130; thus, the availability of a single comprehensive data set that represents all planktonic organisms collected on a global scale made possible investigations of such macroecological patterns. There was a decline of diversity for most planktonic groups towards the poles, and this decline was mainly driven by temperature with input from productivity and seasonal variability36. Projections into the future using climate models of the Intergovernmental Panel on Climate Change further suggested that severe warming of the ocean in the future may lead to tropicalization of the diversity of most planktonic groups at higher latitudes. These changes may have ripple effects on marine ecosystem functioning, affecting both biogeochemical cycles and trophic interactions globally. REVIEWS Biological organization Biological complexity IV. Global ocean and seascape III. Communities and metacommunities Biological processes Nitrates Disciplines and techniques PO4- IV. Biogeochemical cycles Salinity II. Organisms and holobiont IV. Earth system science and ocean modelling Fluo III. Biotic and abiotic interactions I. Biomolecules III. Ecology and network analysis II. Morphogenesis, behaviour and reproduction II. Cell biology and automated imaging I. Molecular evolution and metabolism I. Molecular biology and bioinformatics Spatial scale From nanometres to 40,000 km Fig. 6 | Ecosystems biology and integrative analyses of the global oceans. The planetary-scale and large volumes of the Tara Oceans data sets for the epipelagic and mesopelagic ocean allow the extraction of emerging properties that one can successively integrate into higher levels of biological organization. This step-by-step simplification and integration of complexity can be compared with Russian dolls in providing an eco-evolutionary, data-driven framework for modelling of the Earth system. The analysis of each layer requires the use of different techniques, which are often discipline specific, to target different biological processes. Besides the interest within each layer of spatial scale and biological complexity (described from I to IV), Tara Oceans has integrated information across the various layers of biological organization. For example, one study98 built a co-occurrence network (III) using organismal abundance data (II) inferred from metabarcode data (I) and validated examples using imaging of organisms (II). Another study52 analysed plankton communities (III) in the context of oceanographic models (IV) using DNA sequence data (I). Other studies have used the whole ecosystems biology framework (I to IV) to scale up from genes and organisms to emergent ocean ecosystem processes such as the biological carbon pump45,125. Overall, these ecosystems biology approaches highlighted previously underrated organisms and genes that should be assessed as genomic proxies for the prediction of key emergent ecosystem functions. These analyses were unprecedented with respect to the global ecosystem scale and further laid the foundation for future robust ecosystem modelling to bridge information about genes, organisms, consortia and biomes (Fig. 6). Conclusions and perspectives Life has evolved over billions of years, starting in the oceans; however, it is only recently that technologies have enabled us to capture the taxonomic, genetic and morphological biodiversity of extant ocean life as a whole, from microorganisms to animals. Tara Oceans exemplifies how such a holistic approach has been used to study ocean plankton at a planetary scale. Starting from an adventurous initiative of blue-sky research with no substantial core funding, the project has developed into a multinational, multidisciplinary, collaborative programme (Box 1). The comprehensive end- to- end sampling protocols developed to capture plankton from viruses to metazoans, organisms rarely studied together, have greatly expanded our knowledge of biodiversity, organismal interactions, ecological drivers of community structure and genomic proxies for key ecosystem processes, such as carbon export, in the ocean. This approach has already prompted similar implementations on other oceanographic cruises and time-series studies, and may, ultimately, help to establish much-needed standards for biological sampling in oceanography131–133. Furthermore, the commitment to create a consistent knowledge base has resulted in open- access resources of in situ multi-omics sequencing, imaging and environmental data, which a diverse community of researchers has been mining ever since for new insights and discoveries. www.nature.com/nrmicro REVIEWS Although Tara Oceans maximized its global reach in its sampling design, there remains a need to increase the geographic coverage and granularity of ocean sampling, also across depth. In the meantime, additional Tara expeditions have sampled transects across parts of the North Atlantic Ocean and Pacific Ocean134 including coastal reef waters135. However, the subarctic North Pacific, equatorial sections and the Southern Ocean remain priority areas with insufficient coverage. In addition to complementary ocean sampling campaigns (for example, Ocean Sampling Day15 and the International Census of Marine Microbes16), repeated cruises and expeditions applying similar approaches have been completed, are under way or are planned136, and will help to close these gaps. One limitation inherent to the spatially distributed nature of ocean sampling expeditions is the lack of temporal resolution. To complement existing snapshots of planktonic states, it will thus be important to incorporate trajectories of community variability over time as they have been recorded, for example, at long- term ocean time- series stations12,14,137–141, in shorter-term studies at day-to-day resolution142 and during mesoscale process studies143. To further increase spatiotemporal information of plankton dynamics, these local measurements will ideally be complemented by future technological advances144 to provide global and multiyear coverage, for example using in situ remote observatories for automated genomic, imaging and environmental data collection145. In conclusion, considering that ocean ecosystems biology aims to gain a holistic understanding of the biodiversity and processes that govern the ocean, the field is still very much in a data-driven, phenomenological discovery phase146. And yet it must rapidly get up to speed as anthropogenic climate change is already altering the global ocean147. Moreover, ocean plankton will 1. 2. 3. 4. 5. 6. 7. 8. 9. Field, C. B., Behrenfeld, M. J., Randerson, J. T. & Falkowski, P. Primary production of the biosphere: integrating terrestrial and oceanic components. Science 281, 237–240 (1998). Guidi, L. et al. A new look at ocean carbon remineralization for estimating deepwater sequestration. Global Biogeochem. Cycles 29, 1044–1059 (2015). Henson, S. A., Sanders, R. & Madsen, E. Global patterns in efficiency of particulate organic carbon export and transfer to the deep ocean. Global Biogeochem. Cycles 26, GB1028 (2012). Kwon, E. Y., Primeau, F. & Sarmiento, J. L. The impact of remineralization depth on the air-sea carbon balance. Nat. Geosci. 2, 630–635 (2009). Azam, F. et al. The ecological role of water-column microbes in the sea. Mar. Ecol. Prog. Ser. 10, 257–263 (1983). Raes, J. & Bork, P. Molecular eco-systems biology: towards an understanding of community function. Nat. Rev. Microbiol. 6, 693–699 (2008). Karsenti, E. et al. A holistic approach to marine eco-systems biology. PLoS Biol. 9, e1001177 (2011). Rusch, D. B. et al. The Sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 5, e77 (2007). Venter, J. C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004). This study applies high-throughput DNA sequencing to produce a large data set of microbial community genome fragments from surface seawaters of the Sargasso Sea and identifies more than 1.2 million previously unknown genes, illustrating the diversity of ocean microbial life. NATURE REVIEWS | MICROBIOLOGY not only be affected by the outcomes of these changes but will also, to some degree, control them148. To effectively evaluate how open-ocean life will respond to environmental change, large-scale, diverse, interdisciplinary efforts — including empirical, theoretical and modelling approaches — are needed to advance our understanding of organismal abundances and biomass, physiology and interactions across space and time. To address this need, members of research teams need to step far outside their comfort zone, and non- traditional funding schemes will be required to support the syntheses needed to make transformative advances in such a complex space. Such scientific endeavours and the resulting information must be coupled with concerted efforts to inform policy and management decisions, and to provide diverse outreach programmes, including technology transfer and capacity building, to truly foster a societal impact on Earth (Box 2). Furthermore, the ocean biome is intricately linked to other biomes on Earth, including host-associated systems. The crucial need for an integrated understanding of ecosystem processes across the ocean, atmosphere and terrestrial systems could be fostered by the availability of discoverable, accessible, interoperable and reusable data, and toolkits that would facilitate global-scale synthetic analyses, similarly to what is already in practice in ongoing international efforts in the areas of physical oceanography (for example, Argo programme, NASA) and the health sector (for example, Global Alliance for Genomics and Health, International Cancer Genome Consortium). Together, such efforts should ultimately help us to better understand and predict the effect of climate change on extant life and the future habitability of our planet in the Anthropocene epoch. Published online xx xx xxxx 10. Yooseph, S. et al. The Sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 5, e16 (2007). 11. Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2, 150023 (2015). 12. Biller, S. J. et al. Marine microbial metagenomes sampled across space and time. Sci. Data 5, 180176 (2018). 13. Duarte, C. M. Seafaring in the 21st century: the Malaspina 2010 Circumnavigation Expedition. Limnol. Oceanogr. Bull. 24, 11–14 (2015). 14. Karl, D. M. & Church, M. J. Microbial oceanography and the Hawaii ocean time-series programme. Nat. Rev. Microbiol. 12, 699–713 (2014). 15. Kopf, A. et al. The ocean sampling day consortium. Gigascience 4, 27 (2015). 16. Amaral-Zettler, L. et al. in Life in the World’s Oceans (ed. McIntyre, A. D.) 221–245 (Wiley, 2010). 17. Longhurst, A. Seasonal cycles of pelagic production and consumption. Prog. Oceanogr. 36, 77–167 (1995). 18. Sunagawa, S., Karsenti, E., Bowler, C. & Bork, P. Computational eco-systems biology in Tara Oceans: translating data into knowledge. Mol. Syst. Biol. 11, 809 (2015). 19. Fuhrman, J. A. Marine viruses and their biogeochemical and ecological effects. Nature 399, 541–548 (1999). 20. Suttle, C. A. Marine viruses-major players in the global ecosystem. Nat. Rev. Microbiol. 5, 801–812 (2007). 21. Wommack, K. E. & Colwell, R. R. Virioplankton: viruses in aquatic ecosystems. Microbiol. Mol. Biol. Rev. 64, 69–114 (2000). 22. Brum, J. R. & Sullivan, M. B. Rising to the challenge: accelerated pace of discovery transforms marine virology. Nat. Rev. Microbiol. 13, 147–159 (2015). 23. Brum, J. R. et al. Patterns and ecological drivers of ocean viral communities. Science 348, 1261498 (2015). This article describes the first of the Tara Oceans efforts to investigate the diversity and structure of double-stranded DNA viral communities in the oceans, supporting a model of passive global transport by ocean currents and selection by local environmental conditions. 24. Gregory, A. C. et al. Marine DNA viral macro- and microdiversity from pole to pole. Cell 177, 1109–1123.e14 (2019). 25. Roux, S. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016). 26. Duhaime, M. B. et al. Comparative omics and trait analyses of marine pseudoalteromonas phages advance the phage OTU concept. Front. Microbiol. 8, 1241 (2017). 27. Martinez-Hernandez, F. et al. Single-virus genomics reveals hidden cosmopolitan and abundant viruses. Nat. Commun. 8, 15892 (2017). 28. Warwick-Dugdale, J. et al. Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands. PeerJ 7, e6800 (2019). 29. Nishimura, Y. et al. Environmental viral genomes shed new light on virus-host interactions in the ocean. mSphere 2, e00359-16 (2017). 30. Nishimura, Y. et al. ViPTree: the viral proteomic tree server. Bioinformatics 33, 2379–2380 (2017). 31. Bin Jang, H. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by REVIEWS 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. gene-sharing networks. Nat. Biotechnol. 37, 632–639 (2019). Bolduc, B. et al. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and bacteria. PeerJ 5, e3243 (2017). Roux, S. et al. Minimum information about an uncultivated virus genome (MIUViG). Nat. Biotechnol. 37, 29–37 (2019). Simmonds, P. et al. Consensus statement: virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017). Baas-Becking, L. G. M. Geobiologie of Inleiding tot de Milieukunde (Van Stockum & Zoon, 1934). Ibarbalz, F. M. et al. Global trends in marine plankton diversity across kingdoms of life. Cell 179, 1084–1097.e21 (2019). Jia, Y., Shan, J., Millard, A., Clokie, M. R. & Mann, N. H. Light-dependent adsorption of photosynthetic cyanophages to Synechococcus sp. WH7803. FEMS Microbiol. Lett. 310, 120–126 (2010). Ribalet, F. et al. Light-driven synchrony of Prochlorococcus growth and mortality in the subtropical Pacific gyre. Proc. Natl Acad. Sci. USA 112, 8008–8012 (2015). Yoshida, T. et al. Locality and diel cycling of viral production revealed by a 24 h time course cross-omics analysis in a coastal region of Japan. ISME J. 12, 1287–1295 (2018). Fridman, S. et al. A myovirus encoding both photosystem I and II proteins enhances cyclic electron flow in infected Prochlorococcus cells. Nat. Microbiol. 2, 1350–1357 (2017). Mann, N. H., Cook, A., Millard, A., Bailey, S. & Clokie, M. Marine ecosystems: bacterial photosynthesis genes in a virus. Nature 424, 741 (2003). Sullivan, M. B. et al. Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts. PLoS Biol. 4, e234 (2006). Hurwitz, B. L., Hallam, S. J. & Sullivan, M. B. Metabolic reprogramming by viruses in the sunlit and dark ocean. Genome Biol. 14, R123 (2013). Howard-Varona, C. et al. Regulation of infection efficiency in a globally abundant marine Bacteriodetes virus. ISME J. 11, 284–295 (2017). Guidi, L. et al. Plankton networks driving carbon export in the oligotrophic ocean. Nature 532, 465–470 (2016). This study integrates Tara Oceans data across organismal size classes from epipelagic depths, revealing that unexpected taxa can predict the downward export of carbon by biological processes in subtropical, nutrient-depleted oceans. Howard-Varona, C. et al. Phage-specific metabolic reprogramming of virocells. ISME J. 14, 881–895 (2020). Wilhelm, S. W. & Suttle, C. A. Viruses and nutrient cycles in the sea - viruses play critical roles in the structure and function of aquatic food webs. Bioscience 49, 781–788 (1999). Hurwitz, B. L., Brum, J. R. & Sullivan, M. B. Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean virome. ISME J. 9, 472–484 (2015). Carradec, Q. et al. A global ocean atlas of eukaryotic genes. Nat. Commun. 9, 373 (2018). Hingamp, P. et al. Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes. ISME J. 7, 1678–1695 (2013). Lescot, M. et al. Reverse transcriptase genes are highly abundant and transcriptionally active in marine plankton assemblages. ISME J. 10, 1134–1146 (2016). Villar, E. et al. Environmental characteristics of Agulhas rings affect interocean plankton transport. Science 348, 1261447 (2015). Li, Y. et al. The earth is small for “Leviathans”: long distance dispersal of giant viruses across aquatic environments. Microbes Environ. 34, 334–339 (2019). Mihara, T. et al. Taxon richness of “Megaviridae” exceeds those of bacteria and archaea in the ocean. Microbes Environ. 33, 162–171 (2018). von Dassow, P. et al. Life-cycle modification in open oceans accounts for genome variability in a cosmopolitan phytoplankton. ISME J. 9, 1365–1377 (2015). Clerissi, C. et al. Deep sequencing of amplified Prasinovirus and host green algal genes from an Indian Ocean transect reveals interacting trophic dependencies and new genotypes. Environ. Microbiol. Rep. 7, 979–989 (2015). 57. Clerissi, C. et al. Unveiling of the diversity of prasinoviruses (Phycodnaviridae) in marine samples by using high-throughput sequencing analyses of PCR-amplified DNA polymerase and major capsid protein genes. Appl. Environ. Microbiol. 80, 3150–3160 (2014). 58. Clerissi, C. et al. Prasinovirus distribution in the northwest Mediterranean Sea is affected by the environment and particularly by phosphate availability. Virology 466–467, 146–157 (2014). 59. Li, Y. et al. Degenerate PCR primers to reveal the diversity of giant viruses in coastal waters. Viruses 10 (2018). 60. Blanc-Mathieu, R. et al. Viruses of the eukaryotic plankton are predicted to increase carbon export efficiency in the global sunlit ocean. Preprint at bioRxiv https://doi.org/10.1101/710228 (2019). 61. Bolduc, B., Youens-Clark, K., Roux, S., Hurwitz, B. L. & Sullivan, M. B. iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure. ISME J. 11, 7–14 (2017). 62. Mihara, T. et al. Linking virus genomes with host taxonomy. Viruses 8, 66 (2016). 63. Steward, G. F. et al. Are we missing half of the viruses in the ocean? ISME J. 7, 672–679 (2013). 64. Salazar, G. et al. Gene expression changes and community turnover differentially shape the global ocean metatranscriptome. Cell 179, 1068–1083.e21 (2019). 65. Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015). This study catalogues 40 million ocean microbial genes and shows temperature to be a main driver of open-ocean microbial community composition in the epipelagic zone at a global scale. 66. Kultima, J. R. et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 7, e47656 (2012). 67. Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014). 68. Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010). 69. DeLong, E. F. et al. Community genomics among stratified microbial assemblages in the ocean’s interior. Science 311, 496–503 (2006). 70. Giovannoni, S. J. & Stingl, U. Molecular diversity and ecology of microbial plankton. Nature 437, 343–348 (2005). 71. Fuhrman, J. A. et al. Annually reoccurring bacterial communities are predictable from ocean conditions. Proc. Natl Acad. Sci. USA 103, 13104–13109 (2006). 72. Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 1014 (2019). 73. Farrant, G. K. et al. Delineating ecologically significant taxonomic units from global patterns of marine picocyanobacteria. Proc. Natl Acad. Sci. USA 113, E3365–E3374 (2016). 74. Grebert, T. et al. Light color acclimation is a key process in the global ocean distribution of Synechococcus cyanobacteria. Proc. Natl Acad. Sci. USA 115, E2010–E2019 (2018). 75. Yelton, A. P. et al. Global genetic capacity for mixotrophy in marine picocyanobacteria. ISME J. 10, 2946–2957 (2016). 76. Royo-Llonch, M. et al. Exploring microdiversity in novel Kordia sp. (Bacteroidetes) with proteorhodopsin from the tropical Indian Ocean via single amplified genomes. Front. Microbiol. 8, 1317 (2017). 77. Royo-Llonch, M., Sánchez, P., González, J. M., Pedrós-Alió, C. & Acinas, S. G. Ecological and functional capabilities of an uncultured Kordia sp. Syst. Appl. Microbiol. 43, 126045 (2020). 78. Cabello, A. M. et al. Global distribution and vertical patterns of a prymnesiophyte-cyanobacteria obligate symbiosis. ISME J. 10, 693–706 (2016). 79. Cornejo-Castillo, F. M. et al. Cyanobacterial symbionts diverged in the late Cretaceous towards lineage-specific nitrogen fixation factories in single-celled phytoplankton. Nat. Commun. 7, 11071 (2016). 80. Cornejo-Castillo, F. M. et al. UCYN-A3, a newly characterized open ocean sublineage of the symbiotic N2 -fixing cyanobacterium Candidatus Atelocyanobacterium thalassa. Environ. Microbiol. 21, 111–124 (2019). 81. Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018). 82. Martijn, J., Vosseberg, J., Guy, L., Offre, P. & Ettema, T. J. G. Deep mitochondrial origin outside the sampled alphaproteobacteria. Nature 557, 101–105 (2018). This study exemplifies the use of Tara Oceans data to formulate new hypotheses by reconstructing genomes that support a mitochondrial origin before the divergence of all Alphaproteobacteria sampled to date. 83. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017). 84. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci. Data 5, 170203 (2018). 85. Pushkarev, A. et al. A distinct abundant group of microbial rhodopsins discovered using functional metagenomics. Nature 558, 595–599 (2018). 86. Oppermann, J. et al. MerMAIDs: a family of metagenomically discovered marine anion-conducting and intensely desensitizing channelrhodopsins. Nat. Commun. 10, 3315 (2019). 87. Louca, S., Parfrey, L. W. & Doebeli, M. Decoupling function and taxonomy in the global ocean microbiome. Science 353, 1272–1277 (2016). 88. Bar-On, Y. M., Phillips, R. & Milo, R. The biomass distribution on Earth. Proc. Natl Acad. Sci. USA 115, 6506–6511 (2018). 89. Caron, D. A., Countway, P. D., Jones, A. C., Kim, D. Y. & Schnetzer, A. Marine protistan diversity. Ann. Rev. Mar. Sci. 4, 467–493 (2012). 90. Colin, S. et al. Quantitative 3D-imaging for cell biology and ecology of environmental microbial eukaryotes. eLife 6, e26066 (2017). 91. Decelle, J. et al. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy. Mol. Ecol. Resour. 15, 1435–1445 (2015). 92. Guillou, L. et al. The protist ribosomal reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41, D597–D604 (2013). 93. Seeleuthner, Y. et al. Single-cell genomics of multiple uncultured stramenopiles reveals underestimated functional diversity across oceans. Nat. Commun. 9, 310 (2018). 94. Sieracki, M. E. et al. Single cell genomics yields a wide diversity of small planktonic protists across major ocean ecosystems. Sci. Rep. 9, 6025 (2019). 95. de Vargas, C. et al. Eukaryotic plankton diversity in the sunlit ocean. Science 348, 1261605 (2015). This study surveys the eukaryotic diversity of ocean plankton from the smallest protists to millimetresized animals by 18S ribosomal RNA gene amplicon sequencing, revealing 150,000 taxonomic groups dominated by protistan parasites and symbiotic hosts. 96. Flegontova, O. et al. Extreme diversity of diplonemid eukaryotes in the ocean. Curr. Biol. 26, 3060–3065 (2016). 97. Decelle, J. et al. Worldwide occurrence and activity of the reef-building coral symbiont Symbiodinium in the open ocean. Curr. Biol. 28, 3625–3633 e3623 (2018). 98. Lima-Mendez, G. et al. Determinants of community structure in the global plankton interactome. Science 348, 1262073 (2015). This study evaluates the effect of abiotic and biotic factors on organismal interactions among bacteria, archaea, eukaryotes and viruses, emphasizing the role of grazing, pathogenicity and parasitism as predictors of plankton community structure. 99. Vincent, F. J. et al. The epibiotic life of the cosmopolitan diatom Fragilariopsis doliolus on heterotrophic ciliates in the open ocean. ISME J. 12, 1094–1108 (2018). 100. Malviya, S. et al. Insights into global diatom distribution and diversity in the world’s ocean. Proc. Natl Acad. Sci. USA 113, E1516–E1525 (2016). 101. Le Bescot, N. et al. Global patterns of pelagic dinoflagellate diversity across protist size classes unveiled by metabarcoding. Environ. Microbiol. 18, 609–626 (2016). 102. Lopes Dos Santos, A. et al. Diversity and oceanic distribution of prasinophytes clade VII, the dominant www.nature.com/nrmicro REVIEWS group of green algae in oceanic waters. ISME J. 11, 512–528 (2017). 103. Gimmler, A., Korn, R., de Vargas, C., Audic, S. & Stoeck, T. The Tara Oceans voyage reveals global diversity and distribution patterns of marine planktonic ciliates. Sci. Rep. 6, 33555 (2016). 104. Beaugrand, G., Luczak, C., Goberville, E. & Kirby, R. R. Marine biodiversity and the chessboard of life. PLoS One 13, e0194006 (2018). 105. Biard, T. et al. Biogeography and diversity of Collodaria (Radiolaria) in the global ocean. ISME J. 11, 1331–1344 (2017). 106. Del Campo, J. et al. Assessing the diversity and distribution of apicomplexans in host and free-living environments using high-throughput amplicon data and a phylogenetically informed reference framework. Front. Microbiol. 10, 2373 (2019). 107. Callahan, B. J., McMurdie, P. J. & Holmes, S. P. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643 (2017). 108. Foster, Z. S., Sharpton, T. J. & Grunwald, N. J. Metacoder: an R package for visualization and manipulation of community taxonomic diversity data. PLoS Comput. Biol. 13, e1005404 (2017). 109. Pierella Karlusich, J. J., Ibarbalz, F. M. & Bowler, C. Phytoplankton in the Tara Ocean. Annu. Rev. Mar. Sci. 12, 233–265 (2020). 110. Leblanc, K. et al. Nanoplanktonic diatoms are globally overlooked but play a role in spring blooms and carbon export. Nat. Commun. 9, 953 (2018). 111. Treguer, P. et al. Influence of diatom diversity on the ocean biological carbon pump. Nat. Geosci. 11, 27–37 (2018). 112. Rabosky, D. L. & Sorhannus, U. Diversity dynamics of marine planktonic diatoms across the Cenozoic. Nature 457, 183–186 (2009). 113. Azaele, S., Pigolotti, S., Banavar, J. R. & Maritan, A. Dynamical evolution of ecosystems. Nature 444, 926–928 (2006). 114. Ferriere, R. & Cazelles, B. Universal power laws govern intermittent rarity in communities of interacting species. Ecology 80, 1505–1521 (1999). 115. Gawryluk, R. M. R. et al. Morphological identification and single-cell genomics of marine diplonemids. Curr. Biol. 26, 3053–3059 (2016). 116. Mordret, S. et al. The symbiotic life of Symbiodinium in the open ocean within a new species of calcifying ciliate (Tiarina sp.). ISME J. 10, 1424–1436 (2016). 117. Biard, T. et al. In situ imaging reveals the biomass of giant protists in the global ocean. Nature 532, 504–507 (2016). 118. Vannier, T. et al. Survey of the green picoalga Bathycoccus genomes in the global ocean. Sci. Rep. 6, 37900 (2016). 119. Steinberg, D. K. & Landry, M. R. Zooplankton and the ocean carbon cycle. Ann. Rev. Mar. Sci. 9, 413–444 (2017). 120. Roullier, F. et al. Particle size distribution and estimated carbon flux across the Arabian Sea oxygen minimum zone. Biogeosciences 11, 4541–4557 (2014). 121. Corse, E. et al. Phylogenetic analysis of Thecosomata Blainville, 1824 (holoplanktonic opisthobranchia) using morphological and molecular data. PLoS One 8, e59439 (2013). 122. Gasmi, S. et al. Evolutionary history of Chaetognatha inferred from molecular and morphological data: a case study for body plan simplification. Front. Zool. 11, 84 (2014). 123. Madoui, M. A. et al. New insights into global biogeography, population structure and natural selection from the genome of the epipelagic copepod Oithona. Mol. Ecol. 26, 4467–4482 (2017). 124. Arif, M. et al. Discovering millions of plankton genomic markers from the Atlantic Ocean and the Mediterranean Sea. Mol. Ecol. Resour. 19, 526–535 (2019). 125. Caputi, L. et al. Community-level responses to iron availability in open ocean plankton ecosystems. Global Biogeochem. Cycles 33, 391–419 (2019). 126. Busseni, G. et al. Meta-omics reveals genetic flexibility of diatom nitrogen transporters in response to environmental changes. Mol. Biol. Evol. 36, 2522–2535 (2019). 127. D’Alelio, D. et al. Modelling the complexity of plankton communities exploiting omics potential: From present challenges to an integrative pipeline. Curr. Opin. Syst. Biol. 13, 68–74 (2019). 128. Whittaker, R. H. Evolution and measurement of species diversity. Taxon 21, 213–251 (1972). 129. Fuhrman, J. A. et al. A latitudinal diversity gradient in planktonic marine bacteria. Proc. Natl Acad. Sci. USA 105, 7774–7778 (2008). NATURE REVIEWS | MICROBIOLOGY 130. Raes, E. J. et al. Oceanographic boundaries constrain microbial diversity gradients in the South Pacific Ocean. Proc. Natl Acad. Sci. USA 115, E8266–E8275 (2018). 131. Capotondi, A. et al. Observational needs supporting marine ecosystems modeling and forecasting: from the global ocean to regional and coastal systems. Front. Mar. Sci. 6, 623 (2019). 132. Lombard, F. et al. Globally consistent quantitative observations of planktonic ecosystems. Front. Mar. Sci. 6, 196 (2019). 133. Ten Hoopen, P. et al. Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards. Stand. Genomic Sci. 10, 20 (2015). 134. Gorsky, G. et al. Expanding Tara Oceans protocols for underway, ecosystemic sampling of the oceanatmosphere interface during Tara Pacific expedition (2016–2018). Front. Mar. Sci. 6, 750 (2019). 135. Planes, S. et al. The Tara Pacific expedition — a pan-ecosystemic approach of the “-omics” complexity of coral reef holobionts across the Pacific Ocean. PLoS Biol. 17, e3000483 (2019). 136. Bolhuis, H. et al. Atlantic Ocean Research Alliance — marine microbiome roadmap (AORA, 2020). 137. Cram, J. A. et al. Seasonal and interannual variability of the marine bacterioplankton community throughout the water column over ten years. ISME J. 9, 563–580 (2015). 138. D’Alcala, M. R. et al. Seasonal patterns in plankton communities in a pluriannual time series at a coastal Mediterranean site (Gulf of Naples): an attempt to discern recurrences and trends. Sci. Mar. 68, 65–83 (2004). 139. Gilbert, J. A. et al. The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS One 5, e15545 (2010). 140. Romagnan, J. B. et al. Comprehensive model of annual plankton succession based on the wholeplankton time series approach. PLoS One 10, e0119219 (2015). 141. Gasol, J. M. et al. ICES phytoplankton and microbial plankton status report 2009/2010 (eds O’Brien, T. D., Li, W. K. W. & Morán, X. A. G.) 138–141 (ICES, 2012). 142. Martin-Platero, A. M. et al. High resolution time series reveals cohesive but short-lived communities in coastal plankton. Nat. Commun. 9, 266 (2018). 143. Laber, C. P. et al. Coccolithovirus facilitation of carbon export in the North Atlantic. Nat. Microbiol. 3, 537–547 (2018). 144. Marx, V. When microbiologists plunge into the ocean. Nat. Methods 17, 133–136 (2020). 145. Buttigieg, P. L. et al. Marine microbes in 4D-using time series observation to assess the dynamics of the ocean microbiome and its links to ocean health. Curr. Opin. Microbiol. 43, 169–185 (2018). 146. Shneider, A. M. Four stages of a scientific discipline; four types of scientist. Trends Biochem. Sci. 34, 217–223 (2009). 147. Karl, D. M. A sea of change: biogeochemical variability in the North Pacific Subtropical Gyre. Ecosystems 2, 181–214 (1999). 148. Cavicchioli, R. et al. Scientists’ warning to humanity: microorganisms and climate change. Nat. Rev. Microbiol. 17, 569–586 (2019). This review article provides a consensus statement, the ‘microbiologists’ warning to humanity’, documenting how microorganisms will affect and will be affected by climate change. 149. Bork, P. et al. Tara Oceans studies plankton at planetary scale. Introduction. Science 348, 873 (2015). 150. Logares, R. et al. Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities. Environ. Microbiol. 16, 2659–2671 (2014). 151. Nakayama, T. et al. Single-cell genomics unveiled a cryptic cyanobacterial lineage with a worldwide distribution hidden by a dinoflagellate host. Proc. Natl Acad. Sci. USA 116, 15973–15978 (2019). 152. Probert, I. et al. Brandtodinium gen. nov. and B. nutricula comb. Nov. (Dinophyceae), a dinoflagellate commonly found in symbiosis with polycystine radiolarians. J. Phycol. 50, 388–399 (2014). 153. Decelle, J., Colin, S. & Foster, R. A. in Marine Protists: Diversity and Dynamics (eds Ohtsuka, S. et al.) 465–500 (Springer, 2015). Acknowledgements Tara Oceans (which includes the Tara Oceans and Tara Oceans Polar Circle expeditions) would not exist without the leadership of the Tara Ocean Foundation and the continuous support of 23 institutes (https://oceans.taraexpeditions.org/). The authors further thank the commitment of the following sponsors: the French CNRS (in particular Groupement de Recherche GDR3280 and the Research Federation for the Study of Global Ocean Systems Ecology and Evolution FR2022/Tara GOSEE), the French Facility for Global Environment (FFEM), the European Molecular Biology Laboratory, Genoscope/CEA, the French Ministry of Research and the French Government Investissements d’Avenir programmes OCEANOMICS (ANR11-BTBR-0008), FRANCE GENOMIQUE (ANR-10-INBS-09-08) and MEMO LIFE (ANR-10-LABX-54), the PSL research university (ANR-11- IDEX-0001-02) and EMBRC- France (ANR10-INBS-02). Funding for the collection and processing of the Tara Oceans data set was provided by the NASA Ocean Biology and Biogeochemistry Program under grants NNX11AQ14G, NNX09AU43G, NNX13AE58G and NNX15AC08G (to the University of Maine), the Canada Excellence Research Chair in Remote Sensing of Canada’s New Arctic Frontier and the Canada Foundation for Innovation. The authors also thank agnès b. and E. Bourgois, the Prince Albert II de Monaco Foundation, the Veolia Foundation, Region Bretagne, Lorient Agglomeration, Serge Ferrari, Worldcourier and KAUST for support and commitment. The global sampling effort was made possible by countless scientists and crew who performed sampling aboard the Tara from 2009 to 2013, and the authors thank MERCATOR-CORIOLIS and ACRI-ST for providing daily satellite data during the expeditions. The authors are also grateful to the countries that graciously granted sampling permission. The authors thank N. Le Bescot and N. Henry for their help in designing the figures in this article. C.d.V. thanks the Roscoff Bioinformatics platform ABiMS (http://abims. sb-roscoff.fr). S. Sunagawa thanks the European Molecular Biology Laboratory and ETH Zürich’s high-performance computing facilities for computational support. C.B. acknowledges funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement 835067) as well as the Radcliffe Institute of Advanced Study at Harvard University for a scholar’s fellowship during the 2016–2017 academic year. M.B.S. thanks the Gordon and Betty Moore Foundation (award 3790) and the US National Science Foundation (awards OCE#1536989 and OCE#1829831) as well as the Ohio Supercomputer for computational support. S.G.A. thanks the Spanish Ministry of Economy and Competitiveness (CTM201787736-R). F.L. thanks the Institut Universitaire de France as well as the EMBRC platform PIQv for image analysis. S. Sunagawa is supported by ETH Zürich and the Helmut Horten Foundation and by funding from the Swiss National Foundation (205321_184955). The authors declare that all data reported herein are fully and freely available from the date of publication, with no restrictions, and that all of the analyses, publications and ownership of data are free from legal entanglement or restriction by the various nations in whose waters the Tara Oceans expeditions conducted sampling. This article is contribution number 100 of Tara Oceans. Author contributions S. Sunagawa and C.d.V. are the lead authors of the article and all other authors contributed to discussion of the content, writing and editing of the article. Competing interests The authors declare no competing interests. Peer review information Nature Reviews Microbiology thanks David Hutchins, Maria Pachiadaki and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Supplementary information Supplementary information is available for this paper at https://doi.org/10.1038/s41579-020-0364-5. RELATED LINKS Artist profiles: https://oceans.taraexpeditions.org/en/m/art/ artists/ Plankton chronicles: http://planktonchronicles.org Tara Ocean Foundation: https://oceans.taraexpeditions.org/en Tara Oceans Sample Registry: https://doi.pangaea.de/10.1594/ PANGAEA.875582 Tara Oceans Sequencing: https://www.ebi.ac.uk/ena/data/ view/PRJEB402 UniEuk: http://unieuk.org © Springer Nature Limited 2020 REVIEWS Tara Oceans Coordinators Silvia G. Acinas2, Marcel Babin7,20, Peer Bork3,4,5, Emmanuel Boss21, Chris Bowler6,7, Guy Cochrane22, Colomban de Vargas7,19, Michael Follows23, Gabriel Gorsky7,9, Nigel Grimsley7,24,25, Lionel Guidi7,9, Pascal Hingamp7,26, Daniele Iudicone10, Olivier Jaillon7,18, Stefanie Kandels3,7, Lee Karp-Boss21, Eric Karsenti6,7,11, Magali Lescot7,26, Fabrice Not19, Hiroyuki Ogata12, Stéphane Pesant13,14, Nicole Poulton27, Jeroen Raes28,29,30, Christian Sardet7,9, Mike Sieracki27, Sabrina Speich31,32, Lars Stemmann7,9, Matthew B. Sullivan15,16,17, Shinichi Sunagawa1 and Patrick Wincker7,18 Département de Biologie, Québec Océan and Takuvik Joint International Laboratory (UMI 3376), Université Laval (Canada)–CNRS (France), Université Laval, Quebec, QC, Canada. 21School of Marine Sciences, University of Maine, Orono, ME, USA. 22European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge, UK. 23Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA. 24CNRS UMR 7232, Biologie Intégrative des Organismes Marins, Banyuls-sur-Mer, France. 25Sorbonne Universités Paris 06, OOB UPMC, Banyuls-sur-Mer, France. 26Aix Marseille Universit/e, Université de Toulon, CNRS, IRD, MIO UM 110, Marseille, France. 27Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA. 28Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium. 29Center for the Biology of Disease, VIB KU Leuven, Leuven, Belgium. 30Department of Applied Biological Sciences, Vrije Universiteit Brussel, Brussels, Belgium. 31Department of Geosciences, Laboratoire de Météorologie Dynamique, École Normale Supérieure, Paris, France. 32Ocean Physics Laboratory, University of Western Brittany, Brest, France. 20 www.nature.com/nrmicro