AI For Science Report

AI for Science
Report on the Department of Energy (DOE) Town Halls on

Artificial Intelligence (AI) for Science
Town Hall Co-Chairs

Rick Stevens Associate Laboratory Director, Argonne National Laboratory
Jeffrey Nichols Associate Laboratory Director, Oak Ridge National Laboratory
Katherine Yelick Associate Laboratory Director, Lawrence Berkeley National Laboratory
Department of
Energy Contact
Barbara Helland Program Manager, Department of Energy
Special Assistance
Chapter Leads: Argonne National Laboratory
Valerie Taylor, Director, Mathematics and Computer Science Division
Mihai Anitescu, Prasanna Balaprakash, Pete Beckman,

Thomas S. Brettin, Charles E. Catlett, Andrew Chien,
Santanu Chaudhuri, Ian Foster, Dogan Gursoy, Salman Habib,
Cynthia Jenks, Rao Kotamarthi, Zein-Eddine Meziani,
Michael E. Papka, Robert Ross, Stefan Wild
Lawrence Berkeley National Laboratory

David Brown, Director, Computational Research Division
Katerina Antypas, Wes Bethel, Ben Brown, Paolo Calafiura,

Wibe de Jong, Sudip Dosanjh, Inder Monga, Peter Nugent,
Mary Ann Piette, Prabhat, Brian Quiter, Lavanya Ramakrishnan,
John Shalf, Haruko Wainwright, John Wu, Petrus Zwart
Oak Ridge National Laboratory

Arthur Barney Maccabe, Director, Computer Science and
Mathematics Division
David Dean, James Hack, Kenneth Herwig, Judith Hill,

Forrest M. Hoffman, Teja Kuruganti, Bronson Messer,
Nageswara Rao, Arjun Shankar, Bobby G. Sumpter,
Georgia Tourassi, John Turner, Jeffrey Vetter, David Womble,
Steven Young
Lawrence Livermore National Laboratory

Ana Kupresanin
General Atomics
David Humphreys
Administrative: Argonne National Laboratory: Silvia Mulligan

Lawrence Berkeley National Laboratory: Hellen Cademartori
Oak Ridge National Laboratory: Becky Verastegui
i
Publication: Argonne National Laboratory: Linda Conlin, Kristen Dean,
Lorenza Salinas, John W. Schneider, Sonya Soroko
Editorial: Argonne National Laboratory: Emily M. Dietrich, Laura Wolf

Lawrence Berkeley National Laboratory: Carol Pott
Oak Ridge National Laboratory: Scott Jones, Elizabeth Rosenthal
ii
Contents
Executive Summary ...................................................................................................... 1
Introduction: AI for Science ......................................................................................... 5

Materials, Environmental, and Life Sciences ....................................................................... 5
High-Energy, Nuclear, and Plasma Physics ........................................................................ 7
Engineering, Instruments, and Infrastructure....................................................................... 9
Foundations, Software, Data Infrastructure, and Hardware ................................................11
Conclusions .......................................................................................................................14
01. Chemistry, Materials, and Nanoscience .............................................................. 17

1. State of the Art ...............................................................................................................17
2. Major (Grand) Challenges ..............................................................................................18
3. Advances in the Next Decade ........................................................................................21
4. Accelerating Development .............................................................................................23
5. Expected Outcomes.......................................................................................................25
6. References ....................................................................................................................25
02. Earth and Environmental Sciences ..................................................................... 27

1. State of the Art ...............................................................................................................27
6. References ....................................................................................................................34
03. Biology and Life Sciences .................................................................................... 37

1. State of the Art ...............................................................................................................37
6. References ....................................................................................................................43
04. High Energy Physics............................................................................................. 45

1. State of the Art ...............................................................................................................46
6. References ....................................................................................................................52
05. Nuclear Physics..................................................................................................... 55

1. State of the Art ...............................................................................................................56
6. References ....................................................................................................................63
iii
Contents (Cont.)
06. Fusion .................................................................................................................... 65
1. State of the Art ...............................................................................................................65
6. References ....................................................................................................................71
07. Engineering and Manufacturing........................................................................... 73

1. State of the Art ...............................................................................................................73
6. References ....................................................................................................................79
08. Smart Energy Infrastructure ................................................................................. 81

1. State of the Art ...............................................................................................................81
6. References ....................................................................................................................87
09. AI for Computer Science ...................................................................................... 89

1. State of the Art ...............................................................................................................89
6. References ....................................................................................................................96
10. AI Foundations and Open Problems ................................................................... 99

1. State of the Art ...............................................................................................................99
2. Major (Grand) Challenges ............................................................................................100
3. Advances in the Next Decade ......................................................................................102
4. Accelerating Development ...........................................................................................105
5. Expected Outcomes.....................................................................................................105
6. References ..................................................................................................................106
11. Software Environments and Software Research .............................................. 109

1. State of the Art .............................................................................................................109
6. References ..................................................................................................................115
iv
Contents (Cont.)
12. Data Life Cycle and Infrastructure ..................................................................... 117
1. State of the Art .............................................................................................................118
3. Advances in Next Decade ............................................................................................122
6. References ..................................................................................................................123
13. Hardware Architectures ...................................................................................... 125

1. State of the Art .............................................................................................................125
6. References ..................................................................................................................131
14. AI for Imaging ...................................................................................................... 133

1. State of the Art .............................................................................................................133
6. References ..................................................................................................................138
15. AI at the Edge ...................................................................................................... 141

1. State of the Art .............................................................................................................142
6. References ..................................................................................................................147
16. Facilities Integration and AI Ecosystem ............................................................ 149

1. State of the Art .............................................................................................................149
AA. Report Writing Team .......................................................................................... 155
AB. Agendas .............................................................................................................. 157

Argonne National Laboratory ...........................................................................................157
Oak Ridge National Laboratory ........................................................................................160
Lawrence Berkeley National Laboratory...........................................................................163
Washington, DC...............................................................................................................167
v
Contents (Cont.)
AC. Combined Town Hall Registrants ..................................................................... 171
AD. Abbreviations and Terminology ....................................................................... 197
AE. References .......................................................................................................... 201
vi
Executive Summary
From July to October 2019, the Argonne, Oak Ridge, and Berkeley National Laboratories hosted a
series of four town hall meetings attended by more than 1,000 U.S. scientists and engineers. The
goal of the town hall series was to examine scientific opportunities in the areas of artificial
intelligence (AI), Big Data, and high-performance computing (HPC) in the next decade, and to
capture the big ideas, grand challenges, and next steps to realizing these opportunities.
In this report and in the Department of Energy (DOE) laboratory community, we use the term “AI
for Science” to broadly represent the next generation of methods and scientific opportunities in
computing, including the development and application of AI methods (e.g., machine learning, deep
learning, statistical methods, data analytics, automated control, and related areas) to build models
from data and to use these models alone or in conjunction with simulation and scalable computing
to advance scientific research.
The AI for Science town hall discussions focused on capturing the transformational uses of AI that
employ HPC and/or data analysis, leveraging data sets from HPC simulations or instruments and
user facilities, and addressing scientific challenges unique to DOE user facilities and the agency’s
wide-ranging fundamental and applied science enterprise.
The town halls engaged diverse science and user facility communities, with both discipline- and
infrastructure-specific representation. The discussions, captured in the 16 chapters of this report,
contain common arcs revealing classes of opportunities to develop and exploit AI techniques and
methods to improve not only the efficacy and efficiency of science but also the operation and
optimization of scientific infrastructure.
The community’s experience with machine learning (ML), HPC simulation, data analysis methods,
and the consideration of long-term science objectives revealed a growing collection of unique and
novel opportunities for breakthrough science, unforeseeable discoveries, and more powerful
methods that will accelerate science and its application to benefit the nation and, ultimately,
the world.
New AI techniques will be indispensable to supporting the continued growth and expansion of
DOE science infrastructure from ESnet to new light sources to exascale systems, where system
scale and complexity demand AI-assisted design, operation, and optimization. Toward this end,
novel AI approaches to experiment design, in-situ analysis of intermediate results, experiment
steering, and instrument control systems will be required.
DOE’s co-design culture involving teams of scientific users, instrument providers, mathematicians
and computer scientists can be leveraged to develop new capabilities and tools such that they can
be readily applied across the agency’s (and indeed the nation’s) diversity of instruments, facilities,
and infrastructure. This report captures some early opportunities in this direction, but much more
needs to be explored.
From chemistry to materials sciences to biology, the use of ML and deep learning (DL) techniques
opens the potential to move beyond today’s heuristics-based experimental design and discovery to
AI-enhanced strategies of the future.
EXECUTIVE SUMMARY 1
Early use of generative models in materials exploration suggests that millions of possible materials
could be identified with desired properties and functions and evaluated with respect to
synthesizability. The synthesis and testing stages necessary for such scales will in turn rely on ML
and adaptive, autonomous robotic control of high-throughput synthesis and testing lines, creating
“self-driving” laboratories.
The same complexity challenge and concomitant need to move from human-in-the-loop to AI-
driven design, discovery, and evaluation also manifests across the design of scientific workflows,
optimization of large-scale simulation codes, and operation of next generation instruments.
Exascale systems and new scientific instruments, such as upgraded light sources and
accelerators, are increasing the velocity of data beyond the capabilities of existing instrument data
transmission and storage technologies. Consequently, real-time hardware is needed to detect
events and anomalies in order to reduce the raw instrument data rates to manageable levels. New
ML, including DL, capabilities will be critically important in order to fully exploit these instruments,
replacing pre-programmed hardware event triggers with algorithms that can learn and adapt, as
well as discover unforeseen or rare phenomena that would otherwise be lost in compression.
In recent years, the success of DL models has resulted in enormous computational workloads for
training AI models, representing a new genre of HPC resource demand. Here, the use of AI
techniques to optimize learning algorithms and implementation will be necessary with respect to
both the energy cost of large-scale computation and to the exploitation of new computing
hardware architectures. AI in HPC has already taken the form of neural networks trained as
surrogates to computational functions (or even entire simulations), demonstrating the potential for
AI to provide non-linear improvements of multiple orders of magnitude in time-to-solution for HPC
applications (and, coincidentally, reductions in their cost).
Similarly, scientific infrastructure—accelerators, light sources, networks, computation and data

resources—have reached scales and complexities that require the use of ML for tasks such as
anomaly detection in operational data (e.g., for cybersecurity). Moving from today’s fixed rules-
based operating procedures to the use of AI algorithms that factor real-time analysis will be
indispensable for optimizing performance and energy use of increasingly complex, large-scale
infrastructures. New DL methods are required to detect anomalies and optimize operating
parameters, with additional potential to predict failures as well as to discover new optimization
algorithms and novel mechanical or externally induced threats.
The DOE computing facilities such as Summit, Perlmutter, Aurora and Frontier will simultaneously
support the development of existing large-scale simulations, new hybrid HPC models with AI
surrogates, and the exploration of new types of generative models emerging from multimodel data
streams and sources. Future systems envisioned over the next decade may need to support even
richer workloads of traditional HPC and next-generation AI-driven scientific models.
AI will not magically address these and the other opportunities and challenges discussed in this
report. Much work will be required within all science disciplines, across science infrastructure, and
in the theory, methods, software, and hardware that underpin AI methods. The use of AI to design
and tune hardware systems—whether exascale workflows, national networks, or smart energy
infrastructure—will require the development and evaluation of a new generation of AI frameworks
and tools that can serve as building blocks that can be adapted and reused across disciplines and
EXECUTIVE SUMMARY 2
across heterogeneous infrastructure. Bringing AI to any specific domain—whether it is nuclear
physics or biology and life sciences—will demand significant effort to incorporate domain
knowledge into AI systems, quantify uncertainty, error, and precision, and appropriately integrate
these new mechanisms into state-of-the-art computational and laboratory systems.
The overflowing attendance at the AI for Science town halls, the level of enthusiasm and the
engagement of attendees, the number of spontaneous AI projects throughout every scientific
discipline, and the commitment to growth in this area at the nation’s premiere laboratories all
combine to indicate that the DOE scientific community is ready to explore and further the
transformational potential of AI through 2030 and beyond.
EXECUTIVE SUMMARY 3
This page intentionally blank.
4
Introduction: AI for Science
The AI for Science town halls brought together Below, we briefly outline the principle findings
more than a thousand researchers from of the main sections of the report.
DOE National Laboratories, industry, and
academia to identify opportunities for AI to
impact the national science enterprise Materials, Environmental, and
supported by DOE. The teams also outlined Life Sciences
the research and infrastructure needed to Chapters 1–3
advance AI methods and techniques for
science applications. Finding new materials, chemical compounds,
and biological agents able to address
Sixteen topical expert teams summarized the contemporary challenges—for example,
state of the art, outlined challenges, developed batteries with 10 times more storage capacity,
an AI roadmap for the coming decade, and materials that capture more solar energy at
explored opportunities for accelerating greater efficiency, and new drugs targeting
progress on that roadmap. emerging pathogens—is a grand challenge due
to the nearly infinite chemical, biological, and
Important themes emerged for AI applications atomic design spaces to which scientists have
in science. For example, participants anticipate access. Such discovery requires pervasive AI-
the use of AI methods to accelerate the design, enabled automation, from experiment design to
discovery, and evaluation of new materials, execution and analysis.
and to advance the development of new
hardware and software systems; to identify Projecting environmental risk and developing
new science and theories within increasingly resiliency in a changing environment are
high-bandwidth instrument data streams; central challenges to earth and environmental
to improve experiments by inserting inference sciences, encompassing atmosphere, land,
capabilities in control and analysis and subsurface systems along with their
loops; and to enable the design, evaluation, interdependencies. From large-scale observa-
autonomous operation, and optimization of tories such as the Atmospheric Radiation
complex systems from light sources to HPC Measurement (ARM) facility, AI methods will be
data centers; and to advance the essential to obtaining the data needed to refine
development of self-driving laboratories and complex earth and environmental systems
scientific workflows. models, and to developing new models with
unprecedented fidelity and resolution. AI “at the
Important themes also emerged with respect to edge”—where people and things meet—will
outlining the research needed to advance AI. enable autonomous observatories to detect
For example, participants highlighted the need anomalies and outliers, adapting instrument
to incorporate domain knowledge into AI settings and algorithms to provide detailed
methods to improve the quality and inter- measurement of events and conditions that
pretability of the models; the need to develop would otherwise go unnoticed.
software environments to enable AI capabilities
to seamlessly integrate with large-scale HPC Biology and life sciences are at the vanguard of
models; and the need to automate the large- AI applications, for instance using population
scale creation of “FAIR” (findable, accessible, genomics data to learn the bases of complex
interoperable, and reusable) data, given the traits and discovering or building workflows that
central role of data in an AI-centric future automate the inverse design of microbial and
science landscape. plant cells. “Self-driving” laboratories will
INTRODUCTION 5
leverage new generative models and Learning and Integrating Domain
reinforcement learning to explore potential Knowledge
compounds for cancer drugs, evaluate their
synthesizability, or model their response in Today’s computational learning frameworks are
target tumors. not yet able to realize the full potential of
AI-enabled materials, chemical, environmental,
Discovery and Data and biological sciences. We need new AI
methods that can both predict complex
Scientists have used computational
phenomena and provide insights into
approaches to explore virtually materials and
underlying processes. Such methods will be
chemical compounds, leveraging new data
foundational to our capacity to design custom
sources containing the simulated properties of
biological systems capable of addressing major
millions of simple materials and chemical
global health and environmental challenges—
compounds. Deep learning approaches are
that is, ultimately to “build life to spec.” Here, as
being developed to explore more deeply inside
with materials design, AI-enabled, self-driving
vast molecular and biological design spaces.
laboratories (through new automation and
Molecular scientists are using AI to learn force
decision support services) can fuel game-
fields to enable near-exact molecular dynamic changing advances in the understanding and
(MD) simulations with fully quantized electrons deployment of biological, chemical, and
and nuclei. Such analyses, intractable only a environmental systems.
few years ago, must now be captured and
advanced in the form of AI software toolkits Self-Driving and Steering Laboratories
and services.
The most exciting discovery possibilities for
Across the sciences, rapidly growing data emerging instruments such as for bio- or
sources can, in principle, be used to train ML materials imaging lie in going beyond today’s
models provided that the data can be “found, human-in-the-loop experimentation, and
accessed, and are interoperable and reusable,” allowing embedded AI to evaluate results and
or “FAIR.” The use of DL and unsupervised steer experiments.
learning for automatic labeling and reduction of
data also needs to be captured as adaptable AI-assisted management and control of
software services that can be applied to data research labs, instruments, facilities, experi-
sources ranging from environmental datasets ments, and workflows can help achieve a
at broad spatial and time scales, to instrument variety of goals, for instance by adapting
data from materials testing, to genomics data. workflows in response to new hypotheses
generated during workflow execution,
For life sciences, energy infrastructure scheduling resources for more efficient use of
sciences, and even national security, access is facility hardware, and dramatically reducing the
needed to protected sensitive data. We must total cost of operating facilities.
establish new infrastructure to enable shared
use of data that cannot be moved or revealed Experimental science is moving rapidly toward
due to privacy concerns. Similar challenges more frequent online analysis and adaptation.
arise with respect to proprietary manufacturing, In “self-driving” laboratories, AI can be used not
mobility, and private energy data. only for analysis and hypothesis generation,
but also to act on intermediate results, adapting
INTRODUCTION 6
to new data by adjusting experimental integration of uncertainty quantification into AI
parameters or laboratory processes toward workflows.
specific goals, such as protecting resources,
maximizing the data gathered related to a A secure environment for objective bench-
specific phenomenon, or following up on marking of AI algorithms against community
surprising or anomalous results. consensus metrics is needed to detect,
monitor, and possibly correct dataset biases or
AI-guided self-driving laboratories are inconsistent AI performance. Foundational
envisioned that can automate the design, technologies are needed to promote a rigorous
synthesis, and evaluation of material and statistical framework to monitor for potential
increase the pace of discovery by orders biases or inaccuracies in collected data, and to
of magnitude. monitor AI performance to confirm robust
performance or identify performance gaps.
AI in HPC These topics are detailed in Foundations,
Software, Data Infrastructure, and Hardware
Multi-scale models are needed to understand (page 11).
the underlying systems affecting phenomena
associated with the growing global demand for
fuel, food, water, and predictable weather. High-Energy, Nuclear, and
AI technologies can reveal the emergent Plasma Physics
controls of these enormously complex environ-
Chapters 4–6
mental, plant, and microbial biosystems,
enabling us to engineer our environment, for In cosmology, high-energy physics, fusion, and
instance to expand the range of arable lands nuclear physics, the next decade will bring
while improving water availability and quality. In new, enormous, and rich data sets from new
order to enable such discovery capabilities, we light sources, accelerators, tokamak facilities,
must not only improve the performance and and advanced survey telescopes, unparalleled
quality of HPC models (e.g., using ML in depth and resolution at the observed scales.
surrogates) but we must make it possible to These observations will be combined with
build generative models from diverse exascale-enabled simulations modeling
observations (e.g., time series measurements) structure formation in unprecedented detail to
and computational simulations. This will need enable major scientific advances. ML, including
to be aligned with AI-based inverse problem DL, techniques will be crucial in the analysis of
solvers, such as for image-to-phase or multi-spectral observational data sets. “AI-in-
waveform-to-source problems to explore novel HPC” approaches to simulation that use fast
geoengineered solutions. AI-based surrogates will allow the reconstruc-
tion of the history of the universe from the Big
Such simulation models represent another
Bang until today at unprecedented fidelity, from
domain where AI is already showing trans-
the largest scales down to our own galaxy.
formative results. Time-to-solution of modeling
systems and associated reduction in
The multiscale, highly correlated, and high-
computational needs (and associated energy
dimensionality nature of the physics of the
use) can be improved by combining data-
nuclear force also leads to a rich set of
informed AI approximations with physical
phenomena in nuclear physics. AI techniques
principles for earth systems, ecosystems, soil
offer the possibility of increased understanding
microbiology, watershed, and other models.
and new discoveries via DL analyses of light
The use of such AI “surrogate” functions will
source experimental data, especially given
require robust, explainable AI methods for
recent and planned upgrades and resulting
training and validating hybrid models, and the
INTRODUCTION 7
increased data volumes and rates. Fusion and experimental design is the availability (to
scientists look to AI/ML techniques for the general community) and the lack of
breakthroughs ranging from maximizing uniformity of data. A significant need in the
predictive understanding of fusion plasmas and coming decade will be to develop ML methods
the burning plasma state to enabling real-time to automatically annotate and structure data
control in long-pulse tokamak experiments, and from computational models and experimental
ultimately AI-in-the-loop plasma prediction and facilities such as the international ITER
control solutions necessary for sustained, safe, Tokamak, upgraded light sources such as the
and efficient fusion power plant operation. Advanced Photon Source (APS), and
Advanced Light Source (ALS).
Discovery and Data
Designing and Steering Experiments
In coming years, the global high-energy
physics community will deploy AI-controlled, The introduction of ML and AI into the scientific
city-size scientific instruments (particle process for hypothesis generation and the
accelerators and particle detectors) that design of experiments promise to significantly
produce zettabytes of data. Similarly, high- accelerate the scientific process by automating
bandwidth streams will come from new survey and accelerating the development of models
telescopes, upgraded light sources, and and the testing of hypotheses. For this to
tokamak experiments. AI-powered hardware become reality, domain knowledge must be
will be required to filter detector data in integrated into ML models, moving beyond
microseconds. AI inference systems trained by current models that are either purely data-
data and simulations of detector response will driven or that incorporate only simple
be needed to enable high-precision studies, algorithms, laws, and constraints. ML
while unsupervised AI-based searches for techniques that combine theoretical and data-
anomalies and rare events, indeed even for driven models in hybrid systems that better
“New Physics,” will open new windows for represent the underlying dynamics specific to
discovery. phenomena will be especially key.
Learning and Integrating Domain Across experimental sciences, AI-aware

Knowledge experimental design, construction, and
operation of scientific instruments offer
AI methods are critically important if we are to transformative improvements. For detectors
fully exploit data from new or upgraded large- and accelerators, the use of reinforcement
scale instruments and complex experiments— learning (RL) will both reduce beam generation
facilitating the collection, evaluation, and times and improve the quality of beams
analysis of metadata; improving data reduction delivered to end stations. Improving particle
and documentation of experimental conditions; tracking will also rely on ML techniques, but
and facilitating data interoperability. these techniques must be sufficiently validated
to ensure the tracking performs on data in
To achieve such capabilities across diverse the energy region of interest. AI-centric
instruments, we must create usable tools for workflows using deep neural networks (DNNs)
the large-scale training and optimization of ML trained by detector signals will improve our
models, training methodologies that can detect ability to distinguish event candidates from
rare features in high-dimensional spaces, and background data.
tools to quantify the impact of systematic
effects of the accuracy and stability of complex AI algorithms have demonstrated powerful
ML models. However, one of the obstacles to anomaly detection capabilities and will
applying data science to hypothesis generation also provide the necessary performance for
INTRODUCTION 8
intelligent instrument operation and experi- Designing and Steering Infrastructure
ment-steering. ML inference with microsecond
latency will be required to support particle Just as AI will enable breakthroughs in
physics trigger applications in large detectors automation (such as designing experiments,
and associated event processing operations. self-driving laboratories, or steering
instruments), it will make it possible for the
The use of AI for real-time experiment-steering same techniques to be applied to designing
will increasingly become indispensable, and operating complex infrastructure. From
whether for light source instruments or electrical generation to transmission to
tokamak experiments, and will become equally distribution systems, increasingly powerful
critical for orchestrating the coupling of sensors—with edge computation enabling AI
cosmological models with the steering in-situ for anomaly detection, predictive
mechanisms of a new generation of multi- analytics, and controls/optimization—will
spectral telescopes. improve resilience as well as restoration by
enabling predictive capabilities of after-event
states and sharper awareness during the
Engineering, Instruments, and restoration process. AI-driven, real-time
Infrastructure intelligence in this context can perform
Chapters 7–9 and 14–16 information fusion from disparate sources,
coupling real-time infrastructure data with
Terms such as “smart manufacturing” and infrastructure models (e.g., a “digital twin”).
“digital twins” reference transformational Similarly, AI/ML-enabled predictive models
approaches for expanding optimization to trained by infrastructure data will be
include an entire manufacturing lifespan, from indispensable for exploring the design spaces
raw materials to shape/topology to manufac- for smart energy—as well as transportation—
turing process to end use. Concurrently, AI has infrastructure, HPC computing systems and
been used in generative design, a two-step data centers, and communications networks.
iterative process based on design goals that
first generates possible outputs that meet In similar fashion, particle accelerators, light
specified constraints and then allows a sources, and complex instruments such as
designer to tune variables to meet constraints. ITER comprise many interconnected
Generative adversarial networks are often used subsystems of magnets; mechanical, vacuum,
to drive the underlying optimal design. and cooling equipment; power supplies; and
other components. These instruments have
The nation’s energy infrastructure is moving thousands of control points and require high
increasingly from traditional loads (non-digital, levels of stability, making their operation a
invisible) to many more and smaller loads that complex optimization problem. The operation
expose data (are visible) and have of these instruments has benefited from AI/ML-
communication and intelligence features based solutions but remains extremely difficult
amenable to a cooperative load-management due to the lack of a priori models for reliable
approach. Combined with increasingly and safe control. In the absence of such
intelligent energy distribution and generation models, learning models based on raw data
infrastructure, the complexity, nonlinearity, and and other AI/ML-based solutions have been
emergent behaviors of these systems will explored, with promising results.
require AI-enabled, distributed and cooperative
configuration, optimization, threat detection and Even smaller scales, such as manufacturers of
avoidance, and control. limited volume batches of materials and those
INTRODUCTION 9
that produce many variants of similar designs electron microscopes and light sources can
for customized products, are limited by mere require responses in the 100 nanosecond
automation with heuristics-based operational range—over 100,000 times faster.
rules on robotic assembly lines. As with self-
driving laboratories, this widespread class of All of DOE’s current scientific facilities—ESnet,
manufacturing genre must move to robotics exascale machines, the continentally
with AI-at-the-edge to perform tasks distributed ARM, individual light sources, data
autonomously (in similar fashion, as noted sources from field-deployed sensors, and
earlier, with respect to self-driving laboratories instrument and HPC data repositories—have
or remote observatories). been designed for traditional scientific
workflows. Every link in this chain, from data
These data-driven methods for control-level portals and networks to edge systems, HPC
modeling, management, and interpretation of resources, and input/output (I/O) systems must
real-time data for control, optimal trajectory evolve to support the new demands of AI
determination, and real-time prediction to applications and workflows.
support continuous and asynchronous actions
and prevent faults will also accelerate the Infrastructure Security
development of approaches to the operation of
new types of infrastructure such as fusion As critical infrastructures increasingly rely on
power plants. information systems, AI applications will offer
the best approach to detecting and diagnosing
DOE also operates instruments with cyber and physical attacks and threats in real-
components distributed over distances from time. Removing the human-in-the-loop is
hundreds of kilometers (e.g., ESnet or the ARM increasingly necessary for defensive responses
facility). Moving to autonomy and adaptive on the same millisecond timescales as digital
measurement makes the current practice of attacks. Here AI can offer novel techniques,
centralized control intractable. Whether in including surrogate models, closure models,
laboratory experiment lines, on city-sized and learning-driven compute acceleration of
accelerator facilities, or for continental-scale high-fidelity models and solvers.
infrastructure, AI will be needed to support
AI in HPC
infrastructure as autonomous, self-tuning, and
self-healing complex systems with emergent As noted above, use of AI surrogates within
properties and non-linear behavior, relying on HPC models has the potential to improve time-
AI-at-the-edge due to complexity as well as to-solution by orders of magnitude, albeit
latency and data communications bandwidth. replacing first-principles functions with
approximations. AI-based surrogate models
Commercial AI hardware and system-on-chip can play at least three roles in manufacturing
(SoC) systems also have a key role to play, systems, including a priori optimization,
given DOE’s billions of dollars of investment in in situ real-time process control, and
experimental facilities. Ultra-low latency and heterogeneous manufacturing through the
low power inference for scientific experimental transfer of AI models between different devices
control in these facilities can enable more and/or feedstocks.
complex, intelligent experiments, and more
efficient operation. Again co-design and overall With infrastructure and manufacturing,
system architecture are critical as even the surrogate models could form the basis for
most time-sensitive commercial applications digital twins that guide design and operation.
fueling the AI hardware industry, such as Determining the best AI techniques to generate
autonomous driving, require millisecond and validate surrogates that are robust and
response, while DOE instruments such as
INTRODUCTION 10
with minimal bias will be important, along with optimization, and control services while also
research to explore, for at least several providing data that can use new AI-based
exemplar manufacturing and infrastructure services for creating and refining generative
processes, the optimal type and quantity of models that can guide the optimization and
data to improve design optimization. safe operation of the instruments themselves.
Surrogates must also incorporate an

understanding of emergent behaviors of Foundations, Software, Data
interacting AI agents while capturing the multi- Infrastructure, and Hardware
physics of complex infrastructure and energy Chapters 10–13
systems, learning from the combination of
measured data and physics- or model-based As noted throughout the disciplinary,
simulation data for rapid prediction. Critically, engineering, and infrastructure discussions,
new methods for validating and testing AI- research and infrastructure are needed to
based models, controls, and optimization will advance AI methods and techniques to
be required in order to entrust critical services address the complex challenges of using AI to
to their control, with verifiable trust being as advance science discovery. It is recognized
important as the capabilities themselves. that research is needed in areas such as
processor and memory design, mathematical
AI-Driven Leadership Computing AI foundations, software environments, data
infrastructure, and hardware.
The DOE’s Aurora, Frontier, and Perlmutter
architectures are already designed to optimize Training Models
for AI workflows. Follow-on systems at the
LCFs and NERSC will have upgrades and The core of any ML-based AI system is the
enhancements informed by the new AI creation of an abstract model and the training
services, workflows, and toolkits discussed of that model is based on data. Data efficient
throughout this report. Aligning the future path- learning in ML systems must be studied with
forward efforts with the development of these respect to algorithms and efficiency of
new capabilities will be critical to enabling an implementation, and especially with respect to
AI-based instrument approach to future exploiting new architectures, whether through
infrastructure and experiment design. the use of AI-oriented, reduced precision
accelerator hardware or novel computing
Instrument-to-Edge systems (e.g., quantum, neuromorphic) and
associated programming paradigms.
Existing large-scale instruments, upgrades
such as those to APS or ALS, and new
Today’s approaches to ML and AI are generally
instruments such as ITER, all share the need
domain-agnostic, ignoring domain knowledge
for AI services that can exploit their capabilities
that extends far beyond the raw data itself. For
and plumb the unprecedented volumes of data
example, current approaches ignore physical
they produce. The envisioned AI-based
laws, available forward simulations, and
services and toolkits as described earlier will
established invariances and symmetries.
have the most impact if undertaken in concert
Incorporating modeling and simulation
with an “Instrument-to-Edge” hardware and
capabilities to generate use case specific
software infrastructure that is developed and
training data leverages decades of HPC
incrementally deployed to grow a common
improvements to accelerate learning; incor-
control and analysis architecture across DOE’s
porating mathematical equations and scientific
major instruments. Experiments using these
literature leverages centuries of advances
instruments will rely on these new AI design,
in theory.
INTRODUCTION 11
New AI Hardware and Systems To fully realize capabilities ranging from self-
Components driving laboratories to AI-designed, imple-
mented, and operated scientific workflows, new
There is an explosion of new AI hardware in programming and run-time models must also
industry, however the target applications be developed. For example, scientists might
driving these devices largely comprise ideally describe workflows as high-level goals
consumer or enterprise areas such as and itemize building-block tasks (i.e.,
autonomous driving, social networks, e- experiments, simulations) and rough models of
commerce, and gaming. As evidenced in the costs of those tasks. An AI system could
DOE’s Exascale Computing Project, there are instead generate a specific workflow,
significant opportunities to co-design incorporating expert knowledge, to accomplish
heterogeneous compute nodes that leverage those tasks, adapting as results are uncovered
these new architectures and commodity or new data become available and refining the
SoC ecosystems. models of costs (e.g., in energy use or time).
Such workflows will need to operate across
A set of integrated new AI workflow orders of magnitude variations in
frameworks and exemplar applications will be communications latency and bandwidth and in
needed to evaluate emerging AI architectures computational power and storage, especially in
from edge SoCs to HPC data centers. This cases of specialized edge devices designed for
would effectively create both an evaluation tool low-power deployments in the field. These
set and a simultaneous series of specific programming frameworks will need to provide
science-based challenges to drive and shape resource discovery, matching, negotiation, and
new AI technologies, including those that fuse complex optimizations of these new forms of
explicit knowledge and learned function. heterogeneous distributed computing infra-
structure, including the integration of inference
Programming Models and Workflows on low-power edge systems with iterative
learning systems within a few milliseconds of
The design of next-generation hardware and the edge (e.g., in 5G telecommunications
software systems—from new chips to entire stations) and deep learning in data centers.
HPC systems—and the mapping of application
codes to target systems is currently a static Current HPC memory and storage systems are
process that involves human-in-the-loop design architected for traditional HPC simulation-only
with repeated experiments, modeling, and workloads with relatively small inputs and large
design space exploration. As these systems outputs, where the access patterns are
increase in complexity and heterogeneity, predictable, contiguous, block-based opera-
current strategies will be impractical. tions. Current AI training workloads, in contrast,
must read large datasets (i.e., petabytes)
Early work demonstrating systems and repeatedly and perhaps non-contiguously for
workflows that integrate AI capabilities with training. AI models will need to be stored and
traditional HPC simulation has largely involved dispatched to inference engines, which may
bespoke capabilities for each experiment. The appear as small, frequent, random operations.
frameworks, software, and data structures are Indeed, the model for computing within DOE
distinct, and APIs do not exist that would will need to evolve to where specialized AI
enable even simple coupling of simulation and hardware cooperates with traditional HPC
modeling codes with AI libraries and systems to train models that are dispatched to
frameworks. In situ data analysis requiring ML low-power devices at the edge.
capabilities suffers from the same limitations.
INTRODUCTION 12
AI Foundations characterized tool for science, the research
community will need to address these
AI presents a unique opportunity for creating questions and develop advanced capabilities to
data-driven surrogate models that are explain the behavior of the AI model.
potentially orders of magnitude faster to run
than first-principles simulation codes and that Especially for systems operating experiments,
can be particularly effective in the ability to instruments, or critical infrastructure, validation
simulate physical processes that span many is vital regardless of whether the AI model is
spatial and temporal scales. Rigorously making the right decision for the right reason.
understanding tradeoffs such as generalization Has the AI model learned spurious
limits, proofs of interpolation/extrapolation, correlations, or can the model determine the
robustness, assessment of confidence control variables? Can AI be used to identify
associated with predictions, and effects of the causal variables or distinguish between cause
input data will impact not only model selection and effect? Typically this cannot be done with a
in AI systems, but also the creation and single training dataset. Instead, the AI model
investigation of new classes and types needs to be trained to construct a hypothesis,
of models. typically a counterfactual one, and to design an
experiment—including the collection of data
At the most basic level, frameworks and tools (and the suitability of that data)—to test that
are needed to establish that a given problem is hypothesis.
effectively solvable by AI/ML methods and is
not subject to limits such as extreme Opportunities exist for fundamental advances
complexity, unbounded problems, or explain- in optimization algorithms, differentiation
ability. Principles of theoretical computer techniques, and models—foundational to
science provide a rigorous framework to training in AI. Additionally, an important aspect
establish critical properties of AI/ML codes, in the development and application of AI is the
namely computability, learnability, quantification of uncertainties. Where AI and
explainability, and provability. ML are used in physics-based applications,
established approaches to UQ are applicable.
To become an accepted part of the toolboxes In other cases, particularly in classification
used by scientists and engineers, the validity problems, ML models tend to be highly
and robustness of AI techniques need to be nonlinear systems that are extremely sensitive
trusted. What are the limits of AI techniques, to input data, and small (e.g., undetectable to
and what assumptions and circumstances can the human eye) changes can lead to
lead to establishing assurance of AI predictions misclassification.
and decisions? Which AI techniques can best
address different sampling scenarios and Addressing the computer science challenges
enable efficient AI on various computing and will require a comprehensive AI/ML science
sensing environments? Resulting AI systems program to develop and refine foundational
must similarly address assurance: whether and limits and solvable problems and to sharpen
when an AI model can be trusted. Why does the solutions for solvable classes to ensure
the AI model work for a problem? What are the effective computation, performance guarantees
internal representations of data that the AI and explanations. This is an urgent issue, as
model has learned during training? How can work on the foundations of AI and ML has been
the behavior of the AI model be explained? far outpaced by the empirical exploration and
How confident are the AI models on their use of such techniques—often in the form of
predictions given the different sources of bespoke systems with disparate architectures.
uncertainties and inductive biases involved? Consequently, the principles underlying the use
For such an AI model to be accepted as a well- and understanding of these and other
INTRODUCTION 13
techniques tend to be scattered across techniques are vital to designing and operating
disciplines, from theoretical computer science increasingly large scale, complex infrastruc-
to signal processing to statistics. ture. These services will, in turn, require AI-
based functions that can integrate and
Discovery and Data augment multimodal data sources including
metadata, such as scientific instrument
Accelerating science, engineering, and responses (e.g., flux and focus) in combination
manufacturing through AI methods requires with a record of instrument configurations (e.g.,
large and diverse sources of data. At the same motor positions, neutron chopper phases,
time, AI may hold the key to the limitations monochromator bending parameters), and
associated with that data. That is, applying measurable instrument and environmental
data sources—from instruments, simulations, parameters (e.g., ring current, cooling water
sensor networks, satellites, the scientific flow, and temperature). The integrated data will
literature, and research results—is inherently underpin AI services for developing generative
challenging with respect to data being “FAIR” models and decision-making functions that will
(findable, accessible, interoperable, reusable). be required to build advanced predictive
AI systems can be employed to automate the models of accelerators, end stations, and
creation of FAIR data and integrate it into sample delivery systems. Such services and
knowledge repositories, in turn providing the models will also aid in automated alignment
architectural basis for new data infrastructure and calibration of instruments, stabilizing user
necessary to accelerate AI training and operations, predicting and preventing
model development. catastrophic failures, and/or reducing the total
downtime of the instrument.
This high-volume data acquisition not only
extends the end-to-end experimentation time While the infrastructure and methods needed
but also limits experiments with time-sensitive to enable AI methods to access, learn from,
phenomena. Smart data reduction techniques and add to the broad body of knowledge are
(e.g., filtering relevant data or point-of-interest nascent, there are promising examples, such
data acquisition) will be necessities rather than as the use of reinforcement learning,
features with the upcoming instruments such unsupervised learning, and classification
as those mentioned earlier. techniques to automate labeling and creation of
metadata.
Data produced by instruments, manufacturing
systems, or engineered products (e.g.,
vehicles), often cannot be shared due to Conclusions
regulations (e.g., medical records or energy
usage data) or the competitive nature of the Realizing the scientific capabilities discussed
data (e.g., factory or mobility data). AI-based throughout this report will require extensive co-
federated learning techniques can accelerate design work for domain scientists, facility
model development, for instance, by designers, AI experts, mathematicians,
harnessing proprietary manufacturing data computer scientists, and software research
from multiple sources. These techniques teams. Across the 16 chapters are scientific
enable the development and training of models requirements that suggest a suite of new AI-
with data from many sources without requiring capability building blocks and services, from
data sharing among them. design to control, augmented simulations to
generative models, decision making to inverse
AI-based data services that leverage success problems, and the ability to learn not only from
to date of new DL and unsupervised learning multi-model data (e.g., text, graphics, images,
INTRODUCTION 14
waveforms, structured, time series) but from analysis infrastructure that fully exploits and
the domain knowledge embodied in the optimally supports new, AI-enabled software,
scientific literature. data lifecycle, workflow, and modeling services
and toolkits.
To achieve the grand challenge of developing
self-improving and self-adaptive hardware- DOE’s programmatic approaches, such as co-
software systems and applications, the design or SciDAC programs, are ideal for
services, applications, and software developing the new AI services—packaged
infrastructure must be both grounded by and supported as reusable toolkits and building
mathematical and AI foundations research and blocks—that are required for self-driving
also implemented, evaluated, and adjusted laboratories and for steering scientific
over the coming decade. While this report is a instruments. AI-based components including
not a detailed implementation plan, we can see design, decision-making and evaluation,
possibilities for how to accelerate the control and optimization, or the creation of
opportunities identified by the community. One generative models from instrument data and
potential path is to partner with industry along simulations are necessary to move from
at least two roadmaps. “AI has potential for…” to “AI is enabling…”.
The first is an “Instrument-to-Edge” activity that International leadership in AI over the coming
charts the course toward common tools and decade will hinge on an integrated set of
services for instrument, experiment, and programs across four interdependent areas—
infrastructure design, evaluation, optimization new applications, software infrastructure,
and steering, and safer operation across the foundations, and hardware tools and
DOE enterprise. technologies, feeding into and informed
concurrently by DOE’s scientific instrument
The second entails continuing efforts to facilities and by DOE’s leadership class
advance a leadership computing, data, and computing infrastructure.
INTRODUCTION 15
16
01. Chemistry, Materials, and Nanoscience
The ability to design and refine materials and 1. State of the Art
chemical compounds has always been key to
the rapid advancement of society’s technology Our ability to discover new materials and
and infrastructure. Today’s complex chemical reactions is driven by intuition, design
technologies require a broad spectrum of rules, models, and theories derived from
needs when developing and optimizing scientific data generated by experiments and
materials and chemicals with desired simulation. The number of materials and
performance [1–3], such as mechanical, chemical compounds that can be derived is
electronic, optical, and magnetic properties astronomical, so finding the desired ones can
(e.g., smartphones use up to 75 different be like looking for a needle in a haystack.
elements compared to the twentieth-century Currently, various machine learning (ML)
version that had only ~30). This new level of approaches are used to help scientists explore
technological complexity, combined with the complex information and data sets with the
need to search undiscovered areas of goal of gaining new insights that lead to
the chemical and materials landscape without scientific discoveries. Future discoveries of
clear theories or synthesis directions, advanced materials could be greatly
[4] requires new paradigms that utilize artificial accelerated through ML. Note, for example, the
intelligence (AI). timeline from discovery of LiMn2O4 to nickel-
manganese-cobalt (NMC) materials for
AI will become an integral part of a scientist’s batteries. Using known data, we could use ML
arsenal, alongside pen and paper, and to accelerate discovery of new material classes
experimental and computational tools. It will for batteries from 14 years to less than 5 years
accelerate the next scientific discoveries and (Figure 1.1).
the design and development of revolutionary
technologies benefiting society. AI will identify Nowadays, experimental characterization tools
both promising materials and chemicals, and routinely provide picometer/picosecond
the reaction pathways to make them [5]. resolved images at an ever-increasing rate,
Scientists will use AI to generate scientific data and, when coupled with a modern camera, are
in a rational way, formulating new physical capable of providing several hundreds of
models and theoretical insights that drive new frames per second. This pushes the data size
paths for rational design of materials and into the several hundreds of terabytes (TB) per
chemicals, and exploring atomic design spaces experiment for a single microscope [6]. Real-
currently unimaginable. time analysis of this data, aided by AI, is
Figure 1.1 Timeline from discovery of LiMn2O4 to NMC materials for batteries.
01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 17

needed to provide rapid feedback to and from created library, allowing for the identification of
models and simulations that can both several defect types across multiple imaging
inform and validate decisions. Such rapid platforms. This approach now allows automatic
feedback would also enable experimental creation of defect libraries in solids, explores
adjustments on the fly. Progress has begun to the metastable configurations that are always
address two major gaps in the current present in real materials, and provides
paradigm of materials design and discovery correlative studies with other atomically
that typically proceeds via synthesis ⇒ resolved techniques than can provide compre-
characterization ⇒ theory. hensive insight into defect functionalities.
First, continuous growth in high-performance It is this integration and analysis of multiple,

computing (HPC) capabilities, combined with complex data sources combined with current
the development of efficient and scalable state-of-the-art ML approaches that holds great
electronic structure calculation methods, is promise for a drastic acceleration of materials
enabling scientists to virtually explore materials and chemical compound discovery.
and chemical compounds. Large databases
have come online containing the simulated 2. Major (Grand) Challenges
properties of millions of relatively simple
materials and chemical compounds. Deep Finding new materials or chemical compounds
learning (DL) approaches are being developed that have unique properties needed for real-
for various tasks, such as predicting properties world applications—for example, batteries that
or structure, but this barely scratches the hold 10x the storage capacity compared to
surface of the full atomic design space today’s batteries, or materials that capture
available to us. Even more, the real world is far more solar energy at greater efficiency—is a
more complicated than the simple structures grand challenge due to the nearly infinite
often studied by electronic structure calcula- chemical or atomic design space to which
tions, and simulations investigating systems scientists have access. To date, our modern
under device-relevant conditions are still chemical and materials synthesis and
prohibitively expensive. Advances are needed discovery process incorporates a wide range of
in reliable and precise computational tech- design rules and theories, alongside advanced
niques that accurately (and rapidly) address characterization tools capable of observing
the increasingly complex functionalities synthesis processes on size and time scales at
required for today’s technological applications. which they occur. At the same time, high-
throughput screening via theory-driven
Second, significant progress has been made approaches, per the materials genome, has
toward fully exploiting all of the information provided guidance in identifying promising
contained in experimental and computational candidates optimized for particular properties.
data to predict and understand new materials. Early work in ML shows the potential for AI to
An example is the automated image analysis start to provide guidance on the synthesis
and recognition based on DL networks that pathways to make a material or chemical. The
was successfully developed to identify and underlying grand challenge as outlined by the
enumerate defects, and that created a library of Basic Energy Sciences Advisory Committee
(meta) stable defect configurations (BESAC) is how to design and perfect atom-
(Figure 1.2). The electronic properties of the and energy-efficient synthesis of revolutionary
sample surface were further explored by new forms of matter with tailored properties.
atomically resolved scanning tunneling This requires us to explore materials and
microscopy (STM). Density functional theory chemical compounds compositions entirely
(DFT) was used to estimate the STM unknown, driving questions such as, where in
signatures of the classified defects from the our atomic design space do we look? How do

Figure 1.2 A scanning transmission electron microscope (STEM) images materials where there are defects
present or intentionally induced by the electron beam in the STEM. DL via convolutional neural networks is used
to process the data to recognize and categorize defects. These data are populated into a database hosted by
CITRINE Informatics. DFT calculations via HPC are used to predict STM images for the different defect classes,
which then are used to train the DL in a similar fashion to the STEM, and then deposited into the database [7].
we search the chosen space in the most Specifically, gaps/challenges that need to be
efficient way or decide to move on to other addressed by AI/ML are listed below.
areas? Can we develop new design rules?
Aiding this would be the ability to understand Design metastable phases and materials
the length- and time-scale evolution of that persist out of equilibrium. These
functional chemical and materials systems. materials enable access to a diversity of
properties beyond the limits drawn by
The primary challenges are concisely equilibrium thermodynamics. For example,
described by BESAC’s 2015 report, Challenges optically driven processes of materials could
at the Frontiers of Matter and Energy: provide more control over the chemical
Transformative Opportunities for Discovery processes and lead to new materials, such as
Science. metastable phases or new low-dimensional
materials with dynamics controlled by in-plane
• Mastering Hierarchical Architectures and heterogeneity rather than layer stacking order.
Beyond-Equilibrium Matter Another example is self-assembly, where
transient (non-equilibrium) intermediate states
• Beyond Ideal Materials and Systems:
frequently appear, and control of assembly
Understanding the Critical Roles of
pathways can enable improved structural
Heterogeneity, Interfaces, and Disorder
control. Modern characterization systems such
• Revolutionary Advances in Models, Mathe- as electron and scanning probe microscopies
matics, Algorithms, Data, and Computing may allow “bottom-up” fabrication of new
• Harnessing Coherence in Light and Matter structures that are metastable, which allows
arrays, for example, of topological defects to be
• Exploiting Transformative Advances in created with nanometer precision for desired
Imaging Capabilities across Multiple Scales properties. The challenge is to do this in an
efficient and reproducible fashion; this requires
in-line analytics and feedback of very high
velocity and volume data streams.

In January 2018, the U.S. Department of Energy’s complex hierarchical correlations, from
(DOE’s) Office of Advanced Scientific Computing molecular-scale interactions up to transport
Research (ASCR) hosted a Basic Research phenomena, and mapping energy landscapes
Needs workshop focused on ML for science. This for the chemical and materials transformations
workshop resulted in development of priority research
directions (PRDs) for interpretability, domain
that occur during aging of separation
awareness, robustness, and needed capabilities materials/chemicals.
(Workshop report on Basic Research Needs for
Scientific Machine Learning: Core Technologies for
Artificial Intelligence, https://www.osti.gov/servlets/
purl/1478744). Although the workshop highlighted
significant investment in ML for the analysis of big
data, there has been less activity on the generation of
such data sets—a critical need as DOE’s major
experimental facility upgrades begin commissioning.
PRD-6 from the workshop, intelligent automation and
decision-support, is highly relevant as timely
advances in AI and ML will be critical to enable the
full scientific potential. To make AI/ML successful for
the large experimental and computational data from
our facilities, there are challenges in terms of
archiving metadata and preserving provenance,
workflows to manage data transfer to and from
instruments and integration with HPC facilities, Figure 1.3 An integrated approach for future design of
development of software stacks (federated), and materials interfaces tailored for performance. Key to this vision
uncertainty quantification to identify regions of model is inclusion of multi-modal operando experiments enabled by
AI/ML.
validity.
Design materials and molecules for

Understand and control interfacial quantum information sciences (QIS). Much
processes and properties. Controlling of the transformative success of technologies
interfaces (liquid/liquid, gas/solid, etc.) often underlying the information age was built on our
rely on precise control of atomic bonding and ability to manipulate chemical composition and
molecular interactions between two dissimilar doping, and hence electronic band structure
phases. The ideal strategy to avoid and electrochemical potential, within materials
performance-limiting defects in materials, for at tiny length scales, encode local electronic
example, is to minimize perturbation of the properties as the physical instantiation of
atomic order at the interface by preserving a information, and thus control the storage, flow,
high degree of crystallographic order (e.g., and processing of information. We now stand
epitaxy). However, atomic scale insights into on the brink of a quantum information
grown structures present significant inverse revolution. Here, breakthroughs will be driven
problems that have been difficult to address. by the ability to harness the interplay and
This may potentially be tackled using combined evolution of quantum entangled and coherent
physics-ML methodologies (Figure 1.3). ensembles as the physical representation and
Additionally, chemical separations, an area processing of information. This will provide
which is fundamentally important to almost radically new opportunities in computation,
every aspect of our daily lives, from the energy enabling exponentially higher speeds and
we utilize to our medications to chemical efficiencies and the ability to solve problems
purification, including water, can see that are currently intractable. As such, there is
transformative advances with AI in terms of a desperate need to deliver systems for
refining and optimizing experimental potential solid-state qubits, photon sources,
approaches. The use of AI will aid the pursuit and quantum sensing systems [BES
of grand challenges such as understanding Roundtable, Opportunities for Basic Research

for Next-Generation Quantum Systems, Oct. Understand and master energy and
30–31 (2017); Roundtable, Opportunities for information with capabilities rivaling those
Quantum Computing in Chemical and Materials of biological systems. Biological systems
Sciences, Oct. 31–Nov. 1 (2017)]. Promising naturally transform and distribute energy
advances at DOE facilities in layered materials through photosynthesis and subsequent
stamping and a new pulsed laser deposition decomposition of photosynthetic material.
(PLD) system will generate rich structural, Conversion of energy to biomass can occur via
heterointerface, and functional property various mechanisms, including photosynthetic
datasets that will require deep AI/ML analysis and chemical pathways with oxygen (i.e.,
and real time control. This analysis/control will aerobic) and without oxygen (i.e., anaerobic).
need to be done in situ and on the timeframe of Greater insights are needed into the regulation
the experiments to enable smart-steering of the of these pathways, the mechanisms
synthesis processes toward successful responsible for the reactions, and environ-
quantum materials. mental influences on the reactions. This
improved understanding is a precursor to
Understand the critical roles of hetero- enabling changes in pathways that may
geneity in complex systems. Heterogeneities uncover new or more efficient energy sources.
and interfaces underlie novel functionalities
and drive dynamical processes, such as 3. Advances in the Next Decade
charge and exciton transport (e.g., along grain
boundaries), charge separation (at Type II In the next five to 10 years, AI will be an
heterojunctions) and recombination (at Type I integral part of a scientist’s discovery and
heterojunctions), spin evolution, and transport design arsenal. Scientists will use AI to
of ions or molecules through ordered and generate scientific data in a rational way,
disordered systems (e.g., at battery interfaces formulating new physical models and
or through metal organic frameworks). theoretical insights that drive new paths of
However, understanding transient and time- rational design of materials and chemicals,
dependent processes in material and chemical exploring atomic design spaces currently
systems is enormously challenging; examples unimaginable.
include identifying chemical reaction pathways,
visualizing electronic and optoelectronic The ultimate form of AI for materials, chemistry,
processes at their native lengths (single atoms and nanoscience constitutes autonomous-
to many nanometers) and time scales (femto to smart experiments and simulations,
nanoseconds and beyond) in heterogeneous including synthesis and automated
materials, and studying exchange processes discovery, that integrate all aspects of the
between excitations on various length scales. materials and chemistry discovery loop—from
Progress can be made via high-throughput preparation through characterization, to data
materials synthesis and automated atomic- interpretation and feedback—in order to
scale/multimodal characterization. Here the minimize the experimental trials needed to
aim is to broadly understand how population achieve a desired property or set of properties.
diversity influences growth and behavior, with This could allow vastly more challenging
the ultimate goal of creating a closed-loop materials and chemical compound problems to
materials property prediction, synthesis, and be tackled. However, such an autonomous
characterization loop. By understanding and process will still require expert scientists in the
controlling heterogeneity, it may be finally loop to ensure viability and success. Overall,
possible to design multifunctional and self- the vision of “autonomous-smart experiments”
regenerating catalytic systems. is an as-yet unrealized grand challenge, as the
parameter space is simply too large to manage
in traditional ways. AI/ML can clearly be a

transformative key to bridge this gap, but it will structure-property mapping. It will be important
require addressing a number of challenges to fully enable in situ multimodal analysis with
(ranging from teaching the AI physical streaming data, for example, implementing
concepts and rational design decisions), online analysis and active learning during an
making experimental instruments “smart,” experiment when more than one type of probe
integrating experimental and simulation data, is being used (as data will be streaming at
working with large and diverse sets of potentially very high velocity and volume).
streaming data, and having precise control
over the experiments. AI/ML can be With AI and ML automation of model-building
transformative in terms of high-throughput and decision-making in experimental loops,
screening, drastically accelerating simulation machine-guided synthesis, processing, and
capabilities to achieve desired precision with ultimately materials and chemistry discovery
very low computational cost and opening the can be achieved, enabling discovery,
door to virtually explore a much larger part of synthesis, and control of novel processes and
the available design space. properties (Figure 1.4).
Efficient materials, chemical, and device In the next decade, all the upgrades to DOE’s
characterization are critical elements in the light sources will be completed alongside the
scientific discovery workflow. As such, the proton power upgrade at the neutron source.
characterization capabilities are constantly Thus, there will be significant advances and
used for the determination of chemical new information in the following areas.
composition, structure, physical properties, and
overall functionality. In general, this involves New data sets/instruments online. There will
(1) an analytical step to confirm that the target be a continued increase in the capabilities in
chemicals and/or materials are produced; detectors/cameras alongside accelerators that
(2) characterization of the physical properties, will lead to a tremendous increase in potentially
morphologies, defects, and interfaces of the high-quality information from microscopes and
functional materials and chemicals by multiple light sources. Those instrument advances will
probes/techniques; (3) characterization of the provide extreme volumes and velocities of data
functional properties, in situ/operando, in that contain deep information regarding
devices. This means it will require new analysis materials/chemistry processes alongside a
across all of these platforms, including modality that enables manipulation and control
registration of data from different instruments of the materials.
(e.g., pan sharpening) and scaling for
Figure 1.4 Schematic illustration of the elements of experiments and computations that are
required to enable autonomous-smart experiments for materials/chemical design/synthesis.

Enhancement in big data and data curation. Computers and algorithms. There will
There must be a focused effort to link major continue to be major advances in computer
facilities and capabilities, such as our capacity and mathematical algorithms which
leadership computing facilities and our will further enhance the ability to perform in-line
microscopy, light, and neutron sources, to and real-time analysis of experimental and
characterize and fully understand the new computational data.
materials. We need a radical improvement on
data sharing, analysis, and curation that will Accelerated simulation. Continued advances
catalyze scientific discovery. This requires the in computing capacity and computational
development of protocols, common data chemistry and materials methodologies,
formats, and complete metadata to document combined with ML network development, will
and curate the full history and knowledge of the provide new sets of data for AI/ML and
synthesized material. Furthermore, workflows decision making.
to integrate knowledge across multiple
facilities, and the ability to create and draw on New AI/ML techniques. Advances are
knowledge graphs to better inform modeling expected in reinforcement (algorithms that
and propose new experiments, should be employ reward/punishment), active learning
expected. Ultimately, a shared and curated and neuromorphic computing that may be used
source of data that is easily searchable and “at the edge”—where people and things meet
minable will be a fundamentally needed (AI/ML at the edge)—as well as in explainable
infrastructure. Progress is expected along the and interpretable AI/ML (see Chapter 10, AI
lines of new AI platforms that integrate diverse Foundations and Open Problems). Particularly
scientific data resources, including the important will be advances in AI/ML
literature, and respective mining engines, approaches that can deal effectively with
which will enable automatic development of sparse, unlabeled data.
training sets from heterogeneous experimental
and simulated data (see Chapter 12, Data 4. Accelerating Development
Lifecycle and Infrastructure).
To achieve the vision of autonomous-smart
Rare events detection and identification. experiments/discovery, a number of technical
Rare events are events that occur very challenges must be addressed. It will be
infrequently, i.e., their frequency ranges from critical to accelerate development in the
0.1 percent to less than 10 percent. While following areas.
these events are low probability, they can have
high impact. Advance edge computing and integrated
experimental instruments. Computing at the
Events such as failure in materials under experimental instrument(s) for on-the-fly
stress, or side reactions in gas phase analysis with feedback during an experiment
chemistry that may occur on time scales too will need to be implemented to maximize
short for humans to observe, are very information gain and efficient control. This will
important to identify. Near-term adaptive be particularly important for multimodal
control of some experiments—when experimental probes that require analysis
implemented as real-time decision-making across different platforms. Edge computing for
during an experiment—can identify regions of automating aspects of experiments, such as for
interest and save the relevant data. The AI/ML-assisted tuning of the environment,
introduction of AI into instrument control importance sampling, next-experiment
systems will allow detection when their recommendation, etc., will be critical.
alignment has drifted and then perform Additionally, on-demand pipelines to HPC for
automated alignment and recalibration. automatic spawning of jobs directly related to

discoveries at the instrument are needed. This heterogeneities in materials and the delicate
can be important for forming databases based balance in chemical compounds and reactions.
on higher levels of ML models trained on The goal is to minimize the uncertainty and to
simulated data, where the simulations would maximize physics knowledge gain.
require an HPC environment. The goal is to
provide fast on-the-fly analysis of “streaming” Enable AI/ML approaches to represent
experimental data. physics. Dictated by the laws of physics, only
discretized structures exist in nature. This
Enable in situ multimodal analysis. “discreteness” needs to be represented
Characterization capabilities are constantly properly in the encoding space to control
used for the determination of chemical erroneous predictions and misclassifications.
composition, materials structure, physical New and novel mathematical approaches are
properties, and how such properties correlate needed to incorporate physical constraints and
with functionality. In general, this involves symmetries into the representation and
(1) an analytical step to confirm that the target encoding of chemical and materials data,
chemicals and/or materials are produced; feature detection, and the learning process
(2) characterization of the physical properties, itself. New kernels that can operate on
morphologies, defects, and interfaces of hierarchical structured data for similarity
the functional materials, by multiple quantification to enable the application of
probes/techniques; (3) characterization of the uncertainty-aware regression methods are
multi-functional properties, in situ/operando, in also needed.
devices (in vacuo, in solute, in atmosphere)
across a broad frequency range. Achieving Enable big-fast data at the signal-noise
acceleration will require new in situ multimodal edge. Use of ML models for characterization at
diagnostic approaches which incorporate all of the dose limited range is critical for
these analytical platforms in one experiment. autonomous experiments. Big-data-based
These include registration of data from different techniques, such as four-dimensional scanning
instruments/in situ probes and scaling (e.g., transmission electron microscopy (4D-STEM),
pan sharpening) for structure-property map- are limited by how fast the data can be
ping, multimodal cross-correlation, and building collected, with the bottlenecks arising from
of frameworks to integrate knowledge in a detector readout times and data transfer rates.
rigorous physics-based framework that This imposes constraints on the sample since it
incorporates uncertainty quantification meta rules out beam-sensitive samples that will not
data analytics. In situ data analytics, including be stable under the comparatively slower
cross modeling, will be approached on two imaging conditions, and also dynamic in situ
levels, the first level at the point of experiment experiments. Fast detection is possible, but the
using edge computing and at the second level data is noisier. Current state-of-the-art iterative
of HPC. The ML algorithms will be incorporated analysis protocols are more susceptible to
as a part of in situ multimodal analysis. It will noise, and next-generation ML models trained
lead to machine-guided decision-making on HPC-simulated datasets can be a way to
algorithms for selection of optimal experimental bridge this gap between big data in microscopy
condition, minimal number of experiments, and and dynamic microscopy.
reduced model error.
This includes integrating data efficiently from
Enable automated smart characterization. different characterization techniques to provide
The use of active learning and Bayesian a more complete perspective on materials
methodologies in combination with predictive structure and function. Even with this promising
modeling during experimental characterization progress, there is still tremendous need for
can enable the efficient exploration of work that can bridge a number of critical gaps,

including delivering a set of open-source 6. References
petascale quantum simulation, data
assimilation, and data analysis tools for 1. Riordan, M. & Hoddeson, L., Crystal Fire:
functional materials design, within an approach The Invention of the Transistor and the
that includes uncertainty quantification and Birth of the Information Age, W. W. Norton
experimental validation and verification of AI & Company, 1998.
models (see Chapter 10, AI Foundations and
2. Sze, S. M., Physics of Semiconductor
Open Problems).
Devices, 2nd Edition, John Wiley and Sons,
New York, 1981.
Develop a workforce that can work across
domains. Existing and emerging training 3. Shockley, W., Electrons and Holes in
programs in chemistry and materials need to Semiconductors: With Applications to
be expanded to ensure a workforce that Transistor Electronics, D. Van Nostrand
understands AI approaches and how they can Company, Inc., 1950.
best benefit problems in chemistry and 4. Fuechsle, M. et al., A single-atom
materials discovery. transistor. Nat. Nanotechnol. 7, 242–246
(2012).
5. Expected Outcomes
5. Sumpter, B. G., Vasudevan, R. K., Potok,
Success in achieving autonomous-smart T., Kalinin, S. V., A bridge for accelerating
experiments will lead to transformative materials design. npj Comp. Mat. 1: 15008
advances in: (2015). DOI: 10.1038/npjcompumats.2015.8
6. Kalinin, S. V., Sumpter, B. G., & Archibald,
• The diversity of materials properties possible
R. K., Big-deep-smart data in imaging for
beyond the limits drawn by equilibrium
guiding materials design. Nat. Mater. 14,
thermodynamics or our imagination based
973–980 (2015).
on discovered design rules.
7. M. Ziatdinov, et al., “Building and exploring
• The realization of multifunctional and self-
libraries of atomic defects in graphene:
regenerating catalytic systems.
Scanning transmission electron and
• The control of interfaces optimized to scanning tunneling microscopy study,” Sci.
perform desired functions. Adv. 5:eaaw8989 (2019). DOI:
• On-the-fly materials and (bio)chemical 10.1126/sciadv.aaw8989.
design and synthesis.
• The discovery of unknown synthesizable
materials and complex chemical species
1000x faster and with desired properties.

26
02. Earth and Environmental Sciences
Earth and Environmental Sciences addresses Earth systems, facilitated by HPC capabilities.
some of the most pressing challenges in the Together, these vast observation and
nation, from natural resource utilization to simulation data offer unique opportunities
maintaining our infrastructure and environment. to apply AI approaches for improved
In particular, recent events have highlighted the understanding and scientific discovery in Earth
fact that our society is vulnerable to and environmental sciences. AI methods offer
increasingly frequent natural hazards, including the promise to accelerate development of
wildfire, drought and extreme precipitation advanced tools and the next generation of
events (Figure 2.1). An urgent need exists for technology for assimilating observations and
improving our predictive capabilities of earth data-driven forecasting.
and environmental systems, including physical,
chemical, and biological processes that govern 1. State of the Art
the complex interactions among the land,
atmosphere, subsurface, and ocean Applications of AI methods for Earth,
components from molecular to global scales, environment, and climate research are in their
and from daily to decadal time scales. infancy, but interest is growing rapidly as
our ability to collect and create data outpaces
our ability to assimilate, interpret, and
understand it [24,3]. Primary applications
include (1) knowledge discovery and
estimation; (2) data assimilation and data-
driven models; (3) model emulators, and
(4) hybrid process-/ML-based models that
integrate process scale data. Artificial neural
networks (ANNs) and deep neural networks
(DNNs) have been widely used for producing
weather forecasts (e.g., [8]), spatiotemporal
gap filling (e.g., [13]), and various remote
Figure 2.1 Billion-dollar weather and climate disasters for the sensing and geophysical image processing and
year 2018 [32].
analysis [3,21]. Random forest (RF) methods
are widely used to understand and interpret
In recent decades, Earth observation complex environmental data [1], as well as to
capabilities have been revolutionized, based on estimate environmental parameters such as
a suite of novel sensor, analytics and soil properties at the global scale [12]. In
telecommunication technologies. In particular, addition, unsupervised learning and clustering
DOE has pioneered integrated observational methods have been used to discover key
capabilities at the laboratory scale (e.g., EMSL, spatiotemporal patterns in large remote
SNS, ALS) and at field scales (e.g., NGEEs, sensing and simulation datasets (e.g., [14]).
ARM), as well as developed systems biology
databases (e.g., KBase) and data archives More recently, increasing interest in ML
(e.g., ESS-DIVE and ESGF). We now have applications have fueled development of
access to several hundred petabytes of emulators for environmental process
observational data of the Earth system in the models, particularly in the subsurface and
U.S. alone; most of them in real-time. In atmospheric sciences (e.g., [25,17,29]). New
parallel, predictive modeling capabilities have parameterizations based on ANNs have been
advanced significantly to simulate complex developed for representing stochastic
02. EARTH AND ENVIRONMENTAL SCIENCES 27

convection based on simulations from fully contamination (Figure 2.2). Our ability to
resolved cloud models [6,23,22] with similar assess the vulnerability from such changing
efforts for ocean modeling [4]. Earth, conditions, mitigate imposed risks, and
environment, and climate research is seeing respond rapidly to such events is limited by the
rapid acceleration in the use of AI for data fidelity of modeling and observational tools.
assimilation and for producing hybrid process- New advanced sensors coupled with edge
/ML-based models and physics-informed ML, computing capacity are now available for rapid
including “active learning” methods and data acquisition, but many challenges still exist
GANs [26,27]. for real-time data-model assimilation. New
tools are needed to accelerate the projection of
2. Major (Grand) Challenges weather extremes and their result impacts on
energy infrastructure and the built environment
Four grand challenges have emerged in the (i.e., buildings, roads, utilities) under changing
Earth, environment, and climate disciplines that environmental conditions. Efforts to build
could be revolutionized through application of resiliency to address evolving risks will benefit
AI methods and incorporation of burgeoning from data-driven approaches that integrate
data, leading to new scientific discoveries and smart sensing systems, built-for-purpose
advances in energy security, national security, models, large ensemble forecasts to quantify
and adaptation and resilience to extremes in uncertainty, and dynamic decision support
our changing environment. systems for critical infrastructure. The 10-year
goals include (1) development and under-
Project environmental risk and develop standing of predictive capabilities of Earth,
resiliency in a changing environment. environment, and climate models from sub-
Increasing risks are posed by changing seasonal to decadal scales; (2) development of
environmental conditions and increasing coupled datasets that are consistent across all
frequency of weather extremes on various components of the Earth, environment, and
aspects of our society and energy sector, climate; (3) development of purpose-built and
including detrimental effects of wildfires, floods, point-of-action forecast models of Earth,
droughts, wind, solar energy production and environment, and climate that are usable for
Figure 2.2 There are many ways environmental conditions and changes in the environment affect energy
systems [33].

estimating risk and resilience, and (4) the
scaling up of the observational capabilities of
extreme events.
Develop adaptive subsurface management

strategies for energy production and
storage, and waste isolation. The energy
security of our nation relies on the utilization of
subsurface reservoirs for energy production
and storage, carbon storage, and spent nuclear
fuel storage. We need to substantially increase
hydrocarbon extraction efficiency from
unconventional reservoirs; discover and exploit
hidden geothermal resources; reduce
environmental impacts, including induced
seismicity; dramatically increase geologic CO2
storage; and improve prediction of the long-
term fate and transport of contaminants.
However, our capabilities to assimilate existing
data to understand, reliably predict, and
adaptively control subsurface processes are
extremely limited (Figure 2.3). The subsurface Figure 2.3 AI/ML is required to connect multiscale data of
geomechanical-chemical-transport trapping mechanisms in
datasets and real-time data streams are Geological Carbon Capture and Sequestration for the case of a
typically uncertain, disparate, diverse, sparse, deep saline reservoir [34].
and affected by scaling issues. The physical
models of subsurface processes (e.g., flow, land surface, oceans, sea ice, land ice, and
storage, stress, chemistry) are incomplete, subsurface processes—to yield predictions of
uncertain, and frequently unreliable for making future climate and Earth system conditions
predictions. The 10-year goal would be under various scenarios of human factors such
seamless integration of multivariate data with as population, socioeconomics, and energy
real-time data streams into forecasts of system production and use. Accurate predictions
behavior with innovative visualization, including needed to close the carbon cycle require
the capability for predictive models to test understanding the responses of terrestrial and
various hypothetical operational and economic marine ecosystems to changes in temperature
scenarios, as needed, to guide operational and atmospheric composition and the
decisions in near real-time. feedbacks of those responses on the climate
system. Integral to this research is
Develop a predictive understanding of the characterizing the influence of water in
Earth system under a changing environ- mediating biological responses and in
ment. To advance the nation’s energy and transferring energy, carbon, and nutrients
infrastructure security, a foundational scientific across all components of the Earth system. In
understanding of complex and dynamic addition, leveraging advances in genomics and
biological, geochemical, and hydrological bioscience data promises to provide detailed
processes, and their interactions under understanding of plant/microbial functions and
environmental change, is required (Figure 2.4). their adaptation and feedbacks to the changing
The knowledge gained through this research environment. The 10-year goal would be
must be incorporated into models of the Earth leveraging AI methods for (1) assimilating large
system—designed to simulate atmospheric, volumes of continuous observations into

Earth System Model
Energy and Water Cycles Carbon and Biogeochemical
(ESM) Simulations
Cycles
Figure 2.4 Earth system models (ESMs) are designed to capture the behavior of interacting natural and anthropogenic
processes and to project future behavior as a result of changes in population, economics and policy, and strategies for future
energy production and use [31].
data-driven models and for optimizing

model parameters; (2) extrapolating sparse
measurements across space and through time
to characterize functional traits of biological
systems and dynamic processes important for
closing the carbon cycle, and (3) developing
hybrid process-based/ML models that improve
climate predictability and reduce uncertainty in
future projections.
Ensure water security under a changing

environment. Water resources are critical for
human health, energy production, food
security, and economic growth. The demand
for fresh water is increasing because of the
growing population and corresponding
consumption practices. However, water
availability and water quality are being Figure 2.5 Achieving predictive capabilities toward water
security requires the integration of diverse data from
impacted by climate change, extreme weather, multidisciplinary Earth sciences, including hydrology, ecology,
and disturbances such as wildfire, droughts, climate, geology, geophysics, geochemistry, and microbiology.
floods, and land-use change. Processes Adapted from [28].
affecting water quality and water availability
span multiple spatiotemporal scales from soil are (1) development and understanding of
microbiology to individual watersheds to predictive capabilities of water availability and
continental scale hydrology (Figure 2.5). water quality at the continental scale;
Therefore, water availability and water quality (2) development of modeling and sensing
cannot be adequately addressed locally or systems to obtain representative data across
regionally or within a single compartment. the range of scales and compartments;
Methods are needed to integrate disparate and (3) development of scale-relevant theories to
diverse multi-scale data with models of bridge scales for prediction and control, and
watersheds, rivers, and water utility (4) development of faster execution capabilities
infrastructure for near-real time prediction and to predict water availability and water quality
water management. The 10-year goals across scales.

3. Advances in the Next Decade Synchrotron light sources, neutron scattering
facilities, and electron beam imaging devices
Observations of the atmosphere, biosphere, are becoming sufficiently penetrating that they
ocean, land, and subsurface are expected to allow us to observe directly the reactivity
improve considerably over the next five to occurring inside a rock or other porous matrix
10 years. Increasingly, remote sensing data [18,30]. Increasing flux has improved the time
from satellite platforms is providing profiles resolution dramatically; however, this creates a
through the depth of the atmosphere, oceans, challenge in that the data sets derived are
and even soil layers over land and surface and massive and unwieldy [11]. Artificial
subsurface deformation. This increases the intelligence has the potential to allow us to
volume of data available several fold. Ground- quantify pore-fluid accessible surface areas of
based measurements are also rapidly different mineral phases, fluid-fluid contact
increasing in ubiquity and the variety of areas that allow us to measure residual
geophysical variables that are sensed. trapping efficiencies for carbon dioxide. The
Geophysical methods continue to advance increasing complexity of systems capable of
their capabilities particularly due to massive being measured is directly matched by
amounts of data collected by dense sensor dramatic increases in the size and complexity
deployments, or rapid advancements in fiber of the associated data, necessitating new
optic sensing. Observations collected on approaches to analysis and interpretation [9,7].
robotic platforms that navigate remote oceans
and cheap sensors in everyday devices Simulation data sets are also increasing in
(Internet of Things) are becoming increasingly size. Earth system modeling is a key
common for collecting environmental data (see application targeted by upcoming DOE
Chapter 15, AI at the Edge). These types of exascale computing platforms. The resolution
sensors excel in the frequency of data of Earth system models, such as the DOE
collection and produce copious amounts of E3SM model, is increasing toward resolving
data. Data from the next generation of mesoscale extreme weather phenomena and
accelerators and light sources will be another eddy processes at the ice–ocean interface, and
source of large datasets in the next few ultimately toward full cloud resolving models.
years (see Chapter 14, AI for Imaging). Resolving detailed processes in these
Characterizing the properties of atmospheric growingly complex models produces large
particles, biogeochemistry of soil cores, and increases in model output; however, I/O
geochemical processes at nano and atomic bandwidth limitations of high-performance
scale in controlled laboratory settings will be computing platforms will increasingly limit these
widely available [20]. Recently, the ability to high-frequency, highly resolved data from
collect extensive, multimodal genomic and being saved for later analysis. This points to
microbial data across spatiotemporal scales both the need for online intelligent feature
and tools enabling precise modification and extraction to reduce simulation data [15], and
control of biological (see Chapter 3, Biology to deploy ML and statistical analysis algorithms
and Life Sciences) and environmental systems in situ within simulation codes to analyze
have significantly expanded and are expected output as it is being generated [2].
to enhance multifold in the future.
4. Accelerating Development
Below ground datasets are challenging to
collect as most of the available data are based While the amount of data collected on the
on in situ measurements, soil core collection, various components of the Earth system are
and subsequent laboratory analysis, and increasing, the ability to use these data in
limited remote sensing technologies [16]. developing improved forecasts and model

development has not kept pace. Numerical (e.g., subsurface structure and subsurface
model development is constrained by the ability flow) there are no ground truth observations.
of scientists to evaluate the data, develop and
test hypotheses, and produce new models. Spatial and temporal heterogeneity. Earth
Earth and environmental data for global and environmental processes have large
change presents a challenge to ML methods spatiotemporal variability, which is highly
because the dimension of the data (e.g., spatial correlated and structured. The data often have
resolution) can be much greater than the nonlinear relationships, feedbacks, non-
number of data samples (e.g., time slices). stationary features, and low frequency high
Data are often multiscale, can be irregularly impact events.
distributed (point cloud or unstructured mesh
data), and in some cases can be sparse or Environmental forecasting is complex and
missing in random ways (measurement bias). uncertain. Environmental projections are
Data are usually correlated across large developed using complex, coupled, nonlinear
distances in space and time, which presents systems representing different components of
challenges for traditional ML methods that the Earth system. This makes the projections
assume independent samples or that assume from these models uncertain, with uncertainties
spatial regions can be analyzed independently. propagated from data, model structural and
Computer science innovations will be required model parameters. It is necessary to
both in training algorithms and in distributed characterize these uncertainties and increase
computation. Thus, to accelerate development, the credibility of the projections to support
the following issues specific to environmental decision making.
datasets will have to be addressed.
What must we do to accelerate develop-
Multi-scale data. Earth and environmental ment?
data are often available from different
sources (such as satellite sensors, in situ a) Develop AI approaches to improve and
measurements and model simulations, and, optimize data acquisition, including sensor
increasingly, robotic sensors) and at varying network optimization, data compression
spatial and temporal resolutions, exhibiting and edge computing (see Chapter 10, AI
different characteristics (such as sampling Foundations and Open Problems).
frequency and accuracy). b) Establish the protocols and tools to allow
access, transfer, curation, quality control,
Noisy, missing, and uncertain data. Earth and maintenance of public datasets that
and environmental data show different degrees can dynamically be coupled with the
of noise, incompleteness, and uncertainty. model/simulation systems (see Chapter 12,
Satellite sensors can be noisy with clouds and Data Life Cycle and Infrastructure).
snow cover; sensors may temporarily fail
causing missing data; and some environmental c) Develop supervised, semi-supervised, and
variables can be measured only indirectly from unsupervised AI systems for multiscale
other observations or model simulations with multi-type data (see Chapter 10, AI
uncertainty. Foundations and Open Problems).
d) Relieve the bottlenecks in processing
Shortage of labeled data with ground truth. petabytes of data and speeding up the
Collecting high-quality and high-resolution entire model development and training
Earth and environmental data is very algorithms, by exploiting effective
expensive and time-consuming, and for some CPU/GPU communication patterns (see
environmental variables and processes Chapter 13, Hardware Architectures).

e) Develop AI-enabled automated approaches • Apply automatic labeling and reduction of
for model development and hypothesis environmental datasets at various spatial
testing which can provide improved insights and time scales.
on physics, chemistry, and biogeochemistry • Advance explainability of AI approaches for
(see Chapter 10, AI Foundations and Open modeling EC phenomena and avoiding the
Problems). “black box” conundrum.
f) Develop fully AI/physics-coupled models • Develop a hybrid approach to combine AI
that can ingest massive data and honor with physical principles for EC models, and
mass/energy conservations, and other develop robust explainable AI software for
physical principles (see Chapter 10, AI training and validating hybrid models
Foundations and Open Problems). (Figure 2.6).
What are the top priorities? • Develop robust and consistent protocols for
testing the transferability and reproducibility
• Develop AI-assisted data acquisition of AI models across a wide range of
strategies associated with new robotics and conditions.
in situ sensors. • Advance uncertainty quantification
• Develop consistent and high throughput methodology as an integral part of the AI
data access, compression and transfer workflow.
software for the variety of Earth and
environmental science datasets.
Figure 2.6 Hybrid approach that combines AI with physical understanding to address some of the black box issues and
make the models physically consistent: (a) shows a multilayer neural network, with n the number of neural layers and m
the number of physical layers; b and c are concrete examples of hybrid modelling; (b) prediction of sea-surface
temperatures from past temperature fields; (c) a biological regulation process (opening of the stomatal ‘valves’
controlling water vapor flux from the leaves) is modelled with a recurrent neural network [24]. Hybrid models are useful
for replacing poorly understood or unresolved (sub-grid scale) phenomena. Challenges include (a) obey physical
constraints; (b) quantify uncertainties in the parameters in the network models; and (c) develop methods for adding
explanation to the network models and parameters. Training hybrid models using offline or online methods need
exploration.

How do we improve scale? 6. References
Most of the applications in the literature were
performed with small datasets in the range of 1. Basu, S., Kumbier, K., Brown, J. B., & Yu,
gigabytes. Handling large data throughputs will B. (2018). Iterative random forests to
be necessary to fully realize the potential of AI, discover predictive and stable high-order
requiring scaling up of computational infra- interactions. Proceedings of the National
structure and the ability of the AI algorithms to Academy of Sciences, 115(8), 1943-1948.
handle large volumes of data. As these data
2. Baydin, A. G., Shao, L., Bhimji, W.,
volumes grow over time, data cannot be kept in
Heinrich, L., Meadows, L., Liu, J., & Ma, M.
memory continuously for retraining ANNs.
(2019). Etalumis: Bringing Probabilistic
Thus, we need AI algorithms that scale in
Programming to Scientific Simulators at
terms of intelligence while processing very
Scale. arXiv preprint arXiv:1907.03382.
large data volumes out-of-core.
3. Bergen, K. J., Johnson, P. A., Maarten, V.,
Scalability will be an important challenge, & Beroza, G. C. (2019). Machine learning
potentially requiring a move toward streaming for data-driven discovery in solid Earth
analysis methods adapted to spatially and geoscience. Science, 363(6433),
temporally correlated data. When conducted eaau0323.
online in conjunction with physics simulations, 4. Bolton, Thomas, and Laure Zanna.
additional scalability challenges will arise due “Applications of deep learning to ocean
to incompatibilities between traditional AI data inference and subgrid parameteriza-
distributed training techniques and distributed tion.” Journal of Advances in Modeling
computation for physics simulations, requiring Earth Systems 11, no. 1 (2019): 376-399.
new, potentially domain-specific algorithms.
5. Brantley, S. L. (2018) Shale Network
5. Expected Outcomes Database, Consortium for Universities for
the Advancement of Hydrologic Sciences,
Success in these areas means: Inc. (CUAHSI). DOI: 10.4211/his-data-
shalenetwork
• AI will revolutionize the development of
6. Brenowitz, N. D., & Bretherton, C.
process scale models by accelerating the
S. ( 2018). Prognostic validation of a neural
process of discovery and model creation.
network unified physics parameterization.
• AI will enable rapid prototyping of purpose- Geophysical Research Letters, 45, 6289-
built models of Earth system processes and 6298. https://doi.org/10.1029/2018GL07851
energy/built infrastructure that will enhance
7. Cherukara, M. J., Nashed, Y. S. G., &
national energy and water security
Harder, R. J. Real-time coherent diffraction
preparedness.
inversion using deep generative networks
• AI will make it feasible to merge large (2018). Scientific reports 8(1), 165230.
datasets with numerical models for a new
8. Collins, W. & P. Tissot. An artificial neural
generation of predictive models that can
network model to predict thunderstorms
span the forecast scale from daily to decadal
within 400 km2 South Texas domain,
and local to global.
Meteorological Applications 22, no. 3
(2015): 650-665.
9. Deng, J. et al.. Correlative 3D
x-ray fluorescence and ptychographic
tomography of frozen-hydrated green algae
(2018), Sci. Adv.4(11) eaau4548(1-10).

10. Flinchum, B. A., et al. Critical Zone 18. Li, Z., et al. (2016) Searching for
Structure Under a Granite Ridge Inferred anomalous methane in shallow
From Drilling and Three-Dimensional groundwater near shale gas wells. J.
Seismic Refraction Data. (2018) J. Contam. Hydrol. 195, 23-30. DOI:
Geophys. Res.: Earth Surf. 123 (6), 1317- 10.1016/j.jconhyd.2016.10.005
1343. 19. Lin, H.W., Tegmark, M., & Rolnick, D. Why
11. Godinho, J. R. A., Gehrke, K. M., Stack, A. Does Deep and Cheap Learning Work So
G., & Lee, P. D. (2016) The dynamic nature Well? J Stat Phys (2017) 168: 1223.
of crystal growth in pores. Sci. Rep., 20. Ling, F. T., et al. (2018) Nanospectroscopy
6:33086. DOI: 10.1038/srep33086 Captures Nanoscale Compositional
12. Hengl, T., et al. (2017). SoilGrids250m: Zonation in Barite Solid Solutions. Sci.
Global gridded soil information based on Reports,8:13041. DOI:10.1038/s41598-
machine learning. PLoS one, 12(2), 018-31335-3
e0169748. 21. Nogueira, K., Penatti, O. A., & dos Santos,
13. Krasnopolsky, V., Nadiga, S., Mehra, A., J. A. (2017). Towards better exploiting
Bayler, E., & Behringer, D. (2016). Neural convolutional neural networks for remote
networks technique for filling gaps in sensing scene classification. Pattern
satellite measurements: Application to Recognition, 61, 539-556.
ocean color observations. Computational 22. O’Gorman, P. A., & Dwyer, J. G. ( 2018).
Intelligence and Neuroscience (2016): 29. Using machine learning to parameterize
14. Kumar, J., Mills, R. T., Hoffman, F. M., & moist convection: Potential for modeling of
Hargrove, W. W. (2011). Parallel k-means climate, climate change, and extreme
clustering for quantitative ecoregion events. Journal of Advances in Modeling
delineation using large data sets. Procedia Earth Systems, 10, 2548–2563.
Computer Science, 4, 1602-1611. https://doi.org/10.1029/2018MS001351
15. Kurth, T. et al. (2018) Exascale deep 23. Rasp, Stephan, Michael S. Pritchard, and
learning for climate analytics. In Pierre Gentine. “Deep learning to represent
Proceedings of the International subgrid processes in climate models.”
Conference for High Performance Proceedings of the National Academy of
Computing, Networking, Storage, and Sciences 115, no. 39 (2018): 9684-9689.
Analysis, pp. 51. IEEE Press,. 24. Reichstein, M., et al. (2019). Deep learning
16. Laanait, N., He, Q., Borisevich, & A. Y. and process understanding for data-driven
Reconstruction of 3-D Atomic Distortions Earth system science. Nature, 566(7743),
from Electron Microscopy with Deep 195.
Learning. arXiv, [cond-mat.mtrl-
25. Scher, S. (2018). Toward Data-Driven
sci]arXiv:1902.06876v1 19 Feb 2019
Weather and Climate Forecasting:
17. Liu, Y., Sun, W., & Durlofsky, L. J. (2019). A Approximating a Simple General Circulation
deep-learning-based geological parameter- Model With Deep Learning. Geophysical
ization for history matching complex Research Letters, 45(22), 12-616.
models. Mathematical Geosciences, 51(6),
725-766.

26. Schneider, T., Lan, S., Stuart, A., & 30. Zachara, J., et al. (2016) Internal Domains
Teixeira, J. (2017). Earth system modeling of Natural Porous Media Revealed: Critical
2.0: A blueprint for models that learn from Locations for Transport, Storage, and
observations and targeted high-resolution Chemical Reaction. Environ. Sci.
simulations. Geophysical Research Letters, Technol. 50, 2811-2829 DOI: 10.1021/
44, 12,396-12,417. https://doi.org/10.1002/ acs.est.5b05015
2017GL076101 31. Hoffman, F. M., et al. (2017. International
27. Tartakovsky, M., C. Ortiz Marrero, P. Land Model Benchmarking (ILAMB) 2016
Perdikaris, G. D. Tartakovsky, and D. Workshop Report, Technical Report
Barajas-Solano, “Learning Parameters and DOE/SC-0186, U.S. Department of Energy,
Constitutive Relation- ships with Physics Office of Science, Germantown, Maryland,
Informed Deep Neural Networks,” arXiv USA, doi:10.2172/1330803.
e-prints, p. arXiv:1808.03398, Aug 2018. 32. https://www.ncdc.noaa.gov/billions/
28. Varadharajan et al., “Launching an 33. http://www.energy.gov/downloads/usenergy
Accessible Archive of Environmental Data,” -sector-vulnerabilities-climate-change-
Eos, vol. 100. 2019. andextreme-weather
29. Wang, J., Balaprakash, P., and Kotamarthi, 34. Zarzycki, P. Towards understanding of
R. (2019). Fast domain-aware neural Reactive Interfaces in Geological CO2
network emulation of a planetary boundary Sequestration, RIGECO, ERC-2015-CoG
layer parameterization in a numerical Proposal 682274, September 2015.
weather forecast model, Geosci. Model
Dev., 12, 4261–4274, https://doi.org/
10.5194/gmd-12-4261-2019.

03. Biology and Life Sciences
The capacity to predict, control, and However, realizing the future potential of AI-
understand biological systems in mechanistic, enabled bioscience is impeded by limitations of
often molecular detail, is on the horizon. the computational learning frameworks that
Biology is being transformed by the ability to exist today. AI must be predictive of complex
collect large, multimodal data across phenomena and simultaneously provide insight
spatiotemporal scales, as well as by tools that into the underlying biophysical processes they
enable precise modification and control of model [5]. Analyses that enable understanding
biological and environmental systems. have something in common: they are
amenable to human exploration, statistical
Concomitant advances in data analysis, ML, inference, and model discovery and selection.
and new hardware architectures, coupled with For example, a perfect, atomistic generative
HPC-enabled simulations are transforming our model of a bioreactor could be available, yet, if
capacity to connect molecular interactions to that model is not amenable to goal-based
higher levels of organization, from cells to optimization—to inverse design—its utility is
ecosystems. To deliver on the promise of limited to “guess and check” prediction. In
emerging technologies—to offer personalized biology, guess-and-check prediction is often
medical solutions by developing and testing useless (if we had a strong hypothesis about
mechanistic hypotheses in tractable laboratory the system as a whole, we wouldn’t be
or in silico settings—requires fundamental resorting to AI in the first place). Furthermore, if
advances in statistical ML and AI that the model is as complex as the system itself,
integrate massively multiscale and multimodal have we really learned how it works, or merely
sensing modalities. demonstrated the ability to replicate it in silico?
1. State of the Art These challenges are particularly clear in

healthcare, one of the fastest growing
The dawn of AI-enabled discovery in biology segments of the digital universe, which is
has occurred. Population genomics data is expected to reach 2,314 exabytes of data by
being used to learn the bases of complex traits, 2020 [6]. While the average lifespan in the U.S.
enabling researchers to discover non-linear (79 years) has increased 30 years over the
molecular and gene-regulatory interactions past century, medical research has been less
along with the architecture of their phenotypic successful at prolonging healthy life (i.e., health
manifestations [1]. Elsewhere, neuroscientists span). Prolonging our lifespan without
are learning the dynamics of the thousands of prolonging our health span is financially
neurons that control behavior from electrical unsustainable for our nation (total costs of age-
and imaging data [2]. Synthetic biologists are related diseases are expected to skyrocket,
building workflows that automate the inverse exceeding $1.5 trillion in the U.S. by 2030). AI
design of microbial and plant cells [3]. could offer powerful solutions to these
Computational biologists are using AI to learn challenges by enabling powerful utilization of
force fields to enable near-exact molecular rapidly accumulating health data. This
dynamic (MD) simulations with fully quantized ambitious endeavor requires data-driven
electrons and nuclei [4]. Such analyses were mapping of the human genome (i.e., genomic
intractable only a few years ago, and now the profile), phenome (i.e., physiologic status), and
pace of innovation driven by AI technologies exposome (i.e., physical and social
is accelerating. environment) in real-time and across the
human lifetime. It is clear that the state of the
03. BIOLOGY AND LIFE SCIENCES 37

art falls far short of what the economy and
society requires to survive.
2. Major (Grand) Challenges

Biological systems are dynamic processes
characterized by combinatorically vast
configuration spaces and the presence of
emergent control principles at multiple levels of
spatiotemporal organization. The overarching Figure 3.1 AI can revolutionize synthetic biology if applied
challenge before us is to enable the wisely. AI can help systematically choose molecules that fit a
mechanistic characterization of biological desired specification, and propose possible pathways and
hosts to synthesize it. AI can help power self-driving labs able
systems through increasingly automated cycles to collect high-quality, abundant data needed for ML to be
of multimodal observation followed by effective. AI can complement mechanistic models to accurately
experimentation. We see this manifest in three simulate and model cells in a variety of environments. This can
make production scale-up a predictable endeavor, a process
grand challenges. that is presently more an art than a science.
Build the capacity to design custom Chapter 7: Engineering and Manufacturing,

biological systems capable of addressing and Chapter 11: Software Environments and
major global health and environmental Software Research). Digital twins of cells could
challenges – “build life to spec.” Synthetic be created by combining traditional
biology leverages engineering approaches to mechanistic models with AI algorithms,
produce biological systems to a given leveraging the predictive capabilities of the
specification (e.g., producing a target drug [7] latter and the insight of the former. Importantly,
or the capacity to invade cancer cells [8]). the ability to modify cells through synthetic
Tools are available that promise to disrupt this biology tools brings about the possibility of
field: clustered regularly interspaced short validating and sequentially constraining such
palindromic repeats (CRISPR)-enabled genetic models systematically. It is reasonable to
editing, high-throughput multi-omics pheno- anticipate that obtaining first-principles models
typing, and exponentially growing DNA for biological systems will soon be on
synthesis capabilities, among others. However, the horizon.
synthetic biology will only reach its full potential
when we have developed the capability to Foundational challenges remain—even the
predict the behavior of biological systems, to “vocabulary” of biological systems is
develop first principles models, and to observe resplendent with “dark matter” [11]. Despite
biological systems with much finer spatial and progress in the past decade, the fundamental
temporal resolution [9]. AI can provide the challenge of systematically exploring small-
required predictive power, and improved molecule chemical space to find new
methods for interrogating fitted AI learners may applications and biological knowledge remains
ultimately facilitate the detailed mechanistic largely unsolved. At least a third of sequenced
understanding needed to support synthetic genes across organisms are of completely
biology (Figure 3.1). unknown function. In meta-metabolomics
experiments, it is rare that more than 5 percent
“Digital twins” of organisms will be a key of mass-spectra can be identified. To
enabling technology toward the capacity to understand how organisms operate in
design organisms to specifications— ecological and environmental contexts,
quantitatively modeling and simulating the learning the molecular vocabulary of life is a
behavior of complex biosystems [10] (see also prerequisite. Using AI to integrate multi-omics

data constitutes an opportunity to accelerate codesign of algorithms and automated systems
the discovery of function for these “dark” for data collection therefore arises as a need
molecules. rather than a luxury [14] (see “self-driving
labs” below).
Bioimaging technologies are rapidly improving
in resolution and dynamic range. CryoEM Learn to systematically manage and
tomography enables atomistic modeling of engineer global environmental systems by
complex macromolecules [12]. However, extant obtaining a predictive understanding of
tools for learning these models from low signal- ecosystems and their services. Attempting to
to-noise cryoEM data rely heavily on real-time understand how ecosystem services emerge
inputs from human operators. Work stations for from organismal and environmental
state-of-the-art microscopes must be proximal interactions is central in environmental and
to the scopes themselves for guidance during biomedical sciences. The mechanisms behind
experiments derived from human insight and carbon, nitrogen, phosphorous, potassium, and
intuition. Beyond cryoEM, hyperspectral micronutrient cycles are determined by the
imaging provides increasing power to discover integration of microbial, plant, fungal,
biomarkers and elucidate chemistry without metazoan, and viral interactions that, despite
appeal to destructive omics modalities. With decades of quantitative ecology, remain
advances in AI, real-time imaging of challenging to predict, or even quantify. In
biochemical processes and landscapes in living recent years, our capacity to measure the
samples is on the horizon [13]. Machine vision ecology, chemistry, and hydrology that give
for low signal-to-noise technologies, as well as rise to nutrient dynamics has evolved
tensor-on-tensor regression strategies to exponentially—metagenomics, untargeted
translate between hyperspectral and omics chemistry, hyperspectral imaging, satellite
modalities are needed to radically increase the imaging, in situ sensing, and soon quantum
automation, throughput, and discovery-power sensing systems. The challenge is discovering
of bioimaging technologies. mechanistic models that are amenable to
inverse design, thus enabling intervention at
The modeling of biomechanical systems pose scales relevant to engineering our troposphere.
similar challenges—and opportunities. For
example, vascular flow simulations to With growing global demand for fuel, food,
understand the fluid dynamics that result in water, and predictable weather, learning to
aneurysms and other anatomical anomalies engineer ecosystems has become urgent. In
are poised to deliver early prognosis of patient the U.S. alone, there are more than 1.1B acres
risks that, today, are rarely detected prior to of managed lands [15]. In the last 100 years,
pathogenesis. Coupling physical simulations to some 50% of soil carbon has been depleted
AI “hypervisors” to guide variable mesh through land use practices and soils [16].
resolution stands to radically accelerate the Before us is the unprecedented opportunity to
modeling of complex biophysical systems. The transform our managed lands into engines for
same technologies will enable the study of fluid environmental control. Atmospheric carbon is
dynamics in bioreactors, or cell-free systems rising—an opportunity to mine this carbon
for chemical or pharmaceutical production— presents a trillion dollar opportunity—to enrich
where understanding fluid dynamics and our soils with labile carbon, enhancing the
diffusion is essential to achieving efficiency. fertility, and therefore the value of our
farmlands, to render marginal lands fertile, to
More generally, using AI to design organisms grow our economy, and to feed our future
to a given specification requires large amounts population. In our depleted soils, prescriptions
of high-quality data. We cannot produce these of chemical fertilizers are overused, which
data without leveraging automation. The pollute our fresh and marine waters, leading to

algal blooms and marine dead zones. The soil- macroscale measurements need to be
water interface is equally important, and constructed that provide multiscale under-
currently extremely difficult to simulate or standing of cellular and community functions.
model with any accuracy—and it is essential to These capabilities are required to predict,
understand if we are to intelligently manage control, and understand the biological pro-
marine and freshwater algal blooms, and to cesses underlying productivity, health, disease,
ameliorate marine dead zones. and bio-resilience to environmental conditions.
AI technologies can reveal the emergent AI technologies aimed at ecosystem control

controls of these enormously complex systems, have enormous implications for human health
and enable us to engineer our environment to and biomanufacturing as well. For example, the
radically expand the range of arable lands, challenge of efficiently scaling up reactor
while improving our freshwater availability and results from lab-scale (50 ml) to commercial
quality—in part, by replacing our dependence volumes (10,000 l) requires understanding the
on chemical fertilizers with designed plant and biodynamics that lead to stable production. The
microbial biosystems. The modeling of need to identify and understand meaningful
macroecology, and cognizance of impacts on levels of organization in biological systems
species distributions and clines, are required from a control perspective is exceptionally clear
for the intentional design and engineering of in the biomedical sciences. Diseases are
ecological processes. Such models may caused by small-scale disruptions (e.g., genetic
ultimately reveal control principles for natural mutations) that manifest at larger scales.
ecosystems, enabling the responsible Effective medical treatments require
stewardship of our managed wildlands. identification, prediction, and control of
Moreover, these ambitions require rigorous biological processes. However, most small
biosecurity, which itself is a design problem scale processes are currently immeasurable in
ripe for AI-powered guidance. AI for biosecurity humans at the required time-scales
will be under exquisite scrutiny—these (Figure 3.2). For example, in the brain, while
applications are likely to push the development meso/macroscale measurements (e.g.,
of secure, explainable AI systems with rigorous electrocorticography (ECoG), functional
statistical guarantees. magnetic resonance imaging (fMRI)) have
revealed principles of global processing of
Integration of AI with experimentally brain areas in humans, the precise biophysical
constrained, large-scale, biophysically detailed mechanisms that relate these signals to the
simulations will be required to refine and activity of individual neurons is unclear. This
construct forward and inverse models—a impedes translation between basic
particular challenge in the biological and neuroscience findings and our understanding
environmental sciences, where most of the human brain in health and disease,
knowledge is stored only in the literature. Novel including dementia. To overcome these
methods are required for extracting and challenges, AI is needed that can discover
organizing knowledge in constructs compatible nonlinear “governing equations” from high-
with guiding learning in AI architectures, dimensional, noisy time-series data with
resulting in biologically meaningful discoveries. unobserved influences to bridge the gap
between observed processes and those that
Throughout the biological and environmental require control.
sciences, forward and inverse models of meso/

As in the manufacturing sector, we will soon
see the effects of automation and robotics
throughout biology. Industrial robots featuring
high-quality computing capabilities, improved
operational mobility, and machine vision
systems are needed for future laboratories,
particularly in synthetic biology, where goals
will include genetic engineering toward an
optimized design specification.
The future of these highly automated

laboratories coupled with autonomous robots
with enhanced dexterity must be intricately
connected with the advancement of our
application of AI to the challenges outlined
Figure 3.2 Biological systems, including humans, constitute
above. Self-driving laboratories will require tight
the integration of many levels of spatiotemporal organization. coupling with advanced AI models capable of
AI technologies hold promise to enable the systematic representing complex biology far beyond what
discovery of the manifestations of molecular interactions and
processes on higher levels of physiological organization.
is possible today. Current AI approaches, such
as model validation, uncertainty quantification,
Develop AI-enabled, self-driving labora- and active learning are relatively immature and
tories to enable game-changing advances will need to be common throughout our science
in the understanding and deployment of to drive the execution of laboratory
biological, chemical, and environmental experiments—for example in molecular biology
systems. Fundamental to the role of AI reactions, chemical reactions, and high
in Science, and in particular biological, resolution imaging—to a continuous feedback
chemical, and environmental sciences, is the loop of data in the coming years.
advancement of laboratories through
automation and decision support (Figure 3.3). 3. Advances in the Next Decade
In the next 10 years, it will be possible to
automate the process of biological discovery
on unprecedented scales. The promise of self-
driving laboratories is exceptional, and may
underlie our capacity to achieve many other
grand challenges: AI algorithms that design
optimal experiments to reduce model
uncertainty and constrain their own constructs
toward learning mechanisms and robotically
Figure 3.3 AI-enabled self-driving laboratories couple auto- perform the experiments, the improvement of
mated robotics platforms for experimentation and data reducibility and reductions in cost, and time-to-
collection, with AI systems that choose not only the parameters
for the next experiment but also the hypotheses to be tested.
discovery. Equally importantly, AI amenable to
Figure adapted from Häse et al., Trends in Chemistry 2019. inverse design will enable “hypothesis
discovery” and reduce our collective reliance
However, laboratory automation without on human intuition, with the potential to
carefully guided experimental design will accelerate the pace of biological and environ-
contribute to the aggregation of low-value data. mental sciences by orders of magnitude.

However, there are two significant advances matical foundations of constrained representa-
that need to be achieved. Because much of tion learning as it relates to the geometry of
modern biology is not in the “big data” regime, loss-surfaces during training (hence the
model training must become more data- learnability) and inference (hence the
efficient than it is today. Additionally, generalizability) need substantial attention.
scientifically meaningful insights from the fitted While these initial steps are promising, much
AI must be extracted. more work is required in these and other areas
of AI.
First there is the need for data efficiency.
Alternative learning approaches need to be To obtain complete descriptions of what class-
developed that do not require highly leading deep neural networks learn from data
overparameterized models for optimization, but during training will require new mathematics.
that still admit the capabilities of neural This is a daunting prospect for a community
networks—the ability to extract hierarchical built on yoking tools from statistics, numerical
representation from raw data (essential when optimization, linear algebra, and dozens of
data inputs lack semantic meaning, as with other areas, not on developing novel theory.
imaging data) and exquisite generalization Here, biology may be as useful to the future of
accuracy. Here, biology may provide inspiration AI as AI is to biology. There is an opportunity to
for AI methods development: biomimetic draw inspiration from the remarkable
systems are needed that radically expand the adaptability and self-regularization of biology to
domain of transfer learning and one-shot or produce the next generation of AI algorithms
few-shot learning. For the foreseeable future, and hardware. Blue sky research into
there will remain scientific regimes in which alternative learning automata that reduce the
dozens or hundreds of observations derive only initial ambient dimension of high-performing
from enormous community efforts. In learning architectures is urgently needed for
ecosystems biology, for example, metage- applications in the biological sciences. New
nomics and other molecular surveys will likely learning “atoms” will undoubtedly come with
remain ultra-sparse at landscape scales. In new hardware requirements, and blue sky
biomedical sciences, in which phase 1 clinical research has the potential to advance in both
trials are an essential data point in the lifecycle directions simultaneously.
of a novel therapeutic or procedure, methods
amenable to “small data” regimes will continue 4. Accelerating Development
to be required.
Biological datasets must scale in their quantity,
Second, scientists will need methods to extract quality, and provenance. We need increased
scientifically meaningful insights from what an standardization of measurement techniques
AI model has learned from the data. Two and metadata collection across the
complementary approaches are emerging. In biosciences, and reconceptualized data
one approach, human-understandable reduced sources as streams rather than the result of
order surrogate models (ROSMs) are extracted single experiments. The lack of data is by far
from more complicated models that accurately the largest threat to the dawn of strongly
represent what an AI algorithm has learned AI-enabled biology.
during training. In a second approach, scientific
knowledge and constraints are imparted to the Further, data availability faces special
architecture or objective function to ‘focus’ the challenges in the biomedical sciences. We
learned representations so they are must establish the infrastructure required to
scientifically interpretable. There has been make communal use of data that cannot be
initial success in both the physical and moved or revealed due to privacy concerns. An
biological sciences in this direction. The mathe- outstanding issue for sensitive domains, such

as health and medicine, is how to preserve given specifications is within reach. Success
privacy while computing with shared data to in the development of AI for biology can
obtain insights. Removing personal identifiers transform our farmlands into an engine for
and confidential details is insufficient, as an soil security and the economic development
attacker can still make inferences to recover of rural America. AI has the potential to
aspects of the missing data. Inference attacks extend the average human life, while
can also jeopardize AI algorithms over shared significantly reducing healthcare costs.
data by targeting the shared AI model training • The potential impacts of AI technologies for
process and the trained model itself. Indeed, health are difficult to overstate. Studies
serious threats are encountered in collective AI estimate that every federal dollar invested to
endeavors that aggregate data from different map the human genome returned $60–$140
sources, since the most vulnerable source to the U.S. economy [17]. By leveraging
establishes the overall security level. This is an federal health data assets, DOE’s
underdeveloped field of AI research in which computing capabilities, and AI, novel
Research & Development (R&D) investments solutions can be developed to extend health
are well warranted to develop new solutions so span and rein in costs by understanding the
that the community can responsibly and broad spectrum of factors impacting well-
privately share sensitive data for aggregated being and discovering cost-effective
analysis, including training shared AI models, approaches to scale promising precision
and performing transfer learning with medicine solutions.
sensitive data.
• Impacts on synthetic and environmental
With the support of other federal agencies, the biology will become increasingly apparent as
DOE national laboratories could provide a the AI technologies are developed to
secure environment for objective benchmarking understand how ecosystem services emerge
of AI algorithms against community consensus from biological processes. AI capabilities,
metrics to detect, monitor, and possibly correct coupled with retrobiosynthesis tools from
dataset biases or inconsistent AI technology synthetic biology pinpointing genetic and
performance. First, investment is needed in molecular controls of complex traits, can
foundational technologies to promote a dramatically change the time scales for
rigorous statistical framework to monitor for product realization, whether that product is a
potential biases or inaccuracies in collected biofuel, a soil amendment, or the founda-
data. During the deployment phase, rigorous tional understanding of a natural ecosystem.
quality control should be implemented,
monitoring AI performance across subgroups 6. References
to confirm robust performance or identify
performance gaps. 1. Garcia, B. J. et al. Phytobiome and
Transcriptional Adaptation of Populus
5. Expected Outcomes deltoides to Acute Progressive Drought and
Cyclic Drought. Phytobiomes Journal.
Throughout the biosciences, ultimately, the (2018) 2(4), 249-60.
expected outcome is an understanding of life, 2. Bouchard K. E., et al. Union of Intersections
from the ground up. (UoI) for Interpretable Data Driven
Discovery and Prediction. Advances in
• Mining excess carbon from the atmosphere, Neural Information Processing System.
revolutionizing human health, and (2017) 30:1078-86.
engineering microbes and ecosystems to

3. Lawson C. E., et al. Common principles and 11. Blaser, M. J., et al. Toward a predictive
best practices for engineering microbiomes. understanding of Earth’s microbiomes to
Nat Rev Microbiol. 2019. Epub 2019/09/25. address 21st century challenges. Am. Soc.
doi: 10.1038/s41579-019-0255-9. PubMed Microbiol. (2016) doi: 10.1128/mBio.00714-
PMID: 31548653. 16.
4. Chmiela, S., et al. Towards exact molecular 12. Allegretti, M., et al. Horizontal membrane-
dynamics simulations with machine-learned intrinsic α-helices in the stator a-subunit of
force fields. Nat. Commun. 9. 3887 (2018). an F-type ATP synthase. Nature 521, 237-
240 (14 May 2015).
5. Murdoch, W. J. et al., Interpretable machine
learning: definitions, methods, and 13. Hermes, M., et al. Mid-IR hyperspectral
applications. arXiv preprint. 2019. imaging for label-free histopathology and
cytology. J. Optics 20(2) (24 Jan 2018).
6. Harnessing the Power of Data in Health.
Stanford Medicine Health Trends Report. 14. Carbonell, P., T. Radivojevic and H. G.
2017. Martin. Opportunities at the Intersection of
Synthetic Biology, Machine Learning, and
7. Paddon, C. J., et al., High-level semi-
Automation. ACS Synth. Biol. 2019, 8, 7,
synthetic production of the potent
1474-1477 (19 July 2019).
antimalarial artemisinin. Nature 496 528-
532 (25 April 2013). 15. Census of Agriculture: Summary and State
Data. United States Department of
8. Anderson, J. C., et al. Environmentally
Agriculture. 2007.
controlled invasion of cancer cells by
engineered bacteria. J. Mol. Biol. 16. Lal, R. Soil carbon sequestration to mitigate
355(4):619-27 (27 Jan 2006). climate change. Geoderma 123(1-2):1-22
(Nov 2004).
9. Gardner, T. S. Synthetic biology: from hype
to impact. Trends Biotechnol. 31(3):123-5 17. Hood, L. and L. Rowen. The Human
(2013 Mar). Genome Project: big science transforms
biology and medicine. Genome Med.
10. Gambhir, S. S., et al. Toward achieving
5(9):79 (13 Sep 2013).
precision health. Sci. Transl. Med. 10(430)
(28 Feb 2018).

04. High Energy Physics
High Energy Physics (HEP) is concerned with
discovering the ultimate constituents of matter
and uncovering the nature of space and time.
The underlying theory and associated
experiments cover the smallest scales in all of
science to the very largest. In the DOE context,
this research quest is divided into three
Frontiers: Cosmic, Energy, and Intensity [1].
The Cosmic Frontier uses probes relying on

multi-wavelength surveys of the sky. The
probes treat the universe itself as an
experimental apparatus to investigate the
mysteries of dark energy and dark matter—the
primordial fluctuations from which all cosmic
structure came to be—and to determine the
masses of neutrinos, the lightest known
material particles. In addition, experiments
searching for direct evidence for dark matter
fall within the purview of the Cosmic Frontier.
The Energy Frontier studies the fundamental

constituents of matter by accelerating and
colliding charged particles at very high
energies in particle accelerators and by Figure 4.1 The ATLAS detector at the LHC under construction
recreating conditions that only existed in the in 2007.
very early universe. The massive detectors that
are used to study the collision events are and whether neutrinos are their own anti-
among the most complex scientific devices particles are just a few of the questions being
ever constructed by humans (Figure 4.1). Work addressed by Intensity Frontier experiments.
in the Energy Frontier is centered on searches
for physics beyond the particle physics A defining characteristic of all experiments in
Standard Model, and investigation of the this field is the generation of large, complex
properties of the Higgs boson, discovered datasets that can range from the hundreds of
in 2012. petabytes to exabytes. In addition, simulation
data, required to interpret the experiments, can
Intensity Frontier experiments require very reach similar scales. The experiments also
sensitive detectors to study rare processes, feature high data throughputs. Because of both
and intense particle beams are often needed the volume and velocity of data, AI approaches
for this purpose. The primary area of interest are needed at multiple levels in the data
here is the neutrino sector. Neutrinos are management chain to improve the
known to exist in three types (‘flavors’) and understanding of subtle systematics effects
change flavor via quantum oscillations as they and to open new avenues for scientific
propagate in space and time. The oscillations discovery (see Chapter 10, AI Foundations and
imply the existence of neutrino mass. The Open Problems).
origins of neutrino mass, the mass ordering,
04. HIGH ENERGY PHYSICS 45

The deployment of AI techniques in HEP has The AI methodologies employed are as broad
much in common with strategies and use cases as the problems to be solved. They range from
discussed in other chapters. Certain general deep learning and active learning methods to
notions such as automated discovery, end-to- random forest classifications, and they include
end workflows, explainability, and integration of more traditional machine learning approaches
data and theory are, of course, ubiquitous. such as Gaussian process modeling. A
More specifically, the idea of digital twins (see noteworthy feature is the close connection with
Chapter 8, Smart Energy Infrastructure) statistics—in particular, sampling theory and
strongly resonates with the modeling-intensive Bayesian methods— because of a focus on
approach characteristic of HEP. Data curation topics such as detailed verification and
(see Chapter 12, Data Life Cycle and validation, which are typically not considered in
Infrastructure) is an essential aspect of HEP non-scientific applications.
science. AI with edge systems (see Chapter
15, AI at the Edge) is analogous to HEP The Cosmic Frontier provides a rich application
detector online computing tasks. Finally, area for several reasons (Figure 4.2). First,
employing AI in reconstruction and tracking is cosmology is based on large observational
highly applicable, as HEP could profit from datasets rather than on isolated experiments.
advances in physics-informed AI models The observational nature of the field makes it
for sparse, high-precision measurements oftentimes impossible to extract the full
(see Chapter 10, AI Foundations and information content from the data without the
Open Problems). use of optimized learning algorithms. Image
analysis approaches to disentangle images of
1. State of the Art galaxies (deblending), analysis of photometric
data for redshift estimation, and feature
Advanced statistical methods and classical ML extraction to identify (e.g., strong lenses) are
approaches have a long and productive history just a handful of examples that have been
in particle physics, and crowd-sourcing actively developed in cosmology.
techniques have been put to excellent use by
cosmologists to lead to new discoveries. There
are, therefore, a great number of natural
applications of AI methods, a large fraction of
which can potentially exploit the burgeoning
activity in deep learning. Though in early
stages, many ideas are being actively
investigated with a view to addressing a
number of crucial problems.
Cosmology offers multiple challenges being

tackled today using AI approaches. Examples
can be found in areas such as (a) photometric
redshift estimation [2], (b) image analysis and
feature extraction [3], (c) reconstruction
methodologies [4] (including gap-filling), Figure 4.2 Cosmological inference problem: AI methods will
contribute in all individual phases as well as in a full end-to-end
(d) object [5] and real-time transient analysis.
classification, (e) inference frameworks [6], and
(f) fast predictions derived from expensive Second, object classification plays a critical
simulations (emulators) [7]. role in cosmology and AI offers manifold
approaches in this area. In particular, AI has

been used for the classification of transient events via event generators followed by
objects, such as supernovae. detailed, time-consuming Monte Carlo
simulations for the interactions within the
Third, the ultimate goal of cosmology is to infer detectors. ML techniques for replacing the slow
the underlying physics of the universe from a pieces of the simulations hold significant future
complex set of data products across multiple promise, and work on these is currently
wavebands spanning many orders of underway. Event generators have a large
magnitude. This endeavor has to necessarily number of parameters and tuning these in
combine sophisticated data analysis with the high-dimensional space is another obvious AI
best possible, and very computationally application (e.g., using Bayesian optimization
intensive, cosmological simulations. Here, ML [10]). ML techniques have been used for a long
approaches have been successfully used to time to reconstruct certain characteristics of
develop precision emulators, and to help collision events from detector raw data [11] via
mitigate systematic effects. The already pattern recognition and classification methods
successful ongoing efforts using AI in (including boosted decision trees and
cosmology along with new—and possibly neural networks).
unexpected—approaches will come together in
the next 10 to 15 years to revolutionize our More recently, unsupervised or weakly
understanding of the universe and help answer supervised anomaly detection models (e.g.,
some of the deepest questions in physics. [12]) have been applied to model-independent
resonance searches, opening new
ML-based methods already define the state-of- opportunities to detect physics Beyond the
the-art in a number of areas in particle physics Standard Model (BSM). ML applications also
experiments, including event and particle have a place in theoretical approaches, such
identification and energy estimation. The as the estimation of parton distribution
dominant algorithms are boosted decision trees functions, which cannot be computed from first
and neural networks [8,9]. Since model training principles QCD alone, and which need to be
is the most computationally expensive determined using experimental data [13].
component, particle physics experiments are
increasingly employing sophisticated ML AI techniques have been used successfully in
applications and reaping high value from them the Intensity frontier experiments NovA [14]
through the rapid turnaround of training and and MicroBooNE [15], which employed
optimization tasks by spiking large scale, but convolutional neural networks (CNNs), as
relatively brief, workloads into a variety of these are particularly suited for applications to
large-scale computational resources, including the large, homogeneous detectors that are
the ASCR LCFs and NERSC, other HPC characteristic of neutrino experiments. These
systems, Grids, and Clouds. Highly scalable techniques have been shown to outperform
distributed workload management systems algorithms used previously, in part because
already exist due to the use of the Grid they can exploit the suitability of GPUs for
paradigm in particle physics, and these ameliorating the high training costs of CNNs.
systems (e.g., PanDA and HEPCloud) can be
used to provide an integrated capability that 2. Major (Grand) Challenges
runs seamlessly over a heterogeneous
resource environment. The grand challenges in high energy physics,
described as follows, are driven by the
Accurate detector simulation using known availability of high-volume, high-throughput
interactions is necessary to compare with data with significantly enhanced scientific value
actual data in order to search for new physics. in resolution, sensitivity, and physics coverage.
This complex task involves modeling the

Reconstruct the history of the universe unprecedented simulations generated by
using AI techniques. During the next decade, exascale and beyond computing capabilities,
rich data sets will appear from advanced AI will enable a fully optimized experimental
survey telescopes. At the same time, the set-up. This set-up will include (1) the
advent of exascale computing will enable the development of an optimal observing strategy,
next generation of sophisticated cosmological given the scientific focus of the observations
simulations, modeling structure formation in (for example, finding the best compromise for
unprecedented detail. The new observations— deep versus wide field observations to deliver
unparalleled in depth and resolution at the the highest-accuracy dark energy constraints);
observed scales—combined with the (2) best-possible methods to remove system-
simulations and advances in AI, will allow the atics from the data; (3) increased processing
reconstruction of the history of the universe speed; (4) optimal calibration, and (5) the
from the Big Bang until today, from the largest search of new observable features and
scales down to our own Galaxy. We will anomalies and, therefore, the identification of
advance our understanding of the nature of new observing opportunities. ACE would be a
dark energy and dark matter, gain insight to the combination of survey telescopes with follow-
earliest moments of the universe as currently up instruments to enable fast tracking of
described by inflation, and measure the mass unexpected objects as well as transient follow-
of the neutrino. AI will play a pivotal role in this ups important for cosmology. The continuous
endeavor. Conventional methods, such as analysis of the data stream in combination with
2-point correlation function measurements, fail the predictions from the simulations will allow
to extract all of the information encoded in the further on-the-fly optimization of the survey.
data. To optimally extract information while ACE will be the cosmological equivalent of the
maintaining robustness, new AI techniques Event Horizon Telescope (EHT) [16], a
combined with statistical methods and HPC concerted effort between a network of radio
simulations will need to be developed. This telescopes that captured the image of a black
combination will enable predictions deep into hole and its shadow for the first time. In a
the nonlinear regime of structure formation, similar way, ACE will capture the structures in
spanning a large mass and spatiotemporal the universe in a concerted effort of optical
dynamic range. Not only will this sharply telescopes to shed light on the dark universe.
determine the cosmological parameters casting
light on fundamental physics, it will enable us Zettascale AI to uncover new fundamental
to run the movie of our own universe back to physics. Over the next decade, we will deploy
the far past—the era of primordial fluctua- AI-controlled, city-size scientific instruments
tions—as well as forward, enabling a glimpse (particle accelerators and particle detectors)
into the future evolution of our local universe. that produce zettabytes of detector data. AI-
powered hardware will filter the detector data in
Advance knowledge of cosmic structure microseconds. AI-simulations of the detector
formation with the AI-driven Automated response will enable high-precision studies,
Cosmology Experiment (ACE). Based on while completely unsupervised AI-searches for
advances in the next decade and driven by “New Physics” will open new windows for
observational facilities such as the Large discovery (Figure 4.3).
Synoptic Survey Telescope (LSST) and the
Dark Energy Survey Instrument (DESI), the To make this vision reality, we need:
next generation of cosmological surveys will
enable a new approach to cosmology via i) Intelligent Operations. AI algorithms for
a fully automated, AI-driven, cosmological anomaly detection will monitor the performance
experiment, ACE. By combining already of particle accelerators, detectors, and
available cosmological data with computing systems, looking for early signs of

Figure 4.3 AI-enabled ultra-fast event processing chain for HEP experiments.
potential problems (see Chapter 10, AI Hadron Collider (HL-LHC), and the Deep
Foundations and Open Problems). These Underground Neutrino Experiment (DUNE)—
techniques will allow for optimization of the will transform high energy physics. These
operations of these complex systems to facilities will be precise and powerful tools that
prevent or mitigate the impact of certain faults will enable both the discovery of new particles
and to accelerate the return to normal and in-depth studies of known particles and
operations if a fault occurs, increasing the fundamental interactions. They will produce
instrument science output. hundreds of petabytes of raw data every year,
and exabytes of simulated and secondary data
ii) ML inference with microsecond-latency streams. These data volumes will preclude
in particle physics trigger applications. At straightforward extensions of current
HL-LHC, each detector will produce petabytes approaches for detector data analysis. Collider
of detector data per second. The experiments physics can be described as a massive inverse
will rely on a “trigger” system built from custom problem, requiring techniques from data
hardware, plus FPGAs, CPUs, and GPUs merging, data visualization, and large-scale
processors to reduce these data rates to a inference, first to “deconvolute” the detector
more manageable 10GB/s. The first level of signals from thousands of particles traversing
this trigger system will reduce the detector data it, and then to reconstruct the primary collision
rate by three orders of magnitude in event from the particle measurements.
10 microseconds or less. The challenge is to
do that without throwing away any collision Key to the success of detector deconvolution
event resulting from rare or new physics and, in general, to the analysis of any particle
processes. AI advances will allow us to detect detector dataset is the availability of accurate,
and preserve these precious events that would high-statistics simulations of the detector
otherwise be lost forever, while still meeting the response to particle traversal. Currently, high-
stringent data rejection and latency accuracy detector simulation is performed
requirements. Advances in AI model using the Geant4 toolkit. As an example, the
architectures and in the use of inference availability of datasets with trillions of simulated
hardware (e.g., FPGAs) will be needed (see collision events could significantly increase the
Chapter 13, Hardware Architectures). sensitivity of precision measurements in the
Higgs and W boson sectors at the HL-LHC and
iii) AI-enabled, ultra-fast event processing help provide the first evidence for physics
chain. Over the next decade, accelerator beyond the Standard Model. Simulating a
facilities—such as the High-Luminosity Large collision event at HL-LHC can take up to

O(1Tflop) of computation and output O(1MB) of very early universe. These questions have led
data. Producing and storing one trillion Geant4 to a rich observational program, currently
collision events would be a challenge even on focused in the optical and microwave bands.
an exascale system. To achieve their physics Specifically, during the next 10 to 15 years,
goals, the next generation of facilities needs an data will be obtained and analyzed from the
ambitious R&D program in generative models following DOE-supported surveys: DESI,
with the goal to simulate the detector response LSST, and the CMB-S4 experiment. The data
for one collision (or interaction) event with will provide many AI challenges, from
O(1Gflop) or less, while maintaining an understanding and reducing systematic errors
accuracy comparable to the one achievable to the determination of the most valuable
using Geant4 (see Chapter 10, AI Foundations follow-up observations, to image analyses,
and Open Problems). Once this goal is and to the inference of the cosmological
achieved, the next challenge will be to integrate parameters that describe the physics of
the AI-accelerated simulation into a fast in situ our universe.
data processing chain of AI models for pattern
recognition, particle classification, signal/ HL-LHC will be a major upgrade of the LHC
background discrimination, anomaly detection, and of its detectors. The experiments will
and model-free searches. Running this “fast observe at least 50 times more proton-proton
chain” on massively parallel systems (see collisions per unit time. The increased statistics
Chapter 16, Facilities Integration and AI will push the precision of most measurements
Ecosystem) will be vital to maximizing the of the property of the Higgs boson against the
discovery potential of the next generation of detectors’ systematic accuracy. If the
particle-physics experiments. experiments can reduce their systematical
errors, particularly by simulating the response
3. Advances in the Next Decade of their detectors with high accuracy, the
following may be enabled: (1) understand the
Major advances in experiments are expected in nature of the Higgs boson (is it a fundamental
the next decade. The Cosmic Microwave particle or a composite?); (2) probe directly or
Background Stage-4 (CMB-S4), DESI, and indirectly the existence of new Beyond-
LSST surveys will be operating on the ground Standard-Model particles and interactions, and
and Euclid, eROSITA, SPHEREx, and WFIRST (3) probe the existence of heavy, weakly
will be sending data from space. HL-LHC and interacting particles which may be dark
DUNE will be taking data by the middle of matter constituents.
the decade.
DUNE will study with unprecedented precision
The cosmological survey landscape in the and accuracy the physics of neutrinos and offer
coming decade will offer exciting challenges at new windows into the origin of the universe
the data analysis front. Interestingly (from the matter-antimatter asymmetry. The DUNE
AI perspective), it is not only the increase in detector is also capable of studying neutrino
data size compared to contemporary surveys bursts from exotic cosmic events, such as the
but also the increased complexity of the data formation of a black hole. DUNE may also be
due to enhanced resolution and depth of the the first detector capable of observing the
telescopes. In particular, DOE is interested in exceedingly rare decay of a proton, allowing
extracting fundamental physics knowledge it to constrain the energy scale at which the
from cosmological surveys and answering three gauge interactions are unified in a
questions about the nature of dark matter and single theory.
dark energy, constraining the mass of
neutrinos and the number of non-relativistic The new HEP experimental facilities will be
species, and investigating the physics of the some of the world’s largest sources of

high-quality scientific data. Exploitation and the focus will be to establish the reliability and
analysis of these data sets will greatly benefit robustness of AI-based methods for the
from integration within the larger AI ecosystem different application areas (see Chapter 10, AI
consisting of DOE’s high-performance Foundations and Open Problems). In
computing and high-performance networking particular, whenever high precision is required,
(HEP) facilities. The data itself will be it has to be ensured that the AI approaches do
generated within HEP facilities and instruments not lead to undesirable biases due to, for
whose operation will also avail of a number of example, misclassification of objects. With
AI capabilities in the sphere of high-speed data LSST and DESI coming on-line very soon,
classification, selection, and reduction, and in many new approaches will be applied and
real-time control and optimization. In some tested. In particular, for LSST’s Dark Energy
contrast to the situation described in Science Collaboration (DESC), simulated data
Chapters 14 (AI for Imaging) and 16 (Facilities challenges are being created that will provide
Integration and AI Ecosystem), HEP data sets excellent testbeds for many of these projects.
are already subject to well-defined quality Therefore, the highest priority in cosmology
standards and the field has a long history of over the next few years will be to develop a
established practice in large-scale data roadmap that clearly establishes the best
management and the exploitation of ML application areas for AI and a solid
techniques. For these reasons, HEP facilities understanding of their error properties. Based
are in an excellent position to take immediate on the findings, the cosmology community will
advantage of AI-enabled methodologies as be able to fully integrate AI in their data
they become available. Because large-scale analysis approaches and pipelines and then
HEP experiments have already built a take the next major steps as outlined in the
sophisticated infrastructure for distributed data Grand Challenge problems to create a well-
management and analysis based on a integrated overarching approach for use of AI
hierarchy of storage and analysis hubs and to enable major advances in cosmological
platforms, an exciting opportunity for greatly inference, extract the maximum possible
enhanced scientific returns exists in embedding information from the data, and inform and
this capability within DOE’s broader HPC and optimize new observational strategies.
HPN infrastructure via AI-enabled smart
edge services to HPC systems and AI- For AI to play a critical role in DUNE and HL-
enhanced “just-in-time” HPN-based data LHC data simulation, processing, and
delivery systems. analyses, new AI models are needed which are
well suited to the sparse, high-precision nature
4. Accelerating Development of the measurements from most HEP
detectors. Pattern recognition algorithms
The amount and complexity of the next- developed for a 10 MPixel photo camera do not
generation data stream will require a concerted work out of the box for a detector with
effort to combine new analysis and modeling 10 million active pixels, which are millimeters or
and simulation methods that effectively even meters apart. Likewise, AI image-
leverage AI technologies. New cosmological generation techniques commonly used to
surveys are rapidly coming online. Given the simulate new images from a library of existing
aim of these surveys to deliver cosmological ones, do not meet the stringent accuracy
parameter constraints at percent level requirements of HEP detector simulation,
accuracy, AI will play a central role in the particularly when it comes to simulating the
analysis and interpretation of the data. The tails of detector response. In general, for AI to
cosmology community has embraced this address HEP data challenges of the next
opportunity fully and is developing approaches decade, we will need to identify resource-
for numerous tasks already. In the near future, critical applications (such as detector

simulation) and to develop ML models that are AI algorithms will play a vital role in the next
good at simulating and detecting extremely generation of particle physics detectors and
rare phenomena 1 with high efficiency and accelerators from Intelligent Operations, to
accuracy. Besides supporting a robust R&D Fast AI for data selection. Simulating and
program targeting select HEP grand processing the DUNE and the LHC detector
challenges, the development needed to meet data with high statistics and high accuracy will
these grand challenges includes: usher in a new era of precision physics at the
Energy and Intensity frontiers that may shed
(1) Create usable tools for large-scale light on HEP fundamental questions such as
distributed training and optimization of ML the scale at which nature’s fundamental forces
models to enable physicists to scale up the are unified, the origin of the universe’s matter-
complexity of their models orders of antimatter asymmetry, and the constituents of
magnitude above the current “laptop-size.” dark matter. The introduction of model-free,
unsupervised AI searches will further push the
(2) Develop training methodologies that are
potential of discoveries that may transform our
able to detect rare features in high-
understanding of fundamental physics over the
dimensional spaces while being robust
next decade.
against systematic effects.
(3) Design tools to quantify the impact of 6. References
systematic effects of the accuracy and
stability of complex ML models. 1. Rosner, J., et al., Planning the Future of US
Particle Physics, arXiv:1401.6075
5. Expected Outcomes 2. Cavuoti, S., et al., Machine-learning-based
photometric redshifts for galaxies of the
The primary aim of ongoing and upcoming
ESO Kilo-Degree Survey data release 2,
cosmological experiments is to further our
MNRAS 452, 3100 (2015).
understanding of the dark universe (dark
matter and dark energy), the very early 3. Kremer, J., et al., Big Universe, Big Data:
moments of cosmic evolution (inflation), and Machine Learning and Image Analysis for
the make-up of the universe. These are Astronomy, IEEE Intelligent Systems 32, 16
profound questions in the area of fundamental (2017).
physics. AI will enable the exploration of the 4. Higson, E., Handley, W., Hobson, M., and
data from the next-generation surveys in new Lasenby, A., Bayesian sparse
and unexpected ways. The data amount and reconstruction: a brute force approach to
the complexity of the data will increase astronomical imaging and machine
immensely in the coming years, and in some learning, MNRAS 483, 4828 (2019).
areas, traditional methods will break down due
to the sheer data volume (e.g., no human will 5. Lanusse, F., et al., CMU DeepLens: deep
be able to look at each image taken by the new learning for automatic image-based galaxy-
surveys). The ability to make a movie of the galaxy strong lens finding, MNRAS 473,
universe from its earliest moments until today 3895 (2018).
and into the future will have a profound 6. Krause, E. and Eifler, T., CosmoLike –
impact—AI in cosmology will be central to our Cosmological Likelihood Analyses for
quest to understand the universe in which Photometric Galaxy Surveys, MNRAS, 470,
we live. 2100 (2017).
7. Heitmann, K. et al., Cosmic Calibration,
1 For example, DUNE expects to observe O(1) proton Astrophys. J., 646, L1 (2006).
decay candidate per year while processing O(PB/s)
of detector raw data.

8. Albertsson, K., et al., Machine Learning in 13. Ball, R.D., et al., Parton distributions for the
High Energy Physics Community White LHC Run II, JHEP 04, 40 (2015).
Paper, arXiv:1807.02876 14. Aurisano, A., et al., A Convolutional Neural
9. Radovic, A., et al., Machine learning at the Network Neutrino Event Classifier, JINST
energy and intensity frontiers of particle 11.09 (2016).
physics, Nature 560, 41 (2018). 15. Acciarri, R., et al., Convolutional neural
10. Ilten, P., Williams, M., Yang, Y., Event networks applied to neutrino events in a
generator tuning using bayesian liquid argon time projection chamber,
optimization, JINST 12.04 (2017). JINST, 12.03 (2017).
11. Albrect, J., HEP Community White Paper 16. Akiyama, K., et al. (Event Horizon
on Software trigger and event Telescope Collaboration), First M87 Event
reconstruction, arXiv: 1802.08638 Horizon Telescope Results. I. The Shadow
of the Supermassive Black Hole, Astrophys.
12. Collins, J. H., et al., Extending the Bump
J. 875, L1 (2019).
Hunt with Machine Learning, arXiv:
1902.02634

54
05. Nuclear Physics
The nature of matter is the fundamental The Nuclear Physics Long Range Plan
question in nuclear physics: what are the basic identifies the priorities for the field. These are:
components of matter and how do they interact
to form the elements that make up our • Utilize investments in accelerators,
universe? This question is not limited to familiar detectors, and computational infrastructure.
forms of matter, but also includes exotic forms, • Develop a U.S.-led, ton-scale neutrinoless
such as those that existed in the first moments double beta decay experiment.
after the Big Bang, and those that exist today
inside neutron stars. In addition to the • An electron-ion collider is the highest priority
fundamental questions of how and why matter for new facility construction.
takes on specific forms, it is also important to • Invest in small and mid-scale projects and
understand how that knowledge can benefit initiatives enabling forefront research,
society in the areas of medicine, nuclear including theory.
energy, and national security. Nuclear
experiments include a range of devices, from Applications of nuclear physics for societal
small- and intermediate-scale devices to very benefit are also important.
large detector programs at accelerator
laboratories like the Relativistic Heavy Ion The multiscale, highly correlated, and high-
Collider (RHIC) at Brookhaven National dimensionality nature of the physics of
Laboratory (BNL), the Continuous Electron the nuclear force leads to a rich set of
Beam Accelerator Facility at Thomas Jefferson phenomena in nuclear physics. AI techniques
National Accelerator Facility (Jefferson Lab), offer the possibility of increasing our
and the Argonne Tandem Linac Accelerator understanding of this physics and making new
System at Argonne National Laboratory discoveries, through a number of applications,
(Argonne). Nuclear physicists also lead detailed here.
experiments at other user facilities such as the
Large Hadron Collider (LHC) at the European
Organization for Nuclear Research (CERN)
(Figure 5.1), the Japan Proton Accelerator
Research Complex (J-PARC), and the
Spallation Neutron Source (SNS) at Oak Ridge
National Laboratory (ORNL).
Nuclear theory is concerned with how quarks

and gluons interact to form protons, neutrons,
and other hadrons, as well as how those
hadrons interact to form and determine the
behavior of atomic nuclei. Studies of the
formation and characteristics of nuclear matter
in stellar explosions (i.e., supernovae) and
neutron stars are among the most
computationally intensive investigations
Figure 5.1 An event display shows particle tracks from a lead-
currently underway. on-lead collision in the ALICE detector. Image courtesy of
CERN, ALICE Collaboration [taken from https://www.energy.gov/
science/np/articles/explaining-light-nuclei-production-heavy-
ion-nuclear-collisions].
05. NUCLEAR PHYSICS 55

1. State of the Art with standard techniques pass through
algorithms based on the T-distributed
Increasing data volumes from nuclear stochastic neighbor embedding (t-SNE)
experiments and simulations have already led approach [6] for unsupervised DL. The t-SNE
to a variety of AI approaches being employed approach was able to find clusters in
in the field. These span nuclear theory, 35 dimensions corresponding to different
experiments at various scales, accelerator detector signals, clearly delineating previously
optimization and controls, and applied identified 17F(α,p) reactions, providing a proof-
nuclear physics. of-concept for use in other reactions.
Nuclear binding energy, for example, is an Analysis of the very complex data sets from
essential property for understanding the heavy ion collisions at RHIC and the LHC
production of nuclear species in astrophysical already benefits from AI. Deep neural networks
events such as supernovae and neutron star can connect specific moments of the complex
mergers. Some relevant binding energies particle correlations inside jets of hadrons with
cannot be measured directly and rely on properties of the quark gluon plasma produced
nuclear models. Supercomputer calculations in the collision—in ways not previously
based on fundamental theory provide our best predictable [8].
predictions for these binding energies and
other important nuclear properties, but to reach The GlueX experiment at Jefferson Lab utilizes
the needed precision, these calculations a high-intensity photon beam and a large-
become very computationally expensive. A acceptance particle detector to search for
team led by researchers from Iowa State exotic hadrons. Individual collisions are
University and Lawrence Berkeley National reconstructed from fine-grained detector
Laboratory (LBNL) developed a DL approach systems. A key use case of ML at GlueX thus
using a neural network trained with state- far is in filtering those events containing rare
of-the-art supercomputer calculations [5]. reactions. GlueX demonstrated that Boosted
The trained network estimates binding energies Decision Trees achieved the required
and other properties with precision performance [7]. Another recent development
beyond expectations from the available in GlueX is a system of data quality
calculations. The researchers validated their monitoring using ML to evaluate images of data
approach by demonstrating consistency with quality histograms in real time to identify
available analytic and phenomenological problematic regions of the detector during the
extrapolation tools. experiment’s operation.
Experimental groups in all areas of nuclear It is now known that neutrinos have mass.
physics are using AI techniques to characterize However, it is not known whether the neutrino
features in their data more quickly, efficiently, is a Dirac or Majorana particle (i.e., the
and with increasing sensitivity. neutrino and the antineutrino are the same
particle). To answer this question, nuclear
Experimental nuclear astrophysicists use the physicists search for the lepton-number
MUlti-Sampling Ionization Chamber (MUSIC) violating process of neutrinoless double beta
detector at Argonne to study the fusion of decay, wherein two neutrons in an atomic
nuclei in stars and to understand explosive nucleus are transformed into two protons
stellar phenomena such as Type I X-ray bursts without the usual emission of two antineutrinos.
and superbursts. Standard data analysis In such searches, it is paramount to
techniques require months to select relevant differentiate a very small signal from
events. Data that were previously analyzed background events that occur at rates orders of

magnitude larger. The backgrounds are background event. Calorimetric resolution of
dominated by intrinsic radioactivity in the the Bragg peak of stopping particles
detector along with instrumental backgrounds. differentiates between the start and end point
Current neutrinoless double-beta decay of an electron track, and the topological
demonstrator experiments are exploring signature (one Bragg peak or two Bragg
different techniques to classify and separate Peaks) differentiates between single
the two-beta-electron signal from other classes and double electron events (Figure 5.2).
of events in detectors, including large-scale Topological signatures are important for
liquid scintillators, semiconductor ionization reducing background rates and reaching the
detectors, bolometers, and high pressure experimental sensitivity needed to learn the
gaseous Xenon TPCs. Geometric patterns of nature of the neutrino.
fired photomultiplier tubes are examined, or the
pulses from charge or phonon collection are Spatially sparse image data, such as that found
used. Most developments were begun with in high pressure Xenon TPCs, naturally lends
decision tree techniques, as event itself to the application of CNNs for topological
classification is the primary goal. Now, discrimination. ML, including DL, methods have
experiments are implementing DNNs and other shown excellent promise in this task of
DL methods. Improvements to the sensitivity to resolving signal and background events at the
neutrinoless double beta decay have been same energy in high pressure Xenon TPCs
demonstrated through more effective simulation and data, a neutrinoless double beta
identification of background events. decay prototype, through the use of CNNs in
three dimensions. Additionally, these networks
As an example, the high spatial resolution of were trained using scalable distributed learning
Xenon TPCs offer an additional handle for techniques with spatially sparse convolutional
neutrinoless double beta decay searches networks and achieved the state of the art in
beyond the excellent energy resolution at the less than 30 minutes of computational time.
0νββ region of interest. The addition of spatial
tracking information provided by the detector Accelerator facilities are improving operations
offers a topological separation of 2-electron using AI technologies. At RHIC, efforts are
events (resulting from a neutrinoless double under way to implement anomaly detection in
beta decay event) from a single electron event the controls and AI methods in data mining. In
of the same energy resulting from a addition, reinforcement learning is used along
(a) (b)
Figure 5.2 (left) 3D rendering of double beta decay-like data in high pressure TPC
type detector. (right a,b) Simulated neutrinoless double beta decay interactions
(right a) have two Bragg peaks, while energetically indistinguishable background
events have just one (right b). Right a,b images are 2D projections [9].

with game theory to analyze client activities in 2. Major (Grand) Challenges
the RHIC control systems [4]. Prognostics and
errant beam prevention are becoming Advances in the use of AI/ML/DL techniques in
increasingly important in an age where we nuclear physics will be driven by the volume
have many superconducting accelerators and complexity of new data—both from
(superconducting magnets and super- experimental facilities (as described above)
conducting radio frequency), with high and from theory and simulation. The ability to
repetition rates and high power, and complex discern physical causality and discover new
sensitive components. There is a much greater phenomena will require the application of
need for improved prognostics to avoid faults new technologies to augment human
and to improve on recovery from faults. Many understanding. We note several grand
groups have efforts focusing on these areas, challenges for better understanding the nature
including improving mining large repositories of of matter in this section.
accelerator engineering data and introducing
methods for real-time anomaly detection in Generate detailed tomography of the
operating systems. proton/nuclei. This 3D tomography of hadrons
and nuclear structure is not directly accessible
An ongoing project at Jefferson Lab leverages in experiments. Obtaining the quantities of
ML to automate cavity trip classification. interest, such as generalized and transverse
Traditional methods have been effective at momentum dependent parton distribution
identifying superconducting radiofrequency functions (Generalized Parton Distributions
(SRF) trip causes, but are labor intensive and (GPDs) and Transverse Momentum
generate results in an asynchronous fashion. Distributions (TMDs)), involves an inverse
Identifying and correcting faults in real-time will problem. This is because these objects are
have numerous benefits including improving inferred from experimental data using
the stability of the SRF system, providing a theoretical frameworks such as quantum
more reliable and available accelerator, and chromodynamics (QCD) factorization theorems
extending the energy reach. It will also provide (e.g., collinear factorization, TMD factorization).
important statistics and insights on cryomodule Such a procedure allows one to connect
operations to engineering and SRF R&D staff experimental data to quantum probability
while freeing them to focus on the future design distributions that characterize hadron and
and fabrication of SRF cryomodules. The nuclear structure and the emergence of
project established a prototype system that hadrons in terms of quark and gluon degrees
reads data from the control system as faults of freedom.
occur, classifies it with a trained ML model, and
outputs the result to subject matter experts. Existing techniques to extract probability
The system provides a cavity trip type, distributions from data have primarily been
identifies the cavity causing the instability, and, used to obtain a 1D tomography of hadrons,
potentially, can predict a trip before it occurs. It provided by parton distribution and
is a first step towards a diagnostic tool for daily fragmentation functions. These techniques
use by operators to accurately identify a cause usually rely on Bayesian likelihood techniques
of a trip and apply precise response and Monte Carlo sampling methods, which are
measures, avoiding unnecessary gradient coupled with suitable parametrizations for the
reduction [10,11]. distribution functions of interest (Figure 5.3).

are fundamental to understanding matter/
antimatter asymmetry in the universe and
neutrinoless double beta decay offers a
window into these phenomena. CNNs offer the
ability to reach beyond current technologies for
neutrinoless double beta decay, thanks to the
ability to quickly learn pattern recognition and
discriminate important topological features. A
significant challenge, however, will be
validating a ML technique sufficiently well to
ensure it performs on data in the energy region
of interest.
With the availability of radioactive sources for

calibration, such as Thallium in high pressure
Xenon TPCs, researchers have access to a
dataset with signal-like and background-like
events that have a very similar topological
signature to a neutrinoless double beta decay
signal and background events, but at a
different energy and with high statistics. The
combination of available simulation, validation
datasets, and very fast training times will allow
Figure 5.3 A momentum space tomography of a hadron at
difference slices in Bjorken x, for u and d anti-quarks. The experiments to perform an optimization
images show how the variable x provides a filter to select campaign to build a robust neural architecture
different aspects of nucleon or nuclear partonic structure. for fast analysis of neutrinoless double beta
data with high confidence of similar
In the Electron-Ion Collider (EIC) era, such performance on data and simulation.
methods need to be dramatically improved Additionally, the introduction of Generative
upon so that the full impact of the science can Adversarial Networks (GANs) to model
be assessed in real-time. This provides an data/simulation discrepancies, with the ability
important opportunity to utilize AI/ML to validate over large energy regimes,
techniques to obtain approximate solutions to increases the confidence in a network trained
the associated inverse problems. That is, to on simulation + GAN datasets. The grand
find an efficient mapping between the exabytes challenge in this space is to create an AI-
of experimental cross-section data and the centric workflow to distinguish neutrinoless
theoretical objects of interest, namely the double beta decay candidates from
quantum probability distributions. Such a background, while using AI to validate
project will produce the next generation of QCD simulations and ensure high-quality inference
analysis tools that will provide rapid feedback results on data.
between experimental data and a deeper
understanding of strong interaction dynamics. Advance the understanding of nucleo-
Therefore, AI/ML methods will help guarantee synthesis. Our understanding of
maximum science output from the EIC. nucleosynthesis is growing through studies of
astronomical measurements, theoretical
Increase the understanding of matter/anti- calculations, and experimental measurements
matter in the universe. A better of exotic nuclei generated at advanced
understanding of electroweak interactions experimental facilities.

Researchers are now working to extend deep anticipate that a variety of AI/ML approaches
learning to a wide range of important properties will be developed to address specific needs at
that govern the production of nuclei in the FRIB, including beam generation, event
Cosmos. Further developments include characterization, detector response, experi-
applications to measure electromagnetic and ment optimization, and data analysis.
weak transition rates in both stable and
unstable nuclei. In addition, applications to
improve scattering and reaction cross-sections
based on fundamental theory appear feasible
in light of the initial successes with binding
energies. For example, incompletely converged
supercomputer calculations of nucleon-nucleus
cross-sections based on microscopic theory
have appeared recently and, as with the
binding energy example, a DL approach could
extend those results to produce cross-sections
at convergence with quantified uncertainties.
Nuclear astrophysics simulations—including Figure 5.4 The Facility for Rare Isotope Beams (FRIB) will
core-collapse supernovae, X-ray bursts, and provide unparalleled beam intensities of the most exotic nuclei.
neutron star mergers—continue an inexorable
march towards higher computational intensity, Transform the operation of accelerators and
as increased physical fidelity is realized using detector systems. In data analysis,
higher spatial resolutions, longer physical experimental design and optimization, and
times, and more complete microphysical even facility operation, AI/ML may provide
descriptions. Anomaly detection for these very approaches that are complementary to and
expensive (i.e., of order tens of millions of LCF offer improvement over traditional techniques.
node-hours) calculations becomes essential to AI/ML studies can offer transformative progress
ensure that scarce computational resources in optimal operations of accelerators. In
are not consumed in error. In addition, many of addition to the ongoing work at BNL and
the requisite microphysics in these simulations Jefferson Lab, FRIB operations will surely
(e.g., neutrino-matter interaction rates, benefit. Production of high-purity, high-intensity
thermonuclear reaction rates, and high-density beams of unstable nuclei and delivery with high
equations of state) are recovered via the use of efficiency to the FRIB experimental end
high-dimensional interpolation tables. ML stations present a daunting challenge. As data-
techniques such as Gaussian process models taking runs for each measurement can be
and deep neural networks can replace short, tuning time is important.
traditional interpolation techniques while
providing superior robustness. Time-consuming, multi-step beam generation
efforts potentially limit the overall scientific
When completed in 2022, the Facility for Rare productivity of the facility, as will the need to
Isotope Beams (FRIB) will be the world’s most (on occasion) use sub-optimal beams with
powerful rare isotope research laboratory. By lower intensity. By utilizing supervised ML
producing intense beams of nearly 80 percent methods or reinforcement learning, it is
of the predicted isotopes for elements up to anticipated that beam generation times can be
uranium, FRIB will enable researchers to make significantly reduced compared to manual
major advances in the structure, stability, and efforts, while simultaneously improving the
limits of nuclear matter, as well as in their quality of beams delivered to the end stations.
interactions and decays (Figure 5.4). We

Detector systems used in nuclear physics location of foreground events in the presence
experiments and nuclear physics applications of significant backgrounds. In calorimeters,
will continue to generate higher fidelity data, DDN’s allow sophisticated analysis of shower
which will drive needs for better data analysis shapes to separate single photons, hadrons,
methods, and, in some cases, for faster and and their decays.
high-fidelity edge-driven analysis.
Many modern detectors digitize the signals
AI techniques are being developed for event (waveforms) from each event. For example,
characterization, particle and photon tracking, new large-volume germanium detectors for
particle identification, and energy gamma-ray spectroscopy will enable position
reconstruction. Reconstruction of tracks in time sensitivity, i.e., determining not only the total
projection chambers could be greatly improved energy deposited via gamma rays, but also the
with such approaches. At FRIB, logistic energy and position of the individual
regression, fully connected neural networks, interactions within the detector. Spatial
convolutional neural networks, and other resolutions of a few millimeters will be possible,
approaches are being explored to identify enabling so-called gamma-ray tracking,
event tracks in the Active Target Time another area where ML is applicable. Gamma-
Projection Chamber (AT-TPC). This step could ray tracking is the core operating principle of
be decoupled from fitting the tracks to the Gamma-Ray Energy Tracking Array
determine reaction kinematics. (GRETA) spectrometer, and AI/ML methods
may transform current approaches using
The enormous particle multiplicities in TPC’s at deterministic and probabilistic methods to
heavy ion colliders cause track reconstruction reconstruct the path of multiple gamma rays
to be slow and require complex correction for from measured interaction positions and
distortions due to the large charge load corresponding deposited energies. ML
in the TPC. Application of ML to this problem algorithms could be trained on the pattern of
would greatly simplify calibration and interaction points and energies with no
track reconstruction. assumptions of the underlying scattering
processes. By focusing on differentiating
Methods to improving particle tracking through events that are completely absorbed versus
sophisticated magnetic spectrometers are also those that are partially absorbed, significant
being developed through AI/ML. While the improvements are anticipated in determination
exact technique differs for different magnet of the peak-to-total, Doppler correction, angular
configurations, room for improvement exists at distributions, and linear polarizations of events
all DOE Nuclear Physics (NP) accelerator in GRETA. Improving the determination of
facilities. At FRIB, correlating signals in the gamma-ray transport parameters and transfer
focal plane detectors of the magnetic functions will improve the position resolution of
spectrometers using a series of masks at the the detector, especially for lower energy
target location could be used to train interaction points. Among other more
corrections for offsets in initial particle angle established approaches, the use of GANs [3]
and position. This could markedly improve the for the discovery of these transfer functions is
energy/momentum resolution of the focal an attractive avenue of investigation. These
plane spectra. techniques will be applicable to other detector
systems, as well.
DNNs are being applied to complement
existing Monte Carlo approaches for particle 3. Advances in the Next Decade
identification. Event shapes in multi-
dimensional (detector signal) space can be The growth of AI techniques and the
used to train ML algorithms to recognize the familiarization of nuclear physicists with those

techniques is anticipated to result in substantial and produce better stewardship of the major
advances in the next decade, which is investments (see Chapter 14 AI for Imaging).
particularly important given the planned
increase in date volume and fidelity resulting Experimental design. More capable, AI-driven
from new experiments and facilities. computing at the sensor edge will enable
In particular, the following advances higher precision instruments to be developed
are anticipated. and fielded at NP experiments. These
advances may result in near-real-time tuning of
Extracting physics from simulations and detector parameters and better data acquisition
other large-scale inverse problems. The decisions (see Chapter 15, AI at the Edge).
coupling of higher-fidelity simulations that
leverage HPC environments with the ability to 4. Accelerating Development
conduct an ever-increasing number of
simulations provides great opportunity to As outlined above, the sheer volume and
leverage AI to infer physics, manage and plan complexity of nuclear physics data is
simulations, and tackle many other large-scale increasing at a rapid pace. These increases
inverse problems, including 3D tomography, are occurring across the enterprise of nuclear
which relates to precision medicine physics, from nuclear theory to experiment,
(see Chapter 10, AI Foundations and and to the operation of facilities and the
Open Problems). collection of data in support of nuclear science
applications. Inference from these increasingly
Data analysis. Data analysis methods will complex sources, and therefore physical
continue to advance the AI methods that are understanding, is constrained even now by
being leveraged for data analysis in both online physicists’ ability to examine, analyze, and
and offline scenarios, where the online AI interrogate data. The effective continued
activities may be pushed closer to the sensor adoption of AI techniques into the nuclear
edge. Advances in particle tracking, particle physics workflow depends most critically on
identification, data fusion, and background several factors:
reduction, as well as methods such as using
shallow neural networks for curve fitting and • The development of AI/ML/DL techniques
other data analysis methods, will continue that are scalable from modest or scarce data
(see Chapter 10, AI Foundations and volumes, to data volumes that can be
Open Problems). exponentially larger (see Chapter 10, AI
Foundations and Open Problems).
Data management. Similar to data analysis • AI approaches for anomaly detection and
methods, methods used to provide metadata, decision support that can be used in
facilitate data discovery and data retrieval, and operating environments where expensive
enable cross-experiment analyses will evolve resources (e.g., accelerator beamlines and
thanks to AI methods that can reduce the now leadership-class supercomputers) are being
human-intensive task of curating data (see used (see Chapter 15, AI at the Edge).
Chapter 12, Data Life Cycle and Infrastructure).
• Creation of new data analysis techniques for
Facility operation. Experimental facilities are analyzing and interpreting the large
major investments in capital. Operating these multidimensional data sets produced by
facilities with minimal down-time and maximal heterogeneous sensor networks, and
user value provides the best return on methods of performing online sensor and
investment and scientific outcomes. sensor network reconfiguration to optimize
Improvements in beam diagnostics and control performance. Two techniques of particular
and beam-line planning will save human effort interest are the use of unsupervised learning

methods for the discovery of multi- real-time. The accessibility of the data to the
dimensional patterns and the development wider nuclear physics community would
of underlying models, as well as online create a connectivity across experiments not
learning techniques that are able to use seen before. This connectivity will become
streaming data to adapt to changing the standard rather than the exception in
conditions across a network in real time (see understanding nuclear phenomena from the
Chapter 10, AI Foundations and Open laboratory to the universe.
Problems and Chapter 15, AI at the Edge). • In a practical sense, radioactive and stable
• AI techniques that can optimize the design isotopes are critical to several societal
of complex, larger scale experiments could needs. They are essential for energy
completely revolutionize the way exploration and innovation, medical
experimental nuclear physics is done (see applications, national security, and basic
Chapter 10, AI Foundations and Open research. The utilization of AI to optimize the
Problems). choice of reactor parameters, exposure
time, and sample composition poses the
• AI techniques can facilitate the collection
potential to significantly increase the reliable
and analysis of metadata, facilitating data
and cost-effective production of isotopes,
reduction tasks to better document
thereby impacting national needs in
experimental conditions and better facilitate
these areas.
nuclear data evaluation and the
‘interoperability’ of data resulting from 6. References
complex experiments (see Chapter 12, Data
Life Cycle and Infrastructure). 1. Lee, I. Y., Gamma-ray tracking detectors.
Nucl. Instrum. Meth. A 422, 1-3 (1999),
5. Expected Outcomes 195-200.
• One of the fundamental goals of nuclear 2. Deleplanque, M. A., et al., GRETA: utilizing
physics is to understand how interactions new concepts in gamma-ray detection.
between quarks and gluons ultimately Nucl. Instrum. Meth. A 430 2-3 (1999), 292-
manifest in the structure and binding of 310.
nucleons and nuclei. Approximate 3. Goodfellow, I., et al., Generative
symmetries found in nuclear physics are Adversarial Nets. Adv. Neural Inf. Process.
thought to have origins not only in the Syst. 27, (2014) 2672–2680.
underlying interaction, but also in the
complicated many-body physics of the 4. Gao, Y., J. Chen, T. Robertazzi, and K. A.
problems. AI has the potential to aid human Brown. Reinforcement learning based
understanding of these complex systems schemes to manage client activities in large
through improved methods that discern the distributed control systems. Phys. Rev.
origins of these symmetries and the Accel. Beams 22, 014601, January 2019.
emergent behavior that is often observed. 5. Negoita, G. A., et al., Deep learning:
• Applications of AI in nuclear physics will Extrapolation tool for ab initio nuclear
produce a paradigm shift in the way theory. Phys. Rev. C 99 (Oct. 2019).
information is gathered, stored, analyzed, 6. Maaten, Laurens van der, and Geoffrey
and interpreted from the large amount of Hinton. Visualizing data using t-SNE. J.
data obtained from scattering and decay Mach. Learn. Res. 9, Nov (2008): 2579-
experiments. With the aid of AI, experiments 2605.
that require years of analysis will see
decisions on optimization and results in near

7. Dugger, M., et al., “A study of decays to 10. Solopova, A. D., et al., SRF Cavity Fault
strange final states with GlueX in Hall D Classification Using Machine Learning at
using components of the BaBar DIRC,” CEBAF. Proc. 10th Int. Particle Accelerator
arXiv:1408.0215 [physics.ins-det]. Conf. (IPAC'19), Melbourne, Australia, May
2019, pp. 1167-1170.
8. Lai, Y. S., arXiv:1810.00835.
11. Carpenter, A., et al., “Initial Implementation
9. Ferrario, P., et al., “Demonstration of the
of a Machine Learning System for SRF
event identification capabilities of the
Cavity Fault Classification at CEBAF.”
NEXT-White detector,” arXiv:1905.13141
17th Int. Conf. on Accelerator and Large
[physics.ins-det], accepted to JHEP 2019.
Experimental Physics Control Systems
(ICALEPCS’19), New York, NY, USA, Oct.
2019, paper WEPHA025.

06. Fusion
The pursuit of fusion energy has required the early 1990s. The most extensive
extensive experimental and theoretical science applications have been to predict disruptions
activities to develop the knowledge that will be (catastrophic, sudden loss of plasma
needed to enable the design of successful confinement due to growing instabilities) in
fusion power plants. Even today, following tokamak magnetic confinement devices.
decades of research in many key areas, Tokamaks have confined plasmas with
including plasma physics and material science, temperatures in excess of 150 million degrees
much remains to be learned about the Celsius for many seconds and are the present
optimization of the tokamak—an experimental leading candidate for a fusion power plant.
magnetic confinement machine that has the There are a multitude of examples of ML
potential to produce controlled thermonuclear application for disruption prediction, such as
fusion power—or other paths toward employing a neural network to predict high beta
fusion energy. disruptions in real time from many
axisymmetric-only input signals [21]; producing
Data science methods from the fields of ML a multi-machine applicable disruption predictor
and AI offer opportunities for enabling or for the Joint European Torus (JET) [20, 17] and
accelerating progress toward the realization of the Axially Symmetric Divertor Experiment
fusion energy by maximizing the amount and Upgrade (ASDEX-UG) [4,15,16]; demonstrat-
usefulness of information extracted from ing use of time series data and explicit look-
experimental and simulation output data (see ahead time windows for disruption predictability
Chapter 10, AI Foundations and Open in Alcator C-Mod [5], DIII-D [11], and the
Problems). While data-driven methods have Experimental Advanced Superconducting
long been used for specific roles in fusion Tokomak (EAST) [13] (Figure 6.2); and
research, such as real-time prediction of demonstrating use of extensive profile
disruption risk in tokamaks [7], there is measurements in multi-machine disruption
significant potential for impactful application to prediction for the JET and DIII-D tokamaks with
other areas, such as hypothesis generation convolutional and recurrent neural networks
and testing, optimization and acceleration of [12]. Even with the growing use of ML methods
scientific workflows (Figure 6.1), boosting of
experimental diagnostic data interpretability,
model extraction and reduction, augmentation
of plasma control effectiveness, and data-
enhanced event and state prediction
algorithms. The DOE recently sponsored
workshops and assessments to determine
optimal approaches and priority research
opportunities for advancing fusion with ML and
AI [2,8,18]. Grand challenges, anticipated
advances, and expected outcomes discussed
herein are abstracted from these and related
assessment efforts.
Figure 6.1 Scientific discovery with ML includes approaches to
1. State of the Art bridging gaps in theoretical understanding through the identifi-
cation of missing effects using large datasets, accelerating
ML models trained on large datasets have hypothesis generation and testing, and optimizing experimental
planning to help speed progress in gaining new knowledge.
been employed in fusion energy research since
06. FUSION 65
Figure 6.2 The left two plots compare the performance of machine-specific disruption predictors on three different tokamaks
(EAST, DIII-D, C-Mod). The rightmost plot shows the output of a real-time predictor installed in the DIII-D plasma control
system, demonstrating an effective warning time of several hundred milliseconds before disruption [15].
in fusion energy science applications, very little were used to formulate the four Grand
attention has been given to uncertainty Challenges in this area.
quantification. Due to the inherent statistical
nature of ML algorithms, comparing model Maximize predictive understanding of
predictions to data is nontrivial since fusion plasmas and the burning plasma
uncertainty must be considered [19]. The state. A central challenge for the advancement
predictive capabilities of a ML model are of fusion science toward the realization of
assessed using the model response as well as fusion energy is the achievement of sufficiently
the uncertainty, and each aspect is critical to predictive understanding of confined plasmas
the combined effectiveness of real-time and and, in particular, the burning plasma state.
offline applications. While both computational theoretical and
experimental studies have produced
In addition to the rapid growth in tokamak substantial understanding of fundamental
disruption predictors, in recent years fusion plasma phenomena, significant progress
applications of ML and statistical inference to is needed to enable high confidence design of
fusion research have expanded to include operational power plants. For example, further
model reduction for code acceleration understanding of energetic particle behavior in
[14], plasma control [6], and physics tokamak burning plasmas is needed to enable
discovery [3,10]. calculation of power plant performance and first
wall impacts. Divertor function in self-heated
2. Major (Grand) Challenges tokamak plasmas must be projected to enable
design of waste heat and exhaust handling
The principal challenge in fusion energy solutions in a power plant. Much of this
research for the coming decades is to predictive understanding may still be
determine the key solutions that would undiscovered in data collected from fusion
establish the viability of a fusion power plant. experiments and produced by simulations over
The work on components of this overarching the last ~50 years. Maximizing predictive
challenge is expected to grow, developing in understanding from data, both available and
perhaps unanticipated directions with the produced in the future, will be significantly
arrival of new burning plasma experiments aided by design and application of specialized
such as ITER [1]. A recent joint Fusion Energy ML methods.
Sciences (FES)/Advanced Scientific
Computing Research (ASCR)-sponsored This challenge can be addressed further
workshop [2] identified a set of seven priority through the development of specialized
research opportunities for the application of ML infrastructure, for which requirements are
to accelerate this process. These priorities
06. FUSION 66
tightly coupled to the unique nature of fusion understanding from plasma confinement
experimental and computational resources. For experiments and simulations.
example, neither experimental nor simulation
data produced today are typically archived or Enable real-time understanding in long-
made accessible in ways appropriate for large- pulse tokamak experiments. The advent of
scale application of ML methods. The Fusion long pulse, burning plasma, large-scale
Data Machine Learning Platform [2] is international fusion experimental devices will
envisioned as a novel system for managing, drive unique needs to extract the maximum
formatting and curating fusion experimental amount of information from increasingly large
and simulation data, with the goal of and rapid real-time streams of data
dramatically improving usability of data for ML (Figure 6.4). These long pulse experimental
algorithms. Such a platform is needed to devices will provide the first examples of the
enable unified management of both unique real-time data streaming and analysis
experimental and simulation workflows for ML, requirements that will be posed by an
by supporting sufficiently rapid access to data operational fusion power plant.
from multiple experimental and computational
sources (Figure 6.3). Fusion-specialized tools Addressing this challenge will require
will be needed to enable efficient access to interpreting and reducing fusion data at the
multi-machine and simulated data, either source, as well as along the processing
centralized or distributed, and to enable pipeline. The requirements for generation of
automated generation of fusion metadata for real-time understanding and the nature of long
supervised learning. pulse tokamak data streams are significantly
unique to fusion experiments and burning
plasma devices soon to be online. As such,
they demand unique solutions and unique
specific deployments of analysis systems. The
effort will include integrating large numbers of
fusion-specific data sources (multi-code, multi-
machine, multi-diagnostic) to produce
statistically supported interpretations, quantify
uncertainties, and yield more understanding
than the sum of individual sources. In
particular, enabling federated, multi-institution
collaborations on very large scales will pose
unique problems. AI and ML methods are
expected to be instrumental in addressing this
Figure 6.3 Vision for a future Fusion Data Machine Learning
Platform that connects tokamak experiments with an advanced challenge by providing methods for managing
storage and data streaming infrastructure that is immediately the increased data scales and unique fusion
queryable and enables efficient processing by ML/AI data types, as well as fusion-specific tools for
algorithms.
enhancing interpretability.
Key goals in this area for the next 10 to
Key goals in this area for the next 10 to
15 years include the deployment of an effective
15 years include development of AI methods
Fusion Data Machine Learning Platform,
that will enable: a) in situ, in-memory analysis
characterized by extensive integration into the
and reduction of extreme-scale simulation data
U.S. and international fusion workflow, and
as part of a federated, multi-institutional
development of the relevant enabling
workflow, and b) ingestion into the new Fusion
algorithmic and computer science solutions
Data Machine Learning Platform and analysis
specific to maximizing fusion plasma predictive
of extreme-scale fusion experimental data
06. FUSION 67
Figure 6.4 The shot cycle in tokamak experiments includes many diagnostic data handling and analysis steps that could be
enhanced or enabled by ML methods. These processes include interpretation of profile data, interpretation of fluctuation
spectra, determination of particle and energy balances, and mapping of MHD stability throughout the discharge.
for real- or near-real-time collaborative used successfully in fusion research [3,10], and
experimental research. is expected to play an increasingly important
role in managing uncertainties and knowledge
Develop models that bridge gaps in fusion gaps in the coming era of long pulse burning
plasma confinement and stability plasma experiments.
prediction. Fusion energy science is
significantly challenged by existing gaps and Key goals in this area for the next 10 to
uncertainties in the understanding of fusion- 15 years include the development of
specific plasma physics, coupled with the interpretable ML methods and model extraction
increasing importance of simulations and and reduction techniques that will help guide
analyses in closing these gaps. For example, future experimental campaigns and help close
while great strides have been made in gaps in the understanding of physics. Hybrid or
modeling plasma phenomena that contribute to other ML-informed models will be developed to
energy and particle transport in a tokamak, enable sufficient predictability with quantified
sufficient predictability has not been achieved, uncertainties for fusion plasma confinement,
and the yet-unseen burning plasma regime is instabilities, plasma-wall interaction, and other
expected to yield further new phenomena that critical physics areas.
must be represented in models. Sufficient
predictability of crucial performance-limiting Establish the plasma prediction and control
and potentially disruptive instabilities such as solutions for sustained fusion power plant
tearing modes in tokamaks must also be operation. A viable tokamak-based fusion
achieved to enable operational scenarios and power plant must have high-reliability, high-
control for a reliable power plant. performance plasma control to ensure very low
rates of operational interruption and system
ML offers techniques that can combine failure. Both control physics and control
theoretical and data-driven models in hybrid algorithm mathematics requirements for fusion
systems that better represent the underlying plasma control are uniquely challenging due to
dynamics specific to such fusion plasma their extreme nonlinearity, degree of
phenomena. This approach has already been multiphysics overlaps, resource limitations,
06. FUSION 68
reliability requirements, and range of Key goals in this area for the next 10 to
bandwidths involved. A key requirement is 15 years include the identification of areas of
therefore to use data-driven methods to fusion plasma control research that will most
contribute to control-level modeling, manage- significantly benefit from ML/AI-augmented
ment and interpretation of real-time data control algorithms, including data-driven
for control, optimal trajectory determination, methods that enable the prediction of key
and real-time prediction to support continuous plasma phenomena and plant system states,
and asynchronous actions and prevent allowing critical real-time and offline health
faults (Figure 6.5). monitoring and fault prediction. Mathematical
approaches must be developed for quantifying
the uncertainty of the data-driven fusion
plasma models identified and the reliability of
corresponding plasma control algorithms.
Methods must be developed and qualified for
extracting the required level of real-time control
knowledge from limited diagnostics in a fusion
power plant environment, while accomplishing
the required level of control authority from
limited actuators.
Addressing these four grand challenges for the

application of statistical inference, AI, and ML
Figure 6.5 The ITER Plasma Control System (PCS) Forecast- methods to fusion research will contribute
ing System will include functions to predict plasma evolution significantly to accelerating the development of
under planned control, plant system health and certain classes
of impending faults, as well as real-time and projected plasma
solutions to many key problems on the path to
stability/controllability, including likelihood of pre-disruptive and fusion energy.
disruptive conditions. Many or all of these functions will be
aided or enabled by application of ML methods.
3. Advances in the Next Decade
Use of data-driven methods in control modeling Presently operating fusion experimental
and algorithm design always poses a challenge facilities will make significant advances in
to operational application, due to the difficulty diagnostics, actuators, and accessible regimes
of quantifying uncertainty and reliability of in the coming decade, which will have equally
performance with such approaches. The significant impact on the data available for
challenges specific to fusion are particularly AI/ML applications (see Chapter 10, AI
demanding of advances in scientific Foundations and Open Problems).
understanding, as well as mathematical control
theorems, due to the combination of The advent of exascale high performance com-
multiphysics and range of bandwidth and plant puting resources will provide a revolution in
integration scales. These characteristics processing capabilities, enabling a similar leap
dramatically amplify the fundamental challenge forward in the effectiveness of large-scale data-
of operating the most complex, control- driven algorithms (see Chapter 16, Facilities
intensive power plant ever envisioned by Integration and AI Ecosystem).
mankind reliably for months at a time with
extremely limited sensor and actuator The most significant impact in the coming
resources (compared with present-day decade to the application of AI/ML methods to
fusion devices). fusion problems is expected to be the
availability of data from several key
experimental facilities. ITER, the world’s first
06. FUSION 69
burning plasma experiment, will provide unique easy use by others. The Fusion Data Machine
opportunities to study self-heated plasmas on a Learning Platform is envisioned as a step
size and power scale relevant to a fusion toward solving these problems (see
power plant. JT-60SA [9], the largest long Chapter 12, Data Life Cycle and Infrastructure).
pulse superconducting tokamak in the world
(until ITER operates), will explore advanced Despite these gaps, we believe a research
tokamak regimes not accessible by ITER. Data direction with the potentially highest payoff may
from these devices will provide extensive, be the integration of our knowledge of physics
novel groundwork for application of AI/ML into ML models. Most existing AI/ML models
techniques that maximize the information and are either purely data-driven or incorporate
understanding extracted. The amount and very simple physical laws and constraints.
quality of these data will help better validate Without building the structure of physical laws
key components of plasma physics codes and into ML methods, it is difficult to interpret the
reveal gaps in the understanding of the physics predictions from data-driven models.
behind the models, thus suggesting
improvements to the implementation of codes 5. Expected Outcomes
as well as the theory.
Application of AI/ML methods to fusion energy
The deployment of a Fusion Machine Learning research will accelerate progress toward
Data Platform could in itself prove a trans- realization of a commercial fusion power plant.
formational advance, dramatically increasing It is very possible that the new capabilities
the ability of fusion science, mathematics, and offered will actually enable practical solution of
computer science communities to combine problems not otherwise tractable even on a
their areas of expertise in accelerating the timescale of decades without use of data-
solution of fusion energy problems. driven methods.
4. Accelerating Development Fusion energy offers an essentially infinite

energy source with minimal environmental
The introduction of ML and AI into the scientific impacts, and high power density compatible
process for hypothesis generation and the with siting near high-demand population
design of experiments promises to significantly centers. Large-scale deployment of
accelerate the scientific process by automating economically viable fusion power plants on
and accelerating the development of models worldwide grids have the potential to minimize
and the testing of hypotheses (see Chapter 10, the impacts of climate change and address the
AI Foundations and Open Problems). energy demands of the coming centuries.
Fusion remains the only known energy option
Perhaps the biggest obstacle in applying data with virtually unlimited scalability to match
science to hypothesis generation and growth in demand.
experimental design is the availability of data
and its lack of uniformity. In fusion, The long-term impacts of solving the relevant
experimental data is limited by available scientific challenges and achieving routine and
diagnostics, experiments that cannot be widespread deployment of fusion power plants
reproduced at a sufficient frequency, and a lack are potentially transformational, for society as a
of infrastructure and policies to easily share whole, and for the enterprise of science in
data. Furthermore, even with access to the particular.
existing data, there is still the obstacle that
these data have not been properly curated for
06. FUSION 70
6. References 11. Hill, D.N., et al., “DIII-D Research Towards
Resolving Key Issues for ITER and Steady
1. Gribov, Y., et al., “ITER Physics Basis,” State Tokamaks,” Nuclear Fusion, 53
Nuclear Fusion, 47 (2007). (2013).
2. Report of the Workshop on Advancing 12. Kates-Harbeck, J., Svyatkovskiy, A., Tang,
Fusion with Machine Learning April 30 – W., “Predicting Disruptive Instabilities in
May 2, 2019. https://science.osti. Controlled Fusion Plasmas Through Deep
gov/-/media/fes/pdf/workshop-reports/FES_ Learning,” Nature, 568 (2019).
ASCR_Machine_Learning_Report.pdf 13. Li, J., et al, “A Long-Pulse High
3. Baltz, E. A., et al., “Achievement of Confinement Plasma Regime in the
Sustained Net Plasma Heating in a Fusion Experimental Advanced Superconducting
Experiment with the Optometrist Algorithm,” Tokamak,” Nature Physics, 9 (2013).
Nature Scientific Reports, 7 (2017). 14. Meneghini, O., et al., “Self-Consistent Core-
doi:10.1038/s41598-017-06645-7 Pedestal Transport Simulations With Neural
4. Bock, A., et al., “Advanced Tokamak Network Accelerated Models,” Nuclear
Investigations in Full-Tungsten ASDEX Fusion, 57 (2017).
Upgrade,” Physics of Plasmas, 25 (2018). 15. Montes, K. J., et al., “Machine Learning for
5. Bonoli, P. T., et al., ”Lower Hybrid Current Disruption Warning on Alcator C-Mod, DIII-
Drive Experiments on Alcator C-Mod: D, and EAST,” Nuclear Fusion, 59 (2019).
Comparison with Theory and Simulation,” 16. Rea, C., et al., “Disruption Prediction
Physics of Plasmas, 15 (2008). Investigations using machine learning tools
6. Boyer, M. D., Kaye, S., Erickson, K. “Real- on DIII-D and Alcator C-Mod,” Plasma
Time Capable Modeling of Neutral Beam Physics and Controlled Fusion, 60 (2018).
Injection on NSTX-U Using Neural 17. Rebut, P-H., “The Joint European Torus
Networks,” Nuclear Fusion, 59 (2019). (JET),” European Physical Journal, 43
7. Cannas, B., Cau, F., Fanni, A., Sonato, P., (2018).
Zedda, M.K., and JET-EFDA Contributors, 18. Baker, N., et al. Workshop Report on Basic
“Automatic Disruption Classification at JET: Research Needs for Scientific Machine
Comparison of Different Pattern Learning: Core Technologies for Artificial
Recognition Techniques,” Nuclear Fusion, Intelligence. doi:10.2172/1478744 (2019).
46 (2006).
19. Smith, R. C., “Uncertainty Quantification:
8. Maingi, R., et al., “Summary of the FESAC Theory, Implementation, and Applications,”
Transformative Enabling Capabilities Panel SIAM, Philadelphia (2014)
Report,” Fusion Science and Technology,
75 (2019). 20. Windsor, C. G., Pautasso, G., Tichmann,
C., Buttery, R. J., Hender, T. C., JET EFDA
9. Giruzzi, G., et al., “Physics and Operation Contributors and the ASDEX-UG team, “A
Oriented Activities in Preparation of the JT- Cross-Tokamak Neural Network Disruption
60SA Tokamak Exploitation,” Nuclear Predictor for the JET and ASDEX Upgrade
Fusion, 57 (2017). Tokamaks,” Nuclear Fusion, 45 (2005).
10. Gopalaswamy, V., et al., “Tripled Yield in 21. Wroblewski, D., Jahns, G. L., Leuer, J. A.,
Direct-Drive Laser Fusion through “Tokamak Disruption Alarm Based on a
Statistical Modelling,” Nature, 565 (2019). Neural Network Model of the High-Beta
Limit,” Nuclear Fusion, 37 (1997).
06. FUSION 71
72
07. Engineering and Manufacturing
Over the last decade, advances in Manufacturers of smaller batches, and ones
technologies, such as sensors, networks, and who produce many different variants of similar
control systems, along with the rise of data designs for consumers who want a customized
analytics and artificial intelligence (AI) product, need robots on the assembly line to
approaches, such as machine learning (ML), perform tasks autonomously rather than
have led to increasing discussion of holistic automatically. Typical automation is not
approaches to manufacturing and engineering profitable at this level, and this is often referred
(see Chapter 15, AI at the Edge). Terms such to as the “Batch Size 1” or “Order of One”
as “smart manufacturing,” “the Internet of problem. What is meant by “autonomous” is
things,” and “digital twins” are used to refer to that the robots are not reprogrammed step-by-
these types of transformational approaches, step to complete the new assembly; rather,
with the concept of optimization expanding to they independently learn how to optimally
include an entire lifespan, from raw materials to assemble one variant or another. Basically, the
shape/topology to manufacturing process to robots are provided the fundamentals to learn
end use. how to assemble on their own.
The future of manufacturing hinges on the Siemens Corporate Technology has managed
ability to bring new ideas and custom products to solve this problem for some simple
to market faster than ever before while assemblies [1]. They have done this by
reducing cost, energy use, and waste products. semantically converting the parts and process
A major effort is under way to use distributed information into ontologies and knowledge
manufacturing and products designed for a graphs, thereby converting implicit information
circular economy to shrink the supply chain to into explicit. Previously, the robots had to be
the benefit of local communities. Obstacles taught through code, but now the robots
include: disruptions in the supply chain due to analyze the CAD drawings and find
natural disasters; changing economic costs the corresponding solution to assembly
(tariffs, transportation costs, etc.) or new (Figure 7.1). An added benefit is that the robots
regulations; inability to optimally utilize differing are also able to correct some faults without
raw materials; appropriate data collection; having this option explicitly instructed
weakness in altering processes in real time; beforehand. If a part slips and falls or is
and cybersecurity threats, among others. The needed on the other side of the assembly, one
goal is to overcome these obstacles in an robotic arm can stop and pick it up or pass it off
optimal way to the benefit of the manufacturer, to its partner and the assembly can continue
consumer, and environment. on unimpeded.
1. State of the Art

AI has yet to have a major impact in
manufacturing and engineering, but in the
handful of examples provided here one can
easily see the potential it has to change
industry. To date, some of the initial forays into
using AI have focused on smart manufacturing Figure 7.1 This Siemens two-armed robot uses AI to interpret
(improving efficiency and reducing waste), CAD instructions and assemble parts.
generative design, and autonomous
robotic assembly. In Korea, Siemens’ largest IT provider, LG
CNS, works across a variety of industries using
07. ENGINEERING AND MANUFACTURING 73

its cloud-based smart factory service. This can be impacted is much broader than AM, it
service helps manufacturers automate will serve as our exemplar.
production and keep track of efficiency
throughout the entire process. Collecting data AM is revolutionizing manufacturing, allowing
over large swaths of production history and construction of complex parts not readily
placing it into easily accessible databases, fabricated by traditional techniques. In addition,
manufacturers have used Microsoft’s Azure AM offers the possibility of constructing
Machine Learning to predict defects before “designer materials” by adjusting process
they happen [2]. While not perfect, it greatly control variables to achieve spatially varying
minimizes costs due to delays on the physical properties. AM is a unique application
production line and the subsequent waste area due to its strategic importance to both
these defects cause. U.S. industry and federal agencies (DOE,
NNSA, DOD, NASA). Although there has been
Generative design is one of the newest areas significant interest and investment in AM, the
through which AI has had an impact on fraction of this investment devoted to modeling
manufacturing. This is a two-step iterative and simulation—not to mention data analytics,
process that, based on design goals, generates ML, and AI—is relatively small.
a number of possible outputs that meet the
specified constraints. A designer then tunes Modeling of the AM process allows both
variables in these outputs (over previously set prediction of how the AM process variables
minimal and maximal values) to reduce and (the “machine knobs”) impact the resulting
optimize potential outputs that meet the material microstructure (the forward problem)
aforementioned constraints. Generative and the ability to control the AM process to
adversarial networks are often used to drive manufacture parts with desired properties (the
the underlying optimal design. Airbus has optimization problem). Our grand challenges
employed this technique to improve the design tackle some of the most pressing problems in
of the partition that separates the passenger these areas, and the ones with the most
compartment from the galley in the Airbus potential to accelerate manufacturing.
A320 cabin (Figures 7.2 and 7.3). Their design
goals focused on a reduction in weight with We would be remiss if we failed to reference
constraints in maximal width, strength to reports from two workshops held by the
support two jump seats during takeoffs and National Academies of Sciences, Engineering,
landings, and number of airframe attachment and Medicine. The first, held in 2016, was titled
points [3]. Note that the models used to assess “Predictive Theoretical and Computational
designs are typically reduced-order surrogates, Approaches for Additive Manufacturing” [4].
and the impact of their fidelity to optimality of The second, held in 2018, was titled “Data-
the final design is an open question. Driven Modeling for Additive Manufacturing of
Metals” [5] and is of particular relevance to the
2. Major (Grand) Challenges current topic.
Since additive manufacturing (AM) is in Optimally solve the Batch Size 1 problem in
relatively early stages of development, it can additive manufacturing. The ability to quickly
simultaneously gain the greatest benefit from design a new product, optimally, without going
advanced simulation, data analytics, and AI through an expensive simulation (let alone trial
approaches and offers the greatest flexibility and error), is the path to solving the “Batch
and research resources for implementing those Size 1” problem in AM. This can be carried
ideas. So, although the spectrum of out through the creation of a high-quality
engineering and manufacturing processes that surrogate model.

Figure 7.2 This generative designed partition for the Airbus A320, with its seemingly random construction, has been
optimally designed to be both lightweight and strong.
Figure 7.3 Left: In 2010, an Airbus A380 sustained an uncontained engine rotor failure (UERF) of the No. 2 engine as it
departed from Singapore while climbing through 7,000 ft. Debris from the UERF hit the aircraft, which led to significant
structural damage. Bottom: The culprit was caused by metal fatigue in an oil feed stub pipe due to a slightly misaligned
machining that left the pipe a little thinner on one side. [Australian Transport Safety Bureau Investigation #: A0-2010-089]
Surrogate models (which includes reduced models. Improved surrogate models based on
order models, see Chapter 10, AI Foundations physics-informed AI models would enable more
and Open Problems) can play at least three extensive exploration of both design and
roles in AM: (1) a priori optimization, (2) in situ, parameter space, ultimately accelerating
real-time process control, and (3) transferability qualification of AM parts.
of AI models between different devices and/or
feedstocks—heterogeneous manufacturing. The second role would have even more impact,
The first role encompasses both design and but is also significantly more difficult. It requires
process optimization. Design optimization is access to the AM control system, an extensive
the outermost loop and includes both shape array of sensors, and the ability to process data
and topology and implicitly local control of from the sensors during a build, analyze it in
microstructure and properties. Although there real time, and determine whether and how to
has been significant research over the last few alter any process control parameters. Since
years in shape and topology optimization, it this would have to happen in a matter of
typically relies on extremely simplistic physical seconds (between layers of a build), the data
models. The extent to which model fidelity processing and analysis requirements are
impacts “optimal” design is unknown. Similarly, significant, and accurate, fast-running
process optimization (selection of parameters surrogate models are essential (see
such as beam diameter, beam power, preheat, Chapter 15, AI at the Edge).
and scan strategy) also relies on approximate

An additional twist to this entire process would process dynamics through AI will allow us to
be the ability to transfer a surrogate model design new materials and process
trained on one AM system to another of a characteristics to achieve desirable
different design, or equally from one feedstock microstructures and properties (see Chapter 1,
to another on the same AM system. Chemistry, Materials, and Nanoscience).
Heterogenous manufacturing, where different
systems build the same part to the same Along this line, a very important component to
specifications, would open up the possibility of manufacturing is a full understanding of part
distributed manufacturing at an entirely new tolerances and lifetimes. Folding this into the
level. Here a CAD drawing for a given part, entire aforementioned design process is a
along with the system configuration and major goal in the future. Parts are precisely
feedstock, would be used as inputs to the AI designed to meet specific tolerances and are
process to produce the optimal design for the engineered to function perfectly over a
system in question. specified lifetime. Failure to meet these
specifications can often produce catastrophic
Couple material design with prediction and consequences. For these reasons, many
control of the additive manufacturing potential applications of AI in manufacturing will
design process. Microstructures produced by need to be explainable (or interpretable). Parts
AM processes are very different from those designed by AI for use in the automotive,
that arise from traditional manufacturing aircraft, or medical device industry (among
processes, such as casting and forging, and many others) will need to have a full cost
these differences can lead to extremely poor accounting of how they met the design
properties (strength, ductility, etc.) and specifications and to what tolerances.
unsatisfactory performance (likelihood of
fatigue/cracking, lifetime, etc.). The good news Securely aggregate data across the
is that the microstructures produced are manufacturing industry. A key challenge in
strongly dependent on process parameters and maximizing our knowledge in the manufactur-
conditions such as the beam power, scan ing process, and doing it robustly, is the
speed, scan pattern, etc., and on geometry. collection of vast quantities of data across
many different systems. This is hindered by the
It should be noted that the microstructures fact that companies do not want to share this
produced by AM can be better or worse than data for fear of losing their IP and subsequent
those produced by traditional processes. Part competitive advantage in the field. A path
of the difference is due to the unpredictability of toward solving this challenge can be found in
response to process parameters. But another federated learning. This is an ML technique
important factor is that most alloys were where the goal is to train a high-quality,
designed for traditional processes, and we centralized model in which the training data
cannot expect them to respond in the same remains distributed over a large number of
way when manufactured using AM. Most of the clients. For every iteration during the learning,
alloys we use today were invented years or each client independently computes an update
decades ago. The best high-performance to the current model based on its own data and
aluminum alloy for pistons was invented after then pushes this update to a central server,
World War II and hasn’t been improved upon where it is aggregated to compute a new
for more than 70 years! We have to relearn globally optimized model [6]. Through secure,
how to innovate in metallurgy and federated learning, it is now possible with
manufacturing to be successful. Coupling an several AI applications to train the models
understanding of the fundamental materials without exposing the underlying data, even
science with prediction and control of the when attacked by a variety of adversaries [7].

Develop data and design tools for steep learning curve. The bootstrapping of
manufacturing in a circular economy. By industrial machines, with the ability to perform
2050, the world’s population will likely pass 10 optimization under uncertainty and constraints,
billion. As the Earth’s raw materials are not will require a range of new method
limitless, and global labor and the costs for development to respond to streaming data from
these materials are on the rise, new solutions machines and the fusion of complex datasets
are needed to mitigate this emerging in real time. Here there is an additional need to
challenge. Circular economy business provide robust safeguards such as physics
opportunities are one way manufacturing can constraints and strong cybersecurity. This can
grow and diversify under these pressures. In a alter the trajectory of research in control
circular economy, materials, and the resultant systems for many domains where we need a
products, keep circulating in a high- value state cost-effective way to transform the legacy
of use, through supply chains, for as long as machines into smart machines. Ultimately, from
possible. The key challenge of transforming the large machines to connected products in the
current manufacturing ecosystem is providing hands of consumers like smart phones, the
design tools to develop products that are easy cradle-to-grave awareness of the system state
to remanufacture, recycle, or capture critical will create a new engineering ecosystem.
materials for reuse. Many locally sourced and
sustainable materials are hard to integrate in The four cases described above have
product design and prototyping. The linear commonality in the need for the integration of
economy depends on mines and materials datasets across different engineering functions
scale-up facilities across the globe for in an ecosystem of smart machines and smart
dependable supply. The challenging aspect of products. This will enable lower cost, energy-
manufacturing for a circular economy is the efficient operations, and, ultimately, a
ability to identify and optimize a supply chain sustainable products and consumption
using massive amounts of customer data on economy with responsible use of resources.
products currently in use, and incentivizing
consumers to engage in the supply chain. New 3. Advances in the Next Decade
models need to be developed for tagging
products nearing end-of-life and they need to Building an integrated software environment for
use AI to optimize supply chain models to manufacturing data and AI will provide
reduce fluctuations and disruptions. researchers with the ability to merge data, ML
models, computer vision, simulations, and
Transition to smart engineering of knowledge to accelerate the state-of-the-art in
products, services, and operations. Finally, manufacturing. Leadership from the national
engineering functions across the value chain labs can improve the edge-to-exascale
need to evolve to integrate AI to create better infrastructure needed to advance response
products as well as the next wave of products. time. For manufacturing, the ability to provide
There are four major tasks here: (1) AI to real-time quality control for products such as
reduce cost and accelerate time-to-market; (2) advanced batteries, complex parts, electronics,
AI for optimization under uncertainty and and sensors will be unique in the nation.
constraints; (3) AI for real-time control and
steering, and (4) AI for cradle-to-grave system The future of manufacturing is closely linked to
state awareness. advances in intelligent cyber-physical systems
for bringing new ideas and custom products to
The challenges here are large. From jet market faster than ever before. In addition,
engines to consumer products, designers will enabling creative but market-conscious design
need access to innovative tools and accessible where one includes constraints based on cost,
computing services to partner with AI without a quality, lifetime, aesthetics, manufacturability,

recyclability, and supply-chain logistics will be More research in secure, federated learning
critical for gaining an edge in manufacturing. will be needed to not only accelerate AI’s
Given the sheer complexity of this system, an adoption by more users, but to know which AI
AI-driven design process is a natural solution methods can remain secure. A public-private
for this optimizing problem. partnership supporting the creation of such a
hub for training and testing accessible AI tools
The response time and data needs can will increase industry’s access to the expertise
skyrocket if decisions are not made locally and in the national labs and academia. This will
models for the manufacturing process are not allow industries in the U.S., struggling to take
trained in smart ways. The need for software- advantage of AI for improving business
defined sensors and edge computing is efficiency for engineering services and
paramount for making progress in improving AI manufacturing, to leverage these new
models for a digital and custom manufacturing capabilities and know that their data is secure.
future. It will also be likely that a new Another major challenge for applying AI to
generation of low-cost imaging and metrology manufacturing is that the data is both noisy and
will need to be developed. expensive to collect. We have to efficiently
design experiments to collect the most valuable
DOE recently launched the ReCell Center to data, design high-quality characterization and
capture critical materials such as cobalt from sensing modalities to get clean, pedigreed
electric vehicle (EV) batteries. EV batteries that data, and then properly label that data. In
are 100% recyclable will be needed to meet the addition, there is significant need for curated,
demand of a rapidly growing market, with few publicly accessible datasets—both experi-
suppliers for critical materials for mobility. mental and simulation. Efforts such as the
Application of similar concepts to consumer NIST Additive Manufacturing Benchmark Test
electronics can be transformational for design Series (AM-Bench) is a huge step in the right
and reduction of e-waste and are estimated direction, but these efforts need more
to unlock $90 billion economic value by widespread support with an additional focus on
2030 [8,9]. AI can be used to connect community data formats.
manufacturing processes to adjust to changes
in supply, develop intelligent process Finally, AM systems themselves need to be
optimization to increase efficiency, and allow more open. APIs for the control systems should
continuous improvement of products for a be available to researchers to explore more
circular economy. advanced, real-time analysis, feedback, and
process control, all of which is necessary
4. Accelerating Development before AI can be effectively deployed on these
systems.
The main bottleneck in a competitive labor
market is to develop a fully functional model for What must we do to accelerate
integrating AI all across the design, engi- development?
neering, supply-chain management, resource
planning, and manufacturing sectors. A good a) Automate the entire learning pipeline: The
way to accelerate the transition is to allow goal is for the machines to be intelligent
development and testing of applications in a and learn the production process with full
secured environment. The ideal framework will awareness of the intent of the designer
be to build pre-competitive tools and and certification/qualification goals (see
benchmarks for manufacturing with strong Chapter 9, AI for Computer Science).
adherence to standards and safe keeping of
proprietary data.

b) Determine the best AI techniques for Running hundreds or thousands of higher
developing surrogate models for the fidelity physics models to develop the
manufacturing process. appropriate training data, understand bounds,
etc., will be a major challenge. Efforts to reduce
c) Combine sensor modalities and incorporate
this cost, or enable the transfer of trained
data from a fleet of machines instead of a
models from one system setup to another, will
single machine, as the data from a single
likely be critical to the success of this effort.
machine is prone to variability and often
provides poor statistics for creating
Equally important will be the confrontation of
an intelligent learning environment
data, which is currently limited in nature,
(see Chapter 10, AI Foundations and
quality, and size, to these models to constrain
Open Problems).
parameter space and perform effective
d) Develop an open framework with optimization. Major efforts need to be made in
standardization data formats and plug-and- increasing data collection, determining which
play module capabilities, but designed data needs to be collected, and improving the
with protection through secure, quality of the data, as well as designing the
federated learning to make use of exact data formats to be optimized for AI training.
design specifications while maintaining
proprietary knowledge. 5. Expected Outcomes
e) Provide access to curated datasets so that
The domains of engineering and manufacturing
we can accelerate reinforcement learning
span a large portion of the U.S. Gross
across the manufacturing industry.
Domestic Product (GDP) and investment in
f) Incorporate physics-informed AI to reduce R&D. While U.S. consumption is high, much of
simulation and data requirements in training the manufacturing is currently done outside the
(see Chapter 10, AI Foundations and U.S. The supremacy of products and the ability
Open Problems). to compete in a global marketplace can be
accelerated through technological leadership
What are the top priorities? and dominance in AI in manufacturing. Use of
AI will become the primary way in which future
• Couple current mod-sim efforts in workforces can participate in a distributed
manufacturing with a variety of AI-generated manufacturing ecosystem where design,
surrogate models to determine which are the supply-chain management, prototyping, and
most robust and least biased. production will be managed by people with
• Determine which AI techniques are diverse skill sets, and distributed
amenable to secure, federated learning. geographically, but they will be connected by a
digital manufacturing backbone with strong
• Design experiments to determine which data integration of training data, accessible
is needed, and of what quality, in a few knowledge, and AI-enabled tools. This will
exemplar manufacturing processes to allow entrepreneurs and small businesses to
improve design optimization through AI. successfully compete on the world stage.
How do we improve scale? 6. References

Several of the potential methods one might
employ for the creation of surrogate models, for 1. Zistl, S. “The Future of Manufacturing:
both design and optimization, will require a Prototype Robot Solves Problems without
major scale-up effort. These simulations are Programming,” Seimens.com Global
already pushing toward the exascale level. Website.

2. Microsoft UK Enterprise Team. “Better, Washington, D.C.: National Academies
faster, more efficient: AI meets Press, 2019. DOI: 10.17226/25481.
manufacturing.” Microsoft Industry Blog – 6. Bonawitz, K., et al., Practical Secure
United Kingdom (6 June, 2018). Aggregation for Privacy-Preserving
3. “Airbus: Reimagining the future of air Machine Learning. Proceedings of the 2017
travel.” Autodesk Website. ACM SIGSAC Conference on Computer
and Communications Security. 1175-1191.
4. U.S. National Committee on Theoretical
Oct 30-Nov 3, Dallas, TX, 2017.
and Applied Mechanics, Board on
International Scientific Organizations, Policy 7. Kasiviswanathan, S. P., et al. What Can We
and Global Affairs, and National Academies Learn Privately? The 49th Annual IEEE
of Sciences, Engineering, and Medicine, Symposium on Foundations of Computer
Predictive Theoretical and Computational Science. 531-540. Oct. 25-28, Philadelphia,
Approaches for Additive Manufacturing: PA (2008).
Proceedings of a Workshop. Washington, 8. Balde, C. P., et al. The Global E-waste
D.C.: National Academies Press, 2016. Monitor 2017: Quantities, Flows, and
DOI: 10.17226/23646. Resources (Bonn, Geneva, and Vienna:
5. Board on Mathematical Sciences and United Nations University, International
Analytics, National Materials and Telecommunication Union, and
Manufacturing Board, Division on International Solid Waste Association,
Engineering and Physical Sciences, and 2017).
National Academies of Sciences, 9. Ellen MacArthur Foundation, Circular
Engineering, and Medicine, Data-Driven Consumer Electronics: An Initial
Modeling for Additive Manufacturing of Exploration, 2018.
Metals: Proceedings of a Workshop.

08. Smart Energy Infrastructure
Cities, local governments, and communities are at the edge, stationary electrical energy
trying to manage growth and build for storage has played an important role in the
resilience, while much of America’s aging U.S. electricity system. Energy storage
energy infrastructure—the electrical grid solutions currently deployed in the grid, while
and gas pipelines, as well as many buildings developed to smooth out peaks or support
and transportation systems—needs to be intra-day shifts in energy consumption patterns,
repaired or replaced. can also be used to integrate electricity from
intermittent renewables. Additionally, urban
Resilience is a primary concern for the energy planners have an increasing need for better
infrastructure. It entails the ability to recover tools to plan and improve their overburdened
rapidly, with minimal interruptions and damage transportation infrastructure and co-optimize
to infrastructure and consumers, when with electrical infrastructure operators to be
submitted to external stresses such as extreme ready for future demands, including connected,
weather, unexpected outages, or malicious mixed autonomous, shared, and electrified
attacks. Such problems dominated the news in vehicle fleets.
the cases of Puerto Rico’s power system [1]
being crippled by Hurricane Maria in 2017 (the A smart energy infrastructure that meets
largest blackout in U.S. history) and the energy demands at multi-spatial and temporal
destructive 2018 Camp Fire in California (the scales and operates in an intelligent manner to
deadliest U.S. fire in the past 100 years), which achieve energy efficiency, flexibility, and
was likely started by power lines built in the resilience is needed to help local, state, and
early 1900s. In 2003, the Northeast blackout [2] federal governments achieve their energy,
left 55 million people in the U.S. and Canada economic, and environmental goals. Artificial
without power for up to 14 days. intelligence can contribute to meeting
these objectives.
Reliable delivery of electricity requires an
instantaneous and continuous balance 1. State of the Art
between supply and demand at multiple scales
[3]. However, a number of factors are adding Novel opportunities for AI in this area stem
increasing uncertainty to the situation, including from the rapid deployment of connected
intermittent renewable energy sources (e.g., devices in the energy infrastructure. Smart
solar, wind), more dynamic and unpredictable energy systems comprise interconnected
demand from buildings, increasing use of systems of buildings, urban microclimates,
electric vehicles and evolving charging vehicles, power and water supplies, and
patterns, and deployment of decentralized humans [6]. Urban-scale smart energy
power generation/storage facilities [4]. This infrastructure research offers insights into
poses a significant challenge for wide-area efficiency, sustainability, and resilience,
coordinated operation of the nation’s power leveraging emerging opportunities in the
grid. These challenges also stem from a lack of Internet of Things (IoT), big data, machine
flexibility by traditional generation facilities, learning, and exascale computing [7]. Modern
such as coal-fired and nuclear power plants, to infrastructure and technologies applied to
accommodate rapid changes in the supply and urban systems include wide-area monitoring,
demand balance. Moreover, impacts of long- distributed control, advanced communication
term climate change and short-term extreme systems, and varying levels of AI at the edge.
weather on the energy infrastructure are IoT has become a critical part of the daily
intensifying [5]. As the grid continues to evolve operation in smart buildings, mobility, and the
08. SMART ENERGY INFRASTRUCTURE 81

electric grid, with deployments happening at analyzing human mobility patterns. These
the city scale. On the transportation side, new efforts are limited to the size and quality of
data streams from infrastructure sensors and available data, certain energy end uses or
geospatial positioning devices paint a noisy, single buildings, and single or simplified
complex picture of the demands on the objective functions. For example, the CityBES
transportation system. As a result, large [8] tool developed by DOE researchers allows
volumes of data are flowing into cities for energy-saving retrofit analysis of hundreds
and communities. of buildings in a model that considers how the
buildings interact (Figure 8.1) [9]. Further,
Technologies developed for the IoT and smart evaluating the impacts of climate change,
devices offer an unprecedented opportunity to extreme weather, changing energy usage
observe and reliably operate the electrical patterns in buildings, and interactions
power system through dynamic control of of transportation and electrical grid are
demand. IoT devices and technologies have still missing.
effectively provided a “software interface” to
energy generation, consumption, monitoring, Traditionally, resilience (and, consequently,
and control assets that drive the electrical grid, reliability over longer time scales) has been
enabling an unprecedented opportunity to assured by capital- and staffing-intensive
federate a large number of heterogeneous activity and massive, repeated offline and
devices for performing a decentralized, online analyses both before and after a major
coordinated control toward a next-generation event has occurred. The key “before event”
smart energy infrastructure. Similarly, system metrics that can prevent or reduce the
distributed control of vehicles presents a huge severity of an event’s impact have been safety
challenge for cities, where currently multiple margins, redundancy-by-design capacity
agencies operate independently with different (including spare generation and infrastructure),
views of the system and different objectives and intense overall situational awareness that
to optimize. includes multiple layers of sensors, external
data streams (such as weather forecasts), and
To design control algorithms that fulfill this state/load estimators. The “after event” metrics
potential while accounting for the large have been mostly qualitative indicators of
numbers of small but now visible and possibly readiness of the local utilities that may be
controllable loads, it is necessary to have involved in restoration, such as how many
scalable, data-driven models and training exercises they have participated in and
understanding that represent the primary of what type. Since existing data of major event
elements of generation and transmission, outcomes are exceedingly rare, most of the
distribution, and the interaction with the primary before-event key metric evaluations and
features of smart buildings. Despite these selection are done by means of synthetic data
opportunities, and particularly the data deluge generated by simulation with standard physics
from these devices, AI has been used in a and are optimization-based (to emulate the
limited way when it comes to energy financially driven decision process that takes
infrastructure. Typically, various ML techniques place outside emergency situations). For
were applied to individual buildings or their system state/load estimation, the models are
energy systems, such as virtual sensing (e.g., very crude, typically classical time series
data-driven models to estimate operational models with very simple and often inaccurate
parameters), prediction of thermal and models that include a large amount of
electrical loads, modeling of building energy coarsening and aggregation. The after-event
systems and human-building interactions, readiness factors are done mostly in an expert-
detection and diagnosis system operational estimated-mode rather than in any form of
faults, optimization of control systems, and predictive fashion.

Figure 8.1 Screenshot from CityBES, a city building energy modeling and analysis tool that considers how the buildings interact
and allows for large-scale energy retrofit analysis.
2. Major (Grand) Challenges Wide-area situational awareness to enable

energy resilience. Maintaining and increasing
The overarching challenge of the energy sector the resilience levels in the presence of
is providing safe, secure, cost-effective, and increased variability of supply, reduced inertia,
clean energy. In the presence of the rapid and novel edge-network structure (such as
change drivers of increased variability of increased use of microgrids that present the
supply, reduced inertia, and significantly opportunity of increasingly independent
increased end-user complexity, this requires operation, but also the challenge of
vastly superior situational awareness and novel coordination) require much better
tools for rapidly estimating and optimizing the understanding and prediction of system
system resilience. Moreover, the multi- variability and state. This will enable predictive
stakeholder nature of the energy infrastructure capabilities of after-event states and sharper
operation, combined with the enormous awareness during the restoration process.
representation complexity (in the context of Smart energy infrastructure has the potential to
distributed urban assets) requires the develop- leverage a variety of disparate data sources
ment of scalable virtualized computational with varying spatial and temporal resolution to
energy infrastructure models (commonly enable AI-driven real-time intelligence for
referred to as digital twins) of multiple optimal situational awareness. In particular, AI
interdependent energy and consumption can: a) perform information fusion from
infrastructures. These models can serve as disparate data sources coupled with an
support for defining the stakeholder interfaces, integrated model of the energy infrastructure at
help promote optimal usage policies, and a time scale pertinent to enabling proactive
enable exploration of scenarios for optimized responses to improve resilience, b) enable
energy operation and sustainable planning. predictive models for exploring smart energy
and transportation infrastructure design, and c)
Three key grand challenges have emerged for detect and diagnose cyber and physical attacks
application AI in improving the resilience of the and threats in real time to ensure security of
next-generation energy infrastructure as data the energy infrastructure.
becomes more pervasive. These grand
challenges are:

Reliable integration of renewable energy (911 and 511), and modern probe data traffic
into existing infrastructure using distrib- feed from third parties. Starting with the
uted optimal control to balance supply and detection of threats/disruptions, the next
demand. Achieving energy reliability while on frontier is to anticipate and mitigate the adverse
the path to clean, sustainable, secure, and effects faster than real time of future system
affordable energy sources requires high- states. AI provides solutions to urban-scale
resolution (spatial and temporal), wide-area challenges, including real-time model training,
distributed control and optimization to balance focus on rare event prediction performance, the
supply and generation. Simulation and multi-stakeholder nature of the data access,
modeling, integrated with AI, provides guidance and decision rules and procedures that are
for wide-area control design. The mechanisms, affected by the complexity of the “what-ifs” that
reliability, and robustness required to deploy are combinatorial and partially graph-indexed
control actions for a real-world demonstration in nature.Opportunities for integration exist
will have to be deeply vetted with stakeholders at multiple scales, from the drivetrain to
that have operational roles in this sector. A vehicle-to-vehicle and vehicle-to-infrastructure
particularly important direction at the exchanges at city and regional scales. A
intersection of the situational awareness and feedback loop from the real world back into the
wide-area control is one of advanced, very high digital replica allows for a level of automated
resolution modeling that can represent their response to perturbations in the network.
switched dynamics without drastic This requires a deep understanding of how
simplification of the generation, delivery, and technological disruptions affect human
consumption models. Here AI can offer novel decisions and, in turn, alter demands
techniques, including surrogate models, on mobility.
closure models, and learning-driven compute
acceleration of high-fidelity models and solvers. Developing digital twins of urban systems is a
This research will build on current DOE means to address these challenges and
programs, such as the Grid Modernization provides insights or solutions using AI
Initiative, as well as the Building Technologies methods in:
Office’s research to develop grid interactive
efficient building technologies [10], and the 1. Understanding and quantifying the inter-
Vehicle Technologies Office’s Energy Efficient dependencies between buildings, urban
Mobility Systems [11] for affordable and climate, transportation, and the grid at
safe mobility. multi-spatial (from city block to district to
neighborhood to a city to a region) and
Fully virtualized urban-scale infrastructure temporal (from minute to hour to day to
to co-optimize urban mobility and energy month to year to decade) scales under
end-use. Creating digital replicas of urban typical or extreme/disrupted situations.
mobility infrastructure over a geographic area 2. Developing strategic pathways to address
that is coupled with energy infrastructures grid needs beyond the daily cycling and
allows us to understand, predict, and co- provide backup power for several days that
optimize efficiency of energy and mobility could enhance resiliency by integration of
infrastructure. A geospatial visualization of long-duration distributed energy storage
deployed sensors feeding in real-time data, as systems coupled with renewables.
well as mobile sensors that capture trajectories
of vehicles and humans, enables the creation 3. Creating smart operations and controls to
of a capability to anticipate future system state integrate buildings and transportation to
and evaluate the impact of control decisions harmonize with the smart grid for maximal
faster than real time. Data sources include productivity, energy efficiency, demand
signals, sensors, safety information systems flexibility, and resilience.

4. Predicting urban systems’ dynamic poor environments. Each city will have a digital
evolution under extreme weather events twin that evolves with time, more data,
and understanding how the urban computing power, ML algorithms, and changing
landscape interacts with the microclimate. environmental and human needs. However, to
realize this vision, it is necessary to develop
5. Detecting patterns of human mobility and
engineering tools, and particularly simulation
charging needs of future autonomous EV.
models and data analytics, that can be used to
6. Informing sustainable and resilient urban understand the effects of any particular control
planning and policymaking, considering strategy and operating circumstance on the
long-term population and economic growth now-coupled, parameters of consumption,
as well as climate change and extreme generation, and power delivery performance.
weather events. To enable situational awareness and solve
resilience-oriented challenges, AI will enable a
3. Advances in the Next Decade new paradigm of emerging technology with
capabilities to:
More data (static and dynamic, measured and
simulated, physical and human) at the scales • Expand co-simulation tools to include
of peta- to exabytes from diverse sources will generation, transmission, communication,
become available and will be integrated into and increasingly accurate description of the
open and interoperable platforms to power the distribution and behind-the-meter areas for
digital twins of smart energy infrastructure. enabling fine-grained understanding of
Hardware for edge computing enables system behavior.
migrating the low-level “twin” to the edge for
better responsiveness and uninterrupted • Deliver increased levels of flexibility and
operations at local devices, which feed control at the lower levels of the system
information for a higher-level “twin” that hierarchy, such as the introduction and
implements predictive analytics more coupling of microgrids and smart energy
efficiently. Federated instrumentation can management systems with the ability of
enable novel softwarization of energy devices islanding in the presence of rapidly evolving
to enable scalable information fusion and threats, e.g., fires (see Chapter 15, AI at
decentralized control of assets in a reliable the Edge).
fashion. Supercomputing empowered with AI • Leverage data from rapidly expanding
engines will model and simulate smart energy networks of sensors, such as advanced
and transportation infrastructure systems as a meters, phasor measurement units (PMUs),
cyber-physical and natural-human combined F-NET, and other sensing and actuation
system capturing realistic behaviors within the technologies, to revolutionize monitoring and
digital twin. Real-time 3D GIS-integrated control of power grid (see Chapter 12, Data
visualization, coupled with virtual reality and Life Cycle and Infrastructure).
augmented reality in the digital twin, reveals
real-time performance of urban systems and • Generate HPC-powered simulators of multi-
pinpoints hotspots (e.g., energy, heat, air mode mobility of mixed autonomous,
pollution, traffic, population, wind). Using it to electrified vehicles and their interaction with
create a virtual replica of reality would help in the power grid.
building future cities to meet challenges such • Integrate multi-physics data sources, with a
as extreme weather events and housing and particular focus on weather measurements
transport needs. and higher resolution weather forecasts
(see Chapter 2, Earth and Environmental
Techniques and knowledge can transfer ML Sciences).
results from data-rich environments to data-

4. Accelerating Development stakeholders, needs to demonstrate that it has
a vision for how this research can shape the
The promise of AI addressing these critical understanding of technology needs,
challenges can be vastly improved by both capabilities, priorities, integration opportunities
institutional and technical accelerated and control, energy performance, and
pathways. economic value.
Institutional acceleration. This domain This research initiative should provide

features large-scale energy infrastructure and actionable intelligence to the energy industry to
complex systems that will require researchers identify gaps in observability to enable
to establish partnerships with cities, local deployment of key data sources, platforms for
governments, electric utilities, industry, and real-time situational awareness and
others to develop research testbeds based understanding, and novel decentralized control
around public and private partnerships. The of energy assets in real time.
emerging AI technology needs to demonstrate
that it can provide a new understanding of Technical acceleration. Improved situational
energy infrastructure and provide actionable awareness across multiple interdependent
information for energy system planning, design, energy infrastructure requires increased
and operations. Current demonstrations in this accuracy and resolution of external and derived
area tend to focus on buildings, transportation, data streams (heat, mass, urban structures and
or smart grid technology, individually. surfaces, vegetation, weather, and traffic flow).
Integrating all three of these domains is difficult Novel AI algorithms are needed to perform
and will require multi-disciplinary teams. information fusion from disparate data sources
coupled with integrated models of the energy
Piloting fully virtualized, data-driven, computa- infrastructure at a time scale pertinent to
tional digital twins of cities is needed to test enable proactive responses to improve system-
and validate new technologies and the energy wide resilience (see Chapter 10, AI
performance of integrated urban systems (see Foundations and Open Problems). The
Chapter 15, AI at the Edge). The initial scale availability of data imposes an increased focus
could be a city block, then expanding to a on deployable online learning algorithms,
district and later to a small city. particularly for the restoration process where
new data can be extremely informative, to
One early task would be to evaluate the enable multi-scale simulation with AI for model
availability of existing data, data gaps, new discovery. This domain requires developing
sensing and measurement needs, the current novel cooperative AI methods that can support
state of the art in regional demonstrations, and real-time, multi-stakeholder, multi-scale
lessons learned in recent research. It is decision-making for the national energy
necessary to determine how AI can and should infrastructure. Key technological improvements
be coupled with current and future high are needed, with increased focus on human-
performance computing simulation capabilities infrastructure interaction characterization and
(see Chapter 10, AI Foundations and Open prediction for transforming largely reactive
Problems). Examples of the resulting enhanced approaches in use today into proactive resilient
capabilities are providing more realistic models operation of the future. Particular emphasis
of complex agent or system behavior inside the needs to be placed on a sharp characterization
models (inner loop) or discovering more of the performance of AI tools for the rare event
optimal control strategies (outer loop) for the portion of the prediction space, as seen in
system through simulation. In the long run, the recent major natural disasters. This requires
research community, in collaboration with development and usage of surrogate models

and understanding emergent behaviors of 6. References
interacting AI agents that capture the multi-
physics of urban systems and can learn from 1. Kwasinski, F., Andrade, M. J. Castro-
the combination of measured data and physics- Sitiriche and E. O’Neill-Carrillo, “Hurricane
or model-based simulation data for rapid Maria Effects on Puerto Rico Electric Power
prediction. Real-time forecasting techniques Infrastructure,” IEEE Power and Energy
have to be developed by leveraging urban Technology Systems Journal, 6, 85-94
sensing and monitoring (e.g., building energy (2019). doi: 10.1109/JPETS.2019.2900293
use and system operation, traffic flow) to 2. U.S.-Canada Power System Outage Task
predict operational issues (e.g., traffic, power Force, Final Report on the August 14, 2003
demand) and inform preventive actions. Key Blackout in the United States and Canada:
AI-driven optimization methods that are Causes and Recommendations, April 2004.
applicable for control deployment to operate in
real time with deep reinforcement learning 3. IEEE guide for electric power distribution
have to be developed to achieve optimal reliability indices, IEEE Std 1366-2012.
performance of the complex urban systems 4. U.S. Department of Energy website,
delivering multiple objectives (efficiency, “Confronting the Duck Curve: How to
flexibility, and resilience) under uncertain real- Address Over-Generation of Solar Energy,”
world conditions. (2017).
5. National Academies of Sciences,
5. Expected Outcomes Engineering, and Medicine 2017.
Enhancing the Resilience of the Nation’s
AI will enable the development of high-
Electricity System. Washington, DC: The
resolution situational awareness and resilience-
National Academies Press.
focused control of smart energy infrastructure
doi.org/10.17226/24836.
by combining diverse data sources and
creating novel models and synthetic datasets 6. Hong, T., Chen, Y., Lee, S.H., Piette, M.A.
spanning multidisciplinary sciences (building “CityBES: A Web-based Platform to
science, urban science, mobility science, Support City-Scale Building Energy
sensing and communication, data science, AI, Efficiency,” Urban Computing (2016).
computing science, behavioral/decision 7. Chen, Y., Hong, T., Piette, M.A. “Automatic
science). The models and datasets will deliver Generation and Simulation of Urban
data-driven decision support to address grand Building Energy Models Based on City
challenges of urban energy and environment, Datasets for City-Scale Building Retrofit
considering interconnected systems of Analysis,” Applied Energy (2017).
buildings, climate, transportation, smart grid, 8. U.S. Department of Energy, “Smart Grid
and humans. They will also support real-time System Report,” (2018).
operations and optimization of integrated urban
systems by means of computationally efficient 9. Hong, T., et al. “Ten questions on urban
optimization through intelligent interacting AI building energy modeling,” Building and
agents. The fully virtualized data-driven models Environment (2019).
of smart energy infrastructure will enable 10. U.S. Department of Energy, Grid Interactive
stakeholders, decision makers, and citizens to Efficient Buildings https://www.energy.gov/
benefit from efficient, flexible, and resilient eere/buildings/grid-interactive-efficient-
operations under normal, stressed, and buildings
extreme conditions. We believe that AI can 11. U.S. Department of Energy, Energy
improve the resilience significantly—possibly Efficient Mobility Systems https://www.
by an order of magnitude compared to a energy.gov/eere/vehicles/energy-efficient-
business-as-usual approach. mobility-systems

88
09. AI for Computer Science
Artificial intelligence methods were originally addressed by AI. Specifically, we identify grand
developed to solve one of the grand challenges challenges in the areas of hardware and
in computer science, namely the design of software system design, programming,
computer systems that could behave like theoretical computer science, and workflow
humans. The most recent breakthroughs in AI and infrastructure automation. We do not
use machine learning to address specific address computer science solutions to support
problems in computer vision, natural language AI, which is covered in other chapters (see
processing, and robotics, and to outperform Chapters 11–13).
human players in games of strategy like chess
and Go. AI has the potential to address a 1. State of the Art
variety of computer science challenges where
complex manual processes could be replaced AI has the potential to transform many fields of
by automation, including chip design, software computer science, from low-level hardware
development, and online monitoring, and design to high-level programming and from the
decision making in operating and runtime most fundamental algorithmic challenges to
systems, database management, and day-to-day operation of user facilities.
infrastructure management.
Hardware and software design. The design
The DOE Office of Science Advanced Scientific of next-generation hardware and software
Computing Research (ASCR) program drives systems and mapping of application codes to
innovations and improvements in scientific target systems is currently a static process that
understanding through its world-class research involves human-in-the-loop design processes
program and facilities—both computing and and consists of repeated experiments,
networking. The innovations in science user modeling, and design space exploration. The
facilities (see Chapter 14, AI for Imaging) are design of new chips and HPC systems takes
expanding the boundaries of computing to many years, and hardware vendors and
include the edge (see Chapter 15, AI at the application developers spend months mapping,
Edge), consisting of science instruments and porting, and tuning applications to run on new
sensor networks (see Chapter 16, Facilities systems. As hardware and software get more
Integration and AI Ecosystem). Traditional complex and heterogeneous, current strategies
computer science will not be sufficient to will be impractical. DOE has been a leader in
address the complexity and scale of future the co-design of HPC systems for science, but
systems and workloads arising in the DOE many hardware features are still driven by
science mission described in Chapters 1 technology constraints and can be a challenge
through 8. AI will provide solutions to the for programmers (see Chapter 11, Software
design, development, deployment, operation, Environments and Software Research). The
and optimization of all hardware (see Chapter DOE community has also spearheaded the use
13, Hardware Architectures) and software of automatic performance tuning (autotuning)
components (see Chapter 11, Software using both brute force search and
Environments and Software Research), mathematical optimization [5–8]. In recent
ranging from individual elements to coordinated years, AI has been explored for the design of
orchestration of the workflows over computing, chips [1], storage management [2], hardware
networking, and experimental facilities. [3], optimizing compilers [6,7], and to improve
the performance of single-node computation
In this chapter, we identify the grand [5,8], communication, I/O [9,10], math libraries
challenges in computer science that can be [15], and scheduling [11]. However, the payoffs
09. AI FOR COMPUTER SCIENCE 89

from AI-driven hardware and software co- equations, code written in a domain-specific
design are far from complete and will require language, or a simple unoptimized version of a
rapid and non-intrusive data collection, program [4,12,14]. Programmers employ
exploration and development of methods, and various approaches to tackle complex coding
sharing of learned models. tasks, including Google search and online
communities, and use integrated development
Application development and data environments that help them to autocomplete
wrangling. Development, tuning, maintenance, code, which can be partially automated with
and testing of software and making data ready natural language code search [19] and code
for models and methods are manual, recommendation [18]. Smart fuzzing
expensive, tedious, and error- prone processes techniques, which use random or invalid input
(see Chapter 11, Software Environments and to test a computer program, have shown
Software Research and Chapter 12, Data Life promising results in helping to find semantic
Cycle and Infrastructure). Existing techniques bugs in large software systems. Programming
for developing software were mostly designed by optimization [16] is a design paradigm that
for logic-heavy control flow programs that run allows software developers to specify a rich
on a single machine with homogenous and potentially large design space of software
hardware. However, the future of software components that can be used by AI to generate
demands data-driven, distributed programs that programs that perform well in a given context.
run efficiently on heterogeneous hardware.
Recent work in program synthesis and Data wrangling today is largely a human-
automated testing has produced tools that intensive task, and AI offers unique
work independently to generate and test opportunities to automate or simplify the task.
software or as powerful aids for human For example, ActiveClean [22] provides a set of
developers (Figure 9.1). optimizations to select the best data to be
cleaned for an iterative cleaning framework.
Automated program synthesis produces
software solutions based on either input/output Computer science foundations of AI. AI
examples, demonstrations, or high-level methods have been increasingly applied to
specifications, producing software from solve complex science problems using codes
Figure 9.1 AutoPandas uses neural-backed operators in program generators for

program synthesis [4].

with provable performance and correctness they be explainable, but Tarski’s limit prevents
properties. It has been shown that certain a machine from generating explanations in
smoothness and boundedness properties of some cases even if they exist [23]. More recent
physical and abstract system laws can be results show that learnability may be
exploited to develop domain-specific, effective undecidable [29], and similar results are
ML solutions with many desirable properties expected to appear in the future that establish
(see Chapter 10, AI Foundations and Open limits of ML and its performances.
Problems). For example, they lead to Complementing more general considerations in
performance optimization of data transport Chapter 10, the theoretical computer science
infrastructures [13] and accurate power-level provides the frameworks and tools to establish
estimation of reactors [28]. Principles of that a given problem indeed is effectively
theoretical computer science provide a rigorous solvable by AI/ML methods and is not subject
framework to establish critical properties of to the above limits.
AI/ML codes, namely computability,
learnability, explainability and provability, as Workflow and infrastructure management.
illustrated in Figure 9.2. Managing distributed infrastructure that spans
multiple systems, domains, and organizations
There are several known performance limits of is largely achieved today by manual or ad-hoc
ML methods, and many practical problems methods for configuration, monitoring, and
have shown to be within them. Indeed, several optimization (see Chapter 14, AI for Imaging
critical problems—such as zero-day computer and Chapter 16, Facilities Integration and AI
virus detection [26] or assessment of code Ecosystem). Challenges exist at multiple
resilience to arbitrary hardware faults [24]—are levels, from assuring the safe and secure
non-Turing computable and hence not solvable operation of networks and systems to efficient
by black-box ML methods. In some cases, the resource allocation to users and optimal use of
complexity of the tasks could be too great— the distributed systems for complex scientific
such as the unbounded Vapnik-Chevenenkis workflows. Neither individual users nor system
dimension [25]—so that no performance operators have the global view or integrated
guarantees can be given for any ML solution, control mechanisms needed to make efficient
independent of the sophistication of its design use of a multi-purpose, multi-facility
or use of a supercomputer. It appears on the infrastructure. AI provides the automation that
surface that limitations of “black-box” ML can ease the burden of human-driven
solutions can be overcome by requiring that management of infrastructure at facilities.
Figure 9.2 Computable, learnable, explainable, and provable AI/ML solutions.

Current efforts are exploring the use of computing facilities. Additionally, a number of
reinforcement learning, unsupervised learning, different accelerators are emerging in the
and classification techniques to optimally community, including neuromorphic and
control wide area network resources quantum computers (also see Chapter 11,
(Figure 9.3) to improve high-speed big data Software Environments and Software
transfers and analyze the performance Research). However, the cost and time
variation of applications on supercomputers to required for the design of current HPC
correlate with key users and workloads and hardware and software is still prohibitive. AI
leverage software-defined networking and can influence the co-design of hardware and
related services exported in high-level software systems at many levels to meet
programming interfaces, incorporating them energy, security, resilience, and performance
into complex workflows [13]. The future will be requirements. For example, at the transistor
automated, policy-driven management of level, it can find an optimal set of device
distributed resources by both individual parameters given a set of operating
applications and complex-wide workflows. constraints; at the chip level it can optimize
placement of logic blocks and configure
2. Major (Grand) Challenges architectural-level features such as the number
of floating-point units. AI can be used at
We identify three grand challenges in the runtime to monitor hardware characteristics
areas of hardware and software system such as thermal densities to anticipate faults
design, programming, workflow and infra- that deliver incorrect results or disable parts of
structure management, and foundational chips. It can also be used in compilers,
computer science. The grand challenges programming libraries, and applications to
together attempt to optimize productivity, automatically generate and search through
performance, reliability, and portability across various implementations to find one optimized
the DOE complex. for a given hardware platform, and even adapt
dynamically to changing performance
Develop hardware/software systems that characteristics at runtime. For example, AI
are semi-automatically co-designed and co- could identify common parallelization of
tuned. In recent years, heterogeneous memory optimization strategies and transfer
hardware and software infrastructures have them across applications.
been deployed at DOE high performance
Figure 9.3 The performance of data transfer infrastructure depends on all its subsystems, namely, networks, transfer hosts, file and
I/O systems, storage systems, and data transfer. Custom ML methods have been developed to estimate throughput profiles [13].

AI methods can be leveraged in operating candidate programs. AI-driven program
systems and runtime systems. They can synthesis will learn heuristics by extracting
monitor applications running on large-scale probability distribution of programs from real-
HPC systems and learn a model of application world corpus of programs and by remembering
performance. These models can provide search strategies that worked well in the past.
feedback on how to best manually or AI-driven code recommendation, such as auto
automatically adapt the application for better code completion, will help find the right libraries
performance. AI-based performance modeling and APIs and synthesize or recommend new
can capture data flow and dynamic code using these libraries and APIs.
performance behaviors of parallel applications Additionally, automated techniques will extract
under various constraints; model-based intents from user inputs and adapt them to
analysis will extract concurrency, examine the different environments. Automated testing
tradeoffs among different performance factors, based on smart fuzzing code perturbations and
and make predictions about different dynamic symbolic execution will allow
applications on future systems. Model-based developers to efficiently test code.
optimization will allow users to optimize the
performance of applications based on target Enable automated and efficient execution of
objectives. In addition, AI methods can be end-to-end scientific workflows processing
applied to assist model construction, facilitate experimental, observational, and simulation
performance analysis, accelerate optimization, data on adaptive and resilient
and provide software verification. infrastructure. We envision a future in which a
researcher at a user facility would be able to
Enable automated tools for programming launch his or her experiment and seamlessly
and data wrangling suited for modeling and access the network and resources in real time
data-driven science needs. Automated and at HPC facilities to process data, compare with
computer-aided programming tools will be simulation results, search other relevant data,
developed by using AI-driven program and reproduce the workflow. Future novel
synthesis and code recommendation, software workflows may include AI elements combined
adaptation, software testing and verification, with simulation and experimental science.
and code optimization. This will dramatically
reduce or eliminate programming efforts for A recent ASCR workshop report lays out the
high-level applications on heterogeneous challenges and approaches for using ML to
architectures and will support a new generation develop distributed, fault-tolerant, energy-
of programmers for data analysis and learning, efficient HPC applications [17]. An AI-driven
demonstrating that HPC novices can autonomous workflow engine will use user
accomplish tasks in a tenth of the time that input and prior learned knowledge of the
experts spend with traditional tools, produce system to generate optimized code, use the
programs that are 10 times faster than expert- workflow through intelligent schedulers that use
written programs, and increase automated test AI in addition to policies, and monitor the
coverage to 95% or more. execution. AI can guide scientists in designing
and optimizing their workflows in ways that are
In the future, programmers will perform not possible today. These workflows will run
complex programming tasks by expressing atop a fully automated infrastructure in which
their intents via high-level domain-specific AI will design, develop, deploy, monitor,
languages, input-output examples, diagnose, operate, and optimize computing
demonstrations, natural language descriptions, elements, units, systems, complexes,
and formal specifications. Program synthe- networks, databases, and federations. The end
sizers will then take such intents and search user at the user facility and the facility staff at
combinatorically large spaces of possible ASCR facilities will be notified of situations that

require visualization or, more generally, 3. Advances in the Next Decade
humans in the loop. The autonomous workflow
and all data associated with it will also be In the short term, AI can be an invaluable tool
captured and made available to be published in for analyzing observational data in computing
machine-readable journals. AI provides a systems, applications, facilities, and networks.
unique opportunity to automate the Over the next decade, AI techniques will detect
management of the underlying infrastructure and anticipate performance anomalies due to
and the scientific process to accelerate the hardware failures, resource overload, intrusion,
pace of scientific discoveries. or other interactions. It will accelerate the
design of hardware and software through
Develop computable, provable, explainable, intelligent design space exploration and
performance-guaranteed and yet practical improve the automated tuning of high-
AI solutions for science. AI/ML methods that performance libraries and applications. Longer
exploit the properties and structure of term, the analysis will be used for online
underlying system and abstract laws lead to learning and real-time control of increasingly
customized solutions that are computable, sophisticated application workflows that
explainable, and possess proven generaliza- cohesively tie together the facilities and other
tion and correctness. In particular for science resources across DOE and the broader
problems, such solutions range from inferring science complex.
new inter-relationships, efficient polynomial
approximations to NP-hard problems, The grand challenges rely on innovations at
discovering new laws from measurements and different spatial scales, including the node,
simulations, and obtaining optimizing machine, and facility level. At the node level,
parameters over complex spaces. There is an the impact of various hardware and software
immediate need for foundational frameworks knobs will produce learned multi-metric
and tools that enable us to assert the critical performance models for runtime, power,
properties of AI/ML solutions by combining the energy, memory footprint, and more. At the
rigorous theories of computing, learnability, machine level, AI will learn models of
expressability, inference, and provability, which communication, load balancing, and I/O and try
have been developed as highly specialized to understand the impact of resource sharing
individual technical areas. They have to be across applications. Facility-level models will
refined, sharpened, and combined to address capture resource utilization, power constraints,
the spectrum of science areas consisting of user satisfaction, and time-to-solution to
interacting physical and cyber systems, such optimize job scheduling, staging phases of the
as simulation-driven experiments, experiment- application, and file transfers. Model-informed
steered computations, and optimal design and decisions will be made at every level with
operation of smart grids and federations of multiple coordinated feedback loops. In
computing systems and experimental facilities. particular, the coordination will happen both
The underlying laws here are hybrid in bottom up (node to facility) and top down
encompassing both systems, physical and (facility to node).
cyber. Establishing that the solution is indeed
within ML foundational limits, and exploiting the Reliance on ML in DOE science and energy
properties of underlying systems across the applications, user facilities, and cyber-physical
myriad of DOE computational tasks, are systems means there is a new part of the
challenges considering the diversity of science system that can be attacked, such as via
areas in which ML methods are being applied. tainted training data, false sensor data, and
fragile AI algorithms. Any use of AI, particularly

AI-automated processes, is vulnerable to such repositories that are specifically applicable for
attacks. Consequently, detecting tainted the scientific community.
training data and false sensor data, and
measuring confidence of AI algorithms in their We need to develop open-source scalable
output become critical as these AI-enabled modeling and simulation for the entire
systems are deployed. While the existing infrastructure to test AI algorithms. We need
methods are primarily ad hoc and heuristic in methods and algorithms that can operate at
nature, recent methods include the develop- different scales; for instruction scheduling to
ment of AI-based cybersecurity methods. For pointer chasing, we need lightweight, low-
example, adversarial training [20] is an latency ML methods, but for system co-design
approach that injects adversarial examples into we need ML methods that can scale to full
training data to increase robustness of ML exa/zetta scales to explore the vast design
models. For the Cybersecurity of Cyber- space parameters. Additionally, AI will need
Physical Systems and DOE facilities, the appropriate infrastructure. Cloud computing
conventional methods have become platforms have been a cornerstone for scaling
inadequate (for example, zero day threats), ML/DL and AI methods in industry. Similarly,
and new AI-based cybersecurity mechanisms the use of HPC and other platforms to support
are under active development [21]. AI workloads will be critical (see Chapter 13,
Hardware Architectures).
Enabling autonomous workflows on
Many ongoing efforts are using AI to address autonomous infrastructure will take years of
challenges in computer science. However, sustained efforts across experimental,
strategic investments and coordinated efforts at computing, and networking facilities. The
both technical and programmatic levels will be realization of this vision will require a fast,
needed to realize the vision outlined in these dynamic, optimized, distributed software-
grand challenges. defined ecosystem, critical measurement
streams, and powerful analytics that can
Access to curated data from many different extract information to drive allocations,
levels of hardware and software is key. Data diagnosis, and broad strategies and policies.
analogous to ImageNet is needed to feed AI for Initial efforts can focus on automated, adaptive
computer science. This is beyond the capacity collection of instrumented data from many
of the individual researcher and must capture devices and at multiple levels as well as AI-
various design aspects of hardware, software, driven integration into dynamic composite
programming, workflow, and infrastructure. state. Automating parts of the workflow (e.g.,
There are a number of challenges to collecting resource allocation) based on historical data
this data, including applicability (i.e., using data will enable us to lay the foundation for
from an old system for a new system might not autonomous workflows. Additionally, sustained
result in meaningful predictions), and performance optimization using trend
accessibility (i.e., data needs to be extracted detection, strategy adaptation, continuous
and available in formats that are meaningful to performance monitoring, predictive diagnosis,
the models). ASCR facilities already have and graceful task reallocation and migration
organized efforts to make available more data using AI methods will provide starting points for
and will pave the way for more autonomous this work.
infrastructure. AI for programming today can
benefit from websites such as GitHub and Programmatically, we suggest a number of
Stack Overflow that offer massive amounts of efforts to realize the grand challenges. A
data and metadata about programs. Efforts will hardware-software co-design effort that
be needed to identify data and software includes researchers, industry partners, and

the computer facilities will be needed to 6. References
achieve the grand challenge of developing self-
improving and self-adaptive hardware-software 1. Ibrahim, A., Elfadel, M., Boning, D., Li, X.
systems that can be designed and operated (Ed.), Machine Learning in VLSI Computer-
without significant human involvement. Teams Aided Design, Springer International
of computer science researchers working Publishing, 2018.
closely with experimental, computational, and
2. Toigo, J., AI for Storage Management Gets
networking facilities to develop autonomous
Real. Tech Target (2019).
workflows on autonomous infrastructure will
https://www.google.com/amp/s/searchstora
be needed.
ge.techtarget.com/opinion/AI-for-storage-
management-gets-real%3famp=1
The foundational computer science challenge
will require a comprehensive AI/ML science 3. 2nd International Workshop on AI-assisted
program (across math and computing science) Design for Architecture https://eecs.
to develop and refine foundational limits and oregonstate.edu/aidarc/index.php/program/
solvable problems and to sharpen the solutions 4. Bavishi, R., Lemieux, C., Fox, R., Sen, K.,
for solvable classes to ensure effective & Stoica, I., AutoPandas: Neural-Backed
computation, performance guarantees and Generators for Program Synthesis.
explanations. The program would benefit from Proceedings of the ACM on Programming
a SciDAC-style consortium for domain Languages, OOPSLA'19, October 2019.
scientists working closely with ML scientists to
act as a DOE-wide, central resource to be used 5. Ansel, J., Kamil, et al., Opentuner: An
to analyze the ML problems, establish their extensible framework for program
solvability, and develop effective solutions. autotuning, Proceedings of the 23rd
Finally, it will be critical to retrain existing staff International Conference on Parallel
and hire and retain new talent with expertise in Architectures and Compilation, 303–316.
various areas of computer science, including ACM, 2014.
distributed infrastructure, AI, and foundational 6. Balaprakash, P., et al., Autotuning in High-
computer science. performance Computing Applications,
Proceedings of the IEEE, 1–16, 2018.
7. Tiwari, A., Chen, C., Chame, J., Hall, M., &
The use of AI will allow us to address hard Hollingsworth, J., A Scalable Auto-tuning
challenges in computer science toward Framework for Compiler Optimization,
automating human-intensive parts and Proceedings of the 2009 IEEE International
reducing time to innovations in hardware, Symposium on Parallel & Distributed
software, workflows, and infrastructure to meet Processing, 1-12, 2009.
the utilization and performance needs while 8. Thiagarajan, J. J., et al., Bootstrapping
enhancing scientific productivity. These Parameter Space Exploration for Fast
innovations will directly impact scientific Tuning, Proceedings of the 2018
discovery, allowing users to set up federations International Conference on
and execute workflows on well-oiled Supercomputing, 385–395, November
infrastructures. AI will directly result in optimal 2018.
facility utilization and response—self-healing,
self-optimizing infrastructures will handle the 9. Marathe, A., et al. Performance Modeling
predictable problems while human operators Under Resource Constraints Using Deep
will have the tools to diagnose and Transfer Learning, Proceedings of the
fix problems. International Conference for High

Performance Computing, Networking, Search,, Proceedings of the ACM on
Storage and Analysis (SC17), 31, 2017. Programming Languages (OOPSLA’19),
October 2019.
10. Behzad, B., et al., Taming Parallel I/O
Complexity with Auto-tuning, Proceedings 19. Cambronero, J., Li, H., Kim, S., Sen, K., &
of the International Conference on High Chandra, S., When Deep Learning Met
Performance Computing, Networking, Code Search, Industry Track of 27th ACM
Storage and Analysis (SC13), 68, 2013. Joint European Software Engineering
Conference and Symposium on the
11. Lin, X., Wang, Y., & Pedram, M., A
Foundations of Software Engineering
Reinforcement Learning-based Power
(ESEC/FSE’19), ACM, 964–974, August
Management Framework for Green
2019.
Computing Data Centers, 2016 IEEE
International Conference on Cloud 20. Tramèr, F., et al. Ensemble adversarial
Engineering (IC2E), 2016. training: Attacks and defenses. arXiv
preprint arXiv:1705.07204 (2017).
12. Kalyan, A., et al., Neural-guided Deductive
Search for Real-time Program Synthesis 21. ASCR Cybersecurity for Scientific
from Examples, The Sixth International Computing Integrity - Research Pathways
Conference on Learning Representations and Ideas Workshop
(ICLR 2018), 2018. https://escholarship.org/content/qt5j00n7h2/
qt5j00n7h2.pdf
13. Rao, N. S. V., Sen, S., Liu, Z., Kettimuthu,
R., & Foster, I., Learning Concave-convex 22. Krishnan, S., J. Wang, E. Wu, M. J.
Profiles of Data Transport Over Dedicated Franklin, and K. Goldberg. 2016.
Connections, Machine Learning for ActiveClean: Interactive Data Cleaning for
Networking, Springer-Verlag, 2019. Statistical Modeling. Proc. VLDB Endow. 9,
948–959 (2016).
14. Cai, J, et al., Making Neural Programming
Architectures Generalize Via Recursion, 23. Tarski, A.. Logic, Semantics,
The Fifth International Conference on Metamathematics: Papers from 1923 to
Learning Representations (ICLR 2017), 1938, Oxford University Press, 1956.
2017. 24. Rao, N. S. V. On undecidability aspects of
15. Sid-Lakhdar, W., Mahmoudi Aznaveh, resilient computations and implications to
Mohsen, M.A., Li, X., & Demmel, J., Exascale, Resilience 2014: Seventh
Multitask and Transfer Learning for Workshop on Resiliency in High
Autotuning Exascale Applications, Performance Computing with Clouds,
submitted August 2019. Grids, and Clusters, 2014.
16. Hoos, H.H., Programming by Optimization. 25. Vapnik, V. N. Statistical Learning Theory.
Communications of the ACM 55, 70–80 John-Wiley and Sons, New York, New
(2012). DOI: York, 1998.
https://doi.org/10.1145/2076450.2076469 26. Cohen, F. B. “Computational aspects of
17. Berry, M., et al., Machine Learning and computer virus,” Computer & Security, 8,
Understanding for Intelligent Extreme Scale 325–344, 1989.
Scientific Computing and Discovery, 27. Rao, N. S. V., Reister, D. B., Barhen, J.
technical report, DOE ASCR Workshop Information Fusion Methods Based on
Report, 2015. Physical Laws, IEEE Transactions on
18. Luan, S., Yang, D., Barnaby, C., Sen, K., & Pattern Analysis and Machine Intelligence,
Chandra, S., Aroma: Code 27, 66–77 (2005).
Recommendation via Structural Code

28. Rao, N. S. V., et al. Multi-Modal Sensor 29. Ben-David, S., Hrubes, P., Moran, S.,
Fusion for Reactor Power-Level Estimation: Shpilka A., and Yehudayoff, A. Learnability
Thermal, EM, Acoustic. Nuclear Security Can Be Undecidable, Nature, 2019.
Applications Research & Development
Program Review Meeting, 2019.

10. AI Foundations and Open Problems
Advancing the mathematical, statistical, and properties of materials to the smallest of
information-theoretic foundations of artificial subatomic particles. These efforts have
intelligence is vital to realizing the potential of traditionally relied on mathematical, modeling,
AI for science. These foundations are now a and computational building blocks whose
bottleneck for scientific discovery, and the properties are well established. Despite having
practical application of AI and machine learning access to tremendous computational
remains predominantly an art. Although resources, the fact remains that scientists
significant progress is being made, advances in cannot possibly explore all possible theories or
the foundations of AI will be required to simulate phenomena at the sub-grid scale. AI
complement capabilities in hardware and presents a unique opportunity for bridging this
software and realize the full potential of AI in gap, but its building blocks and their
DOE’s science and engineering mission (see composition are not yet sufficiently established
Chapters 1 through 9). for widespread scientific use [2–4].
One of the distinguishing characteristics of Although the past decade has seen significant
science is the existence of laws based on time- algorithmic and theoretical progress, work on
tested observations about natural phenomena. the foundations of AI and ML has been far
How should these governing principles and outpaced by the empirical exploration and use
other scientific domain knowledge be of these techniques [16]. With the increased
incorporated in an AI era? To become an use of AI and ML, clear trends are emerging.
accepted part of the toolbox of scientists and For example, residual network-based
engineers, the validity and robustness of AI convolutional neural networks [10,11] are the
techniques need to be trusted. What are the standard for image processing; automatic
limits of AI techniques, and what assumptions differentiation and accelerated first-order
and circumstances can lead to establishing optimization algorithms are pervasive in
assurance of AI predictions and decisions? training deep networks [12–15]; and generative
Another hallmark of science and engineering is models (e.g., generative adversarial networks,
that limited training data may be available in variational autoencoders) are providing
the most complex, dynamic, and high synthetic data far beyond traditional image
consequence of applications. Which AI applications [6–9,17,25]. Principles underlying
techniques can best address different sampling the use and understanding of these and other
scenarios and enable efficient AI on various techniques tend to be scattered across
computing and sensing environments? disciplines, from theoretical computer science
to signal processing to statistics.
Addressing these and other open problems will
advance the building blocks of the entire Neural networks have started to be specially
AI ecosystem. designed to incorporate some types of domain
knowledge—such as rotational equivariance
1. State of the Art [1,5,21] (Figure 10.1) and statistical [18], partial
differential equation (PDE), [19] and stochastic
Advances in algorithms and hardware have PDE [20] constraints—but these efforts are in
given scientists the tools to model and simulate their infancy. Results are also being
nature at an unprecedented range of scales: established in the computability of AI-related
from computing the history and fate of the problems [26] and in exploiting graph-based
cosmos and the explosion of supernovae to the representations [22–24]. Natural language
evolution of the climate system and the processing and unsupervised learning
10. AI FOUNDATIONS AND OPEN PROBLEMS 99

Figure 10.1 Specially designed neural networks can satisfy domain properties such as 3D rotation-equivariance,
allowing one to train on shapes and molecules in one orientation while still identifying shapes and molecules in any
orientation. Adapted from N. Thomas, NeurlPS18 Molecules and Materials Workshop [1].
techniques are beginning to be explored to transfer learning, and constraint satisfaction for
gain additional insight from the scientific new problem regimes. Incorporating modeling
literature [27,28] and to pass eighth grade and simulation capabilities to generate training
science exams [29]. data leverages decades of HPC improvements
to accelerate learning; incorporating
2. Major (Grand) Challenges mathematical equations and scientific literature
leverages centuries of advances in theory.
Three exemplar grand challenges are identified Furthermore, to complete the scientific
to illustrate the promise of addressing the process, incorporating domain knowledge in AI
foundations of AI. models can be used as the basis for advances
in experimental design, active learning,
Incorporate domain knowledge in ML and facilities operations, formal verification, and
AI. ML and AI are generally domain-agnostic. automated theorem proving to accelerate
Whether studying datasets from a beamline scientific discovery.
scattering experiment, a physics collision, or a
climate simulation, the training procedure Improving our ability to systematically incorpo-
typically treats every labeled dataset as a point rate diverse forms of domain knowledge can
in a high-dimensional space and proceeds to impact every aspect of AI, from selection of
apply standard convolutional and nonlinear decision variables and architecture design to
operations. Off-the-shelf practice treats each of training data requirements, uncertainty quanti-
these datasets in the same way and ignores fication, and design optimization. Indeed,
domain knowledge that extends far beyond the incorporating domain knowledge is a
raw data itself—such as physical laws, distinguishing feature of AI within the DOE
available forward simulations, and established mission, without which AI-based scientific
invariances and symmetries—that is readily progress is otherwise limited to that afforded by
available for many systems, much in the same traditional AI drivers.
way that early knowledge on the neural vision
system led to marked improvements in image Establish assurance for AI. Assurance
processing. Better incorporation and entirely addresses the question of whether an AI model
new methods targeting these principles will has been constructed, trained, and deployed
improve data efficiency; quality, interpretability, so that it is appropriate for its intended use,
and validity of the model; and generalization,

and is one of the most challenging problems careful characterization of the generalization
facing AI. Briefly, it addresses the question limits, proofs of validity, robustness, and
of whether and when an AI model can be assessment of confidence associated with
trusted. Assurance is an extremely broad topic predictions. Establishing assurance is
and includes the validity, robustness, especially vital for scientific and high-
reproducibility, and uncertainty quantification of consequence applications where AI models
both learned models and their use, as well as and tasks would otherwise fail to be adopted,
the topics of explainability and interpretability. It including autonomous systems such as in
also includes the question of whether the data advanced manufacturing, energy generation,
used in training an AI model contains sufficient storage and distribution, automated health
information to train the model without diagnostics, and the control of large
introducing spurious correlations or bias that scientific facilities.
will invalidate AI-based decisions, as well as
operational assurances in the presence of Achieve efficient learning for AI systems.
limited/noisy data or adversarial attacks. The core of any AI system is the creation of an
Furthermore, it includes the development of abstract model and the training of that model
provable methods to assess whether a problem based on data. Efficient learning in ML systems
is computable, learnable, and expressible must be studied along several axes. The first is
given the available data and other limitations. algorithmic. For example, deep neural
networks routinely include hundreds of layers
As an example, establishing assurance for a and billions of trainable parameters. Training
particular AI model would involve clarifying and these models for complex applications is
answering questions such as: Why does the AI computationally intensive, requiring large
model work for a problem? What are the amounts of computing power and data, and is
internal representations of data that the AI critically dependent on factors such as the
model has learned during training? How can quality and quantity of labeled training data, the
the behavior of the AI model be explained? overall type and complexity of the model, and
How confident are the AI models on their the application domain. A second axis is the
predictions given the different sources of efficiency of the implementation of a learning
uncertainties and inductive biases involved? system on given hardware. Achieving improved
For such an AI model to be accepted as a well- efficiency—in terms of power, compute,
characterized tool for science, the research memory usage, and quantity of data required
community will need to address these for training—is broadly essential for scientific
questions and develop advanced capabilities to applications. Included in efficient implementa-
explain the behavior of the AI model form and tions is the use and impact of reduced-
map the internal representation of the model to precision hardware offered in current hardware,
domain-specific concepts. and novel computing hardware (quantum,
neuromorphic) and associated programming
It would then follow that establishing assurance paradigms in future platforms (see Chapter 13,
means determining whether an AI model is Hardware Architectures).
appropriately trained and used for the task for
which it is intended, including whether it is While there have been significant
robust against adversarial attacks or whether improvements in and variants of training
prediction errors can be meaningfully bounded. algorithms, the grand challenge of an efficient,
Explainability can also be used to provide the general-purpose algorithm for learning remains
basis of trust in AI systems by communicating unsolved. Further, nested nonlinear ensembles
meaningful information to humans and for a of linear models are undoubtedly not the last
posteriori in diagnosing AI behaviors. Unique great learning architecture that will emerge;
challenges for AI systems revolve around a

novel model forms may exhibit profound approaches has limitations and requires
advantages in terms of data efficiency. significant foundational research.
In addition to the general learning problem, Creation of surrogates. AI presents a unique

significant challenges remain for specific opportunity for creating data-driven surrogate
classes of AI models. For example, AI-based models that are potentially orders of magnitude
control systems rely on semi-supervised and faster to run than first-principles simulation
reinforcement learning, which are inefficient, codes and can be particularly effective in the
produce “brittle” systems, and are non- ability to simulate physical processes that span
transferable. Efficient continuous learning many spatial and temporal scales. Some of the
systems that handle data streams at the edge unique challenges for AI systems revolve
and remain validated must be developed. And around a careful characterization of the
it will be necessary to rethink the learning generalization limits, proofs of interpolation/
process—and artificial reasoning in general— extrapolation, robustness, assessment of
for systems, including approximate, confidence associated with predictions, and
neuromorphic, and probabilistic computing, to effects of the input data. Rigorously
make them computationally tractable for many understanding these tradeoffs will impact not
real-world problems. Further study of human only model selection in AI systems, but also the
neural systems and the learning process may creation and investigation of new classes and
yield significant insights, new abstractions, and types of models.
complexity classes beyond those conven-
tionally in use at present. Numerical optimization. Optimization algo-
rithms, differentiation techniques, and models
3. Advances in the Next Decade form the foundation of training in AI. Both the
loss landscape of these models and the
Many opportunities exist to advance the traversal of this landscape by algorithms are
foundational building blocks of AI over the next poorly understood. There is a significant
decade. We highlight a few of the areas where opportunity to improve understanding about the
mathematical, statistical, and information-theo- effect of incorporating domain knowledge in the
retic advances are required to address the form of constraints or regularization terms. How
above grand challenges. These advances do these approaches affect the solution
entail the development of new algorithms, manifold and the ability of fast algorithms to
theory, and modeling paradigms. consistently find this manifold? What principles
about network and model selection does this
Exploiting scientific knowledge. Approaches inform? What accuracy is needed in derivative
to leveraging domain knowledge include using and loss evaluations? What guarantees of
custom loss functions; selecting decision or optimality can be established? Opportunities
latent variables; applying physical constraints exist for fundamental advances in this area to
(e.g., conservation laws); leveraging Bayesian impact AI for science from the HPC facility to
or probabilistic graphical models; using the edge.
simulations to augment or generate training
data; and exploiting known smoothness, Uncertainty quantification (UQ). An important
sparsity, or other low-dimensional structures. aspect in the development and application of AI
Many of these approaches have been tested in is the quantification of uncertainties. Where AI
particular areas of ML, but mathematical and ML are used in physics-based
advances are required to establish principled applications, established approaches to UQ are
ways for the incorporation of domain applicable. In other cases, particularly in
knowledge throughout AI and to understand classification problems, ML models tend to be
the induced tradeoffs. Each of these highly nonlinear systems that are extremely

sensitive to input data, and small (e.g., models across the different modes, such as the
undetectable to the human eye) changes can encoding and representation of knowledge or
lead to misclassification. Several approaches events in ways that allow an AI model to
to dealing with uncertainty (e.g., Bayesian establish correlations across the different
neural networks) are computationally modes, and the transferability of knowledge
intractable for many AI problems; significant from one mode to enable more efficient
expansion of these approaches or new, more learning in other modes. Furthermore, DOE is
efficient alternatives are needed. Known and unique in the breadth of diverse datasets and
emerging UQ techniques can also be used to representations produced by various
detect overfitting and select the simplest simulations, experimental and observational
possible model. devices, and computing and networking
facilities. Developing and applying AI methods
Graph-based ML and AI. Graphs arise successfully will require that abstractions and
naturally in many scientific domains (e.g., algorithms are aware of and target the intrinsic
molecules, protein interaction networks, properties of datasets and representations
community networks). Structuring data and (e.g., big vs. small, structured vs. semi-
knowledge representations in terms of graphs structured vs. unstructured, sparse vs. dense,
and exploiting the topology information space vs. time vs. space-time, graphs,
available from a graph representation can be noisy/missing/mislabeled data, multi-
critical to realizing tractable algorithms and variate/-physics/-scale/-modal) to achieve
obtaining better outcomes in tasks such as optimal results.
classification, clustering, and prediction of
missing data. Important questions need to be Interpretable and explainable AI. While the
addressed, including, what is the most relevant ultimate goal of AI research may be fully
graph representation obtainable from noisy autonomous systems and artificial general
data for a problem and how can it be computed intelligence, the larger potential for the near
efficiently? How can the topological information future is augmenting human intelligence—
available in graphs be best exploited within an including, for example, accelerated scientific
AI model? How can the time complexity of AI discovery and engineering design, engineered
computations involving massive graphs be safety systems, and improved medical
tamed? How can algorithms be adapted to diagnoses. In the context of accelerating
work with dynamic graphs, and how can science and engineering, as AI methods make
streaming algorithms be designed when the inroads and produce state-of-the-art results for
graph cannot be stored? data analytics, surrogate modeling, inverse
design, and control applications, advanced
Data/model fusion and representation. capabilities are needed to explain the
Current AI and ML systems tend to analyze behavior of AI models and to map the
one type (mode) of data. However, most internal representation of AI models to domain-
physical systems include data of different types specific concepts.
or modes. For example, environmental sensor
input must be combined with video streams for Hypothesis generation, design of experi-
effective control of manufacturing processes; ment, and causal analysis. Validation is the
audio and text input must be combined in process of determining whether an AI model is
sentiment analysis; and multispectral sensors appropriate for the application or decision for
must be incorporated into environmental which it is being used. One of the fundamental
monitoring systems. The different modes often questions during validation is whether the AI
have fundamentally different characteristics model is making the right decision for the right
and represent different types of information. reason. For example, has the AI model learned
This leads to challenges in fusing data and spurious correlations, or can the model

determine the control variables? In short, can inefficient and costly training; there is a tradeoff
AI be used to identify causal variables or between exploitation and exploration, which is
distinguish between cause and effect? similar to the tradeoff between depth-first and
Typically this cannot be done with a single breadth-first search; and there is often a
training dataset. Instead, the AI model needs to narrow applicability regime and a lack of
be trained to construct a hypothesis, typically a robustness in training control systems. The
counterfactual one, and to design an human-computer interface and explainability
experiment—including the collection of data must also be considered in the context of RL-
(and the suitability of that data)—to test based control systems.
that hypothesis.
Real-time learning and control. AI impacts
Robustness/stability. Robustness generally are typically attributed to the availability of both
refers to an algorithm’s ability to deal with data and computing. However, in some science
errors in the input data or errors during applications one can see the dual problems of
execution of a program. This also includes the too much data and too little computing and
ability of AI to withstand an adversarial attack, storage. In such cases, one will not be able to
as well as the ability to deal with corner cases store even a small fraction of the generated
and rare events that may not appear in the data, nor will one have the ability to (re)train
training data. Similarly, stability refers to the models from scratch. Furthermore, the data
ability to deal with rounding and other errors may have low information content and may end
that are an intrinsic part of any numerical up corrupting models if used incorrectly. Even
algorithm. In many cases, classical numerical today, the cost of training a single, albeit large,
analysis approaches can quantify and control deep neural network has been estimated in the
these errors; however, foundational research tens or hundreds of millions of dollars of power
adapting these results to ML algorithms and and computing capacity. Improving the ability
developing AI-specific approaches to improving to train an AI model continuously (e.g., with
the robustness and stability is required. streaming data that is discarded immediately
Identifying the limits of AI methods and after use) and to deploy the model in real-time
models—for example, in terms of input requires advances in areas such as adaptive
or training data ranges beyond which models, event and anomaly detection, transfer
errors can grow undesirably—would learning, plasticity, and validation.
advance understanding.
Unsupervised learning and dimension
Reinforcement learning (RL) and beyond. reduction. Much of the data used in scientific
RL forms the foundation of most AI-based and engineering ML is unlabeled. For example,
control and policy systems. RL is the process the scientific objective may be to identify
of teaching an AI model to take actions based patterns in datasets, find clusters, estimate
on a current state, an environment, and a distributions, compress data, identify latent
reward function; it has been studied historically variables, or reduce the dimension of a large
in the context of dynamic programming, dataset. First-principles simulations can be
Markov decision processes, and control theory. used to offset partially the lack of labeled data
Within the context of AI, RL has been used (e.g., through the use of simulated data for
successfully in many applications, most visibly training, generative adversarial networks
by DeepMind for player policies for increasingly [GANs], or direct incorporation of physical
complex games. Despite RL’s recent success, laws). Advances will depend on continuing
many challenges must be addressed for research in areas such as matrix factorizations,
scientific and engineering control applications. kernel methods, GANs, and autoencoders, with
For example, action/reward shaping for control a particular focus on incorporating physical
decisions may lead to computationally knowledge and explainability (e.g., in the

determination of latent variables and other dynamic operations and real-time feedback
lower-dimensional representations). scenarios would narrow the distance from the
AI to the instrument, detector, lab, and facility.
Data and models are growing at an
unprecedented scale. A business-as-usual A research agenda supporting algorithmic and
approach for funding research on the theoretical advances in AL and ML will have a
foundations of AI for science is insufficient for profound impact on science, society, and
staying ahead of this deluge, let alone to industry. Successfully addressing the
transform such data and models for scientific challenges identified above will reap huge
understanding. The use of AI in scientific and rewards and enable rapid progress in areas
engineering applications is often constrained such as advanced manufacturing, energy
by the lack of good and labeled data; the distribution and generation, mobility and
inefficient, brittle, and unpredictable training of transportation infrastructure, bioenergy, health
AI models; and the lack of assurance, including science, and advanced materials design
UQ, validation, and interpretability. Key invest- and synthesis.
ments to accelerate development along the
above advances include the following: Primary outcomes of advancing the
foundations of AI will be to maximize the
The use of scientific principles, modeling understanding realized from science-informed
and simulation, and domain-specific knowl- AI, to increase trust in ML and AI as scientific
edge to inform and advance AI. Focusing techniques, and to provide efficient
here would spur the ability to learn effectively computational algorithms—for implementations
with orders of magnitude less data and/or to in diverse and heterogeneous computing
use the same data for otherwise unthinkable and instrument hardware—for generating
predictive power and generalizability. these models.
Addressing robustness, uncertainty quanti- With these advances, we expect that AI and
fication, and interpretability of AI systems. ML will become accepted and well-
Increased understanding of the sensitivities characterized tools in the modern scientific
and limitations of AI models and improving computing toolbox, and the abstract models
scientists’ ability to interpret AI outcomes would generated are understood for use in a variety
significantly accelerate the adoption of AI as a of tasks. Minimizing the risks associated with
scientific capability. AI uses is especially important in high-
consequence applications. Increased trust will
Learning for inverse problems and design also further the adoption of AI and embedded
of experiments. Inverting traditional cause-to- intelligence in everything from edge devices to
effect models to learn what causes could have networks to HPC facilities. Significant improve-
produced an effect, and then to efficiently ment in the efficiency of ML will enable more
generate experimental campaigns to test accurate surrogate models of complex physical
these hypotheses, would broaden the systems (e.g., reacting flows or failure mecha-
scientific method. nisms in materials), optimization algorithms for
inverse problems in materials characterization
Reinforcement and active learning to and design, and more accurate computation
develop AI for control and data acquisition uncertainties necessary in all science and
systems. Advances to directly address engineering disciplines.

6. References 10. Zhang, K., Zuo, W., Chen, Y., Meng, D. and
Zhang, L. Beyond a Gaussian denoiser:
1. Thomas, N., et al. Tensor Field Networks: Residual Learning of Deep CNN for Image
Rotation- and Translation-Equivariant Denoising. IEEE Transactions on Image
Neural Networks for 3D Point Clouds. Arxiv Processing, 26: 3142–3155 (2017).
Preprint, arXiv:1802.08219, 2018. 11. He, K., Zhang, X., Ren, S. and Sun, J.
2. Jordan, M. I., Artificial Intelligence: The Deep Residual Learning for Image
Revolution Hasn’t Happened Yet. Harvard Recognition. Proceedings of the IEEE
Data Science Review (2019). doi:10.1162/ Conference on Computer Vision and
99608f92.f06c6e61. Pattern Recognition, 770–778, 2016.
3. Baker, N., et al. Workshop Report on Basic 12. Bottou, L., Curtis, F.E. and Nocedal, J.
Research Needs for Scientific Machine Optimization Methods for Large-Scale
Learning: Core Technologies for Artificial Machine Learning. Siam Review, 60: 223–
Intelligence, 2019. doi:10.2172/1478744 311 (2018).
4. Arridge, S., Maass, P., Öktem, O., and 13. LeCun, Y.A., Bottou, L., Orr, G.B. and
Schönlieb, C. Solving Inverse Problems Müller, K.R. Efficient Backprop. Neural
Using Data-Driven Models. Acta Networks: Tricks of the Trade, Springer,
Numerica, 28, 1-174. doi:10.1017/ Berlin, Heidelberg, 2012.
S0962492919000059. 14. Sutskever, I., Martens, J., Dahl, G. and
5. Kondor, R., Trivedi, S. On the Hinton, G. On the Importance of
Generalization of Equivariance and Initialization and Momentum in Deep
Convolution in Neural Networks to the Learning. International Conference on
Action of Compact Groups. Proceedings of Machine Learning, 1139–1147 (2013).
the 35th International Conference on 15. Duchi, J., Hazan, E., and Singer, Y.
Machine Learning, PMLR 80:2747–2755, Adaptive Subgradient Methods for Online
2018. Learning and Stochastic Optimization.
6. Goodfellow, I., et al. Generative Adversarial Journal of Machine Learning Research,
Nets. Advances in Neural Information 2121–2159 (2011).
Processing Systems, 2014. 16. National Research Council.– Frontiers in
7. Chen, X., et al. Infogan: Interpretable Massive Data Analysis Washington, DC:
Representation Learning by Information The National Academies Press. 2013.
Maximizing Generative Adversarial Nets. https://doi.org/10.17226/18374.
Advances in Neural Information Processing 17. Mustafa, M., et al. CosmoGAN: Creating
Systems, 2172–2180 (2016). High-Fidelity Weak Lensing Convergence
8. Arjovsky, M., Chintala, S. and Bottou, L. Maps Using Generative Adversarial
Wasserstein Generative Adversarial Networks, Computational Astrophysics and
Networks. International Conference on Cosmology 6, ( 2019).
Machine Learning (pp. 214–223, 2017). 18. Wu, J. L., et al. Enforcing Statistical
9. Paganini, M., de Oliveira, L. and Nachman, Constraints Generative Adversarial
B. CaloGAN: Simulating 3D High Energy Networks for Modeling Chaotic Dynamical
Particle Showers in Multilayer Electro- Systems, Cornell University, 2019.
magnetic Calorimeters with Generative https://arxiv.org/
Adversarial Networks. Physical Review D abs/1905.06841.
97: 014021 (2018).

19. Raissi, M., Perdikaris, P., Karniadakis, G. 25. Tschannen, M., Bachem, O. and Lucic, M.,
E. Physics Informed Deep Learning (Part I): 2018. Recent Advances in Autoencoder-
Data-Driven Solutions of Nonlinear Partial Based Representation Learning. arXiv
Differential Equations. https://arxiv.org/abs/ preprint arXiv:1812.05069.
1711.10561. 26. Ben-David, S., Hrubeš, P., Moran, S. et al.
20. Yang, L., et al. Highly Scalable, Physics- Learnability Can be Undecidable. Nat Mach
Informed GANs for Learning Solutions of Intell 1: 44–48 (2019). doi:10.1038/s42256-
Stochastic PDEs (SC’19 Deep Learning on 018-0002-3
Supercomputers Workshop). 27. Tshitoyan, V., et al. Unsupervised Word
21. Weiler, M., Geiger, M., Welling, M., Embeddings Capture Latent Knowledge
Boomsma, W. and Cohen, T. 3D Steerable from Materials Science Literature. Nature
CNNS: Learning Rotationally Equivariant 571, 95–98 (2019). doi:10.1038/s41586-
Features in Volumetric Data. Advances in 019-1335-8
Neural Information Processing Systems, 28. Swain, M. C. and Cole, J. M., 2016.
10381–10392, 2018. ChemDataExtractor: a Toolkit for
22. T. D. Bui, S. Ravi and V. Ramavijjala, Automated Extraction of Chemical
Neural Graph Learning: Training Neural Information from the Scientific Literature.
Networks Using Graphs, Proceedings of Journal of Chemical Information and
11th ACM International Conference on Web Modeling, 56: 1894–1904 (2016).
Search and Data Mining, 2018. 29. Clark, P., et al., 2019. From ‘F’ to ‘A’ on the
23. R. L. Murphy, B. Srinivasan, V. Rao and B. NY Regents Science Exams: An Overview
Ribeiro, Relational Pooling for Graph of the Aristo Project. arXiv preprint
Representations, Arxiv:1903.02541, 2019. arXiv:1909.
01958.
24. K. Xu, W. Hu, J. Leskovec and S. Jegelka,
How Powerful are Graph Neural Networks?
ArXiv:1810.00826v3, 2019.

108
11. Software Environments and Software Research
The DOE Office of Science has an opportunity unique user facilities that produce petabytes of
and need to research and develop software to data, have no counterpart in industry, and
address the office’s research mission. Such an require new AI software and capabilities.
effort would complement large investments by Finally, many of the DOE scientific datasets
industry to develop AI software environments. need the scale of HPC systems for analysis,
The DOE has deep expertise in simulation, and those systems can have unique
modeling, and large-scale data analysis, and it architectural features that require software
also operates the largest and broadest set of attention and investment, such as large-scale
user facilities for experimental and I/O subsystems and heterogeneous compute
observational science, including light sources, elements. With DOE’s challenging datasets
telescopes, and genomics facilities that have and deep expertise in data analytics,
growing computing and data-analysis simulation, and modeling, DOE researchers
requirements (see also Chapter 16, Facilities are well positioned to contribute unique
Integration and AI Ecosystem). There is an enhancements to the AI software stack.
urgent need to develop software and
computing environments that enable AI 2. Major (Grand) Challenges
capabilities to be seamlessly integrated with
large-scale HPC models and the growing data- When considering the impact of AI on software
analysis requirements of experimental facilities. environments and software research, three
significant opportunities are apparent. First, the
1. State of the Art integration of AI into the “inner loop” can lead
to more effective simulations (see also
There is currently a proliferation of software Chapter 10, AI Foundations and Open
and frameworks for data analysis and machine Problems). For example, leveraging AI within a
learning. Top deep learning and ML simulation could lead to more efficient
frameworks today include scikit-learn, modeling by virtue of the development of digital
TensorFlow, PyTorch, and Keras, but new twins during runtime. Second, integration of AI
software and frameworks are being released into the analysis approach could lead to faster
regularly. These new frameworks are primarily generation of analytical results, automate the
developed and led by industry, with some identification of anomalous behavior, and
notable contributions from academia for ultimately lead to automatic hypothesis
software such as Spark and Jupyter. The generation. Finally, the integration of AI into the
software is open source, though not open management and control of research labs,
governance, and is often controlled and facilities, experiments, and workflows (i.e., the
sponsored by industry leaders, such as Google “outer loop”) can help achieve a variety of
and Facebook. goals. Examples include adapting workflows in
response to new hypotheses generated during
There are a few notable gaps between state-of- the workflow, scheduling resources for more
the-art and DOE scientific requirements when it efficient use of facility hardware, and
comes to software for AI. First, DOE dramatically reducing the total cost of operating
researchers produce massive amounts of data facilities. These three grand challenges are not
from simulations and models that can benefit orthogonal and would provide the greatest
from the integration of AI capabilities. These impact when examined together (see also
are often challenging datasets with Chapter 16, Facilities Integration and
multidimensional data and can also include AI Ecosystem).
nonimage-based data. Second, DOE runs
11. SOFTWARE ENVIRONMENTS AND SOFTWARE RESEARCH 109

Figure 11.1 Three opportunities for the integration of AI into software environments have the potential for dramatic impact
on DOE science: (1) Within the “inner loop” of simulations and experiments, (2) to accelerate and enhance traditional
analysis approaches, and (3) in the “outer loop” to assist in the management and control of workflows, laboratories, and
facilities.
Develop software for seamless integration are sufficiently accurate and, therefore,
of simulations and AI. DOE is the premier determine when the trained model can replace
agency for large-scale simulation and modeling the simulation kernel. Similarly, AI approaches
of physical phenomena because it has deep could be employed to aid in mapping
institutional knowledge and expertise in simulation workflows onto upcoming complex
numerical methods, solvers, and parallel and heterogeneous platforms, revising the use
implementations. There is an opportunity to of resources over the course of workflow
improve the performance, efficiency, and execution through increasingly refined and
fidelity of traditional simulations by integrating accurate performance models. These
AI capabilities. Such a system would allow the approaches have the potential to significantly
integration of data from different sources, in impact traditional simulation and modeling by
different formats, and over different time improving the performance of simulations [1].
domains into existing mathematical models and
adapt in real time to changing model This would lead to a new hybrid computation
conditions. In addition, AI model-generated model, combining traditional simulation with AI
data can be validated against in-memory results in a model that runs more efficiently or
simulation data; by comparing results from in produces higher fidelity results. For example, a
situ analyses on simulation-generated and traditional mathematics-based climate model
model-generated data, one can also determine (i.e., a multiscale, multiphysics simulation)
thresholds at which the model-generated data could be enhanced by replacing a

computationally intensive kernel with stochastic optimal or faster converging system. To
properties with a physics-informed ML transition to a mode where simulations can
approach. Alternatively, an AI system could adapt in real time, however, investments are
learn the representations produced by a needed in areas such as enabling real-time
simulation kernel, and then the kernel could be annotations and descriptions of schemas to
replaced with a better-performing, lower- allow the real-time adaptation of models and
complexity generative model. Furthermore, the analysis. Computer scientists and software
output of this model could be combined with developers cannot do this on their own; it is
ML hydrology models, flooding maps, and essential that software capabilities be co-
evacuation routes, allowing new and more designed in concert with algorithm developers
accurate predictions (see also Chapter 2, Earth and domain science experts. One sometimes-
and Environmental Sciences) [2]. overlooked facet of this challenge is the
corresponding need for enhancements in data
Significant investments need to be made in storage, access, and management that would
software and programming environments to facilitate rapid identification of relevant data,
realize this vision. Today, the coupling of transformations between different data
traditional modeling and simulation codes with representations, and capture of relevant
AI capabilities is largely a one-off capability, provenance to assist in reproducibility of results
replicated with each experiment. The (see also Chapter 12, Data Life Cycle
frameworks, software, and data structures are and Infrastructure).
distinct, and APIs do not exist that would
enable even simple coupling of simulation and Develop software for knowledge extraction
modeling codes with AI libraries and and hypothesis generation. The volume of
frameworks. In situ data analysis requiring ML data, and the knowledge that can be derived
capabilities suffers from the same limitations. from data, is expanding exponentially in nearly
Significant software engineering investments every area of science. Creating next-
are needed to enable reusability and generation AI software that identifies gaps in
composability that would reduce the integration existing knowledge and relevant data can
overhead between simulations, data analysis, enable the generation of new scientific
and AI, along with the integration of new hypotheses relevant to each scientific question
foundational research advances into AI and provide recommendations for knowledge
software. This includes addressing the need for discovery. Such intelligent software and
composable data structures and modular frameworks will be able to investigate various
elements that enable seamless movement possibilities, parameters, and models in a
between simulation, data analysis, and AI scalable manner to gain fundamentally new
algorithms, as well as improvements in insights in specific science domains. In
performance modeling and programmatic addition, as interdisciplinary research is gaining
control of task placement in workflow systems momentum, DOE is well-positioned with its
to enable autonomous mapping of tasks to breadth of both scientific and foundational
heterogeneous resources at runtime. interests to serve as the nexus for this work.
AI-enabled software can use meta-learning
In addition, at present, the parameters of techniques to identify potential overlaps across
models and the choice of solvers is largely the different science domains and generate
determined by human expertise and is fixed at hypotheses that can lead to new discoveries.
compile or runtime [1]. An integrated AI and Such AI software can keep track of many
simulation software environment would enable disparate but relevant data points within and
a model to use one method in a given time step between different sciences as well as suggest
and a different one in the next for a more next steps.

For example, experiments have traditionally techniques do not provide substantial
been designed and conducted by humans, with understandability of the models and the
the help of computational simulations and outcomes. Finally, scalability and performance
analyses to identify and constrain the design. will be a major challenge in knowledge
This cycle of experimentation benefits from the discovery. The sheer amount of data and
body of data amassed in previous experiments. associated variables across different science
AI can provide further insights when trained on domains need a scalable framework that can
experimental data, combined with simulation run HPC systems.
data and analysis results, culminating in a
more precise representation of the phenomena As researchers identify gaps in data and
being studied that incorporates physical generate new hypotheses, significant
constraints, domain knowledge, and human investments are needed for developing
expertise. On the basis of a continuously intelligent, scalable AI software frameworks
growing collection of validation data from that can leverage existing data, models, and
experiment and simulation, the predictive the associated provenance about the training
power of AI models will improve over time; by and analysis methods. Such a framework
identifying areas where the AI models fall short, would provide real-time recommendations for
one could call for more data and experiments understanding the data gaps and exploring the
to improve performance in the low-performing hypotheses. Hence, next-generation AI
context. As these experiments are conducted software needs a self-improving metadata layer
and the model learns from the resulting data, that can continuously learn from the data and
its predictive power will improve in the poor models to enable algorithmic discovery from
areas, and it will become better able to data. Such AI software would use the
generate hypotheses for subsequent provenance and metadata to describe the
experiments (see Chapter 4, High Energy model architecture, parameters, and data.
Physics and Chapter 14, AI for Imaging). Investments are also needed to identify the
unique architectures that cannot only help
One of the major challenges in enabling determine the right network architecture for a
knowledge discovery and hypothesis particular science domain, but also cross
generation is the reuse of existing and future multiple domains. This will require additional
data. Both the scale and potential disparate investments in infrastructure for sharing the
modality of scientific data, be it simulation knowledge and associated data.
output or experimental observations, are
unique when compared to other traditional, Enable self-driving experiments with AI
nonscientific AI training datasets. The amount integration and controls. Computing has
of data, and the existence of different data become increasingly pervasive as a tool at
types and models, requires AI software that experimental science facilities, from simulating
can enable interoperability and knowledge phenomena to controlling systems, analyzing
extraction by reusing the data from different experiment data, and driving hypotheses to
domains. The use of natural language explore in subsequent experiments (see
processing in AI training will be of growing Chapter 4, High Energy Physics and Chapter
importance to integrate across the disparate 14, AI for Imaging). AI can be regarded as an
data modalities, which may include scientific embodiment of this process, touching all of
literature in addition to experimental and these aspects and driving the hypothesis-
simulation data. Another critical challenge is to simulation-analysis cycle as a whole.
generate the right hypothesis, as research
across all science domains has become There is a wide range of experimental science
incredibly complex and it is extremely hard to projects, and their integration with compute and
connect the relevant data points. Existing ML data capabilities is varied; however, the

direction a number of experiments are moving be allowed to observe the incoming stream of
toward is more frequent online analysis and application workflows, the state of the
adaptation of experiments. For example, experiment, computational and storage
scientists at certain light sources use analysis resources, and the behavior of applications that
of imaging for decision-making in near real have been executed, adjusting the allocation of
time. These analyses are typically run at the resources toward goals such as maximizing
experimental facilities within the constraints of utilization of the platform; enabling rapid
time-to-solution and compute availability. In execution of priority jobs (e.g., in response to a
addition to employing AI in analysis and national or regional emergency); or throttling
hypothesis generation as described above, AI power utilization in times of high demand (e.g.,
could be used to act on these results, adapting in response to a regional heat wave). By
to data as they emerge by adjusting the integrating AI into the management of these
parameters of the experiment toward specific systems, the need for immediate responses
goals, such as protecting resources, from facility staff would be reduced, freeing
maximizing the data gathered related to a resources to better assist application teams in
specific phenomenon, or following up on making the best use of these resources. (See
surprising or anomalous results. By automating Chapter 15, AI at the Edge for further
high-level decision-making, experiments could descriptions of edge computing use cases.)
proceed without scientists onsite, and scientists
would be better able to focus on high-level Numerous software advancements are needed
goals of the discovery process rather than to achieve this vision of AI-driven science.
directly monitoring individual experiments. Workflow description capabilities need to be
Ultimately, this AI capability could also identify enhanced to allow for expert knowledge, goals,
experiments that cannot be executed with building-block tasks, and constraints (e.g.,
current devices but are likely to uncover safeguards) to be described in a manner that
promising results, pointing toward promising can be used to automatically construct
new experimental capabilities. workflows. Telescoping language approaches,
for example, might be beneficial in this context.
More generally, complex workflows are an Additionally, new tools for programmatically
integral part of scientific discovery, and describing relationships are needed to enable
increasingly these workflows are defined AI systems to reason about cause and effect in
programmatically so that they may be executed these systems, or at least to provide a starting
as an integrated system. Just as in traditional point on which improved models can be built.
programming, handling the wide variety of Finally, significant investments are needed in
possible outcomes from specific tasks is systems that enable programmatic control of
tedious and error-prone, leading to workflows instruments.
that often terminate in the face of unexpected
results. By allowing scientists to describe 3. Advances in the Next Decade
workflows in terms of high-level goals, building-
block tasks (i.e., experiments, simulations), and The current rapid pace of development in ML
rough models of the costs of those tasks, an AI methods is a direct consequence of the avail-
system could instead generate a specific ability of open-source software frameworks,
workflow, incorporating expert knowledge, to such as PyTorch and Tensorflow, that tightly
accomplish those tasks, adapting as results are integrate algorithmic and programming
uncovered or new data become available and techniques (e.g., optimization and automatic
refining the models of costs (e.g., in time). differentiation) with modern hardware. They
Similarly, AI can be integrated into the lower the barrier for entry and enable rapid
management of experimental controls and iteration on the new domain-specific
computational resources. An AI system could ML architectures.

Extrapolating this development trend, we can Second, it is important for computer scientists,
expect a software stack that facilitates efficient applied mathematicians, and domain scientists
use of a broader range of algorithmic and to work together to co-design solutions that
mathematical techniques, making it as easy to integrate AI into these complex scientific
use methods from geometry, topology, endeavors. These types of partnerships have
functional analysis, optimal transport, and been successfully established toward other
constraint satisfaction as it is today to use applications of computer science and applied
differentiable programming. Simultaneously, mathematical techniques, and in many cases
given the current trends, data sizes, and the these partnerships can be launchpads for new
role of hardware in ML, it is reasonable to AI-focused efforts. Early experiences in these
expect the evolved software stack to take even partnerships will further inform research
greater advantage of the new hardware investments as well.
accelerators, as well as to target distributed
computing architectures (see also Chapter 13, Finally, looking outward, it is critical to
Hardware Architectures and Chapter 16, understand how AI and associated
Facilities Integration and AI Ecosystem). This technologies being developed outside the DOE
presents a potential danger of more are managed and governed to assess whether
fragmentation into proprietary silos, as well as these tools will be viable for DOE use over the
targeting architectures like the industrial long term. If our strategy involves heavily
“cloud,” rather than DOE supercomputers. leveraging AI software technologies from
other sectors, then DOE must engage
There is also a clear recognition in the industry with these communities and establish itself as
of the challenges of data and workflow a contributor to help ensure the relevance
management associated with the vast volumes and effectiveness of these packages on
of training data and complicated processing HPC systems.
pipelines. There are numerous efforts to
automate and standardize approaches to these 5. Expected Outcomes
challenges, and they are likely to bear fruit in
the coming decade. It is important to note that Advanced and capable software is
these industry efforts are driven by data types indispensable for scientific discovery through
that are often very different in terms of simulation, modeling, and data analysis. With
modality, dimensionality, resolution, and scale AI radically transforming numerous fields, it is
from those produced by DOE experimental, crucial that investments are made in software
observational, and computational facilities. to support new AI capabilities in support of the
DOE mission. The integration of AI into
4. Accelerating Development traditional simulation and modeling will improve
performance, efficiency, and fidelity of models
Three early activities would help put efforts in of complex phenomena, and the ability to
software environments on track for success. integrate models with historical and real-time
First, a strong gap-analysis effort that identifies data will improve the predictive accuracy of
internal requirements and assesses existing systems. Next-generation AI software
tool capabilities is critical to understanding frameworks will automatically identify gaps in
where investments are most needed. Initial existing knowledge and relevant data and
work in this direction has been performed in explore new and unexpected scientific
specific science domains already, and reports hypotheses, leading to potentially ground-
are being written. breaking discoveries already within reach of

DOE experiments. Experiments and workflows 6. References
with AI augmentation will optimize use of
premiere DOE computing and experimental 1. Baker, N., et al. Workshop Report on Basic
facilities and identify key features of these Research Needs for Scientific Machine
facilities that lead to even more effective future Learning: Core Technologies for Artificial
platforms. Investments in software for AI will Intelligence, DOE Office of Science
result in software artifacts, new or enhanced Technical Report, 2019.
frameworks, models, and libraries that will
2. Gil, Y. & Selman, B., A 20-Year Community
broadly benefit the DOE user community.
Roadmap for Artificial Intelligence Research
in the US, 2019.

116
12. Data Life Cycle and Infrastructure
Much recent progress in AI has been fueled by to massive video—and from many sources,
the availability of massive data. For example, including the scientific literature (e.g., Chapter
dramatic progress in deep neural networks for 1, Chemistry, Materials, and Nanoscience),
image understanding owes much to the experiments, simulations, and vehicle fleets,
ImageNet database of more than 14 million and encompassing both public and proprietary
annotated and labeled images. Science, too, is elements. Each item within this data collection
about data, and the AI-driven transformation of is documented with details as to where, when,
science will require major changes in data and how it was generated. Furthermore, the
generation, organization, processing, and data collection accommodates dynamic
sharing. This section reviews these changes additions as new knowledge is created.
and the research and development necessary
to support this vision. Such data collections do not exist today,
outside of a few narrow domains. Laboratories
Consider the following scenario: It is 2030. are not, in general, set up to preserve data.
DOE scientists are working to develop a low- Many data are recorded in archaic formats and
cost, high-performance solid-state battery for media without annotation. Descriptive
use in vehicles. Intuiting that disordered metadata are inadequate and inconsistent.
materials holds promise, they task an AI Data are rarely findable, accessible,
system with identifying candidate formulations. interoperable, or reusable (FAIR) [1], whether
Informed by 400 years of physics knowledge, by scientists or by AI systems. Data collections
100 years of scientific literature, and 40 years are often biased by a tendency not to publish
of experimental data from DOE labs, negative results (i.e., the “file drawer problem”).
universities, and industrial collaborators, the AI Autonomous laboratories that can generate
system is able to evaluate options faster than data at scale and under AI direction exist only
any human expert. It suggests new families of in prototype forms.
disordered materials that may have acceptable
stabilities, power densities, and manufacturing AI-driven discovery across the broad range of
costs. However, it also shows high domains important to DOE science will require
uncertainties in its predictions. transformations in both the methods and
infrastructure used to acquire, organize, and
To collect more data, the scientists task the AI process data and the policies that govern data
system with defining and running a series of access. These advancements must proceed
experimental and simulation studies in new via a process of co-design, with progress in
autonomous laboratories and on postexascale methods informing infrastructure and policy
conventional and quantum computing systems. changes and vice versa. The ultimate goal is a
New data integrated into the AI model motivate system of methods and infrastructure that
further experiments. Within weeks, the human enables the coordinated creation, application,
expert/AI team has refined understanding to and update of large quantities of data and
the point where large-scale manufacturing can knowledge as well as associated models,
be considered. Provenance information workflows, computations, and experiments
collected throughout allow for reuse and meta- (Figure 12.1).
analysis of discovery processes.
This chapter makes the case for three priority
Central to this scenario is the existence of a research directions, or grand challenges, to
large, well-curated, and integrated collection of produce the methodological advances required
data of many types—from point measurements to create AI-ready data infrastructure.
12. DATA LIFE CYCLE AND INFRASTRUCTURE 117

to engage fully in the development and
application of AI methods for science are still
lacking. Few communities have large
collections of high-quality, curated, and labeled
data suitable for use by AI systems. Even in
domains where much data has been
generated, silos and the lack of coordination
among data collection efforts hinder
broader access.
Figure 12.1 AI-driven science requires simultaneous For example, in the field of materials science
advancements in the methods, infrastructure, and policies used (Figure 12.2), data collections number in the
to acquire large-scale scientific data, integrate data and
symbolic knowledge, and structure data infrastructure. hundreds and are distributed worldwide. The
Materials Data Facility [2] indexes more than
Automate the large-scale creation of FAIR 100 data sources and operates automated data
scientific data. Given data’s central role in AI- ingestion and metadata extraction pipelines to
driven science, new technologies, methods, facilitate automated analyses. Nevertheless,
and best practices are needed to scale the most materials data remain unfindable,
generation, capture, annotation, and organiza- inaccessible, and noninteroperable and are
tion of data from experiments, observations, rarely reused.
and simulations to produce large collections of
FAIR data for AI-enabled discovery. As a second example, the velocity at which
microbiome data are generated has far
Integrate data and theory to create con- outpaced current capabilities for collecting,
verged knowledge repositories. Realizing processing, and distributing these data in an
the full potential of scientific AI requires a effective, uniform, and reproducible manner,
convergence of data and symbolic even at the largest data centers. The National
representations. To this end, new methods are Microbiome Data Collaborative (NMDC) was
required to synthesize AI models from data and established by the Office of Science in 2019 to
to integrate symbolic representations of build the infrastructure needed to apply
scientific knowledge, to create knowledge consistent ontologies, annotations, and
collections that are similarly FAIR. processing to create a FAIR microbiome data
resource. The NMDC aims to remove
Architect new infrastructure to support roadblocks in the development of AI methods
ubiquitous scientific AI. As AI methods are for microbiome analysis by making large
deployed ever more widely, new infrastructural quantities of labeled, curated, interoperable
concepts and methods are required to ensure data available to the public. Broad success in
that both data and the computation required to these areas depends on overcoming
ingest, enhance, integrate, and interpret data challenges outlined in this chapter.
can be accessed efficiently and reliably—
whenever, wherever, and at whatever scale The Systems Biology Knowledge Base (kBase)
required. [4], Earth System Grid Federation (ESGF) [5],
and Atmospheric Radiation Measurement
1. State of the Art (ARM) facility [6] are further examples of DOE-
supported data infrastructures that assemble
Despite much progress in scientific data large volumes of important scientific data that
acquisition and management, the datasets, offer opportunities for application of AI
processing methods, and infrastructure needed methods.

Figure 12.2 Timeline and geographic distribution of selected materials data infrastructures and companies [3].
Overall, the infrastructure and methods needed science data, and the development of methods
to enable AI methods to access, learn from, and technologies for manipulating those data.
and add to a broader body of knowledge are in By addressing the following grand challenges,
their infancy. this vision has a stronger probability of
being realized.
Annotation with useful metadata is an
important prerequisite for widespread use of Automate the large-scale creation of FAIR
scientific data. Some communities have well- scientific data. Much scientific data today is
established procedures for encoding metadata still created laboriously through individual
in datasets, such as the climate and forecast experiments and then organized via time-
(CF) metadata conventions used in earth and consuming and error-prone manual data
atmospheric sciences. Yet even when such acquisition, movement, and annotation steps.
conventions exist, they often fail to capture Many data are discarded to alleviate transfer
detailed annotations to support searches for and storage costs, and descriptive metadata
specific characteristics or features within large are often inadequate to enable subsequent
datasets. Some recent work investigates the reuse. Scientists need new approaches if they
use of ML to generalize metadata from a are to accumulate the volume, variety, and
subset of labeled data by classifying electron quality of science data required for AI-driven
microscopy images automatically as being methods. In particular, steps must be taken to
generated by either transmission electron or automate major elements of data creation.
scanning transmission electron microscopy [7]. Automation is discussed here from the
Much more work is required to streamline and perspective of data and workflows (see
simplify the process of creating metadata for Chapter 11, Software Environments and
scientific datasets. Software Research). See also a recent ASCR
report [8].
While harnessing existing data flows within
Successful realization of AI-driven science at scientific laboratories is an important first step
the scales envisioned in this report requires the toward creating the rich data collections
creation of large collections of FAIR, AI-ready needed for AI-driven science, progress will

remain limited if it is dependent on experiments times, and for different purposes, systems must
defined and executed manually by human be powerfully interoperable and able to
operators. High-throughput experimentation produce data maps to guide subsequent
alone is not a sufficient solution, either. In human or AI consumers. Data will have value
many fields of science, the possible in unexpected ways for AI agents; when,
alternatives are far too numerous for where, and how a dataset may be used after
exhaustive searches. Instead, autonomous generation are all unpredictable. Data curation
laboratories are required that combine, in decisions concerning, for example, what data
varying ways, high-throughput experimentation, to collect and what to discard, will become vital
large knowledge bases, AI methods, and as larger amounts of science data become
human guidance to both generate data and available. Data collections must flexibly
answer questions [9,10]. The development of accommodate notions of value and importance,
such autonomous laboratories will require age, and ownership to support their optimal
advances in many areas, not least data use for AI. Furthermore, because data
management and analysis, so that AI agents collection and curation decisions frequently
can define new experiments quickly and incorporate ethics and bias concerns, scientists
effectively in light of extant knowledge. must have systems that expose such
considerations early and often to facilitate
Regardless of how data are generated, the transparency into data uses.
automation of what are currently manual data
capture and curation tasks is key to increasing The broader ecosystem of data, including
the quantity of data collected and the quality human and autonomous agents, must consider
and usability of those data and associated model creation (by humans and AI agents),
metadata, as well as supporting ontologies. deployment of software and algorithms, and
Automation must support and simplify all human oversight of these processes. Data
aspects of the process, from creation to use. AI repositories will hold raw and processed data,
itself should be harnessed to improve this software, agents, models, and audit and
process, with autonomous data curation oversight trails. The manner in which humans
capabilities working to capture provenance and retrieve and interact with this ecosystem will be
context information required for future reuse, enmeshed with the data, human–AI interfaces,
and to encode associated uncertainty ranges. and the management and control plane that
These new methods must be adaptable to cuts across this ecosystem.
different applications and disciplines and be
able to support multimodal data collection from Accelerating proliferation of AI methods and
multiple science domains. They need to be applications will require that the data
able to organize datasets for both immediate infrastructure adapt to accommodate these
use and subsequent reuse, without requiring transformative technologies. New software
scientific campaigns to plan for curation and pipelines will come together end-to-end, pulling
storage independent of data gathering. models from diverse sources (Figure 12.3).
Advancements in the coming decades will
With increased automation, it can be require an increasingly flexible, ever-evolving
anticipated that multimodality and diversity of data and software ecosystem that is
data will change from being a barrier to an changeable, self-tuning and explainable so
asset for AI purposes. To align a multitude of that human overseers can provide
datasets that are collected from different appropriate oversight.
sources, in different locations, at different

The FAIR principles discussed earlier can be
repurposed from data to models, yielding the
following requirements and research
challenges for learned models.
• Findable: Innovations are needed to enable

discovery of models that meet specific
research needs, relating, for example, to
domain of applicability, uncertainties, and
nature of the source data used to
generate them.
Figure 12.3 Bibliographic analysis shows rapid growth in the
number of papers describing AI/ML methods in different
• Accessible: New approaches to structuring
physical science disciplines [11]. models are needed to allow them to respond
to a wide variety of both human- and
Integrate data and theory to create con- machine-driven queries. It is important for
verged knowledge repositories. Petabytes or scientific reproducibility that model outputs
even exabytes of data may be essential to can be related to the data elements used to
progress in AI-driven science, but scientists create the model, except when working with
cannot realistically manipulate all relevant data sensitive or proprietary data, when models
whenever they ask a question. Instead, as must encode relationships without revealing
discussed also in Chapter 10, AI Foundations the specifics of, for example, a single
and Open Problems, methods are needed that patient’s medical record (see Chapter 3,
can summarize large quantities of data in forms Biology and Life Sciences) or a single
suitable for use in subsequent research, and manufacturer’s drug assay.
then manipulate both such summaries and • Interoperable: As many models are created,
explicit symbolic (e.g., mathematical, but also it will become important to be able to
qualitative and natural language–derived) combine them—to chain together, for
representations of knowledge. example, models of materials properties and
manufacturing processes to explore
Consider, for example, how Kepler extracted, materials that are both nontoxic and
from decades of observations of planetary manufacturable. Innovations are needed to
motion, compact quantitative relationships create models that can be linked in
among such quantities as orbital period and such ways.
axis, which coalesced as his Laws of Planetary
Motion. Consider also how Newton’s Laws of • Reusable: A model that summarizes a
Motion allow for direct calculation of many certain physical phenomenon needs to be
relationships. Similarly, AI-driven science callable in different contexts, including from
requires the ability to synthesize new models within simulations and other computations,
from data, so that massive data can be and be deliverable to different locations
consumed effectively by scientists and AI (e.g., supercomputers, edge devices) for
methods, and assimilate symbolic representa- different purposes.
tions of known relationships and physical laws.
Technologies, methods, and best practices are Architect new infrastructure to support
needed for synthesizing learned models from ubiquitous scientific AI. The infrastructure
scientific data via the use of AI methods, and required to help accumulate and support FAIR
for working with datasets, learned models, principles must be ubiquitously available to
and symbolic knowledge in natural and support science campaigns from the start, help
efficient ways. accelerate discoveries through domain science

advances, and promote the use of AI on the dissemination and exchange of scientific data.
data. Today’s examples of autonomous Such tools and technologies would be central
vehicles, Internet and media data, and to the needs of scientific reproducibility,
personal health data demonstrate the needs as providing the capability to validate and trust
well as the constraints on data acquisition, experiments, improve science campaigns in
movement, staging, storage, and access. the future, and dramatically enable the reuse of
Science domains grapple with different types of AI techniques for science. The goal is to
data and impose greater constraints than are develop systems that can collect data for AI,
typically encountered in other settings. For enhanced by AI—and make those data
example, scientific data can be several orders accessible anywhere in reusable forms. As the
of magnitude larger than enterprise data. In preparation, organization, and use of data for
addition, data movement needs stretch the AI becomes streamlined and better
limits of current connectivity, despite powerful understood, the value of the appropriate state
tools [12]. of data (raw or reduced) will drive when and
how data are retained within its life cycle.
The traditional infrastructure that supports large
volumes of data must evolve to support AI- 3. Advances in Next Decade
friendly access. Data volumes and retrieval
rates need to scale significantly. AI will require Opportunities and challenges in AI for science
data to train predictive models, observations to are expected to evolve rapidly over the next
infer steps in the process, and control data to decade due to three major factors:
modify and optimize the feedback loop of
theory to experiment and back to Dramatic increases in the volume of
improvements in theory (and our understanding available data. The amount of data available is
of the world). expected to result from improvements in
scientific instrumentation, sensors, and
Given such a need for the acceleration of AI computation. For example, the upgraded
with data, data access pathways must be Advanced Photon Source (APS) at Argonne
scalable in both breadth (distributed) and depth National Laboratory will produce, from 2023
(low latency and high-bandwidth). New search onwards, up to three orders of magnitude more
and retrieval techniques that extend beyond data than in 2019.
the capabilities of our current centralized
approaches must be developed to support the Emergence of autonomous laboratories.
ubiquitous reach of AI techniques. Under- Laboratories capable of collecting data about
pinning these new data flows will need to be scientific phenomena without human
greatly enhanced computational capabilities, intervention will drive developments that have
not only centralized but also co-located with the potential to transform scientific AI by
data producers and consumers, and configured generating data with greatly increased speed
to support specialized AI workflows. DOE user and consistency, but they will also introduce
facilities would serve as ideal test beds for new challenges relating to access and potential
prototype deployment, user testing, and algorithmic bias in terms of data collected.
algorithm development.
Rapid improvements in AI software and
The underlying data, both feeding the AI and methods. Many industries are working to
growing as a result of it, must have a address massive data and learning problems,
provenance and use trail. Data collections and such as for autonomous vehicle fleets with tens
the state and history of associated science of millions of vehicles and remote sensing in
campaigns should be easily shareable. thousands of small satellites. Work in these
Capabilities must explicitly support the and other areas will produce continued

improvements in how data are acquired, governance of dataset quality, incorporating
organized, and processed, and in the overall best practices from diverse communities, and
data life cycle itself. Developments relating to providing intuitive policy and efficient
the integration of symbolic knowledge and data mechanisms for data access and ownership
and progress toward artificial general will be vital for data to be available to AI.
intelligence will produce new datasets,
conceptions of how to use data in AI, and A final area of effort should focus on expanding
methods for manipulating data that may have the national workforce and growing the
relevance to scientific AI. DOE science must expertise of scientists and other professionals
engage effectively with these developments. who will prepare, manage, control, deploy, and
monitor the data backplane integral to
4. Accelerating Development AI. Science campaigns that rely on domain
scientists must be structured to include data
An early priority for a scientific AI initiative must scientists early in the process. These data
be to ensure sustained access to the large scientists and domain experts must work
quantities of high-quality data needed to together in an interdisciplinary manner, with
advance AI-driven science. This means explicit cross-training to ensure that the
prioritizing work to harness important data data life cycle contributes to accelerating
flows; to establish the machinery needed to science goals.
collect, organize, and refine the resulting data;
and to gain experience with the use of those 5. Expected Outcomes
data for AI purposes.
This chapter makes the case for the automated
Harnessing important data flows means creation and use of rich, curated collections of
developing and deploying machinery to collect AI-ready scientific data; the integration of large
AI-critical data generated by scientific data collections with symbolic representations
instruments, including the descriptive metadata of scientific knowledge; and new, ubiquitous
required for those data to be useful within infrastructure to enable effective use of those
AI applications. An envisioned program will data within AI-driven workflows. These
implement such comprehensive data collection developments will transform scientific discovery
for a dozen different data sources within by enabling better science, faster, and at lower
a year, with this experience guiding expansion costs; drive virtuous cycles whereby better data
to progressively more sources in produces better AI, and better AI produces
subsequent years. better data; and contribute to expanded
scientific leadership for DOE and the nation.
Simultaneously, efforts should be launched to
develop the technologies and infrastructure 6. References
required to ingest, organize, annotate, curate,
index, and otherwise prepare these data for 1. Wilkinson, M. D. et al. The FAIR Guiding
use in AI applications. Collaborations between Principles for Scientific Data Management
Office of Science user facilities and data and Stewardship (Sci. Dat. 3, 2016).
projects (e.g., NMDC), on the one hand, and 2. Blaiszik, B. et al., A Data Ecosystem to
ASCR researchers, on the other, can help to Support Machine Learning in Materials
accelerate method development. Efforts should Science (MRS Commun., 2019).
also be started to develop AI applications
based on the increasingly large quantities of 3. Himanen, L., Geurts, A., Foster, A. S.,
data that will be collected as automated data Rinke, P. Data-Driven Materials Science:
collection machinery is deployed. Establishing Status, Challenges, and Perspectives (Adv.
Sci., 2019).

4. Arkin, A. P., et al. KBase: The United 9. Aspuru-Guzik, A., & Persson, K. Materials
States Department of Energy Systems Acceleration Platform: Accelerating
Biology Knowledgebase (Nat. Advanced Energy Materials Discovery by
Biotechnol. 36, 7, 2018). Integrating High-Throughput Methods and
Artificial Intelligence (http://nrs.harvard.
5. Williams, D. N., et al. The Earth System
edu/urn-3:HUL.InstRepos:35164974, 2018).
Grid: Enabling Access to Multimodel
Climate Simulation Data (Bull. Am. 10. Carbonell, P. Radivojevic, T., & García
Meteorol. Soc. 90, 2, 195-206, 2009). Martín, H. Opportunities at the Intersection
of Synthetic Biology, Machine Learning,
6. Stokes, G. M., & Schwartz, S. The
and Automation (ACS Synth. Biol. 8, 1474–
Atmospheric Radiation Measurement
1477, 2019).
Program (Bull. Am. Meteorol. Soc. 75, 7,
1201–1222, 1994). 11. Blaiszik, B., Charting ML Publications in
Science (https://github.com/blaiszik/ml_
7. Weber, G. H., Ophus, C., & Ramakrishnan,
publication_charts, 2019).
L. Automated Labeling of Electron
Microscopy Images Using Deep Learning 12. Chard, K., et al. The Modern Research
(Proc. IEEE/ACM Mach. Learn. in HPC Data Portal: A Design Pattern for
Environ., 26–36, 2018). Networked, Data-Intensive Science (Peer J.
Comput. Sci. 4 e144, 2018).
8. Biven, L., Office of Science Data for AI
Roundtable: Presentation to ASCAC
(http://bit.ly/2QWyTbr, 2019).

13. Hardware Architectures
AI is a powerful force driving the design of 1. State of the Art
computer architectures [1]. Impacts include
both an explosion of new start-ups and DOE user facilities will continue to see
hardware designs, and rapid evolutionary increasing data volumes and rates from large
change in all platforms—CPUs, GPUs, and experimental facilities such as light sources,
even mobile phones. The recent 2019 AI nanoscience centers, and advanced computing
Hardware Summit in Mountain View, California, facilities. As detailed by science domain teams
included nearly 40 companies; enterprise, (elsewhere in this document), effective
semiconductor manufacturers, and AI hard- collection and analysis of these data will be
ware start-ups from around the globe enhanced by adopting AI techniques, which will
presented their AI architectures and systems. often be deployed using specialized AI
accelerators to increase performance and
Although these studies and investments are energy efficiency (Figure 13.1). Applying AI
impressive, most of these activities focus on techniques to process these data streams
consumer or enterprise areas such as requires data management capabilities that can
autonomous driving, social networks, finance, reach from the instrument at the edge to the
and virtual reality [2]. The key problems for data center. Without carefully integrated,
these areas are image and video analysis, orchestrated, and managed data infrastructure,
language translation, and autonomous driving. these AI systems will not be productive for
All of these have data characteristics, real-time science. Moreover, these scenarios will
requirements, and deployable resource targets introduce additional complexities of
that are vastly different from the DOE mission. heterogeneous hardware (e.g., x86 multicore,
Specifically, these commercial AI areas have GPUs, and specialized hardware like TPUs) [4]
massive numbers of small, labeled data items and associated programming systems
(e.g., pictures) from which to generate their (e.g., MPI and TensorFlow).
models. DOE mission areas include areas of
computational science with HPC and Motivated by early results, algorithms and
experimental data, where the dataset can be computer architectures for AI are quickly
drastically different: hundreds of simulations or evolving and growing more diverse. These
experiments with dozens of dimensions, rather architectures include specialized devices at
than millions of photos. different scales for five use cases [3]: (1) AI
research and development (requiring maximum
For this reason, it is recommended that DOE flexibility for experimentation); (2) offline
create a focused strategy to shape AI hardware training of AI models in production;
to serve its science mission. Key to success is (3) inference on servers; (4) inference at the
a strategy that leverages community and edge; and (5) online learning on servers and
industry investments in technology and the edge. As other successful AI technologies
scalable (see Chapter 16, Facilities Integration emerge, such as graph-based ML, the
and AI Ecosystem), intermediate, and edge computational challenges and deployment
systems (e.g., field instruments, see techniques will evolve naturally.
Chapter 15, AI at the Edge) for AI.
13. HARDWARE ARCHITECTURES 125

(a) (b) (c) (d)
Figure 13.1 Examples of AI accelerators from 1 (a) Groq, (b) SambaNova, (c) Habana, and (d) Cerebras Systems. 2
At one extreme, systems with thousands of accelerators (i.e., DLAs) for fixed-function
specialized architectures (e.g., NVIDIA Volta convolutional neural networks (CNNs)
and AMD MI60 GPUs, FPGAs from Intel and inference. Another example is Tesla’s FSD
Xilinx, Google TPUs [4], SambaNova, Groq, Chip, which can deliver 72 tera-ops (72x1012
Cerebras) are required to train AI models from operations per second) at 72 watts and support
immense datasets. For example, Google’s TPU capabilities that can respond in 10 milliseconds
pod has 2048 TPUs and 32 terabytes of (driving speed response) with high reliability.
memory and is used for AI model training; its
specialized tensor processors provide 100,000 In contrast, DOE’s applications can require
tera-ops for AI training and inference. In responses 100,000x faster—100 nanoseconds
addition, they are coupled directly to Google’s for real-time experiment optimization in
cloud, a massive data infrastructure electron microscopy or APS experiments
(>100 petabytes). The progress of the Google where the samples degrade rapidly under
TPU in its use for Alpha Go series of matches high-energy illumination (see Chapter 14, AI
demonstrates that codesign—the refinement of for Imaging).
hardware, software, and datasets for solving a
specific goal—provides major benefits to In terms of software, currently, many consumer
performance, power, and quality [7]. applications of AI use software frameworks like
Tensorflow, PyTorch, MXNet, Torch, or Caffe2
At the other end of the spectrum, edge devices that hide much of the complexity of the
must often be capable of low latency inference underlying hardware. As mentioned earlier,
at very low power. Industry has invested these frameworks have been developed for
heavily in a variety of edge computing devices video, image, and speech recognition as well
for AI including tensor calculation accelerators as language translation and natural language
(e.g., ARM Pelion, NVIDIA T4, Google’s Edge processing, but they remain in their infancy for
TPU, and Intel’s Movidius) and neuromorphic processing scientific data. Furthermore,
devices (e.g., IBM’s TrueNorth and Intel’s software integration of this AI ecosystem
Loihi). Experts expect dramatic improvements (e.g., PyTorch) with the HPC ecosystem (e.g.,
in the compute capability and energy efficiency MPI and OpenACC) will be nontrivial;
of these devices over the next decade as they significant challenges remain in coupling
are further refined. For example, NVIDIA and potentially unifying these software
recently released its Jetson AGX Xavier ecosystems for productivity and efficiency (see
platform, which operates at less than 30W and Chapter 11, Software Environments and
is meant for deploying advanced AI and Software Research).
computer vision algorithms at the edge using
many specialized devices such as hardware 2. Major (Grand) Challenges
1 Permission to use each of the pictures was granted Given this spectrum of architectures and their
by each of the respective companies. fast pace of change, DOE will need to be
2 The Cerebras Wafer Scale Engine is 46,222 mm2; actively engaged with the communities of
by comparison the largest GPU is 815 mm2.

applications, data management, software, and In this regard, holistic design of AI technologies
broader architectures to have timely impact. from server to the edge will be paramount.
More specifically, it is recommended that DOE DOE will need predictive methods and tools to
plan to co-design architectures and develop frequently characterize, model, and simulate
software for a range of heterogeneous systems the AI workflows and algorithms to co-design
that span from the edge to the HPC data and procure the appropriate architectures for
center. DOE should advocate solutions for these mixed HPC simulation and AI workloads.
priority requirements for AI in science. If Additionally, these combined HPC and AI
appropriate solutions do not emerge from systems should also be instrumented to
industry, DOE should pursue them internally, provide rich telemetry data that will be critical
leveraging the new communities in open for this purpose. Hardware should be designed
source hardware and through partnerships with with this requirement in mind.
other government agencies.
In fact, there are significant opportunities to co-
Along these lines, the workshop participants design heterogeneous compute nodes that
identified several challenges. leverage the commodity system-on-chip (SoC)
ecosystem. Likewise, another key aspect is the
Create predictive architecture design tools development of the memory subsystem to
to enable rapid evolution of AI accelerators support this heterogeneous compute node and
for science. Both AI and the use of AI to the co-design of the required memory interface
augment traditional DOE scientific computing controller. Once this heterogeneous SoC
are in their infancy. As such, AI workloads are processor is designed, system interconnection
rapidly changing in nearly every dimension: network fabrics that build on the momentum
network structure, model paradigm, numerical from prior DOE investments will be needed to
precision, training approach, training dataset create a postexascale, leadership-class, large-
sizes, data types, and batch size (i.e., working scale heterogeneous system architecture.
set). This transformation is further intensified
by the rapid expansion of application scenarios Create integrated AI workflows and use
and AI software systems. PyTorch is now the them to evaluate emerging AI architectures
most popular framework used by researchers from the edge SoCs to HPC data centers.
at the NIPS conference, but it did not exist Given the broad spectrum of efficient
5 years ago! specialized AI architectures expected, DOE
scientists will need to run workflows using a
Not only are the AI software artifacts spectrum of accelerators (including AI). This
themselves changing, but the process of “extreme heterogeneity” [5] is challenging,
developing AI software is different from requiring scientists to cobble disparate
methods used for traditional software. Most programming systems (e.g., MPI, CUDA,
noticeably, AI software development focuses OpenMP), storage systems, and data formats
much more on the curation, labeling, and to run simulations on HPC architectures. With
preprocessing of training datasets than writing the emergence of AI as a primary
code. Deployment may also be different; AI will technique, contemporary AI frameworks
use new end-to-end workflows and specialized (e.g., TensorFlow, PyTorch) will also need to
hardware as they become available. In be integrated. Unified frameworks are needed.
addition, as AI models are integrated into This challenge will only get more complex in
existing systems, we will need interfaces for the coming decade as architectures and
embedding those AI components in the relevant programming models become
traditional scientific applications seamlessly specialized. These integrated workflows are a
while hiding specialized hardware intelligently. realistic context in which to evaluate the

real impact of AI architectures. The fact to inference engines, which may appear as
that specialization is successful in the AI small, frequent, random operations.
market is an indication that hardware
specialization as a general strategy is logical On the server side, storage systems, such as
and could be employed for other high-value those that support Lustre and burst buffers, are
scientific applications. not designed for and often perform poorly for
these read-heavy, random access workloads.
It is recommended that SoC hardware The new designs need to include intelligent
ecosystems be leveraged by the DOE to co- workflow management systems that can stage
design flexible, heterogeneous computing data appropriately using additional levels of
systems that better integrate AI elements with storage that can facilitate high-IOPs. Likewise,
both scientific hardware for HPC and edge node-local memory hierarchies are relatively
computing for DOE experimental facilities. small when compared to scientific datasets that
DOE science domain teams can complement will be necessary for training.
natural trajectories of vendor product plans that
fail to meet DOE mission needs. These teams At the edge, energy efficiency and performance
can co-design node and system architecture of the memory system will be critical. Edge
concepts that specifically address combined devices will need to perform inference
HPC and AI workloads to meet DOE mission concurrently with other tasks; memory
needs. In addition, DOE discoveries in capacities will need to increase to support
materials research may directly lend these tasks. This realization has led to pursuit
themselves to advanced AI computation of alternative memory technologies including
(e.g., [9]), and they should be pursued directly. NAND flash and 3D Xpoint (e.g., Intel Optane),
because they offer superior energy efficiency
Furthermore, future AI architectures may and density. The precise architecture and
provide new opportunities for algorithms in software system for using these new memories
computational science applications. For in AI systems remains an open question, given
example, AI support for sparse neural networks the change in AI architectures and applications.
can be repurposed for high- performance
sparse matrix computation in a conjugate Enable the incorporation of explicit science
gradient solver. DOE will need to actively domain knowledge into AI systems and
explore these opportunities as the new hardware to improve robustness and
hardware emerges. capabilities. AI training typically requires huge
quantities of input data, and system behavior
Meet the rapidly growing demand for may be fragile if it is subjected to stimuli
memory, storage, and I/O capabilities of the outside of the original training coverage. Many
emerging requirements of AI-enabled industry applications have massive datasets
science. Current HPC memory and storage (e.g., composed of millions of hours of
systems are architected for traditional HPC 4K video or billions of photos) that can be used
simulation-only workloads with relatively small for training, whereas simulation output and
inputs and large outputs, where the access scientific data from experiments are much
patterns are predictable, contiguous, block- more expensive to produce and often have
based operations. many more dimensions, rendering them
untrainable due to the “sparsity” of the training
AI training workloads, in contrast, must read data. Current AI systems may become fragile
large datasets (i.e., petabytes) repeatedly and when encountering novel situations that lie
perhaps noncontiguously for training. AI outside of their initial training dataset, and the
models will need to be stored and dispatched AI systems cannot guarantee that answers

satisfy any explicit constraints (e.g., in physics, growth in training cost per weight. If the
AI-inferred results must adhere to the law of largest model training costs continue their
conservation of energy). For applications that current growth rate of 10x/year, economic
have consequences to human life, such as and environmental consequences will
autonomous vehicles, adding these “instincts” ultimately be the practical limits.
about physics and causality are an urgent • Steady reduction and plateau in inference
priority for industry AI-hardware investments latency and cost to commercially important
that will enable systems to manage novel thresholds (i.e., ~5 milliseconds for human
situations and to be trained with less data (e.g., and automobile response times).
overcoming the challenge of “sparsity”) [6].
• Integration of AI acceleration hardware into
Industry and academia are in the early stages all mobile/IoT, and server devices.
of developing approaches that instill such
“instinct” or “physics knowledge” for future AI- These advances will be enhanced by the
hardware offerings, and will also require deep numerous electronics technology initiatives
changes in the underlying architectures. But underway such as the IEEE Rebooting
these efforts are far from sufficient to meet Computing, DARPA Electronics Resurgence
DOE mission needs of real-time edge/sensor Initiative, and SRC’s activities like JUMP.
applications performance. DOE has an
opportunity to partner with industry early to 4. Accelerating Development
drive the generalization and increased
capability (low latency) solutions to suit DOE The AI hardware industry is growing by leaps
science applications. and bounds. It’s led by the hyperscale cloud
providers (e.g., Google and Microsoft), but
3. Advances in the Next Decade there is opportunity to shape the emerging
hardware to broader utility for science. The key
Industrial investment by large-scale cloud is to identify leverage points where DOE
companies as well as AI hardware start-ups will science applications will benefit, and industry
continue to drive performance and energy can benefit from features with the generality to
efficiency at scale and at the edge for address broader markets. Understanding and
commercial applications such as image/face tracking DOE’s growing AI workloads will
recognition, natural language, logistics, voice enable DOE to provide incisive and actionable
assistants, and autonomous vehicles. These input to shape future AI architectures.
commercial drivers will infuse AI capabilities Identification and use of leverage points will
broadly, in the scale of data, complexity of drive the creation of new architectures and
function, and robustness that can be achieved. systems (both software and hardware) that
Within the next 10 years, we expect to see build on broader industry developments to
the following: meet DOE’s unique needs for sparse learning
and the support of scientific discovery. The
• Introduction of novel AI algorithms, as they areas of highest leverage are as follows.
are changing quickly and it is difficult to
predict popular algorithms for the next Create new co-design capabilities in DOE to
decade. Five years ago, LSTMs were new, inform strategic action on integrated AI and
ResNets were not in use, and transformer HPC systems. Computing hardware
networks had not yet been invented. architectures are evolving in a disruptive
fashion, with important innovation coming from
• Steady increase in the size of largest AI small start-ups, large vendors, and cloud
models trainable as well as improvements in service providers. This business ecosystem
training algorithms that reduce the order of transformation means that DOE cannot engage

in the traditional fashion of long-term projects challenges to drive and shape new AI
with a few large, known players as in existing technologies that fuse explicit knowledge and
Pathforward programs. Rather, DOE must learned function (see Chapter 10, AI
invest and create a much larger internal Foundations and Open Problems).
architectural research and development
capability. These must then be used to Lead on ultra-low latency and low-power
continually assess the landscape, identify inference for scientific experiment control
important new breakthroughs and partnerships, in experimental facilities. DOE facilities and
and accelerate them rapidly into new large- experiments are multimillion- (and sometimes
scale system capabilities and DOE facilities to billion-) dollar investments that literally push
support rapid, leading-edge exploitation of AI forward the frontiers of knowledge in materials,
across DOE. In some cases, DOE can exploit physics, biology, and other areas of
new hardware to its advantage (e.g., using fundamental science. AI-based real-time
mixed-precision algorithms with low-precision intelligent control of these facilities cannot only
hardware), while in other cases DOE can enable more complex, intelligent experiments
work with industry to provide specific and more efficient operation, but can directly
new capabilities. accelerate the advance of scientific discovery.
DOE should charter cross-disciplinary centers
Support AI for HPC and scientific and focus on low-latency inference challenges
experiments on the edge. Contemporary for scientific experiment control, a critical
HPC architectures are designed to support a capability within national laboratories.
traditional simulation-only paradigm, where the
amount of input data is relatively small when Translate DOE fundamental materials–
compared to the output, and where the output device discoveries into new post-CMOS AI
is not read frequently. Storage systems and devices. There is an opportunity to bring more
memory hierarchies must be redesigned to of the fundamental materials science advances
accommodate this workload change. Moreover, from DOE to augment industry roadmaps
with the addition of AI at the edge to the DOE through fundamentally new approaches to
portfolio, the model for computing within DOE neuro-inspired AI architecture. This can lead to
may need to evolve to where specialized AI a new path for exponential growth in AI
hardware cooperates with traditional HPC computational performance by overcoming the
systems to train models before distributing overheads of conventional digital hardware,
them to low-power inference engines at the which is the predominant approach for today’s
edge (see Chapter 15, AI at the Edge). AI systems, and software design, addressing
the societal challenge of rapidly growing
Drive development of AI systems and negative environmental impacts of DNN-based
hardware that combines explicit knowledge AI. This has a strong connection with the Basic
with learned function. A distinctive Research Needs for Microelectronics [10]
requirement for AI in DOE’s science mission is activity and collaboration opportunities
the need to fuse explicit knowledge with that span the entire Office of Science
learned function, which is often the goal of ML. (see Chapter 1, Chemistry, Materials,
While useful in some commercial applications, and Nanoscience).
the purest and strongest form of this fused
capability is essential for scientific exploration, Create and adopt new operational and
and more importantly, the creation of life cycle models in large-scale DOE com-
scientifically sound modeling and exploration puting facilities that support sustainable AI
computations that are the likely foundation for computing. At 10x annual model size
future computational science. DOE should increases, the training of large AI models
establish a series of specific science-based already matches the lifetime carbon emissions

of five gas-powered automobiles [7]. The rise 6. References
of renewable energy generation creates an
opportunity to sustain AI and HPC’s growing 1. Chien, A. Computer Architecture: Disruption
computing and energy appetite. The DOE can from Above, Commun. ACM 61, 9, 2018.
lead in convening its own cloud and academic
2. Wu, C., et al., Machine Learning at
centers with technology leaders to study and
Facebook: Understanding Inference at the
prove new operational models. These models
Edge, Proceedings of the 2019 IEEE
can enable high levels of renewable energy in
International Symposium on High
power grids while sustaining high-capability
Performance Computer Architecture
computing. Further benefits can arise from new
(HPCA), 331–44 (https://doi.org/10.1109/
life cycle models that shift compute resources
HPCA.2019.00048, 2019).
from high-cost to low-cost (and low-carbon
power) locations that increase the lifetime of 3. LeCun, Y. Deep Learning Hardware: Past,
hardware, reducing cost and e-waste. To Present, and Future, Proceedings of the
sustain the exponential growth in computing 2019 IEEE International Solid-State Circuits
capability at the heart of its scientific missions, Conference (ISSCC), 12–19 (https://doi.org/
the DOE should lead the community in creating 10.1109/ISSCC.2019.8662396, 2019).
and deploying such practices to contain or 4. Jouppi, N. P., et al., In-Datacenter
even reduce its environmental impact for AI Performance Analysis of a Tensor
and HPC computing (see Chapter 16, Facilities Processing Unit, SIGARCH Comput. Archit.
Integration and AI Ecosystem). News 45, 2,1–12 (https://doi.org/10.1145/
3140659.3080246, 2017).
5. Vetter, J. S., et al., Extreme Heterogeneity
Industry will continue its dramatic pace of 2018 – Productive Computational Science
advancement over the next decade, but those in the Era of Extreme Heterogeneity: Report
advances are focused on goals that will not for DOE ASCR Workshop on Extreme
lead to meeting the requirements of DOE Heterogeneity, USDOE Office of Science
computational science and experimental data (https://www.osti.gov/servlets/purl/1473756,
applications. In particular, the AI use cases for https://doi.org/10.2172/1473756, 2018).
scientific applications will differ significantly,
6. AAAS Science Magazine, How
requiring extreme data rates, low-latency
Researchers are Teaching AI to Learn Like
response, and extensive exploitation of explicit
a Child, May 24, 2018.
knowledge. Second, the rapid growth of AI
training costs will create sustainability 7. Strubell, E., Ganesh, A., and McCallum, A.
challenges to the growing AI computing Energy and Policy Considerations for Deep
burdens, forcing new approaches. Scientific Learning in NLP. (https://arxiv.org/abs/
applications and experiments are likely to have 1906.02243)
fewer samples available and require more data 8. Silver, D., et al., Mastering the game of Go
integration for training. By working together without human knowledge, Nature 550,
with industry to augment their hardware 354, (https://doi.org/10.1038/nature24270,
platform offerings, we will be able to meet 2017).
these critical needs for the future of AI for HPC,
for automating control systems at DOE user
facilities, and for creating intelligent sensors for
the future of experimental science.

9. Torrejon, J., et al., Neuromorphic comput- 10. Basic Research Needs for Microelectronics
ing with nanoscale spintronic oscillators, (Brochure), USDOE Office of Science
Nature 547, 428, (https://doi.org/10.1038/ (https://www.osti.gov/servlets/purl/15457,
nature23011, 2017). 72, 2018).

14. AI for Imaging
The DOE-supported x-ray, neutron, electron example, consider the particle accelerators
beam, and nanoscale science research centers driving large-scale research facilities, which
are major experimental facilities providing consist of many interconnected subsystems of
access to world-class imaging and physical magnets; mechanical, vacuum, and cooling
characterization capabilities to more than equipment; power supplies; and other
14,000 visiting scientists and engineers components. These accelerators have many
annually. The advanced tools and instruments thousands of control points, making their
at these facilities probe complex materials and operation a complex optimization problem. This
processes across the physical, materials, is particularly true for the electron accelerators
environmental, and life sciences, including at DOE’s synchrotron light sources, which
those underpinning energy technologies and require a very high level of stability. The
advanced manufacturing. Research done at operation of these accelerators has benefited
these user facilities provides unique insights from AI/ML–based solutions but remains
that help shape the future and ensure the extremely difficult due to the lack of a priori
economic competitiveness of the United models for reliable and safe control. In the
States. Planned developments at these absence of such models, learning models
facilities over the next decade promise to based on raw data and other AI/ML-based
produce vastly larger and more complex solutions have been explored, with promising
datasets much more quickly than today, results, beginning with the demonstration of
making the automation of facility and artificial neural network–assisted control of ion
instrument operations and data collection and sources in the early 1990s. AI/ML optimization
reduction imperative. methods such as genetic algorithms and
particle swarm optimization have been
AI will be essential for ensuring continued successfully applied for several years to
technological progress and maintaining improve various aspects of facility operation,
America’s leadership position in all branches of including electron beam lifetime, transverse
science. The application of AI at DOE user coupling, and injection efficiency. Current
facilities will ultimately allow end-to-end control efforts are focused on simulation of the data
of the scientific endeavor at scale, improving generated by accelerator physics models to
stability in experimental equipment and optimize the performance of next-generation
processes and yielding superior results. Using machine systems under development [1].
AI technologies to augment and expand However, none of these advancements have
existing data analysis techniques will allow become an integral part of today’s accelerator
scientists to process data more efficiently and control systems. This is due to limitations
effectively than ever before. AI technologies in the available data as well as software and
could one day make fully autonomous hardware infrastructure and the reluctance
decisions on measurement strategies, reducing of communities to use AI as a general
experimentalists’ time while simultaneously purpose tool.
enabling efficient exploration of complex
experimental and sample configurations. The high data generation rates of modern
detectors provide additional challenges for data
1. State of the Art processing and management at these facilities.
As an example, advances in neutron detector
Modern research laboratories present technology has enabled high-resolution, 3D
challenges for control systems that are capable tomographic reconstruction of complex, multi-
of meeting evolving requirements. For component materials as illustrated in
14. AI FOR IMAGING 133

Figure 14.1. One major challenge is that the
increasing data volumes will require
autonomous methods for data processing.
Several supervised and semisupervised data
processing workflows based on different neural
network architectures have been proposed for
different parts of the data life cycle. For
example, the transmission x-ray microscopy
instrument at the Advanced Photon Source [2]
uses DL for fully automated correlative
segmentation of metallic alloys by classifying
features in large nano-resolution 3D
reconstructed volumes. Similar types of
network architecture have also been useful for
recognizing known features in data sets and
completing tasks such as classifying peaks,
working with low-resolution data or assigning Figure 14.1 Neutron computed tomography of ultra-high
theoretical models to reduced datasets [3–5]. performance concrete (UHPC) is capable of identifying three
Using such an approach can be of great help in different phases in the samples: cementitious paste, voids, and
H-rich components (reinforced with fibers when present).
interpreting the data, especially when the
measured data is complex or when it contains
features that are not directly related to the Advanced Light Source (ALS) upgrades, will
material under study. Integration of AI with further complicate these problems as the data
these versatile imaging techniques can enable generation rates of the detectors will increase
analysis of extremely large data volumes in by several orders of magnitude for similar
relatively short time frames while exponentially imaging modalities. Smart data reduction
accelerating tomographic data analysis, techniques (e.g., filtering relevant data or point-
possibly opening up novel avenues for of-interest data acquisition) will be necessities
performing 4D characterization experiments rather than features with the upcoming light
with finer time steps. More progress is needed source advancements. Although existing
and will rely on the generation of curated practices work well currently, there is an acute
datasets, robust data processing techniques, awareness that this model is unsustainable in
and sophisticated data management solutions. the long run without the development of
additional software and hardware infrastructure
The management of large-scale experimental and the continuous support of existing activities
data will also require smart data reduction such as domain-specific AI-based data
techniques. For example, current coherent compression schemes, searchable databases
diffraction imaging experiments can generate containing both experimental data and
data at 3 GB/s, resulting in datasets containing metadata, on-the-chip data reduction, and
tens of terabytes for a single experiment. The novel algorithms and workflows to improve
data acquisition process is typically stalled performance. Some of these tasks related to
when data generation rates are so high that scientific data management are currently being
experimental data can be flushed to nonvolatile addressed in the Data Center Pilot, a large
memory. This high-volume data acquisition not collaboration of data and experimental
only extends the end-to-end experimentation scientists at all U.S. light sources that aims to
time but also limits experiments with time- provide a sustainable road map to the future of
sensitive phenomena. Advancements in data issues at experimental facilities. Similar
synchrotron light sources, such as the APS and initiatives are required to develop and maintain

other parts of the large-scale AI ecosystem for the millimeter scale and ultimately govern the
research facilities. dynamics of complex biological systems and
tissues of interest. AI/ML methods will play a
2. Major (Grand) Challenges critical role in linking multimodal observations
across this large, dynamic range of scale and
A new era is dawning in science and is needed to build a predictive understanding of
engineering, one that promises a revolutionary biological function across time and space.
understanding of complex materials and
chemical processes across the entire hierarchy
of relevant length and time scales. This
understanding demands moving beyond
exploration of equilibrium phenomena and
beyond models based on idealized materials
and systems to create new states and achieve
extraordinary new functions [6]. The over-
arching grand challenge facing scientists at
DOE experimental user facilities is to
understand, predict, and ultimately design
emergent behavior in complex materials and
systems. This will require progress in the
following areas.
Characterize biological function across

length scales. X-ray, neutron, and electron
methods generate structural, organizational,
and dynamic data across a range of length
Figure 14.2 An image showing individual myelin sheaths,
scales from atomic to mesoscale. Example highlighted with different colors, surrounding mouse brain
applications include high-resolution imaging of axons revealed by the analysis of experimental nano-CT scans
complex neuronal networks in brains to provide taken at the 32-ID beamline of the APS [11].
a clearer understanding of how even the
smallest changes to the brain play a role in the
onset and evolution of neurological diseases,
such as Alzheimer’s and autism, and perhaps
lead to improved treatments or even a cure.
Figure 14.2 illustrates the power of a machining
learning approach that maintains signal-to-
noise ratios with shorter x-ray exposures
minimizing damage to radiation sensitive
mouse brain. A second example is the
characterization of heterogeneous systems
such as the rhizosphere, where bacterial
communities synergistically interact with the
soil and roots of plants, and the observation of
the dynamic assembly of functional complexes
interacting within living cells [7]. The neutron
radiograph in Figure 14.3 demonstrates the Figure 14.3 Neutron radiograph of droughted black cotton-
role of water dynamics in this region. In both wood (Populus trichocarpa) root system that has been
rehydrated to measure the water dynamics in the roots,
these examples, molecular interactions at the rhizosphere, and bulk soil regions. Worked performed at HFIR
nanometer scale cause emergent behavior at CG-1D Imaging beamline.

Observe and control nanoscale chemical To solve these pressing scientific problems,
transformations in macroscopic systems. new tools need to be developed. While AI/ML
Devices currently in use or being developed for will play an important role in achieving these
selective and efficient heterogeneous catalysis, objectives, a number of synergistic, basic
photocatalysis, energy conversion, and energy scientific and engineering developments will
storage rely heavily on diverse multiscale also be needed, including high-performance
phenomena, ranging from interfacial electron x-ray and neutron optics; sample chamber
transfer and ion transport occurring on environments improved electron, x-ray, and
nanometer and picosecond scales to neutron detectors; sample handling robotics;
macroscale batteries that charge in hours and and upgrades to computational infrastructure
catalytic reactors with turnover times of for the efficient movement and storage of data
seconds. Existing and planned instrumentation across HPC facilities.
at DOE user facilities can probe these environ-
ments with atomic, chemical, and isotopic 3. Advances in the Next Decade
contrast spanning a large spatiotemporal
range, thereby providing unique fundamental A number of advances planned for the next
information about these functioning mesoscale decade will necessitate the need for AI/ML
chemical devices. These “nanokinetic” tools. There is a need to understand and model
operando measurements are essential to the behavior of complex systems across length
optimize complex multiscale chemical and scales and modalities to transform basic
electrochemical devices. The use of AI/ML scientific discovery into a set of engineering
tools will be essential in interpreting the data by principles that allow scientists to provide
integrating experimental observations with solutions to problems such as the need for
computer modeling to provide multiscale plentiful, safe drinking water, safe, efficient
models of complex chemical processes of alternative energy sources, and therapies to
importance to DOE’s mission. address degenerative diseases. While AI/ML
will provide an indispensable set of tools to
Understand and characterize physical and model these systems, parallel developments in
chemical processes in extreme environ- improving neutron and (light) sources for x-ray
ments. Understanding materials and and electron microscopy and hyperspectral
processes in extreme environments such as imaging are fueling a revolution in the physical
ultrahigh pressure or temperature is of vital characterization of samples, with an attendant
importance to the development of fusion and need for AI/ML support. A selected set of
fission materials. Furthermore, such developments is summarized below.
understanding provides unique insights into
planetary physics and geosciences. The study The development of diffraction-limited
of these materials using spectroscopic-, synchrotron light sources. The planned
diffraction-, and imaging-based methods is upgrades to Argonne’s ALS and APS will yield
performed throughout the DOE experimental hard, tender, and soft x-ray sources [8,9]
user facility complex and, with the advent of with significantly increased brightness that
diffraction-limited synchrotron light sources and will allow scientists to explore more complex
next-generation free electron lasers, will and disordered samples under controlled
provide structural characterization modalities of and operating conditions with a precision
matter in extreme states that are far beyond unknown before now—literally approaching
what is achievable today. The use of AI/ML will theoretical limits.
be essential in the data analysis of these
systems for the reduction of noise, solution of The development of new hardware for
inverse problems, and linking of observations electron microscopy. Next-generation
to molecular simulations. detectors for electron microscopy will greatly

increase data rates, resulting in better of AI-based tools is an absolute necessity. The
prediction of electron strike locations. They will ability to improve the stability of the
also improve the electron beam–induced instrumentation, perform experiments in an
sample motion correction that has historically autonomous fashion, and interpret scientific
been a resolution-limiting factor. In addition, the data in a fully automated workflow—and the
development of “phase plates” will increase the ability to discover patterns and behaviors
contrast between sample and background, across multiple experiments—will greatly
revealing particles that traditionally have been accelerate scientific discovery (see Chapter 10,
too small for electron microscope imaging. AI Foundations and Open Problems and
These and additional developments in sample Chapter 12, Data Life Cycle and Infrastructure).
delivery will significantly increase the To facilitate this vision, investments in scientific
applicability of electron microscopy, as well as data warehousing and real-time, experimental
the need for data management tools. steering infrastructure need to be made (see
Chapter 12, Data Life Cycle and Infrastructure),
The development of ultrafast light sources. facilitated by state-of-the art data streaming
The development of ultrafast light sources such and edge computing strategies (see Chapter
as LCLS-II will be transformative for energy 15, AI at the Edge). At present, there is a lack
science as it will qualitatively change the way in of tagged or labeled (both raw and processed)
which x-ray scattering, spectroscopy, and scientific data accessible across the DOE
imaging can be used. High-repetition-rate landscape, limiting the development and
machines will enable imaging of natural and training of AI-based tools that can be deployed
artificial systems, spanning multiple decades of in the control and analyses of experiments at
time scales and multiple spatial scales. High- user facilities. While facilities are developing
repetition-rate sources will enable powerful AI/ML-based approaches aimed at real-time in-
new ways to capture rare chemical events, experiment decision making, they are still far
characterize fluctuating heterogeneous from being used routinely. The needed
complexes, and reveal underlying quantum developments are discussed below.
phenomena in matter using nonlinear, multi-
dimensional, and coherent x-ray techniques A database that consists of raw detector
that are only possible with a true x-ray laser. readings, processed data, and related user
proposals, and associated scientific
The development of higher brightness interpretations in the form of standardized
neutron sources. The planned upgrade of data formats and domain-specific metadata
ORNL’s SNS accelerator to higher power and languages. Creating enough model or
the addition of a Second Target Station will simulated data to provide useful ML training
enable more rapid, time-resolved measure- sets will require access to HPC resources.
ments of transient and out-of-equilibrium These databases can be built in coordination
phenomena; exploration of matter at extreme with user communities, which in turn could be
conditions, such as magnetic field, temper- used to train efficient data reduction algorithms,
ature, and pressure; and simultaneous perform data mining operations for the
measurement across broad ranges of length, discovery of hidden statistical relations only
energy, and time. visible in large datasets, and to build a fully
automated “raw data to final model” analysis
4. Accelerating Development pipeline. Ideally, facility staff, users, and
research communities in the broad sense
To accomplish the grand challenges listed would aid in a “data-tagging campaign” as part
above and optimally leverage the hardware of the execution of their experiment.
advances planned for the next decade, the use

A database that consists of metadata, such 5. Expected Outcomes
as scientific instrument responses (e.g.,
flux and focus) in combination with a record Given the anticipated pace of development in
of instrument configurations (e.g., motor imaging, scattering, spectroscopy, and
positions, neutron chopper phases, and associated hardware, there is a dire need to
monochromator bending parameters) and develop data analytics technologies that can
measurable instrument and environmental aid in making the best choices in experimental
parameters (e.g., ring current, cooling water design, data reduction, and model generation
flow, and temperature readings). This while reducing the overall cost of data
information could be used to build advanced transmission, annotation, and storage.
predictive models of accelerators, end stations,
and sample delivery systems and to aid in Early successes in the use of AI/ML tools in
automated alignment and calibration of instru- experimental facilities indicate that AI/ML will
ments, stabilizing user operations, predicting enable the throughput of experiments via
and preventing catastrophic failures, and/or autonomous experiments, resulting in the
reducing the total downtime of the instrument. ability to explore larger sample configurational
While it is unclear whether data from different setups in a shorter amount of time, yielding
facilities can directly be used to inform models, more complete and informative scientific
the ability to find common patterns could hypotheses. This in turn will result in a reduced
provide cross-cutting improvements. cost of discovery, bringing scientific solutions to
industry and society at a faster pace and at a
AI-guided real-time experimental steering reduced cost. AI/ML for the control of
infrastructure based on curated data and equipment will significantly reduce the need for
metadata of the sample and instrument human intervention in tasks that are currently
state during the experimentation. Gaining performed by hand, resulting in more stable
transformative insight into dynamic materials experimental facilities that produce superior
processes requires identification, tracking, and data with a lower rate of human intervention.
quantification of the most relevant volumes Similar arguments can be made for the data
within a sample under various conditions of analysis component, resulting in more scientific
applied stimuli. AI tools have been shown to be opportunities and better use of the
capable to provide this type of guidance [10] in-demand resources available at major DOE
and should ultimately suggest alternative user facilities.
imaging modalities with which to query the
volume of choice. This situation presents a vast 6. References
measurement parameter space that cannot be
exhaustively surveyed, and that is very difficult 1. Liu, S., Leemann, S. C., Hexemer, A.,
to navigate when seeking concrete connections Marcus, M. A., Melton, C. N., Nishimura, H.,
between sparse local phenomena, such as Sun, C. 2019. Demonstration of machine
dislocation motion and grain boundary stress learning-based model-independent stabi-
concentration, and bulk properties as a function lization of source properties in synchrotron
of environment, especially in the context of light sources. PRL, in press.
irreversible processes. 2. Kaira, C. S., et al. Automated correlative
segmentation of large transmission x-ray
microscopy (TXM) tomograms using deep
learning. Mater. Charact. 142, 203–210
(2018).

3. Pelt, D. M., & Sethian J. A. A mixed-scale Progress and Future Vision; A Report from
dense convolutional neural network for the Biological and Environmental Research
image analysis. PNAS, 115 (2) 254–259 Advisory Committee, DOE/SC–0190,
(2018). BERAC Subcommittee on Grand Research
Challenges for Biological and Environ-
4. Chang, M.C., et al. Accelerating neutron
mental Research (science.osti.gov/~/media/
scattering data collection and experiments
ber/berac/pdf/Reports/BERAC-2017-Grand-
using AI deep super-resolution learning
Challenges-Report.pdf, 2017).
(arXiv:1904.08450, 2019).
8. The Advanced Photon Source Strategic
5. Samarakoon, A. N., et al. Machine learning
Plan: Enabling frontier science in the
assisted insight into spin ice Dy2Ti2O7
national interest (2018).
(arXiv:1906.11275, 2019).
9. ALS-U: Solving Scientific Challenges with
6. U.S. Department of Energy. Report from
Coherent Soft X-Rays (2017).
the Basic Energy Sciences Advisory
Committee. Challenges at the frontiers of 10. Noack, M. M., et al. A Kriging-Based
matter and energy: transformative Approach to Autonomous Experimentation
opportunities for discovery science (2015). with Applications to X-Ray Scattering, Sci.
Rep. 9, 11809 (2019).
7. U.S. Department of Energy. Report from
the Biological and Environmental Research 11. Yang, X., et al. “Low-Dose X-Ray
Advisory Committee. Grand Challenges for Tomography through a Deep Convolutional
Biological and Environmental Research: Neural Network.” Sci. Rep. 8, 2575 (2018).

140
15. AI at the Edge
Many of the use cases outlined in previous can be deployed to run in the vehicle (i.e., at
chapters—Chapter 4, High Energy Physics, the edge).
Chapter 14, AI for Imaging, and Chapter 16,
Facilities Integration and AI Ecosystem— In the DOE community, a large and growing
describe scientific discoveries using large number of science and engineering projects
instruments such as the Large Hadron Collider, require edge computing to imbue sensors with
the Very Large Array, and the IceCube South real-time adaptive or autonomous capabilities.
Pole Neutrino Detector. Likewise, DOE In addition to the examples mentioned in
operates many distributed facilities, such as the Chapters 4, 14, and 16, consider the following.
ARM Climate Research Facility, that operate There are thousands of environmental
sensors and instruments across the planet (see monitoring sensors that typically produce
also Chapter 2, Earth and Environmental longitudinal data with latencies of minutes to
Sciences). For both centralized and distributed weeks between measurement and data
facilities, instruments such as these produce availability due to their remote locations and
vast quantities of data that often cannot be low (or intermittent) capacity network
efficiently moved to or stored in a central connectivity (see also Chapter 3, Biology and
repository, or they include latency-sensitive Life Sciences). Edge computing capabilities
control systems that must act promptly on the would enable such instruments to analyze data
incoming data. Moving a portion of the data locally in real time and feed a lower volume of
analysis pipeline “to the edge,” where the data processed information to central computing
is generated, allows the required computation services for further processing. A radar
to identify the highest value data to be saved deployed by the DOE ARM facility in Oklahoma
and to autonomously respond and control the could use ML at the edge to identify important
experiment. The potential benefits of edge weather phenomena and dynamically steer the
computing are widely recognized, and a instrument for more precise follow-up
considerable amount of work to realize and observations. Such an approach would
expand upon these benefits in business and increase the accuracy and timeliness of
science is under way [2]. tornado warnings, ultimately saving lives. As
mentioned in Chapter 8, Smart Energy
Advances in AI and ML, both in hardware and Infrastructure, monitoring electrical power
software, are among the enablers of edge distribution infrastructure could prevent power
computing. For example, edge computing failures or predict conditions conducive to
enables a self-driving vehicle to make wildfires; monitoring subsurface vibrations from
decisions within the vehicle, using AI oil wells could improve oil production;
techniques to interpret data from the vehicle’s autonomous soil sampling and analysis
many cameras and sensors. This is necessary devices could improve crop yield; more timely
both because of the volume of data (i.e., too data analysis options would enable large-scale
large to transmit to central servers) and the accelerators and light sources to optimize their
real-time requirement for vehicle controls (i.e., operations and predict (and prevent) failures.
answers from remote servers may arrive far too
late). Edge computing is possible, even with DOE is in a unique position to address these
relatively low-powered computing hardware in challenges because it supports many of the
the vehicle, because a large body of training research facilities requiring edge computing,
data has been processed on high-performance either in the near term to better operate
servers (i.e., in the center) into ML models that existing instruments or in the longer term to
15. AI AT THE EDGE 141

develop more intelligent instruments. Further- to be adaptively steered. For example,
more, DOE supports an extensive community Figure 15.2 shows a series of images
of scientists working on technologies including illustrating an automation system under
high-performance sensors, detectors, ML development where an electron microscope is
techniques, new computing architectures, and first used in a low-resolution mode, while some
other critical facets of AI technologies required AI algorithms (running locally with the
to tackle the grand challenges of today and instrument) identify regions on the sample with
tomorrow. As illustrated in Figure 15.1, DOE features of interest. The electron microscope is
has the computing resources necessary to then directed to scan the selected regions in a
develop increasingly sophisticated models to higher resolution mode.
run at the edge, sensor capabilities to support
its many facilities and instruments, and edge
computing research platforms to demonstrate
the potential for enabling new science.
Additional details on algorithm and software
environment research are given in Chapter 11,
Software Environments and Software
Research, and Chapter 10, AI Foundations and
Open Problems.
Figure 15.2 Edge computing can enable instrument steering
using AI at the edge to identify features of interest [3].
In addition to the need for edge computing with

centralized instruments, such as electron
microscopes and light sources mentioned
above, there are applications that require input
from networks of sensors. For example, DOE
funds a number of environmental monitoring
projects where sensors are distributed, often in
Figure 15.1 Illustration of edge computing, from the sensor or remote locations with limited networking
instrument to the high-performance computer or cloud and connectivity. In these cases, edge computing is
back. [Presented in 2018 at the DOE Advanced Scientific
Computing Advisory Committee (https://science.osti.gov/-
required both for data compression and for
/media/ascr/ascac/pdf/meetings/201809/ASCAC-EdgeAI- adaptive sensing. Similar to the previous
Beckman.pdf.) example with electron microscopy,
high-bandwidth measurements may only be
1. State of the Art necessary during events of interest, and edge
computing could change the sensor’s sampling
Experimental facilities such as those operated rates in the same fashion.
at national laboratories have been generating
large amounts of data traditionally provided to Rapid improvement of low-cost sensors is
users via removable storage media. Upgrades creating opportunities for unprecedented
and improvements to these facilities in recent measurement capabilities, all of which require
years have increased the data volume to the edge computation. But in the absence of
point where such methods are impractical. As a general-purpose edge computing platforms,
result, they have joined the unique facilities most teams are creating ad hoc solutions. For
noted earlier in implementing edge computing example, in a large experiment involving
data management and analysis services local monitoring of black carbon in the urban
to their instruments. More importantly, these environment [4], scientists at LBNL had to limit
edge computing capabilities allow experiments

the sampling rate of the sensors so that the sensors. For example, devices with data rates
amount of data could be easily handled by the of 100 TB per second are currently being
available network. In other cases, such as tested in cryogenic electron microscopy. A
Argonne’s Waggle project [5], multiple science number of other light sources and electron
groups pooled expertise and resources to build microscopes are expected to return similarly
a shared, general-purpose edge computing high data rates in the future. AI at the edge
platform. While the resulting devices were could be an effective way to process such high-
more expensive than traditional sensor devices velocity data streams.
without edge computing, the cost was shared
by multiple experiment teams in large-scale Another example is distributed acoustic
projects such as Chicago’s Array of Things [6], sensing (DAS), which uses fiber-optic cable to
an experimental instrument with dozens of monitor seismic motion. It is much cheaper to
sensors. The platform also supports industry deploy than traditional seismometers and
collaboration on the use of edge computing to captures the motion along the full length of the
create new measurement and fault prediction cable. It has the potential to revolutionize many
capabilities for the national electric power grid subsurface applications but has to quickly
[7], where more precise monitoring and analyze a large volume of data (i.e., terabytes
analysis of electricity generation and loads per day). Fortunately, the common data
could enable AI-based models to forecast analysis procedure illustrated in Figure 15.3
catastrophic power failures. only requires the raw sensor data in the first
step; therefore, DAS is amenable to edge
With additional advances in sensing technology computing to generate interferometry. Because
and AI capabilities such as Google Edge TPU interferometry is much smaller in data volume
[8], Intel Movidius [9], and IBM TrueNorth [10], than the raw data, it can be much more easily
the use of AI at the edge will continue to grow. shipped to a central location for further
Integrating these advances into DOE mission- processing.
critical applications could dramatically improve
scientific productivity.

While industry is certainly interested in AI at the
edge, its focus is largely on delivering AI
products to end users (e.g., cell phones and
autonomous vehicles). DOE will leverage the
technologies developed for industrial Figure 15.3 Steps in data analysis for distributed acoustic
applications; however, it will need to address sensing.
some unique challenges for scientific
applications. The following sections present the There are many high-speed data streams that
grand challenges in the scientific arena that will could be similarly processed, which makes
motivate the computer science and applied processing high-speed data streams the first
mathematics work necessary for supporting AI challenge for AI at the edge.
at the edge.
In addition to data velocity, AI at the edge will
Improve scientific productivity with high- also have to deal with data quality issues
speed data through AI at the edge. One arising from malfunctioning sensors while
unique challenge with scientific applications is working under constraints of memory, storage,
that they often involve very-high-speed and electric power.

Enhance scientific discovery through telescopes, microwave telescopes, and
integration of multiple data sources. gravitational sensors, to capture and analyze
Scientific applications not only analyze data important events.
from each high-speed sensor separately, they
also need effective integration of up-to-date Such large-scale data integration will likely
information from multiple, often heterogeneous, primarily be used in science in the next decade
sensors. AI at the edge that can leverage and will require funding from agencies such as
heterogeneous data sources will enable real- DOE, making it the second challenge for AI at
time optimization of these scientific applications the edge.
and new scientific discoveries. The following
example from agricultural land use, which has Enable smart scientific infrastructures
sweeping consequences for carbon through AI at the edge. The challenges
sequestration, water resources, chemical discussed so far involve discrete analysis
pollution, and crop yield, illustrates the operations at the edge. To manage large
complexities involved [11]. distributed systems, it will be necessary not
only to analyze data from multiple sources, but
The AR1K smart farming consortium [12] also to make coordinated decisions to direct
envisions integrating data from satellites, distributed actions. The DOE ESnet [13]
sensor-equipped drones, ground stations, and exemplifies aspects of this challenge.
embedded sensors to understand dynamics
such as the role of soil conditions and DOE has invested in large, national-scale
microbiomes in sequestering carbon. Analysis facilities and cyberinfrastructures. The ESnet
from DNA sampling and sensor data (e.g., soil that connects these DOE facilities to one
nutrients and composition) and drone-mounted another and to the world is central to efficient
multispectral cameras aims to improve crop operations of these facilities. Over its 40-year
yield while reducing the use of chemicals and history, the volume of data flowing through
fertilizers. Today, many measurements rely on ESnet has consistently doubled every
manual sampling and operation, severely 17 months. Over the next decade, the
limiting the scope and scale of measurement projected exponential growth of “wearables”
and analysis. AI at the edge would perform and IoT devices will bring new challenges with
data analysis and reduction in situ and support respect to scale, emergent properties, and
a transition to autonomous, robotic soil and cybersecurity. Consequently, cyber-
plant sampling and sensing devices. infrastructure operation—from local instrument
Furthermore, it would do so at low cost and to facility to laboratory scale to ESnet—will
small physical scale to enable deployments of require embedded edge AI to make intelligent
thousands of units. This level of scale and decisions and coordinate actions across the
automation will be essential for understanding globe. To facilitate this, DOE is planning to
and optimizing the nation’s agricultural implement advanced telemetry on ESnet
resources and addressing the growing impact networking equipment and to use AI at the
of agricultural land use, ranging from chemical edge to digest and process telemetry
runoff to inefficient use of water. measurements to initially recommend and
ultimately autonomously execute the majority
Many other DOE-supported environmental of the networking operations. These tasks
applications such as the Next-Generation include routing of traffic, fault detection,
Ecosystem Experiments (Tropics and Arctic) isolation, and remedy, as well as identifying
have similar needs to integrate information and addressing cybersecurity threats. This
from many different sensors. Other application autonomous self-managing, self-tuning, and
areas such as cosmology are also starting to self-healing scientific cyberinfrastructure, like
integrate multiple sensors, including optical the mobility ecosystem, will rely on edge AI for

such optimizations. It will also rely on edge AI distributed AI capability would also be critical
to forecast, detect, and resolve system-level for improving the reliability of the nation’s
interactions leading to unforeseen global electric power grid, oil production, and other
system behavior, while aiming to optimize for energy-related systems.
system-wide performance goals.
3. Advances in the Next Decade
Other large DOE resources may require similar
smart technology to integrate different types of As AI further permeates everyday life over the
computing resources, sensors, and actuators. next decade, industry will turn to low-power
In particular, we anticipate a computer center edge devices for AI computation. The current
would include a collection of computing paradigm of sending data back to a data center
resources of different types and sizes for analysis will no longer be tractable as the
(e.g., traditional HPC systems augmented with volume of data being collected becomes too
specialized AI processors and quantum large and the speed upon which it needs to
processors). Such a computing facility might be acted increases with the need for real-time
closely linked to nearby experimental facilities, control. Industry has invested heavily in a
such as light sources and particle accelerators. variety of edge computing devices for AI,
including tensor calculation accelerators
Integrate systems of systems using AI at (e.g., Google’s Edge TPU and Intel’s Movidius)
the edge. The next level of challenge for AI at and neuromorphic devices (e.g., IBM’s
the edge involves near-real-time interactions of TrueNorth and Intel’s Loihi). There will be
multiple large distributed systems. For dramatic improvements in the power
example, industry is working on autonomous consumption and compute capability of these
vehicles, while the research community is devices over the next decade.
thinking about a future with smart vehicles fully
integrated with smart transportation Industries are also at the forefront of
infrastructure and smart cities [14]. Additional developing streaming data analysis systems
mobility players are on the horizon, from and data standards. For example, many
autonomous aerial and ground devices to companies such as Waymo [15], Tesla [16],
interactions with wearables associated with and Uber [17] are developing various versions
pedestrians and cyclists. Indeed, AI at the edge of the self-driving software platform to go with
will be ubiquitous in cities and mobility their own vehicles. Many of the general-
systems, making it extremely difficult to purpose ML systems are also creating
centralize the necessary information from tens streaming versions for mobile and embedded
of thousands of independent AI-controlled applications (e.g., Google TensorFlow has
devices to understand their emergent TensorFlow Lite, and PyTorch has a number of
behaviors. A distributed AI-at-the-edge distributed backend systems that could support
approach is the only feasible solution to embedded applications).
address this need to integrate multiple
distributed systems. Each of the participating It is anticipated that the commercial
systems needs to be open and interoperable, technologies will progress quickly in the next
and sophisticated edge AI capabilities will decade; however, these advancements are
perform tasks such as negotiation and unlikely to meet the needs of the grand
optimization across many interacting AI challenges mentioned in the previous sections.
devices and services, as well as detection and Various scientific applications will create data
prevention of failures due to system much faster than commercial applications. For
interactions, natural events, or intentional example, monitoring the environment and the
attacks—all the while balancing necessary data electric power grid will require data integration
exchange with personal privacy. Such on a much larger scale than any commercial

enterprise. The connected mobility systems records. Under such limitations, the AI model is
may have data rate, data volume, and data likely to be relatively small in size and will have
variety challenges not seen in commercial to be updated periodically to accommodate
uses. Even for hydraulic fracturing applications, new trends. New algorithms will be needed to
where there is clear commercial interest, there cope with such resource constraints.
might still be the need for DOE or some other
agencies to fund the initial development of the Understanding errors, failures, and
data analysis technology as in the case of the correctness. Devices at the edge are often
drill head used for hydraulic fracturing [18,19]. unreliable, and the data collected could be
Funding the underlying data analysis research noisy or otherwise imperfect. Correctly
will benefit many applications, directly or understanding the impact of the noise, errors,
indirectly, with far-reaching impacts. and failures on data analysis operations and
control actions will be another challenging
4. Accelerating Development issue. More broadly, improving reliability,
robustness, and interpretability is a key
Supporting AI at the edge will be very important fundamental research topic for AI.
to many future DOE efforts. Much as AI is
permeating everyday life, it is also permeating Dealing with all aspects of the computing
nearly every field of science. This often takes continuum. To bring the promise of AI at the
place as an analysis of static data sets edge to DOE science domains, there is a need
collected in advance. However, the ability to to smoothly connect the edge and the core.
analyze data as it is collected or to exert real- Currently, there is no unified programming
time control over experiments presents an framework for the “computing continuum”—
incredible opportunity to achieve discoveries storage, networking, and computing resources
that otherwise would not be possible. from edge to fog to cloud. A better way to
describe, model, and program the computing
The grand challenges mentioned previously continuum from components and behaviors to
may be able to leverage industrial development systems, objectives, and intents is needed.
to a certain extent. For example, the challenge
of handling data from high-speed sensors may Modeling interactions. For edge devices to
be resolved by leveraging the new AI hardware properly interact with the core and other edge
and more computing capability per watt. devices, information and AI models have to be
However, the need to integrate systems of exchanged and understood by all parties
systems is unlikely to be addressed by involved. Modeling the interactions is critical to
industry. Therefore, to accelerate development, allow larger systems to be composed from
the following key algorithmic and mathematical individual components and smaller systems.
challenges, derived from distinct application Limited work is currently available on this topic.
requirements, will have to be addressed. (Note
that some of these topics overlap with those Managing dynamic resources and
described in Chapters 7–9.) interacting systems. AI at the edge requires
new edge-focused resource management and
Learning under limited resources. Edge support for multitenancy. Current edge
computing is expected to operate under a computing systems, such as Waggle,
number of resource limitations. For example, concentrate on a single device; future edge
the computing resources at the edge are likely nodes must be able to support multiple AI
to be much smaller than could be available at a workloads scheduled to match sampling rates
cloud data center or HPC center. Because of or operational needs. Additionally, the diverse
this, AI at the edge is going to work with limited nature of edge computing requires
data, presumably only the most recent data heterogeneity of edge computing hardware.

Research on how to design and optimize these 6. “Array of things: a scientific research
heterogeneous computing nodes in a instrument in the public way,” April 18,
systematic and scientific manner is needed. 2017, https://dl.acm.org/citation.cfm?id=
3063771, accessed October 11, 2019.
5. Expected Outcomes 7. “Argonne supports grid advances through
DOE’s mission demands high-performance AI pioneering energy storage and sensor
at the edge to harness the power of large research,” March 18, 2019, https://www.anl.
experiments and supercomputers. The gov/es/article/argonne-supports-grid-
anticipated work will allow data collection and advances-through-pioneering-energy-
analyses at scales not possible in the absence storage-and-sensor-research, accessed
of edge computing. By investing in highly October 11, 2019.
capable, robust, and versatile edge devices, 8. “Edge TPU–Google Cloud,” https://cloud.
DOE will enable scientists to perform large- google.com/edge-tpu/, accessed October
scale experiments in harsh environments. AI at 11, 2019.
the edge will empower scientists to modify their
9. “Intel Movidius, an Intel Company,”
experiments in real time based on the data
https://www.movidius.com/, accessed
being collected and thus drive them toward
October 11, 2019.
discoveries that would not be achievable
otherwise. High-performance AI at the edge will 10. “Brain-inspired Chip–IBM Research,”
fundamentally change the way DOE scientists http://www.research.ibm.com/articles/brain-
work. chip.shtml, accessed October 11, 2019.
11. “AR1K–The Smart Farm Research
6. References Consortium,” https://ar1k.org/, accessed
1. “Edge Computing: Vision and Challenges,” October 11, 2019.
June 9, 2016 (https://ieeexplore.ieee.org/ 12. “AR1K: Sustainable, Profitable Agriculture
document/7488250, accessed October 11, through Research,” https://eesa.lbl.gov/
2019). projects/ar1k-sustainable-profitable-
2. “Edge-centric Computing–DOIs,” Septem- agriculture-research/, accessed October 11,
ber 30, 2015, http://doi.org/10.1145/ 2019.
2831347.2831354, accessed October 11, 13. “ESnet,” http://es.net/, accessed October
2019. 14, 2019.
3. “Patterned Probes for High Precision 4D- 14. “Smart Cities: The Future of Urban
STEM Bragg Measurements,” July 11, Development,” Forbes, May 19, 2019,
2019, https://arxiv.org/abs/1907.05504, https://www.forbes.com/sites/jamesellsmoor
accessed October 11, 2019. /2019/05/19/smart-cities-the-future-of-
4. “Making the Invisible Visible: New Sensor urban-development/, accessed October 14,
Network Reveals Telltale Patterns in 2019.
Neighborhood Air Quality,” July 22, 2019, 15. “Waymo,” https://waymo.com/, accessed
https://newscenter.lbl.gov/2019/07/22/new- October 14, 2019.
sensor-network-neighborhood-air-quality/,
16. “Tesla,” https://www.tesla.com/, accessed
accessed October 11, 2019.
October 14, 2019.
5. “Waggle: An open sensor platform for edge
17. “Earn Money by Driving or Get a Ride
computing,” https://ieeexplore.ieee.org/
Now,” https://www.uber.com/, accessed
abstract/document/7808975/, accessed
October 14, 2019.
October 11, 2019.

18. “Hydraulic fracturing Sandia’s role in shale 19. “Hydraulic Fracturing: A Public-Private R&D
gas production technologies,” http://energy. Success Story,” https://clearpath.
sandia.gov/wp-content/gallery/uploads/ org/energy-101/hydraulic-fracturing-a-
FINAL-HydraulicFracturing-Final- public-private-rd-success-story/, accessed
wSAND1.pdf, accessed October 14, 2019. October 14, 2019.

16. Facilities Integration and AI Ecosystem
Recent advances in AI have been driven by facilities regardless of location, HPNs to
the ability to collect, store, and process large connect facilities, and co-scheduling of
labeled datasets using large HPC and HPN resources. Much of the work to date has been
facilities. DOE HPC facilities represent some of at the prototype level and additional
the world’s largest computational and data functionality, including resource modeling,
ecosystems for generating, moving, and resource scheduling, and trust models, is
analyzing experimental and simulation data. still needed.
These facilities are uniquely positioned to be
centers for advances in AI research and Smart Facilities: AI to Enable AI. HPC and
applications and must therefore be prepared to HPN are capable of generating a
fully support these capabilities in the next comprehensive range of operational statistics
decade. Improving integration among DOE with potential to leverage AI capabilities for
user facilities will ensure scientists have facility control, monitoring, and management.
what is needed to apply AI methods in For example, the scientific community is
their research. exploring the use of AI models on operational
and application data generated by facilities to
1. State of the Art identify and proactively predict hardware
failures before they occur. Standards for data
Data Management and Movement: Access collection, data and metadata representation,
to Data. AI derives its effectiveness from and data curation have not yet been
statistical generalizations gleaned from large established, presenting opportunities to exploit
volumes of often high-dimensional data. Within AI capabilities and increase the operational
the scientific community, such data can be efficiency of the large HPC ecosystems
found at experimental, observational, and deployed by DOE.
computational facilities, and the path forward
requires making it readily available for use with 2. Major (Grand) Challenges
AI applications. Most data management
challenges have been described by what are The overarching challenge for realizing the full
known as the FAIR data principles. However, potential of data-driven science is the
the infrastructure for managing, curating, development of the infrastructure required to
publishing, and cataloging datasets that facilitate AI applications. At least three major
adhere to these principles has yet to reach the challenges have emerged in the quest for
same level of maturity as the storage, comprehensive facilitation of AI workloads.
compute, and network infrastructures.
Enable greater access to data. For years,
Resource Orchestration: Co-scheduling scientists have decried the rate of growth of
and Co-designing. Practically all scientific scientific information (estimated in one recent
domains are undergoing paradigm shifts due study to double roughly every 9 years*). Thus,
to explosions in the volume, variety, and data management will present a major
velocity of datasets. Effective exploration of challenge to the application of AI for science
data via AI methods necessitates the tight research (see Chapter 12, Data Life Cycle and
coupling of experimental, observational,
computational, and data facilities within and * Richard Van Noorden, “Global scientific output
beyond the DOE complex. The tight coupling doubles every nine years,” Nature News Blog,
includes seamless access (i.e., authorization, May 7, 2014, (http://blogs.nature.com/news/2014/
authentication) and consistent interfaces to 05/global-scientific-output-doubles-every-nine-
years.html).
16. FACILITIES INTEGRATION AND AI ECOSYSTEM 149

Infrastructure). This includes developing a inevitable due to differences in how they are
broad range of capabilities, such as software generated including programmatic
and services for accessing, sharing, managing, expectations. These silos, however, will need
searching, discovering, publishing, cataloging, to interoperate with the central data
and curating data, in addition to tracking management service in a way that provides a
provenance and community-driven best consistent interface to users.
practices for representing, storing, and
exchanging data. In addition to developing the necessary data
infrastructure, facilities, publishers, and
Facilities will need to collaborate to develop a sponsors will need to collaboratively develop
unified scientific data management system that policies to encourage best practices. These
simplifies routine operations like searching, include publishing and referencing datasets
organizing, sharing, moving, and annotating used for journal publications, the use of
data via an uncomplicated user interface. community-driven and open source file formats
Simplified access to data from a to exchange data, and the use of community-
heterogeneous collection of facilities, through standard schema and vocabulary to express
technologies such as federated identities, will rich metadata that describes and provides
be key. Such a system will need to account context to published data. Scientific domains
for—and, if necessary, abstract—different will need to agree on data standards that
security and authentication protocols used at include documentation on community access
various facilities and be able to offer necessary to facility data archives. Facilities may be able
security for handling sensitive data associated to play a role in organizing scientific domain–
with national security or health applications. oriented workshops where experts develop
Users should not be concerned with which file domain-specific metadata and data standards
system or repository will be used to host their that are compliant with the underlying
newly published data. The system will need data services.
to interface with software environments,
computational workflows, and scientific In 10 to15 years, an ecosystem of connected
instruments to extract relevant parameters, facilities and networks will be needed to host,
calibration information, and source datasets curate, and share domain training datasets
relevant to downstream consumers of the and current state-of-the-art trained models for
data products (see Chapter 11, Software the scientific community. At the core of this
Environments and Software Research). ecosystem will be facility-specific, metadata-
rich data catalogs with programmatic
A central data management system needs to interfaces that enable the automation of data
be closely linked with ancillary services that discovery, data movement, and AI training.
would handle specific operations such as
publishing, cataloging, and curating datasets. Develop AI-focused HPC hardware.
While many such stand-alone capabilities exist Facilitating AI application science will be a
today, these systems do not communicate well substantial change for all facilities as they will
with each other to form an integrated, strongly need to broaden support beyond traditional
knit family of services. These data systems modeling and simulation work to include
need to be designed with long-term storage observational and experimental components.
and routine movement of petabyte-sized This will include complex data interactions that
datasets in mind. The ability to index millions lead to new scientific opportunities.
to billions of data records across a facility or Specifically, future HPC facility architectures
combination of facilities will also be required. will need to be better optimized to handle more
The division of larger scientific data complex data traffic, both within the facility and
repositories into smaller data silos may be with external facilities of all types (see

Chapter 13, Hardware Architectures). HPC efficiently exploit the huge potential gains in
centers will need to pay special attention to distributed computational performance.
high-performance networking for routinely
moving the petabyte-scale datasets required Facilitate resource orchestration. Ensuring
for AI training tasks that will include file system that data can be brought to a heterogeneous,
I/O performance. The architectural details distributed compute infrastructure needed for
touch all parts of the computing ecosystem AI-based science requires new levels of cross-
and will need to be systematically optimized to facility coordination and orchestration. AI
deliver the highest scientific impact. workflows can include a variety of compo-
nents, such as experimental data, multiple
In contrast to traditional simulation-only data repositories, local and nonlocal computing
workloads whose data footprints typically have platforms, LANs and WANs, storage, and
small inputs but large structured outputs, AI people in the loop. Policies at facilities will
workloads will need to access large volumes of need to be restructured to allow seamless co-
unstructured data, sometimes repeatedly, with scheduling of these heterogeneous resources
a need to also write relatively modest for scientific productivity. The federation of
checkpoint and output files. File systems can disparate or geographically distant facilities will
better support these workloads with faster need to be overcome through the development
random read speeds while compute nodes of standardized protocols and cross-facility
would benefit from fast burst buffers to identity management to allow movement of
maintain a local cache of frequently used input computational workloads and data. For
data. High-speed interconnects will need to example, these advancements will be critical
facilitate fast periodic ALL-to-ALL random for enabling “self-driving experimental facilities”
exchange of training data and gradient (see Chapter 14, AI for Imaging).
synchronization. AI workloads may also have
large memory footprints. Because AI models Additionally, the orchestration of resources
can be trained with single and even half- varies at different time scales and may be
precision arithmetic (unlike simulations, which neither aligned nor readily predictable.
often need double precision), CPU Experiments may need quasi-real-time,
architectures capable of supporting operations AI-powered data analyses that depend on
with varying precisions will help maximize experimental operating and downtime
throughput in AI workloads. Similarly, future schedules. In contrast, multiple runs of an AI
high-performance computers could use training algorithm to find optimal network
next-generation accelerators such as TPUs or hyperparameters may tolerate a multiday
neuromorphic computing units that are better turnaround period. Resource orchestration
suited for AI workloads as extensions for more across multiple facilities will need to account
conventional CPU and GPU architectures. for the urgency of the request, which may
require high-priority, on-demand computing for
On a broader scale, data may need to be some cases.
transferred across the ESnet HPN and may
need to leverage computing and specialized AI A global, AI-driven resource orchestrator will
hardware close to the instrument or sensor need to account for the heterogeneous
distributed across the network. This edge computing landscape, as each computing site
computing need could apply to data streamed will have unique capabilities. The AI resource
directly through multiple stages of a network orchestrator could direct data and
where it is used and then discarded. Every link computational resources based on the optimal
in this chain—data portals, networks, edge path and location for hardware, availability,
computation, HPCs, and I/O systems—needs energy costs, and the specific scientific
to be architected with AI applications in mind to application. Not only would this result in a

more efficient workload for the scientist, but could also be used to simplify or automate
there is potential to make more efficient use of access to software modules, input datasets,
network and computational facilities, avoiding validate computational or experimental runs
bottlenecks and maintaining high use of all against previous runs, recommend new
resources. Such an orchestrator could provide parameters for runs to avoid duplication, and
continuous feedback on how to improve study unexplored phenomena.
efficiency and performance.
Meeting this challenge will enable AI to tune
Leverage AI to enable AI with smart the ecosystem to create a more effective
facilities. The increasing complexity of HPC environment for AI applications. This will be of
and HPN workloads will require innovations in general benefit to users of such facilities due to
facility operations, and AI will play a critical role the more flexible and performance-based
in driving this evolution. AI workloads present environment. In addition, this will be of huge
unique challenges because of their data value to the facilities themselves, empowering
movement patterns and uncommon mix of them to predict usage patterns, identify trends
compute intensity and I/O (e.g., training vs. in resource use, and make more informed
inference). In the short term, it will be important decisions about future architectures (see
to develop representative AI benchmarks to Chapter 9, AI for Computer Science).
characterize AI workflows and understand the
optimizations required to efficiently support 3. Advances in the Next Decade
workflows associated with these use cases.
This will involve developing AI benchmarks The next three years will see the deployment
that expose operational data from the facilities of ESnet6 and NERSC-9 (Perlmutter), and the
through exemplar training and inference first generation of exascale machines (Aurora
workloads. This information can then be used at ALCF and Frontier at OLCF) across the
to build the tools and infrastructure to support DOE complex (Figure 16.1).
AI at scale across the DOE complex, where
these AI benchmarks should become an All of these facilities already support the most
integral part of those currently used by popular AI frameworks, and it is expected that
computational and networking facilities. DOE will support the development of additional
HPC-focused AI frameworks in the next
A long-term goal for facility operations would decade, along with platforms that facilitate
be to drive operational decision-making using sharing and publishing AI networks,
AI methods. A truly automated, optimized hyperparameters, and weights in a
facility will be able to predict faults, detect framework-agnostic and architecture-agnostic
anomalies or performance degradation, and manner. The DOE ASCR facilities are all
balance the computational workload developing programs to increase support of AI.
accordingly. However, for this grand challenge These programs will foster burgeoning AI
to be met, the right operational data need to be applications in HPC ecosystems and need to
identified and collected. Identifying the dataset be folded into generalized allocation programs.
of telemetry that can be used by researchers The deployment of scalable scientific data
to design autonomous behavior is not a trivial management systems that will form the
task—facilities currently produce numerous foundation for curating high-quality datasets
terabytes of telemetry data per day on will also be necessary. This work will continue
everything from network statistics to power with the deployment of data gateways that
consumption. Identifying, curating, cleaning, facilitate the transfer of data from a variety of
and sharing these data are vital to designing a sources to computational facilities. It is
truly automated facility, as well as to also expected that AI will be extended
developing a smart resource orchestrator. AI to support rapid data processing at HPC

Figure 16.1 Over the next three years, DOE will stand up its first generation of exascale machines. These systems, along with the
upcoming ESnet high-performance network, present a unique opportunity to leverage HPC in the development of AI for science.
facilities to enable quasi-real-time feedback on necessary for rapid progress in this area.
experiments and observations. The data Software and services can facilitate good data
gateway and the scientific data management practices that will feed AI agents, but actual
system will be critical components expected to accumulation of high-quality datasets is
substantially reduce the accumulation contingent on researchers using the
of “dark” (i.e., unpublished) data and aforementioned data software stack to
accelerate the accumulation of well-annotated populate data repositories. Policies must be
and standardized data for AI in the developed to minimize generation of dark data
upcoming decade. and maximize generation of well-annotated
data. AI efforts will be necessary to draw
Looking further ahead, the ASCR facilities will insights from the collected data, but facilities
continue to design complex, technically need to first train their researchers on ML,
advanced networking and computing facilities including DL, techniques. Furthermore,
for future science generations where the needs facilities will need to foster AI development
of the AI ecosystem will be an integral part of through dedicated research programs. Given
any initial design. Given the pace of change in the data explosion in practically all scientific
AI technology and techniques, these future domains, facilities will need to train
facilities will also need to be designed with researchers on using high-performance
flexibility in mind to take advantage of the computers for developing, scaling, and
advances that will inevitably come from deploying AI agents that can leverage the
application work over the next decade. ballooning body of data.
4. Accelerating Development 5. Expected Outcomes

An AI agent is only as capable as the quality of Without the support of DOE facilities, the
data used to train it. Currently, we lack the scientific community will struggle to take
infrastructure and policies to facilitate curation advantage of the promise of AI. The
of the high-quality datasets critical to fully processing power of DOE supercomputers,
realizing the potential of AI. The FAIR data including the forthcoming exascale systems, is
principles provide ample guidelines for vital to train AI algorithms using the huge
reaching this goal. Facilitating AI necessitates amounts of data being produced and curated
additional manpower for the further by the scientific community. However, simply
development of data management, movement, building these computing facilities does not
curation, publication, standardization, and guarantee that they will be accessible and
streaming software/services (see Chapter 12, useful for AI research. The infrastructure
Data Life Cycle and Infrastructure). Some work described in this chapter will be essential to
has already begun along these lines at every allow scientists to take full advantage of the
DOE facility. However, a highly coordinated compute resources DOE offers. AI will itself be
effort across the DOE complex will be essential to creating such an infrastructure.

With appropriate direction, funding, and the by embedded AI in a transparent facility
cross-facility cooperation described in this infrastructure spanning the DOE complex,
chapter, the goal of a seamlessly allowing data and compute resources to be
interconnected DOE complex can be achieved directed according to the needs of the
in 10 years. Such a reality will allow scientists scientists and the availability of resources,
to build AI-driven experimentation and without a human in the loop.
discovery workflows, optimized and controlled

AA. Report Writing Team
The following individuals made significant contributions to the final content of this report.
First Name Last Name Institution

Corey Adams Argonne National Laboratory
Srikanth Allu Oak Ridge National Laboratory
Jim Ang Pacific Northwest National Laboratory
Mihai Anitescu Argonne National Laboratory
Katerina Antypas Lawrence Berkeley National Laboratory
Melina L. Avila Coronado Argonne National Laboratory
Prasanna Balaprakash Argonne National Laboratory
Deborah Bard Lawrence Berkeley National Laboratory
Pete Beckman Argonne National Laboratory
Wes Bethel Lawrence Berkeley National Laboratory
Philip Bingham Oak Ridge National Laboratory
Kristofer Bouchard Lawrence Berkeley National Laboratory
Thomas S. Brettin Argonne National Laboratory
Ben Brown Lawrence Berkeley National Laboratory
Paolo Calafiura Lawrence Berkeley National Laboratory
Charles E. Catlett Argonne National Laboratory
Andrew Chien Argonne National Laboratory
Taylor Childers Argonne National Laboratory
Santanu Chaudhuri Argonne National Laboratory
Ian C. Cloet Argonne National Laboratory
Ren Cooper Lawrence Berkeley National Laboratory
David Dean Oak Ridge National Laboratory
Bert deJong Lawrence Berkeley National Laboratory
Marcel Demarteau Oak Ridge National Laboratory
Sudip Dosanjh Lawrence Berkeley National Laboratory
Dipankar Dwivedi Lawrence Berkeley National Laboratory
Nicola Ferrier Argonne National Laboratory
Ian Foster Argonne National Laboratory
Hector Garcia Martin Lawrence Berkeley National Laboratory
Devarshi Ghoshal Lawrence Berkeley National Laboratory
Chin Guok Lawrence Berkeley National Laboratory
Dogan Gursoy Argonne National Laboratory
Salman Habib Argonne National Laboratory
James Hack Oak Ridge National Laboratory
Kawtar Hafidi Argonne National Laboratory
Kenneth Herwig Oak Ridge National Laboratory
Judith Hill Oak Ridge National Laboratory
Forrest M. Hoffman Oak Ridge National Laboratory
Tianzhen Hong Lawrence Berkeley National Laboratory
David Humphreys General Atomics
Barbara Jacak Lawrence Berkeley National Laboratory
AA. REPORT WRITING TEAM 155

Cynthia Jenks Argonne National Laboratory
Mariam Kiran Lawrence Berkeley National Laboratory
Rao Kotamarthi Argonne National Laboratory
Ana Kupresanin Lawrence Livermore National Laboratory
Teja Kuruganti Oak Ridge National Laboratory
Frank Liu Oak Ridge National Laboratory
Bronson Messer Oak Ridge National Laboratory
Zein-Eddine Meziani Argonne National Laboratory
Georgios Michelogiannakis Lawrence Berkeley National Laboratory
Inder Monga Lawrence Berkeley National Laboratory
Dmitriy Morozov Lawrence Berkeley National Laboratory
Peter Nugent Lawrence Berkeley National Laboratory
Michael E. Papka Argonne National Laboratory
Mary Ann Piette Lawrence Berkeley National Laboratory
Alan Poon Lawrence Berkeley National Laboratory
Prabhat Lawrence Berkeley National Laboratory
Brian Quiter Lawrence Berkeley National Laboratory
Lavanya Ramakrishnan Lawrence Berkeley National Laboratory
Nageswara Rao Oak Ridge National Laboratory
Rob Ross Argonne National Laboratory
Rajesh Sankaran Argonne National Laboratory
Jibo Sanyal Oak Ridge National Laboratory
Martin Schoenball Lawrence Berkeley National Laboratory
Koushik Sen University of California, Berkeley
John Shalf Lawrence Berkeley National Laboratory
Arjun Shankar Oak Ridge National Laboratory
Michael Smith Oak Ridge National Laboratory
Suhas Somnath Oak Ridge National Laboratory
Bobby G. Sumpter Oak Ridge National Laboratory
Georgia Tourassi Oak Ridge National Laboratory
John Turner Oak Ridge National Laboratory
Tom Uram Argonne National Laboratory
James Vary Iowa State University
Velimir (Monty) V. Vesselinov Los Alamos National Laboratory
Jeffrey Vetter Oak Ridge National Laboratory
Venkatram Vishwanath Argonne National Laboratory
Haruko Wainwright Lawrence Berkeley National Laboratory
Stefan Wild Argonne National Laboratory
David Womble Oak Ridge National Laboratory
John Wu Lawrence Berkeley National Laboratory
Junqi Yin Oak Ridge National Laboratory
Steven Young Oak Ridge National Laboratory
Piotr Zarzycki Lawrence Berkeley National Laboratory
Petrus Zwart Lawrence Berkeley National Laboratory
AA. REPORT WRITING TEAM 156

AB. Agendas
AI for Science Town Hall
Argonne National Laboratory
Advanced Photon Source (APS), Building 402
July 22–23, 2019
Monday, July 22, 2019

8:30 a.m. Registration…………………………………………………………..APS Main Lobby
9:00 a.m. Welcome………………………………………………………………APS Auditorium

Kim Sawyer
9:10 a.m. Introductory Remarks………………………………………………APS Auditorium

Congressman Bill Foster
9:20 a.m. Opening Statement………………………………………………….APS Auditorium

Barbara Helland
9:30 a.m. AI for Science Opportunities………………………………………APS Auditorium

Rick Stevens
10:30 a.m. Morning Break
10:45 a.m. AI at Scale 1: Cosmology…………………………………………..APS Auditorium

Salman Habib
11:05 a.m. AI at Scale 2: Materials……………………………………………..APS Auditorium

Ian Foster
11:25 a.m. AI at Scale 3: Climate………………………………………………APS Auditorium

Rao Kotamarthi
11:45 a.m. Breakout Session Charge Questions…………………………...APS Auditorium

Rick Stevens
12:00 p.m. Collect Lunch and Proceed to Application Breakout Sessions
Materials, Chemistry and Nanoscience…………………………..TCS 1404/1405

Co-leads: Cynthia Jenks, Tim Germann
Session scribe: Chris Knight
Materials, Chemistry and Nanoscience………………………..…TCS 1406/1407

Co-leads: Steve Plimpton, Pieter Swart
Session scribe: Huihuo Zheng
Imaging and Scientific User Facilities……………………………….APS Gallery

Co-leads: Nicola Ferrier, Shinjae Yoo
Session scribe: Nicholas Schwarz
AB. AGENDAS 157

Imaging and Scientific User Facilities……………………..….APS E1100/E1200
Co-leads: Barry Chen, Christine Sweeney
Session scribe: Justin Wozniak
Environment, Climate and Earth Science…………………………….APS A1100

Co-leads: Rao Kotamarthi, Haruko Wainwright
Session scribe: Scott Collis
Biology and Life Science………………………………………….APS Auditorium

Co-leads: Thomas S. Brettin, Ben Brown
Session scribe: Gyorgy Babnigg
Fundamental Physics………………………………………………….…TCS 1416a

Co-leads: Katrin Heitmann, Paolo Calafiura
Session scribe: Corey Adams
Engineering and Technology……………………………………..Bldg. 241 D172

Co-leads: Santanu Chaudhuri, Stuart Slattery
Session scribe: Shashikant Aithal
Energy (wind, solar, fossil, etc.)……………………………………….TCS 1416b

Co-leads: Mihai Anitescu, Bill Tang
Session scribe: Julie Bessac
2:40 p.m. Breakout Sessions End
3:00 p.m. Breakouts Report Out (10 minutes each)…………………..…APS Auditorium
4:30 p.m. Day One Close-out Summary…………………………………...APS Auditorium

Rick Stevens
5:00 p.m. Adjourn
Tuesday, July 23, 2019

8:30 a.m. Registration………………………………………………………..APS Main Lobby
9:00 a.m. Summary of Day 1 and Day 2 Cross-cut Charge…………....APS Auditorium

Rick Stevens
9:30 a.m. Technological and Cross-cut Breakout Sessions
Optimization / UQ / Statistics……………..……………………..TCS 1404/1405

Co-leads: Stefan Wild, Clayton Webster
Session scribe: Bethany Lusch
Optimization / UQ / Statistics……………..……………………..TCS 1406/1407

Co-leads: Ana Kupresanin, Earl Lawrence
Session scribe: Vishwas Rao
Convergence of Simulation and Data Methods…………………...TCS 1416a

Co-leads: Emil Constantinescu, Frank Alexander
Session scribe: Taylor Childers
AB. AGENDAS 158

Convergence of Simulation and Data Methods…………………...TCS 1416b
Co-leads: Justin Newcomer, Cory Hauck
Session scribe: Hong Zhang
Data Infrastructure and Life Cycle……………………………PS E1100/E1200

Co-leads: Ian Foster, Kerstin Kleese van Dam
Session scribe: Youssef Nashed
Hardware and Architecture…………………………………………APS Gallery

Co-leads: Andrew Chien, Jeffrey Vetter
Session scribe: Murali Emani
Software Environments and Software Research………….APS Auditorium

Co-leads: Prasanna Balaprakash, Devarshi Ghoshal
Session scribe: Tom Uram
Facilities Integration………………………………………………….APS A1100

Co-leads: Michael E. Papka, Arjun Shankar
Session scribe: Ryan Milner
11:40 a.m. Collect Lunch and Proceed to Report Out Session
12:00 p.m. Breakouts Report Out (10 minutes each)…………………..APS Auditorium
1:30 p.m. Town Hall Close-out with Next Steps……………………….APS Auditorium

Rick Stevens
3:00 p.m. Town Hall Concludes
AB. AGENDAS 159

Oak Ridge National Laboratory
ORNL Conference Center
August 20–21, 2019
Tuesday, August 20, 2019

8:00 a.m. Registration and Working Continental Breakfast……...….ORNL Conference Center
8:30 a.m. Welcome and Introduction………………………………….…ORNL Conference Center

Jeffrey Nichols
8:35 a.m. ORNL Opening Remarks ………………….…………………..ORNL Conference Center

Jeff Smith
8:45 a.m. DOE HQ Opening Remarks………………………….……...…ORNL Conference Center

Steve Binkley
9:00 a.m. Keynote: AI for Science Opportunities…………………......ORNL Conference Center

David Womble
9:40 a.m. Plenary Session…………………………………………………ORNL Conference Center

Session Chair: Doug Kothe
AI at Scale 1: Microscopy
Sergei Kalinin
AI at Scale 2: Advanced Manufacturing

Tom Kurfess
AI at Scale 3: Health
Georgia Tourassi
10:40 a.m. Breakout Session Charge Questions……………………...ORNL Conference Center

Jeffrey Nichols
11:00 a.m. Collect Lunch and Proceed to Application Breakout Sessions
Materials, Chemistry and Nanoscience………………………………..…Tennessee B

Co-Leads: Bobby G. Sumpter, Markus Eisenbach, Wibe de Jong
Data Collection, Reduction, Analysis, and Imaging for

Scientific User Facilities……………………………….……………….…..Tennessee C
Co-Leads: Hans Christian, Sean Hearne, Christine Sweeny, Jack Wells,
Thomas Proffen
Environment, Climate and Earth Science…………………………….…Tennessee A

Co-Leads: Forrest M. Hoffman, Alison Boyer, Velimir (Monty) V. Vesselinov
Biology and Life Science…………………………………………………….….…Emory

Co-Leads: Julie Mitchell, Jacob Hinkle, Ben Brown
Fundamental Physics………………………………………………….……Cumberland
Co-Leads: Marcel Demarteau, Bronson Messer, Torre Wenaus
AB. AGENDAS 160

Fusion Energy……………………………………..……….Building 5700, Room F234
Co-Leads: Phil Ferguson, Mike Churchill, John Canik
Transportation and Mobility……………………………….5700, CASL Room B302a

Co-Leads: Robert Wagner, Jibo Sanyal, Stanley Young
Advanced Manufacturing…………………………..Building 5600, EVEREST (B228)

Co-Leads: Stuart Slattery, Vincent Paquit, Jim Belak
Energy Generation & Distribution……………….………Building 5700, Room L204

Co-Leads: Teja Kuruganti, Tara Pandya, Mike Sprague
3:00 p.m. Breakout Reports Out (10 minutes each) ......................ORNL Conference Center
4:30 p.m. Day One Close-out Summary..........................................ORNL Conference Center

Jeffrey Nichols
5:00 p.m. Reception .............................................................ORNL Conference Center Lobby
Wednesday, August 21, 2019

8:00 a.m. Registration and Working Continental Breakfast……..ORNL Conference Center
8:30 a.m. Day 2 Welcome ………………………………………….…..ORNL Conference Center

Thomas Zacharia
8:45 a.m. Summary of Day 1 and Day 2 Cross-cut Charge………ORNL Conference Center
Jeffrey Nichols
Numerical Aspects of Learning………………….……...Building 5700, Room F234

Co-Leads: Clayton Webster, Stefan Wild, Sandeep Madireddy
Model Applicability and Characterization………………………………Tennessee B

Co-Leads: Blair Christian, Dan Lu, Justin Newcomer
Decision Support …………………………………………….5700, CASL Room B302a

Co-Leads: Rick Archibald, Tom Potok, Cynthia Phillip
Science Informed Learning ……………………..……………………...…Tennessee C

Co-Leads: Scott Klasky, Cory Hauck, Jeff Hittinger
Software Environments and Software Research….…………………………...Emory

Co-Leads: Robert Patton, Judith Hill, Eric Cyr
Data Infrastructure & Life Cycle…………………………………………..Tennessee A

Co-Leads: Arjun Shankar, Katie Knight, Brad Settlemyer
Hardware and Architecture……..…………………..Building 5600, EVEREST (B228)

Co-Leads: Katie Schuman, Travis Humble, Kenneth Alvin
Facilities Integration and AI Ecosystem …………….............................Cumberland

Co-Leads: James Hack, Michael E. Papka, Inder Monga
AB. AGENDAS 161

12:00 p.m. Collect Lunch and Head Back to Breakout Session
1:00 p.m. Final Report Out from Breakout Session (10 minutes each)
2:30 p.m. Town Hall Close-out with Next Steps…………….…..….ORNL Conference Center
Jeffrey Nichols
AB. AGENDAS 162

Lawrence Berkeley National Laboratory
Building 50 Auditorium
September 11–12, 2019
Wednesday, September 11, 2019

7:15 a.m. Registration…………………………………………….Building 50 Auditorium Lobby
7:30 a.m. Networking Breakfast
8:30 a.m. Welcome and Introduction……………………….……….….Building 50 Auditorium

Mike Witherell
8:40 a.m. Opening Remarks……………………………………………...Building 50 Auditorium

Barbara Helland
8:50 a.m. AI for Science Opportunities and Meeting Objectives

Katherine Yelick
9:40 a.m. Break
9:55 a.m. Examples of AI at Scale……………………………………....Building 50 Auditorium

Session Chair: David Brown
AI, Machine Learning, and Experimental Facilities

James Sethian
AI at Scale: Astrophysics
Josh Bloom
AI at Scale in Biology
Ben Brown
11:25 a.m. Breakout Logistics…………………………………………….Building 50 Auditorium

Katherine Yelick
11:30 a.m. Collect Lunch and Proceed to Application Breakout Sessions
11:45 a.m. Application Breakout Sessions
Physical Sciences
Coordinator: Paolo Calafiura
Cosmology and Astrophysics ................................................................... 59-4016

Co-leads: Uros Seljak, Salman Habib
Particle Physics .......................................................................................... 59-4022

Co-leads: Steve Farrell, Ariel Schwartzman
Accelerator Science ................................................................................... 50-4058

Co-leads: Remi Lehe, Daniel Ratner
AB. AGENDAS 163

Fusion.......................................................................................................... 59-4102
Co-leads: CS Chang, Mike Zarnstorff
Energy Sciences
Coordinator: Jonathan Carter
Materials and Chemistry Modeling .............................................................. 66-316

Co-leads: Anubhav Jain, Jeff Hammond, Shyam Dwaraknath
Materials Synthesis and Chemistry ............................................................. 62-203

Co-leads: Carolin Sutter-Fella, Emory Chan, Ethan Crumlin
Light Sources .............................................................................................. 66-Aud

Co-leads: Alex Hexemer, Petrus Zwart, Chris Jacobsen
Electron Microscopy Imaging .................................................................... 67-3111

Co-leads: Mary Scott, Eva Nogales, Marcus Hanwell
Earth and Environmental Sciences

Coordinator: Trever Keenan, Dipankar Dwivedi
Climate and Carbon ...................................................................................... 84-318

Co-leads: Trevor Keenan, Nathan Urban, Esmond Ng
Subsurface and Geoscience ........................................................................ 74-104

Co-leads: Martin Schoenball, Piotr Zarzycki, Andrew Stack
Water ............................................................................................................. 74-324

Co-leads: Dipankar Dwivedi, Hoshin Gupta, Grey Nearing
Biological and Life Sciences

Coordinator: Ben Brown
Microbiome and Environmental Biology ................................................... 59-3049

Co-leads: Paramvir Dehal, Jennifer Clarke
Synthetic Biology ....................................................................................... 59-3042

Co-leads: Hector Garcia Martin, Peter St. John
Health .......................................................................................................... 59-3025

Co-leads: Kris Bouchard, Tina Hernandez-Boussard
Engineering and Infrastructure

Coordinator: Peter Nugent
Engineering and Manufacturing ..............................................................70A-3377

Co-leads: Stuart Slattery, Tarek Zohdi
Transportation / Mobility ............................................................................ 59-3104

Co-leads: Cy Chan, Timothy Berg
Urban ........................................................................................................... 59-3104

Co-leads: Mary Ann Piette, Peter Graf
AB. AGENDAS 164

SmartGrid .................................................................................................... 59-3104
Co-leads: William Hart, Russell Bent
Computer Science
Coordinator: Katherine Yelick
AI Networking and Computing Facilities................................................... 59-3101

AI for anomaly detection, cybersecurity, networking management, etc.
Co-leads: Mariam Kiran, Nageswara Rao, Lavanya Ramakrishnan
AI for Computer Hardware and Software .................................................. 59-3101

AI for architecture design, programming, etc.
Co-leads: Georgios Michelogiannakis, Koushik Sen
2:30 p.m. Breakout Sessions End .................................................... Building 50 Auditorium
3:00 p.m. Lightning Breakouts Report Out (5 minutes each) ......... Building 50 Auditorium
5:00 p.m. Networking Reception ................................................................ Building 59 Plaza
6:00 p.m. Adjourn
Thursday, September 12, 2019

7:15 a.m. Registration.......................................................... Building 50 Auditorium Lobby
7:30 a.m. Networking Breakfast
8:30 a.m. Summary of Day 1 and Day 2 Cross-cut Charge ........... Building 50 Auditorium
Katherine Yelick
8:45 a.m. Travel to breakout locations
Math Foundations for AI

Coordinator: Tamara Kolda (SNL)
Performance Optimization of Deep Learning .......................................... 59-3054

Numerical and stochastic optimization, network design, hyperparameter
search, network compression, parallelization
Co-leads: Aydin Buluc, Sherry Li
Foundations and Challenges of Deep Learning ...................................... 59-3049

Numerical properties of DL, problems with generalization, understanding
how it works and failure modes, theoretical considerations
Co-leads: Tamara Kolda, Tess Smidt
Opportunities and Foundations of Traditional ML .................................. 59-3025

Regression, random forests, support vector machines, principal
component analysis, clustering, optimization methods
Co-leads: Justin Newcomer, Ali Pinar
AB. AGENDAS 165

Reinforcement/Streaming learning for Decision Support / Control ....... 59-3070
Real-time control and decision-making, incorporating feedback
Co-leads: Mike Mahoney, Prabhat
ML for science problems with limited data .............................................. 59-4101

Bayesian methods, matrix completion, statistical sampling design
Co-leads: Jeremy Templeton, Janine Bennett
Science-informed learning ........................................................................ 59-4102

Physics/chemistry/biology-constrained, data integration
Co-leads: Juliane Mueller, Stefan Wild
Uncertainty Quantification ........................................................................ 59-3104

Co-leads: Habib Najm, David Barajas-Solano
Use of AI with Simulation .............................................................. 50 Auditorium

Co-leads: Marcus Day, Katherine Lewis
Software Environments and Research.............................. 54-Perseverance Hall

How will we write AI software? Tensorflow, Pytorch, etc., and DOE-developed
alternatives or improvements for science? What OS services, workflows, etc. are
needed?
Co-leads: Dmitriy Morozov, Charles Tripp
Data Lifecycle ............................................................................................ 59-3101

Data preparation, data sets, traditional analytics, de-noising, provenance, etc.
Co-leads: Wes Bethel, John Wu
Hardware Technology ............................................................................ 50B-4205

Centralized HPC, Edge Devices…
Co-leads: John Shalf, James Ang
Facilities Infrastructure and Integration; the AI Ecosystem ................ 70A-3377

I/O balance, on-demand computing, science gateways, networking
Co-leads: Inder Monga, Deborah Bard, Michael E. Papka
Cybersecurity and Privacy ........................................................................ 59-4016

Security of Cyber-physical systems, data privacy
Lead: Sean Peisert
11:30 a.m. Collect Lunch and Proceed in to Report Out Session .. Building 50 Auditorium
11:45 a.m. Breakouts Report Out (5 minutes each) ......................... Building 50 Auditorium
1:45 p.m. Town Hall Close-out with Next Steps ............................. Building 50 Auditorium
Katherine Yelick
AB. AGENDAS 166

Washington, DC
Renaissance DC - Downtown Hotel
October 22–23, 2019
Tuesday, October 22, 2019

7:30 a.m. Registration and Working Continental Breakfast………...Ballroom Level Lobby
8:30 a.m. Welcome and Introduction……………………………………Grand Ballroom North

Barbara Helland
8:45 a.m. DOE HQ Opening Remarks…………………………………...Grand Ballroom North

Chris Fall
9:00 a.m. Summary from 3 Town Halls.…………….…………………..Grand Ballroom North

Katherine Yelick, Rick Stevens, Jeffrey Nichols
10:00 a.m. Break
10:30 a.m. How Significant will AI be for the Energy Sector?.……….Grand Ballroom North
Quantifying progress and outlining signposts
Claire Curry, Bloomberg New Energy Finance
11:15 a.m. AI Research Update: What’s Going On Around ………….Grand Ballroom North
The World and Our Research Plans for Studying AI For Science
Earl Joseph, Hyperion Research
11:45 a.m. Break for Working Lunch

Networking and Preparation for Breakout Sessions
1:30 p.m. Breakout Sessions
Machine Learning Foundations and Open Problems……Grand Ballroom North

Co-Leads: David Womble, Stefan Wild, Prabhat
Facilities Integration and AI Ecosystem…………………………...Meeting Room 3

Co-Leads: James Hack, Michael E. Papka, Sudip Dosanjh, Inder Monga
Earth and Environmental Sciences………………………….……..Meeting Room 6

Co-Leads: Forrest M. Hoffman, Rao Kotamarthi, Haruko Wainwright
Chemistry, Materials, and Nano Science………………………….Meeting Room 7

Co-Leads: Cynthia Jenks, Bert deJong
Engineering and Manufacturing…………………………………….Meeting Room 8

Co-Leads: John Turner, Santanu Chaudhuri, Peter Nugent
Nuclear Physics……………………………………………………......Meeting Room 9

Co-Leads: David Dean, Zein-Eddine Meziani, Brian Quiter
Data Life Cycle and Infrastructure…………………………………Meeting Room 10

Co-Leads: Arjun Shankar, Nicola Ferrier, Wes Bethel
AB. AGENDAS 167

Support for AI for Experimental Facilities……………………….Meeting Room 16
Co-Leads: Kenneth Herwig, Dogan Gursoy, Petrus Zwart
3:15 p.m. Break
3:30 p.m. Startup Innovations in AI Hardware……………………..….Grand Ballroom North

Moderator: Rick Stevens
Andy Hock
Kunle Olukotun
Dale Southard
4:30 p.m. Breakout Summary………………………………..…………..Grand Ballroom North

Valerie Taylor, Arthur Barney Maccabe, David Brown
5:00 p.m. Close-out for the Day………………………………….………Grand Ballroom North

Barbara Helland
Wednesday, October 23, 2019

7:30 a.m. Registration and Working Continental Breakfast
8:30 a.m. Day 2 Welcome……………………………………………….Grand Ballroom North

Barbara Helland
8:45 a.m. Breakouts

AI for Computer Science……………………………………Grand Ballroom North
Co-Leads: Nageswara Rao, Prasanna Balaprakash. Lavanya Ramakrishnan
Biology and Life Sciences…………………………………..……..Meeting Room 3

Co-Leads: Georgia Tourassi, Thomas S. Brettin, Ben Brown
High Energy Physics…………………………………………..…...Meeting Room 6

Co-Leads: Salman Habib, Paolo Calafiura
Smart Energy Infrastructure………………………………………Meeting Room 8

Co-Leads: Teja Kuruganti, Mihai Anitescu, Tianzhen Hong
Software Environments and Software Research………………Meeting Room 9

Co-Leads: Judith Hill, Rob Ross, Katerina Antypas
Support for AI at the Edge………………………………………..Meeting Room 10

Co-Leads: Steven Young, Pete Beckman, John Wu
Hardware Architectures………………………………………......Meeting Room 16

Co-Leads: Jeffrey Vetter, Andrew Chien, John Shalf
10:15 a.m. Break
10:30 a.m. DOE Headquarters Remarks…………………………..……Grand Ballroom North

Paul Dabbar
AB. AGENDAS 168

10:45 a.m. Cross Agency AI Strategies…………………………..…….Grand Ballroom North
Moderator: Lynne Parker (OSTP)
DOE – Steve Binkley
DOD NSA Research – Adam Cardinal-Stakenas
NSF - Erwin Gianchandani
NIH - Susan Gregurick
11:45 a.m. Breakout Summary………………………………….………..Grand Ballroom North

Valerie Taylor, Arthur Barney Maccabe, David Brown
12:15 p.m. AI Killer Applications…………………………………….…..Grand Ballroom North

Rick Stevens, Katherine Yelick, Jeffrey Nichols
1:00 p.m. Wrap Up…………………………………………………….…..Grand Ballroom North

Barbara Helland
1:15 p.m. Working Lunch………………………………………….…….Ballroom Level Lobby

Networking and Coordination of Town Hall Report
AB. AGENDAS 169

170
AC. Combined Town Hall Registrants
Brook Abegaz Loyola University of Chicago
Gina Adam George Washington University
Corey Adams Argonne National Laboratory
Marc Adams NVIDIA Corporation
Ryan Adamson Oak Ridge National Laboratory
Adetokunbo Adedoyin Los Alamos National Laboratory
Vivek Agarwal Idaho National Laboratory
Greeshma Agasthya Oak Ridge National Laboratory
Jeffery Aguiar Idaho National Laboratory
Lars Ahlfors Microsoft Corporation
James Ahrens Los Alamos National Laboratory
Sachin Ahuja CNH Industrial
James Aimone Sandia National Laboratories
Shashi Aithal Argonne National Laboratory
Adeel Akram Uppsala University
Maksudul Alam Oak Ridge National Laboratory
Frank Alexander Brookhaven National Laboratory
Boian Alexandrov Los Alamos National Laboratory
Yuri Alexeev Argonne National Laboratory
Stephanie Allport Bloomberg
Srikanth Allu Oak Ridge National Laboratory
Jeff Alstott Intelligence Advanced Research Projects
Activity
Ilkay Altintas University of California, San Diego
Kenneth Alvin Sandia National Laboratories
James Amundson Fermi National Accelerator Laboratory
Valentine Anantharaj Oak Ridge National Laboratory
James Ang Pacific Northwest National Laboratory
Mihai Anitescu Argonne National Laboratory
Dionysios Antonopoulos Argonne National Laboratory
Katerina Antypas Lawrence Berkeley National Laboratory
Chid Apte IBM Research
Rick Archibald Oak Ridge National Laboratory
Whitney Armstrong Argonne National Laboratory
Richard Arthur General Electric Research
Srinivasan Arunajatesan Sandia National Laboratories
Paul Atzberger University of California, Santa Barbara
Brian Austin Lawrence Berkeley National Laboratory
Ariful Azad Indiana University
Gyorgy Babnigg Argonne National Laboratory
Tyler Backman Lawrence Berkeley National Laboratory
Drew Baden Department of Energy, High Energy
Physics
AC. COMBINED TOWN HALL REGISTRANTS 171

David Bader New Jersey Institute of Technology
Jermon Bafaty Department of Energy, Artificial Intelligence
and Technology Office
Zhe Bai Lawrence Berkeley National Laboratory
Ray Bair Argonne National Laboratory
Vamshi Balanaga Sandia National Laboratory/UC Berkeley
Prasanna Balaprakash Argonne National Laboratory
Jan Balewski Lawrence Berkeley National Laboratory
Mark Bandstra Lawrence Berkeley National Laboratory
Feng Bao Florida State University
David Barajas-Solano Pacific Northwest National Laboratory
Giuseppe Barbalinardo University of California, Davis
Deborah Bard Lawrence Berkeley National Laboratory
Jaydeep Bardhan GlaxoSmithKline
Ashley Barker Oak Ridge National Laboratory
Richard Barnes Lawrence Berkeley National Laboratory
Kipton Barros Los Alamos National Laboratory
Edward Barry Argonne National Laboratory
Robert Bartolo Transformational Liaisons (TRL), LLC
Bipul Barua Argonne National Laboratory
Jennifer Bauer National Energy Technology Laboratory
Alex Bayen University of California, Berkeley
Matthew Becker Argonne National Laboratory
Pete Beckman Argonne National Laboratory
Bo Begole AMD Research
James Belak Lawrence Livermore National Laboratory
Matt Bement Los Alamos National Laboratory
Douglas Benjamin Argonne National Laboratory
Janine Bennett Sandia National Laboratories
Russell Bent Los Alamos National Laboratory
Timothy Berg Sandia National Laboratories
Joshua Bergerson Argonne National Laboratory
Anne Berres Oak Ridge National Laboratory
Michael Berube Department of Energy
Julie Bessac Argonne National Laboratory
Wes Bethel Lawrence Berkeley National Laboratory
Budhu Bhaduri Oak Ridge National Laboratory
Wahid Bhimji Lawrence Berkeley National Laboratory
Debsihdu Bhowmik Oak Ridge National Laboratory
Sirui Bi Oak Ridge Institute for Science and
Education
Tekin Bicer Argonne National Laboratory
Sandra Biedron Element Aero
Hassina Bilheux Oak Ridge National Laboratory
Jean Bilheux Oak Ridge National Laboratory
Jay Jay Billings Oak Ridge National Laboratory

Adam Bingston Oak Ridge National Laboratory
Steve Binkley Department of Energy
Jens Birkholzer Lawrence Berkeley National Laboratory
Larry Birnbaum Northwestern University
Ayan Biswas Los Alamos National Laboratory
Laura Biven Department of Energy, Advanced Scientific
Computing Research
Rocco Blais National Intelligence University
Arthur Bland Oak Ridge National Laboratory
Willem Blokland Oak Ridge National Laboratory
Josh Bloom Lawrence Berkeley National Laboratory
Swen Boehm Oak Ridge National Laboratory
Amber Boehnlein Jefferson Laboratory
John Boger Department of Energy
Dorian Bohler SLAC National Accelerator Laboratory
Trudy Bolin University of Wisconsin, Milwaukee
Lynn Borkon Frederick National Laboratory
Nikolay Borodinov Oak Ridge National Laboratory
Kristofer Bouchard Lawrence Berkeley National Laboratory
Charles Bouman Purdue University
Alison Boyer Oak Ridge National Laboratory
Mark Boyer Princeton Plasma Physics Laboratory
Selen Bozkurt Stanford University
Tom Brady Dell Technologies
Jim Brandt Sandia National Laboratories
Justin H. S. Breaux Argonne National Laboratory
Peer-Timo Bremer Lawrence Livermore National Laboratory
Thomas S. Brettin Argonne National Laboratory
Ron Brightwell Sandia National Laboratories
Michael Brim Oak Ridge National Laboratory
Grant Bromhal National Energy Technology Laboratory
David Bross Argonne National Laboratory
Ben Brown Department of Energy, Advanced Scientific
Computing Research
David Brown Lawrence Berkeley National Laboratory
J. Ben Brown Lawrence Berkeley National Laboratory
Acacia Brunett Argonne National Laboratory
Mark Buckner Oak Ridge National Laboratory
Aydin Buluc Lawrence Berkeley National Laboratory
Keith Burghardt University of Southern California
Shawn Burns Sandia National Laboratories
Ralph Butler Argonne National Laboratory/Middle
Tennessee State University
Suren Byna Lawrence Berkeley National Laboratory
John Byrd Argonne National Laboratory
Viveck Cadambe Pennsylvania State University

Helen Cademartori Lawrence Berkeley National Laboratory
Hao Cai Argonne National Laboratory
Zhonghou Cai Argonne National Laboratory
Paolo Calafiura Lawrence Berkeley National Laboratory
Kelly Callison Information International Associates, Inc
John Canik Oak Ridge National Laboratory
Shane Canon Lawrence Berkeley National Laboratory
Yue Cao Argonne National Laboratory
Jian Cao Northwestern University
Adam Cardinal-Stakenas National Security Agency, Research
Suma Cardwell Sandia National Laboratories
Altaf Carim Department of Energy, High Energy
Physics
Richard Carlson Department of Energy
Jonathan Carter Lawrence Berkeley National Laboratory
Emily Casleton Los Alamos National Laboratory
Vic Castillo Lawrence Livermore National Laboratory
Charlie Catlett Argonne National Laboratory
Christine Chalk Department of Energy
Maria Chan Argonne National Laboratory
Emory Chan Lawrence Berkeley National Laboratory
Cy Chan Lawrence Berkeley National Laboratory
Hau Chan University of Nebraska, Lincoln
Jin Chang California Institute of Technology
Shing Chang Kansas State University
Hang Chang Lawrence Berkeley National Laboratory
CS (Choongseok) Chang Princeton Plasma Physics Laboratory
Lali Chatterjee Department of Energy, High Energy
Physics
Arghya Chatterjee Oak Ridge National Laboratory
Santanu Chaudhuri Argonne National Laboratory
Julio Jonas Chaves Montero Argonne National Laboratory
Saurabh Chawdhary Argonne National Laboratory
Weiyang Chen Argonne National Laboratory
Jinsong Chen Lawrence Berkeley National Laboratory
Barry Chen Lawrence Livermore National Laboratory
Wei Chen Northwestern University
Jieyang Chen Oak Ridge National Laboratory
Jacqueline Chen Sandia National Laboratories
Jian Chen The Ohio State University/Interactive Visual
Computing Lab
Shunda Chen University of California, Davis
Alvin Cheung University of California, Berkeley
Andrew Chien Argonne National Laboratory
Taylor Childers Argonne National Laboratory

Eric Chisolm Department of Energy, National Nuclear
Security Administration
Jong Youl Choi Oak Ridge National Laboratory
Swati Choudhary Calysta
Alok Choudhary Northwestern University
Souma Chowdhury University at Buffalo
Marshall Choy SambaNova Systems
Hans Christen Oak Ridge National Laboratory
Blair Christian Oak Ridge National Laboratory
Giri Chukkapalli NVIDIA Corporation
Sudheer Chunduri Argonne National Laboratory
Michael Churchill Princeton Plasma Physics Laboratory
Jennifer Clarke University of Nebraska
Ian Cloet Argonne National Laboratory
Daniel Clouse Department of Defense
Ryan Coffee SLAC National Accelerator Laboratory
Susan Coghlan Argonne National Laboratory
Mark Coletti Oak Ridge National Laboratory
Jim Collins Argonne National Laboratory
William Collins Lawrence Berkeley National Laboratory
Scott Collis Argonne National Laboratory
Samuel Collis Sandia National Laboratories
Guojing Cong IBM Research
Emil Constantinescu Argonne National Laboratory
Simon Corrodi Argonne National Laboratory
Andrea Cortis Belmont Technology
Chip Cotton General Electric Research
Sarah Cousineau Oak Ridge National Laboratory
Stephen Crago University of Southern California, ISI
Claire Cramer Department of Energy
Valentino Crespi University of Southern California, ISI
Jody Crisp Oak Ridge Institute for Science and
Education
Ethan Crumlin Lawrence Berkeley National Laboratory
Claire Curry Bloomberg
Matthew Curry Sandia National Laboratories
Larry Curtiss Argonne National Laboratory
Christine Custis NewPearl, Inc.
Christine Cutillo National Institutes of Health, NCATS
Eric Cyr Sandia National Laboratories
Ed D’Azevedo Oak Ridge National Laboratory
Paul Dabbar Department of Energy
Jamison Daniel Oak Ridge National Laboratory
Payel Das IBM Research
Debolina Dasgupta Argonne National Laboratory
Ganesh Dasika AMD Research

Warren Davis Sandia National Laboratories
Marcus Day Lawrence Berkeley National Laboratory
Maarten de Hoop Rice University
Wibe de Jong Lawrence Berkeley National Laboratory
Cees de Laat Lawrence Berkeley National Laboratory
Sebastian De Pascuale Oak Ridge National Laboratory
David Dean Oak Ridge National Laboratory
Victor Decaria Oak Ridge National Laboratory
Gemechis Degaga Oak Ridge National Laboratory
Anthony DeGennaro Brookhaven National Laboratory
Paramvir Dehal Lawrence Berkeley National Laboratory
Payman Dehghanian George Washington University
Diego del Castillo Negrete Oak Ridge National Laboratory
Phillip DeLeon New Mexico State University
Marcel Demarteau Oak Ridge National Laboratory
James Demmel University of California, Berkeley
Patric Den Hartog Argonne National Laboratory
Anton Dereventsov Oak Ridge National Laboratory
Riccardo Dettori University of California, Davis
Sheng Di Argonne National Laboratory
Zichao Wendy Di Argonne National Laboratory
Alexa Di Paolo Bloomberg
Lori Diachin Lawrence Livermore National Laboratory
Jorge Diaz Cruz University of New Mexico\ SLAC
Emily Dietrich Argonne National Laboratory
Spiros Dimolitsas Georgetown University
Chao Ding Lawrence Berkeley National Laboratory
Nan Ding Lawrence Berkeley National Laboratory
Remi Dingreville Sandia National Laboratories
Stanley Dodds University of Hawaii/Institute for Astronomy
Emily Donahue Sandia National Laboratories
Sijia Dong Argonne National Laboratory
Ge Dong Princeton Plasma Physics Laboratory
Jack Dongarra University of Tennessee
Jana Doppa Washington State University
Max Dornfest Lawrence Berkeley National Laboratory
Sudip Dosanjh Lawrence Berkeley National Laboratory
Mathieu Doucet Oak Ridge National Laboratory
Ye Duan University of Missouri
Javier Duarte Fermi National Accelerator Laboratory
Nicolas Dube Hewlett Packard Enterprise
Vincent Dumont Lawrence Berkeley National Laboratory
Daniel Dunlavy Sandia National Laboratories
Soumya Dutta Los Alamos National Laboratory
Shyam Dwaraknath Lawrence Berkeley National Laboratory
Dipankar Dwivedi Lawrence Berkeley National Laboratory

Carol Eddy-Dilek Savannah River National Laboratory
Romain Egele Argonne National Laboratory
Markus Eisenbach Oak Ridge National Laboratory
Muammar El Khatib Lawrence Berkeley National Laboratory
V. Daniel Elvira Fermi National Accelerator Laboratory
Wael Elwasif Oak Ridge National Laboratory
Murali Emani Argonne National Laboratory
Sujata Emani Department of Energy, BER
Eirik Endeve Oak Ridge National Laboratory
Christian Engelmann Oak Ridge National Laboratory
Sarah Eno University of Maryland
Peter Ercius Lawrence Berkeley National Laboratory
Ali Erdemir Argonne National Laboratory
Stephane Ethier Princeton Plasma Physics Laboratory
David Etim Department of Energy, National Nuclear
Kate Evans Oak Ridge National Laboratory
Tom Evans Oak Ridge National Laboratory
VJ Ewing Oak Ridge National Laboratory
Farah Fahim Fermi National Accelerator Laboratory
Fariba Fahroo Air Force Office of Scientific Research
Chris Fall Department of Energy
George Fann Oak Ridge National Laboratory
Paolo Faraboschi Hewlett Packard Enterprise
Amro Farid Dartmouth College
Steven Farrell Lawrence Berkeley National Laboratory
Pooyan Fazli San Francisco State University
Tingzhou Fei Argonne National Laboratory
Frank Felder Rutgers University
Yan Feng Argonne National Laboratory
Wu Feng Virginia Tech
Phil Ferguson Oak Ridge National Laboratory
Nicola Ferrier Argonne National Laboratory
Emily Fetter Boston University
Hal Finkel Argonne National Laboratory
Nicole Fisk Cray, Inc.
Mary Fitzpatrick Argonne National Laboratory
Aaron Fluitt Argonne National Laboratory
Thomas Flynn Brookhaven National Laboratory
David Fobes Los Alamos National Laboratory
Fernanda Foertter NVIDIA Corporation
Ian Foster Argonne National Laboratory
Guillaume Fouche Bloomberg
Geoffrey Fox Indiana University
Kelly Gaither The University of Texas at Austin
Alexey Galda Argonne National Laboratory

Alfredo Galindo-Uribarri Oak Ridge National Laboratory
Jack Gallant University of California, Berkeley
Yu Gan University of Alabama
Baskar Ganapathysubramanian Iowa State University
Rishi Ganeriwala Lawrence Livermore National Laboratory
Hector Garcia Martin Lawrence Berkeley National Laboratory
Marta Garcia Martinez Argonne National Laboratory
Arti Garg Cray, Inc.
Krishna Garikipati University of Michigan
Christopher Garland Argonne National Laboratory
Andrew Gaspar Los Alamos National Laboratory
Gerald Geernaert Department of Energy
R.Stuart Geiger University of California, Berkeley
Al Geist Oak Ridge National Laboratory
Ann Gentile Sandia National Laboratories
Cole Gentry Oak Ridge National Laboratory
Martina Gerbino Argonne National Laboratory
Tim Germann Los Alamos National Laboratory
Berk Geveci Kitware, Inc.
Mehran Ghafari University of Tennessee at Chattanooga
Devarshi Ghoshal Lawrence Berkeley National Laboratory
Erwin Gianchandani National Science Foundation
Tom Gibbs NVIDIA Corporation
Scott Gibson Oak Ridge National Laboratory
Michael Giering United Technologies/Pratt & Whitney
Roscoe Giles Boston University
Roberto Gioiosa Pacific Northwest National Laboratory
Shawn Gleason Oak Ridge National Laboratory
David Gleich Purdue University
Sergei Gleyzer University of Alabama/Fermilab
Jennifer Glore SambaNova Systems
Eric Goodman Sandia National Laboratories
Daniel Gopman National Institute of Standards and
Technology
Ben Gould Dell EMC
Marco Govoni Argonne National Laboratory
Peter Graf National Renewable Energy Laboratory
Carlo Graziani Argonne National Laboratory
Emily Greenspan National Cancer Institute
Susan Gregurick National Institutes of Health
Annette Greiner Lawrence Berkeley National Laboratory
Michael Grosskopf Los Alamos National Laboratory
Allan Grosvenor Microsurgeonbot Inc.
Ray Grout National Renewable Energy Laboratory
Taylor Groves Lawrence Berkeley National Laboratory
Amy Gryshuk Lawrence Livermore National Laboratory

Qiang Guan Kent State University/Los Alamos National
Laboratory
Mamikon Guillan Sandia National Laboratories
Donna Guillen Idaho National Laboratory
Max Gunzburger Oak Ridge National Laboratory
Hanqi Guo Argonne National Laboratory
Haobo Guo University of Tennessee at Chattanooga
Chin Guok Lawrence Berkeley National Laboratory
Geetika Gupta NVIDIA Corporation
Hoshin Gupta University of Arizona
Dogan Gursoy Argonne National Laboratory
Tejas Guruswamy Argonne National Laboratory
Benjamin Gutierrez-Garcia Argonne National Laboratory
Salman Habib Argonne National Laboratory
James Hack Oak Ridge National Laboratory
Kawtar Hafidi Argonne National Laboratory
Aric Hagberg Los Alamos National Laboratory
Shima Hajimirza Texas A&M University
Mahantesh Halappanavar Pacific Northwest National Laboratory
Jason Hales Idaho National Laboratory
Charlotte Haley Argonne National Laboratory
Scot Halverson Los Alamos National Laboratory
Kathleen Hamilton Oak Ridge National Laboratory
Jeff Hammond Intel Corporation
Steve Hammond National Renewable Energy Laboratory
T. Yong Han Lawrence Livermore National Laboratory
Briana Hanafin Accenture
Marcus Hanwell Kitware, Inc.
Zhao Hao Lawrence Berkeley National Laboratory
Bruce Hardy Savannah River National Laboratory
Rachel Harken Oak Ridge National Laboratory
Kevin Harms Argonne National Laboratory
Peter Harrington Lawrence Berkeley National Laboratory
William Hart Sandia National Laboratories
Cory Hauck Oak Ridge National Laboratory
Nancy Hayden Sandia National Laboratories
Andrew Hearin Argonne National Laboratory
Sean Hearne Oak Ridge National Laboratory
Matt Heavner Los Alamos National Laboratory
Alexander Heifetz Argonne National Laboratory
Nils Heinonen Argonne National Laboratory
Olle Heinonen Argonne National Laboratory
Alan Heirich SLAC National Accelerator Laboratory
Katrin Heitmann Argonne National Laboratory
Barbara Helland Department of Energy, Office of Science
Bruce Hendrickson Lawrence Livermore National Laboratory

Nicolas Hengartner Los Alamos National Laboratory
Marc Henry de Frahan National Renewable Energy Laboratory
Tina Hernandez-Boussard Stanford University
Michael Heroux Sandia National Laboratories
Kenneth Herwig Oak Ridge National Laboratory
Joel Hestness Cerebras Systems
Alexander Hexemer Lawrence Berkeley National Laboratory
Tony Hey SciML Group, Rutherford Appleton Lab, UK
Judith Hill Oak Ridge National Laboratory
Lindsey Hillesheim Cray, Inc.
Jacob Hinkle Oak Ridge National Laboratory
Jeffrey Hittinger Lawrence Livermore National Laboratory
Justin Hnilo Department of Energy
Phay Ho Argonne National Laboratory
Thuc Hoang Department of Energy, National Nuclear
Andy Hock Cerebras Systems
Torsten Hoefler ETH Zurich
Forrest M. Hoffman Oak Ridge National Laboratory
Sabine Hollatz Stanford University
Brian Homerding Argonne National Laboratory
Vasant Honavar Pennsylvania State University
Tianzhen Hong Lawrence Berkeley National Laboratory
Walter Hopkins Argonne National Laboratory
Chet Hopp Lawrence Berkeley National Laboratory
Paul Hovland Argonne National Laboratory
Stephan Hoyer Google Research
Elizabeth Hsu National Cancer Institute
Lucy Hsu National Institutes of Health, NHLBI
Michael Hu Argonne National Laboratory
Rui Hu Argonne National Laboratory
Bin Hu Los Alamos National Laboratory
Xiang Huang Argonne National Laboratory
Yu Huang Argonne National Laboratory
Xiaobiao Huang SLAC National Accelerator Laboratory
Eliu Huerta University of Illinois at Urbana-Champaign
Ashley Huff Oak Ridge National Laboratory
David Hughes Oak Ridge National Laboratory
Travis Humble Oak Ridge National Laboratory
Sean Hurley California Polytechnic State University
Lorraine Hwang University of California, Davis
Hoon Hwangbo University of Tennessee
Costin Iancu Lawrence Berkeley National Laboratory
Khaled Ibrahim Lawrence Berkeley National Laboratory
Matthew Igel University of California, Davis
Gabriel Ilevbare Idaho National Laboratory

Nwike Iloeje Argonne National Laboratory
Ilse C.F. Ipsen North Carolina State University
Ehsan Sabri Islam Argonne National Laboratory
Robert Jackson Argonne National Laboratory
Robert Jacob Argonne National Laboratory
Chris Jacobsen Argonne National Laboratory/Northwestern
University
Dan Jacobson Oak Ridge National Laboratory
Anubhav Jain Lawrence Berkeley National Laboratory
Ralph James Savannah River National Laboratory
Kathy Jang University of California, Berkeley
Michael Jarrett Argonne National Laboratory
Cynthia Jenks Argonne National Laboratory
Elise Jennings Argonne National Laboratory
Vince Jesaitis Arm Inc
Shantenu Jha Brookhaven National Laboratory
Yi Jiang Argonne National Laboratory
Zhenhua Jiang University of Dayton Research Institute
Meng Jiang University of Notre Dame
Xiao-Yong Jin Argonne National Laboratory
Mingzhou Jin University of Tennessee
Marcin Joachimiak Lawrence Berkeley National Laboratory
Eugene John University of Texas at San Antonio
Fred Johnson Department of Energy, Retired
Travis Johnston Oak Ridge National Laboratory
Eric Jonas University of Chicago
Gregory Jones Oak Ridge National Laboratory
Katie Jones Oak Ridge National Laboratory
Scott Jones Oak Ridge National Laboratory
Terry Jones Oak Ridge National Laboratory
Doug Joseph Arm Inc
Renu Joseph Department of Energy
Earl Joseph Hyperion Research
Wayne Joubert Oak Ridge National Laboratory
Gary Jung Lawrence Berkeley National Laboratory
Andrew Kail Savannah River National Laboratory
Rajiv Kalia University of Southern California
Sergei Kalinin Oak Ridge National Laboratory
Mingon Kang University of Nevada, Las Vegas
Ramakrishnan Kannan Oak Ridge National Laboratory
Mahmut Karakaya University of Central Arkansas
Ulas Karaoz Lawrence Berkeley National Laboratory
Ian Karlin Lawrence Livermore National Laboratory
Alisha Kasam-Griffith Argonne National Laboratory
Karthik Kashinath Lawrence Berkeley National Laboratory
Aggelos Katsaggelos Northwestern University

Kimberly Kaufeld Los Alamos National Laboratory
Brian Kaul Oak Ridge National Laboratory
Aditya Kaushal Bank of Montreal
Henry Kautz National Science Foundation, CISE
Trevor Keenan Lawrence Berkeley National Laboratory
Ken Kemner Argonne National Laboratory
Kelly Kessler Bloomberg
Rajkumar Kettimuthu Argonne National Laboratory
Foaad Khosmood California Polytechnic State University
Kathy Kincade Lawrence Berkeley National Laboratory
Ryan King National Renewable Energy Laboratory
Jeffery Kinnison Argonne National Laboratory/University of
Notre Dame
Mariam Kiran Lawrence Berkeley National Laboratory
Uma Klaassen Oak Ridge National Laboratory
Hilda Klasky Oak Ridge National Laboratory
Scott Klasky Oak Ridge National Laboratory
Kerstin Kleese van Dam Brookhaven National Laboratory
Stanley Klein University of California, Berkeley
Tim Kneafsey Lawrence Berkeley National Laboratory
Christopher Knight Argonne National Laboratory
Katie Knight Oak Ridge National Laboratory
Ryan Knox Lawrence Berkeley National Laboratory
Tamara Kolda Sandia National Laboratories
Egemen Kolemen Princeton University
Kadidia Konate Lawrence Berkeley National Laboratory
Urs Koster Cerebras Systems
Rao Kotamarthi Argonne National Laboratory
Olivera Kotevska Oak Ridge National Laboratory
Doug Kothe Oak Ridge National Laboratory
John Koudelka Idaho National Laboratory
William Kramer University of Illinois/NCSA
James Kress Oak Ridge National Laboratory
Harinarayan Krishnan Lawrence Berkeley National Laboratory
Ralph Kube Princeton Plasma Physics Laboratory
Paul Kuberry Sandia National Laboratories
Michelle Kuchera Davidson College
Suhas Kumar Hewlett Packard Laboratory
Dinesh Kumar Lawrence Berkeley National Laboratory
Jitu Kumar Oak Ridge National Laboratory
Praveen Kumar University of Illinois
Vinod Kumar University of Texas at El Paso/Calysta Inc.-
Menlo Park
Ana Kupresanin Lawrence Livermore National Laboratory
Tom Kurfess Oak Ridge National Laboratory
Teja Kuruganti Oak Ridge National Laboratory

Joshua Ladau Lawrence Berkeley National Laboratory
Yue Shi Lai Lawrence Berkeley National Laboratory
M. Paul Laiu Oak Ridge National Laboratory
Matthew Lanctot Department of Energy, Office of Science
TJ Lane SLAC National Accelerator Laboratory
Michael Lang Los Alamos National Laboratory
James Laros Sandia National Laboratories
Jeffrey Larson Argonne National Laboratory
Randall Laviolette Department of Energy, Advanced Scientific
Computing Research
Earl Lawrence Los Alamos National Laboratory
Craig Lawrence University of Maryland
Nam Le Johns Hopkins University Applied Physics
Lab
Jacqueline Le Moigne NASA Earth Science Technology Office
Damien Lebrun-Grandie Oak Ridge National Laboratory
Timothy Ledlow Missile Defense Agency
Eungje Lee Argonne National Laboratory
Steven Lee Department of Energy, Advanced Scientific
Computing Research
Victor Lee Intel Corporation
Seyong Lee Oak Ridge National Laboratory
Ti Leggett Argonne National Laboratory
Remi Lehe Lawrence Berkeley National Laboratory
Margaret Lentz Department of Energy, Artificial Intelligence
Vitus Leung Sandia National Laboratories
Dawn Levy Oak Ridge National Laboratory
Katherine Lewis Lawrence Livermore National Laboratory
Katie Lewis Lawrence Livermore National Laboratory
Sven Leyffer Argonne National Laboratory
Meimei Li Argonne National Laboratory
Ying Li Argonne National Laboratory
Sherry Li Lawrence Berkeley National Laboratory
Ying Wai Li Los Alamos National Laboratory
Zhaojian Li Michigan State University
Ang Li Pacific Northwest National Laboratory
Dong Li University of California, Merced
Bo Li University of Illinois at Urbana-Champaign
Dmitry Liakh Oak Ridge National Laboratory
Dong Liang University of Maryland Center for
Environmental Science
Chen Liao Argonne National Laboratory
Sean Liddick Michigan State University, NSCL
Meifeng Lin Brookhaven National Laboratory
Yuewei Lin Brookhaven National Laboratory

Youzuo Lin Los Alamos National Laboratory
Eric Lin National Institute of Standards and
Technology
Guang Lin Purdue University
Zhihong Lin University of California, Irvine
Travis Linderman Innovation DuPage - NIU/COD
Robert Link Pacific Northwest National Laboratory
Yung Liu Argonne National Laboratory
Cong Liu Argonne National Laboratory
Zhengchun Liu Argonne National Laboratory
Miaoyuan Liu Fermi National Accelerator Laboratory
Yan Liu Oak Ridge National Laboratory
Frank Liu Oak Ridge National Laboratory/CSMD
Jing Liu Stanford University
Bill Livezey Microsoft Corporation
Li-Ta Lo Los Alamos National Laboratory
Jeremy Logan Oak Ridge National Laboratory
Wolfgang Losert University of Maryland, College Park
Pavel Lougovski Oak Ridge National Laboratory
Dan Lu Oak Ridge National Laboratory
Xiaobin Lu Oak Ridge National Laboratory
Zarija Lukic Lawrence Berkeley National Laboratory
Dalton Lunga Oak Ridge National Laboratory
Feng Luo Clemson University
Lixiang Luo IBM Research
Xuan Luo Lawrence Berkeley National Laboratory
Bethany Lusch Argonne National Laboratory
Piotr Luszczek University of Tennessee
Joseph Lykken Fermi National Accelerator Laboratory
Steven Lyness Cray, Inc.
Adam Lyon Fermi National Accelerator Laboratory
Charles Macal Argonne National Laboratory
Arthur Barney Maccabe Oak Ridge National Laboratory
Michael MacNeil Lawrence Berkeley National Laboratory
Siddharth Maddali Argonne National Laboratory
Ravi Madduri Argonne National Laboratory
Sandeep Madireddy Argonne National Laboratory
Ramana Madupu Department of Energy
Gina Magnotti Argonne National Laboratory
Ketan Maheshwari Oak Ridge National Laboratory
Michael Mahoney University of California, Berkeley
Michael Majurski National Institute of Standards and
Technology
Nicholas Malaya Advanced Micro Devices Company
Carlos Maltzahn University of California, Santa Cruz
Andrea Manning Argonne National Laboratory

Arun Kumar Mannodi Kanakkithodi Argonne National Laboratory
Jiafu Mao Oak Ridge National Laboratory
Don March Oak Ridge National Laboratory
Phil Markham Southern Company
David Martin Argonne National Laboratory
Victoria Martin Argonne National Laboratory
Daniel Martin Lawrence Berkeley National Laboratory
Mark Martin Oak Ridge National Laboratory
Carianne Martinez Sandia National Laboratories
Ghoncheh Mashayekhi University of Wisconsin, Milwaukee
Zachary Matheson Department of Energy, National Nuclear
Michael Matheson Oak Ridge National Laboratory
Romit Maulik Argonne National Laboratory
Yury Maximov Los Alamos National Laboratory
Ed May Argonne National Laboratory
Jessica Mazerik National Institutes of Health
Matt McConnell Dell EMC
Dana McCoskey Water Power Tech Office
Dylan McDowell Idaho National Laboratory
Cynthia McMurray Lawrence Berkeley National Laboratory
Hugh Medal University of Tennessee
Shafigh Mehraeen University of Illinois at Chicago
Apurva Mehta National Accelerator Laboratory\ SLAC
Kshitij Mehta Oak Ridge National Laboratory
Veronica Melesse Vergara Oak Ridge National Laboratory
Matt Menickelly Argonne National Laboratory
Bronson Messer Oak Ridge National Laboratory
Zein-Eddine Meziani Argonne National Laboratory
Georgios Michelogiannakis Lawrence Berkeley National Laboratory
Anitescu Mihai Argonne National Laboratory
Mark Miller Lawrence Livermore National Laboratory
David Miller National Energy Technology Laboratory
Richard Mills Argonne National Laboratory
Ryan Milner Argonne National Laboratory
Michael Minion Lawrence Berkeley National Laboratory
Sandeep Miryala Fermi National Accelerator Laboratory
Konstantin Mischaikow Rutgers University
Umakant Mishra Argonne National Laboratory
Utkarsh Mital Lawrence Berkeley National Laboratory
John Mitchell Argonne National Laboratory
Julie Mitchell Oak Ridge National Laboratory
John Mitchell Sandia National Laboratories
Susan Mniszewski Los Alamos National Laboratory
Daniel Moberg Argonne National Laboratory
Bashir Mohammed Lawrence Berkeley National Laboratory

Subhasish Mohanty Argonne National Laboratory
Linda Mohanty Dell EMC
William Monday Oak Ridge National Laboratory
Inder Monga Lawrence Berkeley National Laboratory
Laura Monroe Los Alamos National Laboratory
Luis Montero Argonne National Laboratory
Allison Montroy Department of Defense, Air Force Research
Laboratory
Elisabeth (Lissa) Moore Los Alamos National Laboratory
Juston Moore Los Alamos National Laboratory
Shirley Moore Oak Ridge National Laboratory
Mark Moraes D. E. Shaw Research
Kenneth Moreland Sandia National Laboratories
Hannah Morgan Argonne National Laboratory
Dmitriy Morozov Lawrence Berkeley National Laboratory
James Morris Ames Laboratory
Juliane Mueller Lawrence Berkeley National Laboratory
Terrell Mundhenk Lawrence Livermore National Laboratory
Todd Munson Argonne National Laboratory
Robert Murray Microsoft Corporation
Mustafa Mustafa Lawrence Berkeley National Laboratory
Srideep Musuvathy Sandia National Laboratories
Balu Nadiga Los Alamos National Laboratory
Ambarish Nag National Renewable Energy Laboratory
Habib Najm Sandia National Laboratories
Aiichiro Nakano University of Southern California
Hai Ah Nam Los Alamos National Laboratory
Brad Nance Oak Ridge National Laboratory
Youssef Nashed Argonne National Laboratory
Thomas Naughton Oak Ridge National Laboratory
Gary Navrotski Argonne National Laboratory
Thomas Ndousse-Fetter Department of Energy
Kyle Neal Sandia National Laboratories
Grey Nearing University of Alabama
Benjamin Nebgen Los Alamos National Laboratory
Tommy Nelson Oak Ridge National Laboratory
Denise Neudecker Los Alamos National Laboratory
Michelle Newcomer Lawrence Berkeley National Laboratory
Justin Newcomer Sandia National Laboratories
Harvey Newman California Institute of Technology
Ben Newton Sandia National Laboratories
Esmond Ng Lawrence Berkeley National Laboratory
Brenda Ng Lawrence Livermore National Laboratory
Marcus Nguyen Argonne National Laboratory/University of
Chicago
Jeffrey Nichols Oak Ridge National Laboratory

Bogdan Nicolae Argonne National Laboratory
Marcus Noack Lawrence Berkeley National Laboratory
Jorge Nocedal Northwestern University
Eva Nogales Lawrence Berkeley National Laboratory
Brian Nord Fermi National Accelerator Laboratory
Peter Nugent Lawrence Berkeley National Laboratory
Hoot O’Connor My Math Cloud
Patrick O’Leary Kitware, Inc.
Daniel O’Malley Los Alamos National Laboratory
Aleksandr Obabko Argonne National Laboratory
Ceferino Obcemea National Cancer Institute
Ron Oldfield Sandia National Laboratories
Lenny Oliker Lawrence Berkeley National Laboratory
Kunle Olukotun SambaNova Systems
Olufemi Omitaomu Oak Ridge National Laboratory
Raymond Osborn Argonne National Laboratory
Jim Ostrowski University of Tennessee
Sarah Owens Argonne National Laboratory
John Owens University of California, Davis
Opeoluwa Owoyele Argonne National Laboratory
Diane Oyen Los Alamos National Laboratory
Ozgur Ozmen Oak Ridge National Laboratory
Aaron Packman Northwestern University/Argonne National
Laboratory
David Page Oak Ridge National Laboratory
Pinaki Pal Argonne National Laboratory
Dhabaleswar K (DK) Panda The Ohio State University
Achalesh Kumar Pandey General Electric Research
Tara Pandya Oak Ridge National Laboratory
Theo Papamarkou Oak Ridge National Laboratory
Michael E. Papka Argonne National Laboratory
Vincent Paquit Oak Ridge National Laboratory
Gilchan Park Brookhaven National Laboratory
Ji Hwan Park Brookhaven National Laboratory
Yoonho Park IBM Research
Eun Jung Park Los Alamos National Laboratory
Byung Hoon Park Oak Ridge National Laboratory
Lynne Parker Office of Science and Technology Policy
Valerio Pascucci University of Utah
Gilberto Pastorello Lawrence Berkeley National Laboratory
Deep Patel Oak Ridge National Laboratory
Abani Patra Tufts University
Christina Patricola Lawrence Berkeley National Laboratory
Robert Patton Oak Ridge National Laboratory
Robert Pavel Los Alamos National Laboratory
Chuck Pavloski Pennsylvania State University

Roger Pawlowski Sandia National Laboratories
Kevin Pedro Fermi National Accelerator Laboratory
Sean Peisert Lawrence Berkeley National Laboratory
Amra Peles Pacific Northwest National Laboratory
Swann Perarnau Argonne National Laboratory
Talita Perciano Lawrence Berkeley National Laboratory
Gabriel Perdue Fermi National Accelerator Laboratory
Mauro Perego Sandia National Laboratories
Kalyan Perumalla Oak Ridge National Laboratory
Nick Peters Oak Ridge National Laboratory
Norm Peterson Argonne National Laboratory
Matt Peterson Sandia National Laboratories
Armenak Petrosyan Oak Ridge National Laboratory
Charudatta Phatak Argonne National laboratory
Bobby Philip Los Alamos National Laboratory
Caleb Phillips National Renewable Energy Laboratory
Cynthia Phillips Sandia National Laboratories
Mary Ann Piette Lawrence Berkeley National Laboratory
Ali Pinar Sandia National Laboratories
Robinson Pino Department of Energy
Steve Plimpton Sandia National Laboratories
Matthew Plumlee Northwestern University
Norbert Podhorszki Oak Ridge National Laboratory
Raphael Pooser Oak Ridge National Laboratory
Alex Pothen Purdue University
Thomas Potok Oak Ridge National Laboratory
Carol Pott Lawrence Berkeley National Laboratory
Line Pouchard Brookhaven National Laboratory
Sarah Powers Oak Ridge National Laboratory
Prabhat Lawrence Berkeley National Laboratory
Thomas Proffen Oak Ridge National Laboratory
Andrey Prokopenko Oak Ridge National Laboratory
James Proudfoot Argonne National Laboratory
Fernanda Psihas Fermi National Accelerator Laboratory/The
University of Texas at Austin
Dave Pugmire Oak Ridge National Laboratory
Laura Pullum Oak Ridge National Laboratory
Ji Qiang Lawrence Berkeley National Laboratory
Hong Qin University of Tennessee at Chattanooga
Judy Qiu Indiana University
Alejandro Queiruga Lawrence Berkeley National Laboratory
John Quigley Dell EMC
Jofrey Quintanar Argonne National Laboratory
Mihaela Quirk Department of Energy, National Nuclear
Brian Quiter Lawrence Berkeley National Laboratory

Sudarsan Rachuri Department of Energy
Maryam Rahnemoonfar University of Maryland, Baltimore County
Gulshan Rai Department of Energy, Office of Nuclear
Physics
Pankaj Rajak Argonne National Laboratory
Siva Rajamanickam Sandia National Laboratories
Hridesh Rajan Ames Laboratory/Iowa State University
Vinay Ramakrishnaiah Los Alamos National Laboratory
Lavanya Ramakrishnan Lawrence Berkeley National Laboratory
Arvind Ramanathan Argonne National Laboratory
Jini Ramprakash Argonne National Laboratory
Pradeep Ramuhalli Oak Ridge National Laboratory
Huzefa Rangwala George Mason University
Vishwas Rao Argonne National Laboratory
Nageswara Rao Oak Ridge National Laboratory
William Ratcliff National Institute of Standards and
Technology
Daniel Ratner SLAC National Accelerator Laboratory
Jaideep Ray Sandia National Laboratories
Justin Reese Lawrence Berkeley National Laboratory
Ernst Rehm Argonne National Laboratory
Yihui Ren Brookhaven National Laboratory
Viktor Reshniak Oak Ridge National Laboratory
Randal Rheinheimer Los Alamos National Laboratory
James Ricci Department of Energy, Advanced Scientific
Computing Research
Daniel Ricciuto Oak Ridge National Laboratory
Jasmin Richard University of Rochester
Elias Rigas CCDC Army Research Laboratory
Hugo Riggs Florida International University
Todd Ringler Rep. Ben Ray Luján
Benjamin Robbins Cray, Inc.
Mike Robinson Department of Energy, Wind Energy
Technology Office
Verónica Rodríguez Tribaldos Lawrence Berkeley National Laboratory
Dmitry Romanov Jefferson Laboratory
Elisa Romero Romero University of Tennessee
Mohammad Roni Idaho National Laboratory
Kelly Rose National Energy Technology Laboratory
Derek Rose Oak Ridge National Laboratory
Michael Rosenfield IBM Research
Elizabeth Rosenthal Oak Ridge National Laboratory
Robert Ross Argonne National Laboratory
Fred Rothganger Sandia National Laboratories
Lindsay Roy Savannah River National Laboratory
Ahmad Rushdi Sandia National Laboratories

Thomas Russell Department of Energy, Basic Energy
Sciences
Florin Rusu Lawrence Berkeley National Laboratory
Gary Saavedra Sandia National Laboratories
Ella Saccon National Cancer Institute
Sonia Sachs Department of Energy, Office of Science
Cosmin Safta Sandia National Laboratories
Alec Sandy Argonne National Laboratory
Ramanan Sankaran Oak Ridge National Laboratory
Daniel Santiago Argonne National Laboratory
Fadil Santosa University of Minnesota
Jibo Sanyal Oak Ridge National Laboratory
Vivek Sarkar Georgia Institute of Technology
Mina Sartipi University of Tennessee at Chattanooga
Arif Sarwat Florida International University
Bhima Sastri Office of Fossil Energy
Paul Saxe Virginia Tech, MolSSI
Michael Schatz Georgia Institute of Technology
Ben Schiltz Argonne National Laboratory
John Schlueter National Science Foundation
Martin Schoenball Lawrence Berkeley National Laboratory
Malachi Schram Pacific Northwest National Laboratory
Robert Schreiber Cerebras Systems
Katie Schuman Oak Ridge National Laboratory
Michelle Schwalbe National Academies of Sciences,
Engineering, and Medicine
Ann Schwartz Drobnis Computing Community Consortium
Ariel Schwartzman SLAC National Accelerator Laboratory
Nicholas Schwarz Argonne National Laboratory
Mary Scott Lawrence Berkeley National Laboratory
Sudip Seal Oak Ridge National Laboratory
Pablo Seleson Oak Ridge National Laboratory
Uros Seljak Lawrence Berkeley National Laboratory
Daisy Flora Selvaraj University of North Dakota
Satyabrata Sen Oak Ridge National Laboratory
Koushik Sen University of California, Berkeley
Sergio Servantez Argonne National Laboratory/Northwestern
University
Robert Service Science Magazine
Jamie Sethian Lawrence Berkeley National Laboratory
Bradley Settlemyer Los Alamos National Laboratory
Gökhan Sever Argonne National Laboratory
William Severa Sandia National Laboratories
Volkan Sevim Lawrence Berkeley National Laboratory
James Sexton IBM Research
Elizabeth Sexton-Kennedy Fermi National Accelerator Laboratory

John Shalf Lawrence Berkeley National Laboratory
Hairong Shang Argonne National Laboratory
Arjun Shankar Oak Ridge National Laboratory
susmit shannigrahi Tennessee Technological University
Himanshu Sharma Argonne National Laboratory
Akshay Sharma Lawrence Berkeley National Laboratory
Karlie Sharma National Institutes of Health, NCATS
Emily Shemon Argonne National Laboratory
Chaopeng Shen Pennsylvania State University
Huanjie Sheng University of California, Berkeley
Wei Shi National Energy Technology Laboratory/
LRST/Battelle
Xiaoying Shi Oak Ridge National Laboratory
Xinghua Shi Temple University
Galen Shipman Los Alamos National Laboratory
Cyna Shirazinejad University of California, Berkeley
Shalki Shrivastava University of North Carolina at Chapel Hill,
RENCI
Forrest Shriver Oak Ridge National Laboratory
Maulik Shukla Argonne National Laboratory
Christopher Siefert Sandia National Laboratories
Andrew Siegel Argonne National Laboratory
Horst Simon Lawrence Berkeley National Laboratory
Sean Simoneau Oak Ridge National Laboratory
Rajneesh Singh US Army Research Lab
Ganesh Sivaraman Argonne National Laboratory
Adam Slagell Lawrence Berkeley National Laboratory
Stuart Slattery Oak Ridge National Laboratory
Tess Smidt Lawrence Berkeley National Laboratory
Barry Smith Argonne National Laboratory
Jeff Smith Oak Ridge National Laboratory
Michael Smith Oak Ridge National Laboratory
David Smith University of Wisconsin, Madison
Oleg Sobolev Lawrence Berkeley National Laboratory
Lynda Soderholm Argonne National Laboratory
Sibendu Som Argonne National Laboratory
Suhas Somnath Oak Ridge National Laboratory
Siamak Sorooshyari University of California, Berkeley
Salvador Sosa Guitron University of New Mexico
Carlos Soto Brookhaven National Laboratory
Dale Southard Groq Inc.
Brian Spears Lawrence Livermore National Laboratory
Maria Spiropulu California Institute of Technology
William Spotz Department of Energy
Michael Sprague National Renewable Energy Laboratory
Sarat Sreepathi Oak Ridge National Laboratory

Niranjan Sridhar Verily Life Sciences
Srilok Srinivasan Argonne National Laboratory
Jay Srinivasan Lawrence Berkeley National Laboratory
Gowri Srinivasan Los Alamos National Laboratory
Peter St. John National Renewable Energy Laboratory
Andrew Stack Oak Ridge National Laboratory
Marius Stan Argonne National Laboratory
Vitalii Starchenko Oak Ridge National Laboratory
Janice Steckel National Energy Technology Laboratory
Chad Steed Oak Ridge National Laboratory
Carl Steefel Lawrence Berkeley National Laboratory
Carolyn Steele Argonne National Laboratory
Rick Stevens Argonne National Laboratory
Jim Stewart Sandia National Laboratories
Panos Stinis Pacific Northwest National Laboratory
Miroslav Stoyanov Oak Ridge National Laboratory
Tjerk Straatsma Oak Ridge National Laboratory
David Stracuzzi Sandia National Laboratories
Stephen Streiffer Argonne National Laboratory
Frederick Streitz Department of Energy, HQ
Forrest Striver Oak Ridge National Laboratory
Erich Strohmaier Lawrence Berkeley National Laboratory
Jan Strube Pacific Northwest National Laboratory
Abby Stylianou Saint Louis University
Eric Suchyta Oak Ridge National Laboratory
Sreenivas Sukumar Cray, Inc.
Bobby G. Sumpter Oak Ridge National Laboratory
Yipeng Sun Argonne National Laboratory
Chengjun Sun Argonne National Laboratory
Zhao Sun Hampton University
Yu Sun Stony Brook University
Shivshankar Sundaram Lawrence Livermore National Laboratory
Ceren Susut Department of Energy, Office of Science
Kamlesh Suthar Argonne National Laboratory
Carolin Sutter-Fella Lawrence Berkeley National Laboratory
Amy Swain Department of Energy
Pieter Swart Los Alamos National Laboratory
Christine Sweeney Los Alamos National Laboratory
Laura Swiler Sandia National Laboratories
Madhava Syamlal Department of Energy
Adam Szymanski Argonne National Laboratory
Michael Tamillow NICO
Jifu Tan Northern Illinois University
Yu-Hang Tang Lawrence Berkeley National Laboratory
Deepti Tanjore Lawrence Berkeley National Laboratory
Alexandre Tartakovsky Pacific Northwest National Laboratory

Marc Taubenblatt IBM Research
Michela Taufer University of Tennessee
Valerie Taylor Argonne National Laboratory
Aniket Tekawade Argonne National Laboratory
Jeremy Templeton Sandia National Laboratories
Chris Tennant Jefferson Laboratory
Alan Tennant Oak Ridge National Laboratory
Kazuhiro Terao SLAC National Accelerator Laboratory
Guilhem Tesseyre Google Research
Rajeev Thakur Argonne National Laboratory
Jayakar Thangaraj Fermi National Accelerator Laboratory
Nicholas Thompson Oak Ridge National Laboratory
Aidan Thompson Sandia National Laboratories
Suzy Tichenor Oak Ridge National Laboratory
Ken Tobin Oak Ridge National Laboratory
Peter Tonner National Institute of Standards and
Technology
Roberto Torelli Argonne National Laboratory
Georgia Tourassi Oak Ridge National Laboratory
Nhan Tran Fermi National Accelerator Laboratory
Hoang Tran Oak Ridge National Laboratory
Nathaniel Trask Sandia National Laboratories
Charles Tripp National Renewable Energy Laboratory
Andrew Tritt Lawrence Berkeley National Laboratory
Aristeidis Tsaris Oak Ridge National Laboratory
Bill Turenne Department of Energy, Artificial Intelligence
John Turner Oak Ridge National Laboratory
Sean Turner Pacific Northwest National Laboratory
Victor Udeowa General Services Administration
Thomas Uram Argonne National Laboratory
Nathan Urban Los Alamos National Laboratory
Meltem Urgun-Demirtas Argonne National Laboratory
Ahmet Uysal Argonne National Laboratory
Brian Van Essen Lawrence Livermore National Laboratory
Peter van Gemmeren Argonne National Laboratory
William Vanderlinde Department of Energy, Advanced Scientific
Computing Research
Dirk VanEssendelft National Energy Technology Laboratory
Charuleka Varadharajan Lawrence Berkeley National Laboratory
Laurie Varma Oak Ridge National Laboratory
Robert Varner Oak Ridge National Laboratory
Natalia Vasileva Cerebras Systems
Dilip Vasudevan Lawrence Berkeley National Laboratory
Ashish Vaswani Google Research
Sudharshan Vazhkudai Oak Ridge National Laboratory

Carolyn Vea Lauzon Department of Energy, HQ
Singanallur Venkatakrishnan Oak Ridge National Laboratory
Becky Verastegui Oak Ridge National Laboratory
Matthew Verber University of North Carolina at Chapel Hill
Rafael Vescovi Argonne National Laboratory
Velimir Vesselinov Los Alamos National Laboratory
Jeffrey Vetter Oak Ridge National Laboratory
Michael Vildibill Hewlett Packard Enterprise
Venkatram Vishwanath Argonne National Laboratory
Lukas Vlcek University of Tennessee
Charlie Vollmer Sandia National Laboratories
James von Oehsen Rutgers University
Dave Vorhaus Schmidt Futures
Greg Wagner Northwestern University
Robert Wagner Oak Ridge National Laboratory
Haruko Wainwright Lawrence Berkeley National Laboratory
Jay Walsh Northwestern University
Matthew Walter Toyota Technological Institute at Chicago
Cheng Wang Argonne National Laboratory
Haoyu Wang Argonne National Laboratory
Jiali Wang Argonne National Laboratory
Jin Wang Argonne National Laboratory
Bin Wang Lawrence Berkeley National Laboratory
Zhe Wang Lawrence Berkeley National Laboratory
Dali Wang Oak Ridge National Laboratory
Lipeng Wang Oak Ridge National Laboratory
Felix Wang Sandia National Laboratories
Zhang Wanni Lawrence Berkeley National Laboratory
Karl Warburton Iowa State University
Logan Ward Argonne National Laboratory
Sharlene Weatherwax Department of Energy, Biological and
Environmental Research
Rosina Weber Drexel University
Gunther Weber Lawrence Berkeley National Laboratory
Clayton Webster Oak Ridge National Laboratory
Michael Wehner Lawrence Berkeley National Laboratory
Xishuo Wei University of California, Irvine
Patricia Weikersheimer Argonne National Laboratory
Jack Wells Oak Ridge National Laboratory
Haiden Wen Argonne National Laboratory
Torre Wenaus Brookhaven National Laboratory
Gerry White Federal Emergency Management Agency
Julia White Oak Ridge National Laboratory
Stephen Whitelam Lawrence Berkeley National Laboratory
Eric Whiting Idaho National Laboratory
Justin Whitt Oak Ridge National Laboratory

Patrick Widener Sandia National Laboratories
Stefan Wild Argonne National Laboratory
George Wilkie Princeton Plasma Physics Laboratory
Sean Wilkinson Oak Ridge National Laboratory
Timothy Williams Argonne National Laboratory
Samuel Williams Lawrence Berkeley National Laboratory
Dan Wilmot Department of Energy, Artificial Intelligence
Peter Winter Argonne National Laboratory
Robert Wisniewski Intel Corporation
Laura Wolf Argonne National Laboratory
Matthew Wolf Oak Ridge National Laboratory
Michael Wolf Sandia National Laboratories
Phillip Wolfram Los Alamos National Laboratory
Gayle Woloschak Northwestern University
David Womble Oak Ridge National Laboratory
Geoff Womeldorff Los Alamos National Laboratory
Justin Worrilow Microsoft Corporation
Justin Wozniak Argonne National Laboratory
Nicholas Wright Lawrence Berkeley National Laboratory
Xuli Wu Argonne National Laboratory
Xingfu Wu Argonne National Laboratory/University of
Chicago
Kesheng (John) Wu Lawrence Berkeley National Laboratory
Wei Wu Los Alamos National Laboratory
Sau Lan Wu University of Wisconsin, Madison
Margie Wylie Lawrence Berkeley National Laboratory
Max Wyman Argonne National Laboratory
Hai Xiao Clemson University
Lianghua Xiong Argonne National Laboratory
Yilun Xu Lawrence Berkeley National Laboratory
Zexuan Xu Lawrence Berkeley National Laboratory
Xueqiao Xu Lawrence Livermore National Laboratory
Min Xu Oak Ridge National Laboratory
Wenwei Xu Pacific Northwest National Laboratory
Yexiang Xue Purdue University
Chunhua Yan National Cancer Institute
Da Yan University of Alabama at Birmingham
Chao Yang Lawrence Berkeley National Laboratory
Da Yang Lawrence Berkeley National Laboratory
Zechun Yang Missile Defense Agency
Lexie Yang Oak Ridge National Laboratory
Qian Yang University of Connecticut
Ke-Thia Yao University of Southern California
Katherine Yelick Lawrence Berkeley National Laboratory
Orcun Yildiz Argonne National Laboratory

Junqi Yin Oak Ridge National Laboratory
Shinjae Yoo Brookhaven National Laboratory
Kazutomo Yoshii Argonne National Laboratory
Linda Young Argonne National Laboratory/University of
Chicago
Stanley Young National Renewable Energy Laboratory
Steven Young Oak Ridge National Laboratory
Andrew Younge Sandia National Laboratories
Shiqi Yu Argonne National Laboratory
Dantong Yu New Jersey Institute of Technology
Rose Yu Northeastern University
Thomas Zacharia Oak Ridge National Laboratory
Federico Zahariev Ames Laboratory
Nestor Zaluzec Argonne National Laboratory
Michael Zarnstorff Princeton Plasma Physics Laboratory
Piotr Zarzycki Lawrence Berkeley National Laboratory
Liat Zavodivker Lawrence Berkeley National Laboratory
Zuotao Zeng Argonne National Laboratory
Ruijie Zeng Utah State University
Hong Zhang Argonne National Laboratory
Xiaoyi Zhang Argonne National Laboratory
Yuepeng Zhang Argonne National Laboratory
Xiangyu Zhang National Renewable Energy Laboratory
Guannan Zhang Oak Ridge National Laboratory
Jiaxin Zhang Oak Ridge National Laboratory
Ying Zhang University of Rhode Island
Zhao Zhang University of Texas, TACC
Emma Zhao Argonne National Laboratory
Liang Zhao George Mason University
Huihuo Zheng Argonne National Laboratory
Zhi Zheng University of Wisconsin, Milwaukee
Mingxia Zhou Argonne National Laboratory
Maxim Ziatdinov Oak Ridge National Laboratory
Sue Zillman Argonne National Laboratory
Tarek Zohdi Lawrence Berkeley National Laboratory
Xiaobing Zuo Argonne National Laboratory
Petrus Zwart Lawrence Berkeley National Laboratory
Matthias Zwicker University of Maryland, College Park

AD. Abbreviations and Terminology
Abbreviations Terminology
3D three-dimensional
AGN active galactic nucleus
AI artificial intelligence
ALCF Argonne Leadership Computing Facility
ALS Advanced Light Source
AMIGA All Modular Industry Growth Assessment
AMR adaptive mesh refinement
ANNs artificial neural networks
AOGCM Atmosphere-ocean general circulation model
API application programming interface
APS appearance potential spectroscopy, Advanced Photon Source
Argonne Argonne National Laboratory
ARM atmospheric radiation monitoring
ARM Atmospheric Radiation Measurement Climate Research Facility
ASCR Advanced Scientific Computing Research
ASDEX-UG Axially Symmetric Diverter Experiment Upgrade
BBH binary black hole
Berkeley Lab Lawrence Berkeley National Laboratory
BES Basic Energy Sciences
BESAC Basic Energy Sciences Advisory Committee
BG Blue Gene
BHNS black hole and neutron star
BNS binary neutron star
CAF Co-Array Fortran
CF climate and forest
CGE computable general equilibrium
CGRO Compton Gamma-Ray Observatory
CMOS complementary metal-oxide-semiconductor
CMS Compact Muon Solenoid
CNNs convolutional neural networks
CPU central processing unit
CRISPR clustered regularly interspaced short palindromic repeats
DAE differential algebraic equation
DARPA Defense Advanced Research Projects Agency
DAS distributed acoustic sensing
DBA design basis accident
DETF Dark Energy Task Force
DFT density functional theory
DL deep learning
DLA deep learning accelerator
DNN deep neural network
DOE United States Department of Energy
DVM dynamic vegetation model
AD. ABBREVIATIONS AND TERMINOLOGY 197

E3 Simulation and Modeling at the Exascale for Energy and the Environment
EAST Experimental Advanced Superconducting Tokamak
ECoG electrocorticography
EIC Electron-Ion Collider
ELM edge-localized mode
EMF Energy Modeling Forum
EMSL Environmental Molecular Sciences Laboratory
EOS equation of state
ESGF Earth System Grid Federation
ESM Earth System Model
ESnet Energy Sciences Network
ESS-DIVE Environmental System Science Data Infrastructure for a Virtual Ecosystem
EVLA Enhanced Very Large Array
EXIST Energetic X-ray Imaging Survey Telescope
FAIR findable, accessible, interoperable, reusable
FES Fusion Energy Sciences
FFT fast Fourier transform
flops floating point operations per second
fMRI functional magnetic resonance imaging
FPGA field programmable gate array
FRIB Facility for Rare Isotope Beams
FUSE Far Ultraviolet Spectroscopic Explorer
GAN generative adversarial network
Gbps gigabits per second
GIS geographic information system
GK gyrokinetic
GRETA Gamma-Ray Energy Tracking Array
GLAST Gamma-ray Large Area Space Telescope
GMT Giant Magellan Telescope
GNEP Global Nuclear Energy Partnership
GPU graphics processing unit
GRB gamma-ray burst
GTC Gyrokinetic Toroidal Code
HCCI homogeneous charge compression ignition
HEP high energy physics
HPC high-performance computing
HPN high-performance network
IEEE Institute of Electrical and Electronics Engineers
I/O Input/output
IOP input/output processor
IoT Internet of Things
Jefferson Lab Thomas Jefferson National Accelerator Facility
JET Joint European Torus
JUMP Joint University Microelectronics Program
KBase Systems Biology Knowledge Base
LAN local area network

LBNL Lawrence Berkeley National Laboratory
LCF Leadership Computing Facility
LCLS-II second-generation Linac Coherent Light Source
LSTM long short-term memory
MD molecular dynamic (simulations)
ML machine learning
MPI message passing interface
NERSC National Energy Research Scientific Computing Center
NGEEs Next-Generation Ecosystem Experiments
NIPS Conference on Neural Information Processing Systems
NMDC National Microbiome Data Collaborative
OLCF Oak Ridge Leadership Computing Facility
ORNL Oak Ridge National Laboratory
PLD pulse laser deposition
QCD quantum chromodynamics
QIS quantum information sciences
RF radio frequency
RHIC Relativistic Heavy Ion Collider
RL reinforcement learning
ROSM reduced order surrogate model
SNS Spallation Neutron Source
SoC system-on-chip
SRF superconducting radiofrequency
TB terabyte
TPU tensor processing unit
UHPC ultra-high performance concrete
UQ uncertainty quantification
WAN wide area network

200
AE. References
01. Chemistry, Materials, and Nanoscience 3. Bergen, K. J., Johnson, P. A., Maarten, V.,
& Beroza, G. C. (2019). Machine learning
1. Riordan, M. & Hoddeson, L., Crystal Fire:
for data-driven discovery in solid Earth
The Invention of the Transistor and the
geoscience. Science, 363(6433),
Birth of the Information Age, W. W. Norton
eaau0323.
& Company, 1998.
4. Bolton, Thomas, and Laure Zanna.
2. Sze, S. M., Physics of Semiconductor
“Applications of deep learning to ocean
Devices, 2nd Edition, John Wiley and Sons,
data inference and subgrid
New York, 1981.
parameterization.” Journal of Advances in
3. Shockley, W., Electrons and Holes in Modeling Earth Systems 11, no. 1 (2019):
Semiconductors: With Applications to 376-399.
Transistor Electronics, D. Van Nostrand
5. Brantley, S. L. (2018) Shale Network
Company, Inc., 1950.
Database, Consortium for Universities for
4. Fuechsle, M. et al., A single-atom the Advancement of Hydrologic Sciences,
transistor. Nat. Nanotechnol. 7, 242–246 Inc. (CUAHSI). DOI: 10.4211/his-data-
(2012). shalenetwork
5. Sumpter, B. G., Vasudevan, R. K., Potok, 6. Brenowitz, N. D., & Bretherton, C.
T., Kalinin, S. V., A bridge for accelerating S. ( 2018). Prognostic validation of a neural
materials design. npj Comp. Mat. 1: 15008 network unified physics parameterization.
(2015). DOI: 10.1038/npjcompumats. Geophysical Research Letters, 45, 6289-
2015.8 6298. https://doi.org/10.1029/2018GL07851
6. Kalinin, S. V., Sumpter, B. G., & Archibald, 7. Cherukara, M. J., Nashed, Y. S. G., &
R. K., Big-deep-smart data in imaging for Harder, R. J. Real-time coherent diffraction
guiding materials design. Nat. Mater. 14, inversion using deep generative networks
973–980 (2015). (2018). Scientific reports 8(1), 165230.
7. M. Ziatdinov, et al., “Building and exploring 8. Collins, W. & P. Tissot. An artificial neural
libraries of atomic defects in graphene: network model to predict thunderstorms
Scanning transmission electron and within 400 km2 South Texas domain,
scanning tunneling microscopy study,” Sci. Meteorological Applications 22, no. 3
Adv. 5:eaaw8989 (2019). DOI: (2015): 650-665.
10.1126/sciadv.aaw8989.
9. Deng, J. et al.. Correlative 3D x-ray
02. Earth and Environmental Sciences fluorescence and ptychographic
tomography of frozen-hydrated green algae
1. Basu, S., Kumbier, K., Brown, J. B., & Yu, (2018), Sci. Adv.4(11) eaau4548(1-10).
B. (2018). Iterative random forests to
discover predictive and stable high-order 10. Flinchum, B. A., et al. Critical Zone
interactions. Proceedings of the National Structure Under a Granite Ridge Inferred
Academy of Sciences, 115(8), 1943-1948. From Drilling and Three-Dimensional
Seismic Refraction Data. (2018) J.
2. Baydin, A. G., Shao, L., Bhimji, W., Geophys. Res.: Earth Surf. 123 (6), 1317-
Heinrich, L., Meadows, L., Liu, J., & Ma, M. 1343.
(2019). Etalumis: Bringing Probabilistic
Programming to Scientific Simulators at
Scale. arXiv preprint arXiv:1907.03382.
AE. REFERENCES 201

11. Godinho, J. R. A., Gehrke, K. M., Stack, A. 20. Ling, F. T., et al. (2018) Nanospectroscopy
G., & Lee, P. D. (2016) The dynamic nature Captures Nanoscale Compositional
of crystal growth in pores. Sci. Rep., Zonation in Barite Solid Solutions. Sci.
6:33086. DOI: 10.1038/srep33086 Reports,8:13041. DOI:10.1038/s41598-
018-31335-3
12. Hengl, T., et al. (2017). SoilGrids250m:
Global gridded soil information based on 21. Nogueira, K., Penatti, O. A., & dos Santos,
machine learning. PLoS one, 12(2), J. A. (2017). Towards better exploiting
e0169748. convolutional neural networks for remote
sensing scene classification. Pattern
13. Krasnopolsky, V., Nadiga, S., Mehra, A.,
Recognition, 61, 539-556.
Bayler, E., & Behringer, D. (2016). Neural
networks technique for filling gaps in 22. O’Gorman, P. A., & Dwyer, J. G. ( 2018).
satellite measurements: Application to Using machine learning to parameterize
ocean color observations. Computational moist convection: Potential for modeling of
Intelligence and Neuroscience (2016): 29. climate, climate change, and extreme
events. Journal of Advances in Modeling
14. Kumar, J., Mills, R. T., Hoffman, F. M., &
Earth Systems, 10, 2548–2563.
Hargrove, W. W. (2011). Parallel k-means
https://doi.org/10.1029/2018MS001351
clustering for quantitative ecoregion
delineation using large data sets. Procedia 23. Rasp, Stephan, Michael S. Pritchard, and
Computer Science, 4, 1602-1611. Pierre Gentine. “Deep learning to represent
subgrid processes in climate models.”
15. Kurth, T. et al. (2018) Exascale deep
Proceedings of the National Academy of
learning for climate analytics. In
Sciences 115, no. 39 (2018): 9684-9689.
Proceedings of the International
Conference for High Perfor- mance 24. Reichstein, M., et al. (2019). Deep learning
Computing, Networking, Storage, and and process understanding for data-driven
Analysis, pp. 51. IEEE Press,. Earth system science. Nature, 566(7743),
195.
16. Laanait, N., He, Q., Borisevich, & A. Y.
Reconstruction of 3-D Atomic Distortions 25. Scher, S. (2018). Toward Data-Driven
from Electron Microscopy with Deep Weather and Climate Forecasting:
Learning. arXiv, [cond-mat.mtrl- Approximating a Simple General Circulation
sci]arXiv:1902.06876v1 19 Feb 2019 Model With Deep Learning. Geophysical
17. Liu, Y., Sun, W., & Durlofsky, L. J. (2019). A Research Letters, 45(22), 12-616.
deep-learning-based geological parameter- 26. Schneider, T., Lan, S., Stuart, A., &
ization for history matching complex Teixeira, J. (2017). Earth system modeling
models. Mathematical Geosciences, 51(6), 2.0: A blueprint for models that learn from
725-766. observations and targeted high-resolution
18. Li, Z., et al. (2016) Searching for simulations. Geophysical Research Letters,
anomalous methane in shallow 44, 12,396-12,417. https://doi.org/
groundwater near shale gas wells. J. 10.1002/2017GL076101
Contam. Hydrol. 195, 23-30. DOI: 27. Tartakovsky, M., C. Ortiz Marrero, P.
10.1016/j.jconhyd.2016.10.005 Perdikaris, G. D. Tartakovsky, and D.
19. Lin, H.W., Tegmark, M., & Rolnick, D. Why Barajas-Solano, “Learning Parameters and
Does Deep and Cheap Learning Work So Constitutive Relation- ships with Physics
Well? J Stat Phys (2017) 168: 1223. Informed Deep Neural Networks,” arXiv e-
prints, p. arXiv:1808.03398, Aug 2018.
AD. REFERENCES 202

28. Varadharajan et al., “Launching an 3. Lawson C. E., et al. Common principles and
Accessible Archive of Environmental Data,” best practices for engineering microbiomes.
Eos, vol. 100. 2019. Nat Rev Microbiol. 2019. Epub 2019/09/25.
doi: 10.1038/s41579-019-0255-9. PubMed
29. Wang, J., Balaprakash, P., and Kotamarthi,
PMID: 31548653.
R. (2019). Fast domain-aware neural
network emulation of a planetary boundary 4. Chmiela, S., et al. Towards exact molecular
layer parameterization in a numerical dynamics simulations with machine-learned
weather forecast model, Geosci. Model force fields. Nat. Commun. 9. 3887 (2018).
Dev., 12, 4261–4274, https://doi.org/ 5. Murdoch, W. J. et al., Interpretable machine
10.5194/gmd-12-4261-2019. learning: definitions, methods, and
30. Zachara, J., et al. (2016) Internal Domains applications. arXiv preprint. 2019.
of Natural Porous Media Revealed: Critical 6. Harnessing the Power of Data in Health.
Locations for Transport, Storage, and Stanford Medicine Health Trends Report.
Chemical Reaction. Environ. Sci. Technol. 2017.
50, 2811-2829 DOI: 10.1021/acs.
est.5b05015 7. Paddon, C. J., et al., High-level semi-
synthetic production of the potent
31. Hoffman, F. M., et al. (2017. International antimalarial artemisinin. Nature 496 528-
Land Model Benchmarking (ILAMB) 2016 532 (25 April 2013).
Workshop Report, Technical Report
DOE/SC-0186, U.S. Department of Energy, 8. Anderson, J. C., et al. Environmentally
Office of Science, Germantown, Maryland, controlled invasion of cancer cells by
USA, doi:10.2172/1330803. engineered bacteria. J. Mol. Biol.
355(4):619-27 (27 Jan 2006).
32. https://www.ncdc.noaa.gov/billions/
9. Gardner, T. S. Synthetic biology: from hype
33. http://www.energy.gov/downloads/usenergy to impact. Trends Biotechnol. 31(3):123-5
-sector-vulnerabilities-climate-change- (2013 Mar).
andextreme-weather
10. Gambhir, S. S., et al. Toward achieving
34. Zarzycki, P. Towards understanding of precision health. Sci. Transl. Med. 10(430)
Reactive Interfaces in Geological CO2 (28 Feb 2018).
Sequestration, RIGECO, ERC-2015-CoG
Proposal 682274, September 2015. 11. Blaser, M. J., et al. Toward a predictive
understanding of Earth’s microbiomes to
03. Biology and Life Sciences address 21st century challenges. Am. Soc.
Microbiol. (2016) doi: 10.1128/mBio.00714-
1. Garcia, B. J. et al. Phytobiome and
16.
Transcriptional Adaptation of Populus
deltoides to Acute Progressive Drought and 12. Allegretti, M., et al. Horizontal membrane-
Cyclic Drought. Phytobiomes Journal. intrinsic α-helices in the stator a-subunit of
(2018) 2(4), 249-60. an F-type ATP synthase. Nature 521, 237-
240 (14 May 2015).
2. Bouchard K. E., et al. Union of Intersections
(UoI) for Interpretable Data Driven 13. Hermes, M., et al. Mid-IR hyperspectral
Discovery and Prediction. Advances in imaging for label-free histopathology and
Neural Information Processing System. cytology. J. Optics 20(2) (24 Jan 2018).
(2017) 30:1078-86.
AD. REFERENCES 203

14. Carbonell, P., T. Radivojevic and H. G. 8. Albertsson, K., et al., Machine Learning in
Martin. Opportunities at the Intersection of High Energy Physics Community White
Synthetic Biology, Machine Learning, and Paper, arXiv:1807.02876
Automation. ACS Synth. Biol. 2019, 8, 7, 9. Radovic, A., et al., Machine learning at the
1474-1477 (19 July 2019). energy and intensity frontiers of particle
15. Census of Agriculture: Summary and State physics, Nature 560, 41 (2018).
Data. United States Department of 10. Ilten, P., Williams, M., Yang, Y., Event
Agriculture. 2007. generator tuning using bayesian
16. Lal, R. Soil carbon sequestration to mitigate optimization, JINST 12.04 (2017).
climate change. Geoderma 123(1-2):1-22 11. Albrect, J., HEP Community White Paper
(Nov 2004). on Software trigger and event
17. Hood, L. and L. Rowen. The Human reconstruction, arXiv: 1802.08638
Genome Project: big science transforms 12. Collins, J. H., et al., Extending the Bump
biology and medicine. Genome Med. Hunt with Machine Learning, arXiv:
5(9):79 (13 Sep 2013). 1902.02634
04. High Energy Physics 13. Ball, R.D., et al., Parton distributions for the
LHC Run II, JHEP 04, 40 (2015).
1. Rosner, J., et al., Planning the Future of US
Particle Physics, arXiv:1401.6075 14. Aurisano, A., et al., A Convolutional Neural
Network Neutrino Event Classifier, JINST
2. Cavuoti, S., et al., Machine-learning-based
11.09 (2016).
photometric redshifts for galaxies of the
ESO Kilo-Degree Survey data release 2, 15. Acciarri, R., et al., Convolutional neural
MNRAS 452, 3100 (2015). networks applied to neutrino events in a
liquid argon time projection chamber,
3. Kremer, J., et al., Big Universe, Big Data:
JINST, 12.03 (2017).
Machine Learning and Image Analysis for
Astronomy, IEEE Intelligent Systems 32, 16 16. Akiyama, K., et al. (Event Horizon
(2017). Telescope Collaboration), First M87 Event
Horizon Telescope Results. I. The Shadow
4. Higson, E., Handley, W., Hobson, M., and
of the Supermassive Black Hole, Astrophys.
Lasenby, A., Bayesian sparse
J. 875, L1 (2019).
reconstruction: a brute force approach to
astronomical imaging and machine 05. Nuclear Physics
learning, MNRAS 483, 4828 (2019).
1. Lee, I. Y., Gamma-ray tracking detectors.
5. Lanusse, F., et al., CMU DeepLens: deep Nucl. Instrum. Meth. A 422, 1-3 (1999),
learning for automatic image-based galaxy- 195-200.
galaxy strong lens finding, MNRAS 473,
3895 (2018). 2. Deleplanque, M. A. et al., GRETA: utilizing
new concepts in gamma-ray detection.
6. Krause, E. and Eifler, T., CosmoLike – Nucl. Instrum. Meth. A 430 2-3 (1999), 292-
Cosmological Likelihood Analyses for 310.
Photometric Galaxy Surveys, MNRAS, 470,
2100 (2017). 3. Goodfellow, I. et al., Generative Adversarial
Nets. Adv. Neural Inf. Process. Syst. 27,
7. Heitmann, K. et al., Cosmic Calibration, (2014) 2672–2680.
Astrophys. J., 646, L1 (2006).
AD. REFERENCES 204

4. Gao, Y., J. Chen, T. Robertazzi, and K. A. 3. Baltz, E. A., et al., “Achievement of
Brown. Reinforcement learning based Sustained Net Plasma Heating in a Fusion
schemes to manage client activities in large Experiment with the Optometrist Algorithm,”
distributed control systems. Phys. Rev. Nature Scientific Reports, 7 (2017).
Accel. Beams 22, 014601, January 2019. doi:10.1038/s41598-017-06645-7
5. Negoita, G. A., et al., Deep learning: 4. Bock, A., et al., “Advanced Tokamak
Extrapolation tool for ab initio nuclear Investigations in Full-Tungsten ASDEX
theory. Phys. Rev. C 99 (Oct. 2019). Upgrade,” Physics of Plasmas, 25 (2018).
6. Maaten, Laurens van der, and Geoffrey 5. Bonoli, P. T., et al., ”Lower Hybrid Current
Hinton. Visualizing data using t-SNE. J. Drive Experiments on Alcator C-Mod:
Mach. Learn. Res. 9, Nov (2008): 2579- Comparison with Theory and Simulation,”
2605. Physics of Plasmas, 15 (2008).
7. Dugger, M., et al., “A study of decays to 6. Boyer, M. D., Kaye, S., Erickson, K. “Real-
strange final states with GlueX in Hall D Time Capable Modeling of Neutral Beam
using components of the BaBar DIRC,” Injection on NSTX-U Using Neural
arXiv:1408.0215 [physics.ins-det]. Networks,” Nuclear Fusion, 59 (2019).
8. Lai, Y. S., arXiv:1810.00835. 7. Cannas, B., Cau, F., Fanni, A., Sonato, P.,
Zedda, M.K., and JET-EFDA Contributors,
9. Ferrario, P. et al., “Demonstration of the
“Automatic Disruption Classification at JET:
event identification capabilities of the
Comparison of Different Pattern
NEXT-White detector,” arXiv:1905.13141
Recognition Techniques,” Nuclear Fusion,
[physics.ins-det], accepted to JHEP 2019.
46 (2006).
10. Solopova, A. D., et al., SRF Cavity Fault
8. Maingi, R., et al., “Summary of the FESAC
Classification Using Machine Learning at
Transformative Enabling Capabilities Panel
CEBAF. Proc. 10th Int. Particle Accelerator
Report,” Fusion Science and Technology,
Conf. (IPAC’19), Melbourne, Australia, May
75 (2019).
2019, pp. 1167-1170.
9. Giruzzi, G., et al., “Physics and Operation
11. Carpenter, A., et al., “Initial Implementation
Oriented Activities in Preparation of the JT-
of a Machine Learning System for SRF
60SA Tokamak Exploitation,” Nuclear
Cavity Fault Classification at CEBAF.” 17th
Fusion, 57 (2017).
Int. Conf. on Accelerator and Large
Experimental Physics Control Systems 10. Gopalaswamy, V., et al., “Tripled Yield in
(ICALEPCS’19), New York, NY, USA, Oct. Direct-Drive Laser Fusion through
2019, paper WEPHA025. Statistical Modelling,” Nature, 565 (2019).
11. Hill, D.N., et al., “DIII-D Research Towards
06. Fusion
Resolving Key Issues for ITER and Steady
1. Gribov, Y., et al., “ITER Physics Basis,” State Tokamaks,” Nuclear Fusion, 53
Nuclear Fusion, 47 (2007). (2013).
2. Report of the Workshop on Advancing 12. Kates-Harbeck, J., Svyatkovskiy, A., Tang,
Fusion with Machine Learning April 30 – W., “Predicting Disruptive Instabilities in
May 2, 2019. https://science.osti.gov/- Controlled Fusion Plasmas Through Deep
/media/fes/pdf/workshop-reports/FES_ Learning,” Nature, 568 (2019).
ASCR_Machine_Learning_Report.pdf
AD. REFERENCES 205

13. Li, J., et al, “A Long-Pulse High 2. Microsoft UK Enterprise Team. “Better,
Confinement Plasma Regime in the faster, more efficient: AI meets
Experimental Advanced Superconducting manufacturing.” Microsoft Industry Blog –
Tokamak,” Nature Physics, 9 (2013). United Kingdom (6 June, 2018).
14. Meneghini, O., et al., “Self-Consistent Core- 3. “Airbus: Reimagining the future of air
Pedestal Transport Simulations With Neural travel.” Autodesk Website.
Network Accelerated Models,” Nuclear 4. U.S. National Committee on Theoretical
Fusion, 57 (2017). and Applied Mechanics, Board on
15. Montes, K. J., et al., “Machine Learning for International Scientific Organizations, Policy
Disruption Warning on Alcator C-Mod, DIII- and Global Affairs, and National Academies
D, and EAST,” Nuclear Fusion, 59 (2019). of Sciences, Engineering, and Medicine,
Predictive Theoretical and Computational
16. Rea, C., et al., “Disruption Prediction
Approaches for Additive Manufacturing:
Investigations using machine learning tools
Proceedings of a Workshop. Washington,
on DIII-D and Alcator C-Mod,” Plasma
D.C.: National Academies Press, 2016.
Physics and Controlled Fusion, 60 (2018).
DOI: 10.17226/23646.
17. Rebut, P-H., “The Joint European Torus
5. Board on Mathematical Sciences and
(JET),” European Physical Journal, 43
Analytics, National Materials and
(2018).
Manufacturing Board, Division on
18. Baker, N., et al. Workshop Report on Basic Engineering and Physical Sciences, and
Research Needs for Scientific Machine National Academies of Sciences,
Learning: Core Technologies for Artificial Engineering, and Medicine, Data-Driven
Intelligence. doi:10.2172/1478744 (2019). Modeling for Additive Manufacturing of
19. Smith, R. C., “Uncertainty Quantification: Metals: Proceedings of a Workshop.
Theory, Implementation, and Applications,” Washington, D.C.: National Academies
SIAM, Philadelphia (2014) Press, 2019. DOI: 10.17226/25481.
20. Windsor, C. G., Pautasso, G., Tichmann, 6. Bonawitz, K., et al., Practical Secure
C., Buttery, R. J., Hender, T. C., JET EFDA Aggregation for Privacy-Preserving
Contributors and the ASDEX-UG team, “A Machine Learning. Proceedings of the 2017
Cross-Tokamak Neural Network Disruption ACM SIGSAC Conference on Computer
Predictor for the JET and ASDEX Upgrade and Communications Security. 1175-1191.
Tokamaks,” Nuclear Fusion, 45 (2005). Oct 30-Nov 3, Dallas, TX, 2017.
21. Wroblewski, D., Jahns, G. L., Leuer, J. A., 7. Kasiviswanathan, S. P., et al. What Can We
“Tokamak Disruption Alarm Based on a Learn Privately? The 49th Annual IEEE
Neural Network Model of the High-Beta Symposium on Foundations of Computer
Limit,” Nuclear Fusion, 37 (1997). Science. 531-540. Oct. 25-28, Philadelphia,
PA (2008).
07. Engineering and Manufacturing 8. Balde, C. P., et al. The Global E-waste
1. Zistl, S. “The Future of Manufacturing: Monitor 2017: Quantities, Flows, and
Prototype Robot Solves Problems without Resources (Bonn, Geneva, and Vienna:
Programming,” Seimens.com Global United Nations University, International
Website. Telecommunication Union, and
International Solid Waste Association,
2017).
AD. REFERENCES 206

9. Ellen MacArthur Foundation, Circular 10. U.S. Department of Energy, Grid Interactive
Consumer Electronics: An Initial Efficient Buildings https://www.energy.gov/
Exploration, 2018. eere/buildings/grid-interactive-efficient-
buildings
08. Smart Energy Infrastructure
11. U.S. Department of Energy, Energy
1. Kwasinski, F., Andrade, M. J. Castro- Efficient Mobility Systems https://www.
Sitiriche and E. O’Neill-Carrillo, “Hurricane energy.gov/eere/vehicles/energy-efficient-
Maria Effects on Puerto Rico Electric Power mobility-systems
Infrastructure,” IEEE Power and Energy
Technology Systems Journal, 6, 85-94 09. AI for Computer Science
(2019). doi: 10.1109/JPETS.2019.2900293 1. Ibrahim, A., Elfadel, M., Boning, D., Li, X.
2. U.S.-Canada Power System Outage Task (Ed.), Machine Learning in VLSI Computer-
Force, Final Report on the August 14, 2003 Aided Design, Springer International
Blackout in the United States and Canada: Publishing, 2018.
Causes and Recommendations, April 2004. 2. Toigo, J., AI for Storage Management Gets
3. IEEE guide for electric power distribution Real. Tech Target (2019). https://www.
reliability indices, IEEE Std 1366-2012. google.com/amp/s/searchstorage.techtarge
t.com/opinion/AI-for-storage-management-
4. U.S. Department of Energy website,
gets-real%3famp=1
“Confronting the Duck Curve: How to
Address Over-Generation of Solar Energy,” 3. 2nd International Workshop on AI-assisted
(2017). Design for Architecture https://eecs.
oregonstate.edu/aidarc/index.php/program/
5. National Academies of Sciences,
Engineering, and Medicine 2017. 4. Bavishi, R., Lemieux, C., Fox, R., Sen, K.,
Enhancing the Resilience of the Nation’s & Stoica, I., AutoPandas: Neural-Backed
Electricity System. Washington, DC: The Generators for Program Synthesis.
National Academies Press. Proceedings of the ACM on Programming
doi.org/10.17226/24836. Languages, OOPSLA’19, October 2019.
6. Hong, T., Chen, Y., Lee, S.H., Piette, M.A. 5. Ansel, J., Kamil, et al., Opentuner: An
“CityBES: A Web-based Platform to extensible framework for program
Support City-Scale Building Energy autotuning, Proceedings of the 23rd
Efficiency,” Urban Computing (2016). International Conference on Parallel
Architectures and Compilation, 303–316.
7. Chen, Y., Hong, T., Piette, M.A. “Automatic
ACM, 2014.
Generation and Simulation of Urban
Building Energy Models Based on City 6. Balaprakash, P., et al., Autotuning in High-
Datasets for City-Scale Building Retrofit performance Computing Applications,
Analysis,” Applied Energy (2017). Proceedings of the IEEE, 1–16, 2018.
8. U.S. Department of Energy, “Smart Grid 7. Tiwari, A., Chen, C., Chame, J., Hall, M., &
System Report,” (2018). Hollingsworth, J., A Scalable Auto-tuning
Framework for Compiler Optimization,
9. Hong, T., et al. “Ten questions on urban
Proceedings of the 2009 IEEE International
building energy modeling,” Building and
Symposium on Parallel & Distributed
Environment (2019).
Processing, 1-12, 2009.
AD. REFERENCES 207

8. Thiagarajan, J. J., et al., Bootstrapping 16. Hoos, H.H., Programming by Optimization.
Parameter Space Exploration for Fast Communications of the ACM 55, 70–80
Tuning, Proceedings of the 2018 (2012). DOI: https://doi.org/10.1145/
International Conference on Super- 2076450.2076469
computing, 385–395, November 2018. 17. Berry, M., et al., Machine Learning and
9. Marathe, A., et al. Performance Modeling Understanding for Intelligent Extreme Scale
Under Resource Constraints Using Deep Scientific Computing and Discovery,
Transfer Learning, Proceedings of the technical report, DOE ASCR Workshop
International Conference for High Report, 2015.
Performance Computing, Networking, 18. Luan, S., Yang, D., Barnaby, C., Sen, K., &
Storage and Analysis (SC17), 31, 2017. Chandra, S., Aroma: Code
10. Behzad, B., et al., Taming Parallel I/O Recommendation via Structural Code
Complexity with Auto-tuning, Proceedings Search,, Proceedings of the ACM on
of the International Conference on High Programming Languages (OOPSLA’19),
Performance Computing, Networking, October 2019.
Storage and Analysis (SC13), 68, 2013. 19. Cambronero, J., Li, H., Kim, S., Sen, K., &
11. Lin, X., Wang, Y., & Pedram, M., A Chandra, S., When Deep Learning Met
Reinforcement Learning-based Power Code Search, Industry Track of 27th ACM
Management Framework for Green Joint European Software Engineering
Computing Data Centers, 2016 IEEE Conference and Symposium on the
International Conference on Cloud Foundations of Software Engineering
Engineering (IC2E), 2016. (ESEC/FSE’19), ACM, 964–974, August
2019.
12. Kalyan, A., et al., Neural-guided Deductive
Search for Real-time Program Synthesis 20. Tramèr, F., et al. Ensemble adversarial
from Examples, The Sixth International training: Attacks and defenses. arXiv
Conference on Learning Representations preprint arXiv:1705.07204 (2017).
(ICLR 2018), 2018. 21. ASCR Cybersecurity for Scientific
13. Rao, N. S. V., Sen, S., Liu, Z., Kettimuthu, Computing Integrity - Research Pathways
R., & Foster, I., Learning Concave-convex and Ideas Workshop https://escholarship.
Profiles of Data Transport Over Dedicated org/content/qt5j00n7h2/qt5j00n7h2.pdf
Connections, Machine Learning for 22. Krishnan, S., J. Wang, E. Wu, M. J.
Networking, Springer-Verlag, 2019. Franklin, and K. Goldberg. 2016.
14. Cai, J, et al., Making Neural Programming ActiveClean: Interactive Data Cleaning for
Architectures Generalize Via Recursion, Statistical Modeling. Proc. VLDB Endow. 9,
The Fifth International Conference on 948–959 (2016).
Learning Representations (ICLR 2017), 23. Tarski, A.. Logic, Semantics,
2017. Metamathematics: Papers from 1923 to
15. Sid-Lakhdar, W., Mahmoudi Aznaveh, 1938, Oxford University Press, 1956.
Mohsen, M.A., Li, X., & Demmel, J., 24. Rao, N. S. V. On undecidability aspects of
Multitask and Transfer Learning for resilient computations and implications to
Autotuning Exascale Applications, Exascale, Resilience 2014: Seventh
submitted August 2019. Workshop on Resiliency in High
Performance Computing with Clouds,
Grids, and Clusters, 2014.
AD. REFERENCES 208

25. Vapnik, V. N. Statistical Learning Theory. 6. Goodfellow, I., et al. Generative Adversarial
John-Wiley and Sons, New York, New Nets. Advances in Neural Information
York, 1998. Processing Systems, 2014.
26. Cohen, F. B. “Computational aspects of 7. Chen, X., et al. Infogan: Interpretable
computer virus,” Computer & Security, 8, Representation Learning by Information
325–344, 1989. Maximizing Generative Adversarial Nets.
Advances in Neural Information Processing
27. Rao, N. S. V., Reister, D. B., Barhen, J.
Systems, 2172–2180 (2016).
Information Fusion Methods Based on
Physical Laws, IEEE Transactions on 8. Arjovsky, M., Chintala, S. and Bottou, L.
Pattern Analysis and Machine Intelligence, Wasserstein Generative Adversarial
27, 66–77 (2005). Networks. International Conference on
Machine Learning (pp. 214–223, 2017).
28. Rao, N. S. V., et al. Multi-Modal Sensor
Fusion for Reactor Power-Level Estimation: 9. Paganini, M., de Oliveira, L. and Nachman,
Thermal, EM, Acoustic. Nuclear Security B. CaloGAN: Simulating 3D High Energy
Applications Research & Development Particle Showers in Multilayer Electro-
Program Review Meeting, 2019. magnetic Calorimeters with Generative
Adversarial Networks. Physical Review D
29. Ben-David, S., Hrubes, P., Moran, S.,
97: 014021 (2018).
Shpilka A., and Yehudayoff, A. Learnability
Can Be Undecidable, Nature, 2019. 10. Zhang, K., Zuo, W., Chen, Y., Meng, D. and
Zhang, L. Beyond a Gaussian denoiser:
10. AI Foundations and Open Problems Residual Learning of Deep CNN for Image
1. Thomas, N., et al. Tensor Field Networks: Denoising. IEEE Transactions on Image
Rotation- and Translation-Equivariant Processing, 26: 3142–3155 (2017).
Neural Networks for 3D Point Clouds. Arxiv 11. He, K., Zhang, X., Ren, S. and Sun, J.
Preprint, arXiv:1802.08219, 2018. Deep Residual Learning for Image
2. Jordan, M. I., Artificial Intelligence: The Recognition. Proceedings of the IEEE
Revolution Hasn’t Happened Yet. Harvard Conference on Computer Vision and
Data Science Review (2019). doi:10.1162/ Pattern Recognition, 770–778, 2016.
99608f92.f06c6e61. 12. Bottou, L., Curtis, F.E. and Nocedal, J.
3. Baker, N., et al. Workshop Report on Basic Optimization Methods for Large-Scale
Research Needs for Scientific Machine Machine Learning. Siam Review, 60: 223–
Learning: Core Technologies for Artificial 311 (2018).
Intelligence, 2019. doi:10.2172/1478744 13. LeCun, Y.A., Bottou, L., Orr, G.B. and
4. Arridge, S., Maass, P., Öktem, O., and Müller, K.R. Efficient Backprop. Neural
Schönlieb, C. Solving Inverse Problems Networks: Tricks of the Trade, Springer,
Using Data-Driven Models. Acta Berlin, Heidelberg, 2012.
Numerica, 28, 1-174. doi:10.1017/ 14. Sutskever, I., Martens, J., Dahl, G. and
S0962492919000059. Hinton, G. On the Importance of
5. Kondor, R., Trivedi, S. On the Initialization and Momentum in Deep
Generalization of Equivariance and Learning. International Conference on
Convolution in Neural Networks to the Machine Learning, 1139–1147 (2013).
Action of Compact Groups. Proceedings of
the 35th International Conference on
Machine Learning, PMLR 80:2747–2755,
2018.
AD. REFERENCES 209

15. Duchi, J., Hazan, E., and Singer, Y. 24. K. Xu, W. Hu, J. Leskovec and S. Jegelka,
Adaptive Subgradient Methods for Online How Powerful are Graph Neural Networks?
Learning and Stochastic Optimization. ArXiv:1810.00826v3, 2019.
Journal of Machine Learning Research, 25. Tschannen, M., Bachem, O. and Lucic, M.,
2121–2159 (2011). 2018. Recent Advances in Autoencoder-
16. National Research Council.– Frontiers in Based Representation Learning. arXiv
Massive Data Analysis Washington, DC: preprint arXiv:1812.05069.
The National Academies Press. 2013. 26. Ben-David, S., Hrubeš, P., Moran, S. et al.
https://doi.org/10.17226/18374. Learnability Can be Undecidable. Nat Mach
17. Mustafa, M., et al. CosmoGAN: Creating Intell 1: 44–48 (2019). doi:10.1038/s42256-
High-Fidelity Weak Lensing Convergence 018-0002-3
Maps Using Generative Adversarial 27. Tshitoyan, V., et al. Unsupervised Word
Networks, Computational Astrophysics and Embeddings Capture Latent Knowledge
Cosmology 6, ( 2019). from Materials Science Literature. Nature
18. Wu, J. L., et al. Enforcing Statistical 571, 95–98 (2019). doi:10.1038/s41586-
Constraints Generative Adversarial 019-1335-8
Networks for Modeling Chaotic Dynamical 28. Swain, M. C. and Cole, J. M., 2016.
Systems, Cornell University, 2019. ChemDataExtractor: a Toolkit for
https://arxiv.org/abs/1905.06841. Automated Extraction of Chemical
19. Raissi, M., Perdikaris, P., Karniadakis, G. Information from the Scientific Literature.
E. Physics Informed Deep Learning (Part I): Journal of Chemical Information and
Data-Driven Solutions of Nonlinear Partial Modeling, 56: 1894–1904 (2016).
Differential Equations. https://arxiv.org/abs/ 29. Clark, P., et al., 2019. From ‘F’ to ‘A’ on the
1711.10561. NY Regents Science Exams: An Overview
20. Yang, L., et al. Highly Scalable, Physics- of the Aristo Project. arXiv preprint
Informed GANs for Learning Solutions of arXiv:1909.01958.
Stochastic PDEs (SC’19 Deep Learning on
Supercomputers Workshop). 11. Software Environments and Software
Research
21. Weiler, M., Geiger, M., Welling, M.,
Boomsma, W. and Cohen, T. 3D Steerable 1. Baker, N., et al. Workshop Report on Basic
CNNS: Learning Rotationally Equivariant Research Needs for Scientific Machine
Features in Volumetric Data. Advances in Learning: Core Technologies for Artificial
Neural Information Processing Systems, Intelligence, DOE Office of Science
10381–10392, 2018. Technical Report, 2019.
22. T. D. Bui, S. Ravi and V. Ramavijjala, 2. Gil, Y. & Selman, B., A 20-Year Community
Neural Graph Learning: Training Neural Roadmap for Artificial Intelligence Research
Networks Using Graphs, Proceedings of in the US, 2019.
11th ACM International Conference on Web
Search and Data Mining, 2018. 12. Data Life Cycle and Infrastructure
23. R. L. Murphy, B. Srinivasan, V. Rao and B. 1. Wilkinson, M. D. et al. The FAIR Guiding
Ribeiro, Relational Pooling for Graph Principles for Scientific Data Management
Representations, Arxiv:1903.02541, 2019. and Stewardship (Sci. Dat. 3, 2016).
AD. REFERENCES 210

2. Blaiszik, B. et al., A Data Ecosystem to 12. Chard, K., et al. The Modern Research
Support Machine Learning in Materials Data Portal: A Design Pattern for
Science (MRS Commun., 2019). Networked, Data-Intensive Science (Peer J.
Comput. Sci. 4 e144, 2018).
3. Himanen, L., Geurts, A., Foster, A. S.,
Rinke, P. Data-Driven Materials Science:
13. Hardware Architectures
Status, Challenges, and Perspectives (Adv.
Sci., 2019). 1. Chien, A. Computer Architecture: Disruption
from Above, Commun. ACM 61, 9, 2018.
4. Arkin, A. P., et al. KBase: The United
States Department of Energy Systems 2. Wu, C., et al., Machine Learning at
Biology Knowledgebase (Nat. Facebook: Understanding Inference at the
Biotechnol. 36, 7, 2018). Edge, Proceedings of the 2019 IEEE
International Symposium on High
5. Williams, D. N., et al. The Earth System
Performance Computer Architecture
Grid: Enabling Access to Multimodel
(HPCA), 331–44 (https://doi.org/
Climate Simulation Data (Bull. Am.
10.1109/HPCA.2019.00048, 2019).
Meteorol. Soc. 90, 2, 195-206, 2009).
3. LeCun, Y. Deep Learning Hardware: Past,
6. Stokes, G. M., & Schwartz, S. The
Present, and Future, Proceedings of the
Atmospheric Radiation Measurement
2019 IEEE International Solid-State Circuits
Program (Bull. Am. Meteorol. Soc. 75, 7,
Conference (ISSCC), 12–19 (https://doi.org/
1201–1222, 1994).
10.1109/ISSCC.2019.8662396, 2019).
7. Weber, G. H., Ophus, C., & Ramakrishnan,
4. Jouppi, N. P., et al., In-Datacenter
L. Automated Labeling of Electron
Performance Analysis of a Tensor
Microscopy Images Using Deep Learning
Processing Unit, SIGARCH Comput. Archit.
(Proc. IEEE/ACM Mach. Learn. in HPC
News 45, 2,1–12 (https://doi.org/10.1145/
Environ., 26–36, 2018).
3140659.3080246, 2017).
8. Biven, L., Office of Science Data for AI
5. Vetter, J. S., et al., Extreme Heterogeneity
Roundtable: Presentation to ASCAC
2018 – Productive Computational Science
(http://bit.ly/2QWyTbr, 2019).
in the Era of Extreme Heterogeneity: Report
9. Aspuru-Guzik, A., & Persson, K. Materials for DOE ASCR Workshop on Extreme
Acceleration Platform: Accelerating Heterogeneity, USDOE Office of Science
Advanced Energy Materials Discovery by (https://www.osti.gov/servlets/purl/1473756,
Integrating High-Throughput Methods and https://doi.org/10.2172/1473756, 2018).
Artificial Intelligence (http://nrs.harvard.edu/
6. AAAS Science Magazine, How
urn-3:HUL.InstRepos:35164974, 2018).
Researchers are Teaching AI to Learn Like
10. Carbonell, P. Radivojevic, T., & García a Child, May 24, 2018.
Martín, H. Opportunities at the Intersection
7. Strubell, E., Ganesh, A., and McCallum, A.
of Synthetic Biology, Machine Learning,
Energy and Policy Considerations for Deep
and Automation (ACS Synth. Biol. 8, 1474–
Learning in NLP. (https://arxiv.org/abs/
1477, 2019).
1906.02243)
11. Blaiszik, B., Charting ML Publications in
8. Silver, D., et al., Mastering the game of Go
Science (https://github.com/blaiszik/ml_
without human knowledge, Nature 550,
publication_charts, 2019).
354, (https://doi.org/10.1038/nature24270,
2017).
AD. REFERENCES 211

9. Torrejon, J., et al., Neuromorphic Challenges for Biological and Environ-
computing with nanoscale spintronic mental Research (science.osti.gov/
oscillators, Nature 547, 428, (https://doi.org/ ~/media/ber/berac/pdf/Reports/BERAC-
10.1038/nature23011, 2017). 2017-Grand-Challenges-Report.pdf, 2017).
10. Basic Research Needs for Microelectronics 8. The Advanced Photon Source Strategic
(Brochure), USDOE Office of Science Plan: Enabling frontier science in the
(https://www.osti.gov/servlets/purl/15457, national interest (2018).
72, 2018). 9. ALS-U: Solving Scientific Challenges with
Coherent Soft X-Rays (2017).
14. AI for Imaging
10. Noack, M. M., et al. A Kriging-Based
1. Liu, S., Leemann, S. C., Hexemer, A.,
Approach to Autonomous Experimentation
Marcus, M. A., Melton, C. N., Nishimura, H.,
with Applications to X-Ray Scattering, Sci.
Sun, C. 2019. Demonstration of machine
Rep. 9, 11809 (2019).
learning-based model-independent
stabilization of source properties in 11. Yang, X., et al. “Low-Dose X-Ray
synchrotron light sources. PRL, in press. Tomography through a Deep Convolutional
Neural Network.” Sci. Rep. 8, 2575 (2018).
2. Kaira, C. S., et al. Automated correlative
segmentation of large transmission x-ray 15. AI at the Edge
microscopy (TXM) tomograms using deep
learning. Mater. Charact. 142, 203–210 1. “Edge Computing: Vision and Challenges,”
(2018). June 9, 2016 (https://ieeexplore.ieee.org/
document/7488250, accessed October 11,
3. Pelt, D. M., & Sethian J. A. A mixed-scale 2019).
dense convolutional neural network for
image analysis. PNAS, 115 (2) 254–259 2. “Edge-centric Computing–DOIs,” Septem-
(2018). ber 30, 2015, http://doi.org/10.1145/
2831347.2831354, accessed October 11,
4. Chang, M.C., et al. Accelerating neutron 2019.
scattering data collection and experiments
using AI deep super-resolution learning 3. “Patterned Probes for High Precision 4D-
(arXiv:1904.08450, 2019). STEM Bragg Measurements,” July 11,
2019, https://arxiv.org/abs/1907.05504,
5. Samarakoon, A. N., et al. Machine learning accessed October 11, 2019.
assisted insight into spin ice Dy2Ti2O7
(arXiv:1906.11275, 2019). 4. “Making the Invisible Visible: New Sensor
Network Reveals Telltale Patterns in
6. U.S. Department of Energy. Report from Neighborhood Air Quality,” July 22, 2019,
the Basic Energy Sciences Advisory https://newscenter.lbl.gov/2019/07/22/new-
Committee. Challenges at the frontiers of sensor-network-neighborhood-air-quality/,
matter and energy: transformative accessed October 11, 2019.
opportunities for discovery science (2015).
5. “Waggle: An open sensor platform for edge
7. U.S. Department of Energy. Report from computing,” https://ieeexplore.ieee.org/
the Biological and Environmental Research abstract/document/7808975/, accessed
Advisory Committee. Grand Challenges for October 11, 2019.
Biological and Environmental Research:
Progress and Future Vision; A Report from 6. “Array of things: a scientific research
the Biological and Environmental Research instrument in the public way,” April 18,
Advisory Committee, DOE/SC–0190, 2017, https://dl.acm.org/citation.cfm?id=
BERAC Subcommittee on Grand Research 3063771, accessed October 11, 2019.
AD. REFERENCES 212

7. “Argonne supports grid advances through 14. “Smart Cities: The Future of Urban
pioneering energy storage and sensor Development,” Forbes, May 19, 2019,
research,” March 18, 2019, https://www.anl. https://www.forbes.com/sites/jamesellsmoor
gov/es/article/argonne-supports-grid- /2019/05/19/smart-cities-the-future-of-
advances-through-pioneering-energy- urban-development/, accessed October 14,
storage-and-sensor-research, accessed 2019.
October 11, 2019. 15. “Waymo,” https://waymo.com/, accessed
8. “Edge TPU–Google Cloud,” https://cloud. October 14, 2019.
google.com/edge-tpu/, accessed October 16. “Tesla,” https://www.tesla.com/, accessed
11, 2019. October 14, 2019.
9. “Intel Movidius, an Intel Company,” 17. “Earn Money by Driving or Get a Ride
https://www.movidius.com/, accessed Now,” https://www.uber.com/, accessed
October 11, 2019. October 14, 2019.
10. “Brain-inspired Chip–IBM Research,” 18. “Hydraulic fracturing Sandia’s role in shale
http://www.research.ibm.com/articles/brain- gas production technologies,” http://energy.
chip.shtml, accessed October 11, 2019. sandia.gov/wp-content/gallery/uploads/
11. “AR1K–The Smart Farm Research FINAL-HydraulicFracturing-Final-
Consortium,” https://ar1k.org/, accessed wSAND1.pdf, accessed October 14, 2019.
October 11, 2019. 19. “Hydraulic Fracturing: A Public-Private R&D
12. “AR1K: Sustainable, Profitable Agriculture Success Story,” https://clearpath.org/
through Research,” https://eesa.lbl.gov/ energy-101/hydraulic-fracturing-a-public-
projects/ar1k-sustainable-profitable- private-rd-success-story/, accessed
agriculture-research/, accessed October 11, October 14, 2019.
2019.
13. “ESnet,” http://es.net/, accessed October
14, 2019.
AD. REFERENCES 213

214

AI For Science Report

Uploaded by

Copyright:

Available Formats

AI For Science Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI For Science Report

Uploaded by

Copyright:

Available Formats

AI for Science

Report on the Department of Energy (DOE) Town Halls on

Town Hall Co-Chairs

Mihai Anitescu, Prasanna Balaprakash, Pete Beckman,

Lawrence Berkeley National Laboratory

Katerina Antypas, Wes Bethel, Ben Brown, Paolo Calafiura,

Oak Ridge National Laboratory

David Dean, James Hack, Kenneth Herwig, Judith Hill,

Lawrence Livermore National Laboratory

Administrative: Argonne National Laboratory: Silvia Mulligan

Editorial: Argonne National Laboratory: Emily M. Dietrich, Laura Wolf

Introduction: AI for Science ......................................................................................... 5

01. Chemistry, Materials, and Nanoscience .............................................................. 17

02. Earth and Environmental Sciences ..................................................................... 27

03. Biology and Life Sciences .................................................................................... 37

04. High Energy Physics............................................................................................. 45

05. Nuclear Physics..................................................................................................... 55

07. Engineering and Manufacturing........................................................................... 73

08. Smart Energy Infrastructure ................................................................................. 81

09. AI for Computer Science ...................................................................................... 89

10. AI Foundations and Open Problems ................................................................... 99

11. Software Environments and Software Research .............................................. 109

13. Hardware Architectures ...................................................................................... 125

14. AI for Imaging ...................................................................................................... 133

15. AI at the Edge ...................................................................................................... 141

16. Facilities Integration and AI Ecosystem ............................................................ 149

AA. Report Writing Team .......................................................................................... 155

AB. Agendas .............................................................................................................. 157

AD. Abbreviations and Terminology ....................................................................... 197

AE. References .......................................................................................................... 201

Similarly, scientific infrastructure—accelerators, light sources, networks, computation and data

Learning and Integrating Domain Across experimental sciences, AI-aware

Surrogates must also incorporate an

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 17

First, continuous growth in high-performance It is this integration and analysis of multiple,

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 18

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 19

Design materials and molecules for

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 20

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 21

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 22

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 23

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 24

01. CHEMISTRY, MATERIALS, AND NANOSCIENCE 25

02. EARTH AND ENVIRONMENTAL SCIENCES 27

02. EARTH AND ENVIRONMENTAL SCIENCES 28

Develop adaptive subsurface management

02. EARTH AND ENVIRONMENTAL SCIENCES 29

data-driven models and for optimizing

Ensure water security under a changing

02. EARTH AND ENVIRONMENTAL SCIENCES 30

02. EARTH AND ENVIRONMENTAL SCIENCES 31

02. EARTH AND ENVIRONMENTAL SCIENCES 32

02. EARTH AND ENVIRONMENTAL SCIENCES 33

02. EARTH AND ENVIRONMENTAL SCIENCES 34

02. EARTH AND ENVIRONMENTAL SCIENCES 35

02. EARTH AND ENVIRONMENTAL SCIENCES 36

1. State of the Art These challenges are particularly clear in