Science in The Age of Ai Report

Science in
the age of AI
How artificial intelligence
is changing the nature and
method of scientific research
Science in the age of AI: How artificial intelligence is
changing the nature and method of scientific research
Issued: May 2024 DES8836_1
ISBN: 978-1-78252-712-1
© The Royal Society
The text of this work is licensed under the terms of

the Creative Commons Attribution License which
permits unrestricted use, provided the original
author and source are credited.
The license is available at:
creativecommons.org/licenses/by/4.0
Images are not covered by this license.
This report and other project outputs can be viewed at:

royalsociety.org/science-in-the-age-of-ai
Cover image: Computational graph developed by Graphcore,

visualising what the algorithms in a machine learning model
look like when they are in action. This particular graph is a
mapping of the deep learning tool ResNet18. ©️ Graphcore.
CONTENTS
Contents
Foreword 4
Executive summary 5
Key findings 6
Future research questions 8
Recommendations 9
Introduction 16
Chapter one: How artificial intelligence is transforming scientific research 21

AI in science: an overview 22
AI and methods of scientific research 22
AI and the nature of scientific research 27
AI and access to high-quality data 32
Case study: AI and rare disease diagnosis 35
Chapter two: Research integrity and trustworthiness 39

Reproducibility challenges in AI-based research 40
Barriers limiting reproducibility 44
Advancing transparency and trustworthiness 47
Chapter three: Research skills and interdisciplinarity 51

Challenges for interdisciplinarity 52
Emerging research skills 55
Case study: AI and material science 58
Chapter four: Research, innovation and the private sector 63

The changing landscape of AI technologies in scientific research 65
Challenges related to the role of the private sector in AI-based science 72
Opportunities for cross-sector collaboration 78
Chapter five: Research ethics and AI safety 81

Addressing AI ethics in scientific research 86
Case study: AI and climate science 88
Conclusion 93
Appendices 97
SCIENCE IN THE AGE OF AI 3

FOREWORD
Foreword
With the growing availability of large datasets, The rapid uptake of AI in science has also
new algorithmic techniques and increased presented challenges related to its safe and
computing power, artificial intelligence (AI) rigorous use. A growing body of irreproducible
is becoming an established tool used by studies are raising concerns regarding the
researchers across scientific fields. robustness of AI-based discoveries. The black-
box and non-transparent nature of AI systems
Now more than ever, we need to understand creates challenges for verification and external
Image: Professor Alison the extent of the transformative impact of AI on scrutiny. Furthermore, its widespread but
Noble FRS. science and what scientific communities need inequitable adoption raises ethical questions
to do to fully harness its benefits. regarding its environmental and societal
impact. Yet, ongoing advancements in making
This report, Science in the age of AI, explores AI systems more transparent and ethically
this topic. Building on the experiences of more aligned hold the promise of overcoming
than 100 scientists who have incorporated these challenges.
AI into their workflows, it delves into how
AI technologies, such as deep learning or In this regard, the report calls for a balanced
large language models, are transforming the approach that celebrates the potential of
nature and methods of scientific inquiry. It also AI in science while not losing sight of the
explores how notions of research integrity, challenges that still need to be overcome.
research skills and research ethics are The recommendations offer a pathway that
inevitably changing – and what the implications leverages open science principles to enable
are for the future of science and scientists. reliable AI-driven scientific contributions, while
creating opportunities for resource sharing
New opportunities are emerging. The case and collaboration. They also call for policies
studies in this report demonstrate that AI and practices that recognise the links between
is enhancing the efficiency, accuracy, and science and society, emphasising the need
creativity of scientists. Across multiple fields, for ethical AI, equitable access to its benefits,
the application of AI is breaking new ground and the importance of keeping public trust in
by facilitating, for example, the discovery of scientific research.
rare diseases or enabling the development
of more sustainable materials. While it’s clear that AI can significantly aid
scientific advancement, the goal remains to
Playing the role of tutor, peer or assistant, ensure these breakthroughs benefit humanity
scientists are using AI applications to and the planet. We hope this report inspires
perform tasks at a pace and scale previously actors across the scientific ecosystem to
unattainable. There is much excitement around engage with the recommendations and work
the synergy between human intelligence towards a future where we can realise the
and AI and how this partnership is leading potential of AI to transform science and benefit
to scientific advancements. However, to our collective wellbeing.
ensure robustness and mitigate harms, human
judgement and expertise will continue to be of Professor Alison Noble CBE FREng FRS,
utmost importance. Foreign Secretary of the Royal Society and
Chair of the Royal Society Science in the
Age of AI Working Group.
4 SCIENCE IN THE AGE OF AI

EXECUTIVE SUMMARY
Executive summary
The unprecedented speed and scale of The opportunities of AI for scientific research
progress with artificial intelligence (AI) in recent are highlighted throughout this report and
years suggests society may be living through explored in depth through three case studies
an inflection point. The virality of platforms on its application for climate science, material
such as ChatGPT and Midjourney, which can science, and rare disease diagnosis.
generate human-like text and image content,
has accelerated public interest in the field Alongside these opportunities, there are
and raised flags for policymakers who have various challenges arising from the increased
concerns about how AI-based technologies adoption of AI. These include reproducibility
may be integrated into wider society. Beyond (in which other researchers cannot replicate
this, comments made by prominent computer experiments conducted using AI tools);
scientists and public figures regarding the interdisciplinarity (where limited collaboration
risks AI poses to humanity have transformed between AI and non-AI disciplines can lead to
the subject into a mainstream political issue. a less rigorous uptake of AI across domains);
For scientific researchers, AI is not a novel and environmental costs (due to high energy
topic and has been adopted in some form for consumption being required to operate
decades. However, the increased investment, large compute infrastructure). There are also
interest, and adoption within academic and growing barriers to the effective adoption
industry-led research has led to a ‘deep of open science principles due to the black-
learning revolution’1 that is transforming the box nature of AI systems and the limited
landscape of scientific discovery. transparency of commercial models that power
AI-based research. Furthermore, the changing
Enabled by the advent of big data (for instance, incentives across the scientific ecosystem
large and heterogenous forms of data gathered may be increasing pressure on researchers
from telescopes, satellites, and other advanced to incorporate advanced AI techniques at the
sensors), AI-based techniques are helping to neglect of more conventional methodologies, or
identify new patterns and relationships in large to be ‘good at AI’ rather than ‘good at science’2.
datasets which would otherwise be too difficult
to recognise. This offers substantial potential These challenges, and potential solutions,
for scientific research and is encouraging are detailed throughout this report in the
scientists to adopt more complex techniques chapters on research integrity; skills and
that outperform existing methods in their fields. interdisciplinarity; innovation and the private
The capability of AI tools to identify patterns sector; and research ethics.
from existing content and generate predictions
of new content, also allows scientists to run
more accurate simulations and create synthetic
data. These simulations, which draw data
from lots of different sources (potentially in
real time), can help decision-makers assess
more accurately the efficacy of potential
interventions and address pressing societal
or environmental challenges.
1 Sejnowski T. 2018 The Deep Learning Revolution. MIT press

2 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/. (accessed 7 May 2024)

EXECUTIVE SUMMARY
As an organisation that exists to promote the Further research questions are outlined below.
use of science for the benefit of humanity, The Society’s two programmes of work on
this subject is of great importance to the Mathematical Futures3 and Science 20404 will
Royal Society. This report, Science in the Age explore, in more depth, relevant challenges
of AI, provides an overview of key issues related to skills and universities.
to address for AI to positively transform the
scientific endeavour. Its recommendations, Key findings
when taken together, should ensure that the • Beyond landmark cases like AlphaFold, AI
application of AI in scientific research is able applications can be found across all STEM
to reach its full potential and help maintain fields, with a concentration in fields such
public trust in science and the integrity of the as medicine, materials science, robotics,
scientific method. agriculture, genetics, and computer science.
The most prominent AI techniques across
This report has been guided by a working STEM fields include artificial neural networks,
group of leading experts in AI and applied deep learning, natural language processing
science and informed by a series of activities and image recognition5.
undertaken by the Royal Society. These
• High quality data is foundational for AI
include interviews with Fellows of the Royal
applications, but researchers face barriers
Society; a global patent landscape analysis;
related to the volume, heterogeneity,
a historical literature review; a commissioned
sensitivity, and bias of available data. The
taxonomy of AI for scientific applications;
large volume of some scientific data (eg
and several workshops on topics ranging
collected from telescopes and satellites) can
from large language models to immersive
total petabytes, making objectives such as
technologies. These activities are listed in full
data sharing and interoperability difficult to
in the appendix. In total, more than 100 leading
achieve. The heterogeneity of data collected
scientific researchers from diverse disciplines
from sensor data also presents difficulties
contributed to this report.
for human annotation and standardisation,
while the training of AI models on biased
While the report covers some of the critical
inputs can likely lead to biased outputs.
areas related to the role of AI in scientific
Given these challenges, data curators
research, it is not comprehensive and does
and information managers are essential to
not cover, for example, the provision of high-
maintain quality and address risks linked
performance computing infrastructure, the
to artificial data generation, such as data
potential of artificial general intelligence, nor a
fabrication, poisoning, or contamination.
detailed breakdown of the new skills required
across industries and academia.
3 The Royal Society. Mathematical Futures Programme. See https://royalsociety.org/news-resources/projects/

mathematical-futures/ (accessed 23 April 2024)
4 The Royal Society. Science 2040. See https://royalsociety.org/news-resources/projects/science2040/
(accessed 23 April 2024)
5 Berman B, Chubb J, and Williams K, 2024. The use of artificial intelligence in science, technology, engineering, and
medicine. The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/

EXECUTIVE SUMMARY
• Industry and academic institutions are The significant potential to advance

advancing AI innovation for scientific discoveries using complex deep learning
research6. The past decade has seen a models may also encourage scientists or
surge in patent applications related to AI for funders to prioritise AI use over rigour.
science, with China, the United States, Japan, The adoption of open science principles
and South Korea dominating the number of and practices could help address these
patents filed in these territories. A review challenges and enhance scientific integrity8.
commissioned for this report suggests the
• Interdisciplinary collaboration is essential to
valuation of the global AI market (as of 2022)
bridge skill gaps and optimise the benefits
is approximately £106.99 billion7.
of AI in scientific research. By sharing
• China contributes approximately 62% of the knowledge and skills from each other’s
patent landscape. Within Europe, the UK fields, collaboration between AI and domain
has the second largest share of AI patents subject experts (including researchers from
related to life sciences after Germany, with the arts, humanities, and social sciences) can
academic institutions such as the University help produce more effective and accurate AI
of Oxford, Imperial College, and Cambridge models. This is being prevented, however,
University featuring prominently among the by siloed research environments and an
top patent filers in the UK. Companies such incentive structure that does not reward
as Alphabet, Siemens, IBM, and Samsung interdisciplinary collaboration in terms of
appear to exhibit considerable influence contribution towards career progression.
across scientific and engineering fields.
• Generative AI tools can assist the
• The black-box, and potentially proprietary, advancement of scientific research. They
nature of AI tools is limiting the hold promise for expediting routine scientific
reproducibility of AI-based research. Barriers tasks, such as processing unstructured
such as insufficient documentation, limited data, solving complex coding challenges,
access to essential infrastructures (eg code, or supporting the multilingual translation of
data, and computing power) and a lack academic articles. In addition, there may be
of understanding of how AI tools reach a place for text-generation models to be
their conclusions (explainability) make it used for academic and non-academic written
difficult for independent researchers to tasks, with potential implications for scholarly
scrutinise, verify and replicate experiments. communications and research assessment.
In response, funders and academic
institutions are setting norms to prevent
non-desirable uses9,10.
6 Ahmed, N, Wahed, M, & Thompson, N. C. 2023. The growing influence of industry in AI research. Science, 379(6635),
884-886. (DOI: 10.1126/science.ade2420)
7 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
resources/projects/science-in-the-age-of-ai/
8 UNESCO Recommendation on Open Science. 2021. See: https://www.unesco.org/en/legal-affairs/recommendation-
open-science (accessed 6 February 2024)
9 Kaiser J. 2023. Funding agencies say no to AI peer review. Science. 14 July 2023 .See: https://www.science.org/
content/article/science-funding-agencies-say-no-using-ai-peer-review (accessed 23 January 2024)
10 Harker J. 2023. Science Journals set new authorship guidelines for AI-generated text. National Institute of
Environmental Health Sciences. See https://factor.niehs.nih.gov/2023/3/feature/2-artificial-intelligence-ethics.
(accessed 23 January 2024)

EXECUTIVE SUMMARY
Future research questions 5. AI and the future of skills for science:

The following topics emerged in research How are the skill requirements in scientific
activities as key considerations for the future research changing with the increasing
of AI in science: integration of AI? What competencies will
be essential for researchers in the future
1. AI and computing infrastructures for and what efforts are needed to promote AI
science: How can AI workloads be literacy across diverse scientific disciplines?
optimised to harness the full potential of
heterogeneous computing infrastructures in
6. AI and the future of scholarly
scientific research, considering the diverse
communication: How is the landscape
requirements of different scientific domains?
of scholarly and science communication
2. AI and small data: What are the implications evolving with the integration of AI
of the growing use of AI for researchers technologies? How can AI be leveraged
in which only small data is available? How to improve knowledge translation,
can AI techniques be effectively used multilingualism, and multimodality in
to augment small datasets for training scholarly outputs?
purposes? What trade-offs exist between
7. AI and environmental sustainability:
model size reduction and preservation
What role can AI play in promoting
of performance when applied to small
sustainable practices within the scientific
data scenarios?
community? How can AI algorithms
3. AI and inequities in the scientific be optimised to enhance the energy
system: What barriers exist in providing efficiency of environmental modelling, and
equitable access to AI technologies in contribute to sustainable practices in fields
underrepresented communities? How can such as climate science, ecology, and
AI be used to broaden participation among environmental monitoring?
scientific and expert communities, including
8. AI standards and scientific research:
underrepresented scholars and non-
How can AI standards help address the
scientist publics?
challenges of reproducibility or interoperability
4. AI and intellectual property: What inputs in AI-based scientific research? How can
of AI systems (datasets, algorithms, or the scientific community contribute to the
outputs) are crucial for intellectual property establishment of AI standards?
protection, and in what ways does it interact
with the application of open science
principles in science?

RECOMMENDATIONS
Recommendations
AREA FOR ACTION: ENHANCE ACCESS TO ESSENTIAL AI INFRASTRUCTURES
AND TOOLS
RECOMMENDATION 1
Governments, research funders and AI developers should improve

access to essential AI infrastructures
Access to computing resources has been Actions to enhance access to AI

critical for major scientific breakthroughs, such infrastructures and tools may include:
as protein folding with AlphaFold. Despite 1. Funders, industry partners, and research
this, compute power and data infrastructures institutions with computing facilities actively
for AI research are not equally accessible or sharing essential AI infrastructures such as
distributed across research communities11. high-performance computing power and
Scientists from diverse disciplines require access data resources.
to infrastructure to adopt more complex AI
2. Relevant stakeholders (eg government
techniques, process higher volume and types of
agencies, research institutions, industry,
data, and ensure quality in AI-based research.
and international organisations) ensuring
access to high-quality datasets and
Proposals to improve access have
interoperable data infrastructures across
included institutions sponsoring access to
sectors and regions. This could involve
supercomputing12 and the establishment of
advancing access to sensitive data through
regional hubs – akin to a CERN for AI13. Wider
privacy enhancing technologies and
access can extend the benefits of AI to a
trusted research environments15.
greater number of disciplines, improve the
competitiveness of non-industry researchers, 3. Research funders supporting strategies to
and contribute towards more rigorous monitor and mitigate the environmental impact
science by enabling reproducibility at scale. associated with increased computational
Expanding access to computing must also demands and advancing the principle of
be informed by environmentally sustainable energy proportionality in AI applications16.
computational science (ESCS) best practices,
including the measurement and reporting of
environmental impacts14.
11 Technopolis Group, Alan Turing Institute. 2022. Review of Digital Research Infrastructure Requirements for AI.
See: https://www.turing.ac.uk/sites/default/files/2022-09/ukri-requirements-report_final_edits.pdf
(accessed February 6 2024)
12 UKRI. Transforming our world with AI. See: https://www.ukri.org/publications/transforming-our-world-with-ai/
(accessed 6 February 2024)
13 United Nations. 2023 Interim Report: Governing AI for Humanity. See: https://www.un.org/sites/un2.un.org/files/
ai_advisory_body_interim_report.pdf (accessible 6 February 2024)
14 Lannelongue, L, et al. 2023. Greener principles for environmentally sustainable computational science.
Nat Comput Sci3, 514–521. (https://doi.org/10.1038/s43588-023-00461-y)
15 The Royal Society. 2023 Privacy Enhancing Technologies. See https://royalsociety.org/topics-policy/projects/privacy-
enhancing-technologies/ (accessed 21 December 2023).
16 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023).

RECOMMENDATIONS
AREA FOR ACTION: ENHANCE ACCESS TO ESSENTIAL AI INFRASTRUCTURES

AND TOOLS
RECOMMENDATION 2
Funders and AI developers should prioritise accessibility and usability

of AI tools developed for scientific research
Access to AI does not guarantee its meaningful Taking steps to improve the usability of AI-
and responsible use. Complex and high- based tools (eg software applications, libraries,
performance AI tools and methods can be APIs, or general AI systems) should therefore
challenging for researchers from non-AI involve a combination of mechanisms that
backgrounds to adopt and utilise effectively17. make AI understandable for non-AI experts
Similarly, new skills are needed across the and build their capacity to use AI responsibly.
AI lifecycle, such as data scientists who For example, training should ensure that every
understand the importance of metadata and scientist is able to recognise when they require
data curation, or engineers who are familiar with specialised data or programming expertise in
GPU programming for image-based processing. their teams, or when the use of complex and
opaque AI techniques could undermine the
integrity and quality of results.
Improving usability can also enhance the role

of non-AI scientists as co-designers18 – as
opposed to passive users – who can ensure
AI tools meet the needs of the scientific
community. Creating conditions for co-
design requires bridging disciplinary siloes
between AI and domain experts through the
development of shared languages, modes
of working, and tools.
17 Cartwright H. 2023 Interpretability: Should – and can – we understand the reasoning of machine-learning systems?
In: OECD (ed.) Artificial Intelligence in Science. OECD. (https://doi.org/10.1787/a8d820bd-en)
18 UKRI. Trustworthy Autonomous Systems Hub. Developing machine learning models with codesign: how everyone can
shape the future of AI. See: https://tas.ac.uk/developing-machine-learning-models-with-codesign-how-everyone-can-
shape-the-future-of-ai/ (accessed 7 March 2023)

RECOMMENDATIONS
Actions to enhance the usability of AI tools 3. Research funders and AI developers

may include: investing in strategies that improve
1. Research institutions and training centres understanding and usability of AI for
establishing AI literacy curriculums across non-AI experts, with a focus on complex
scientific fields to build researchers’ and opaque models20. This can include
capacity to understand the opportunities, further research on domain-specific
limitations, and adequacy of AI-based tools explainable AI (XAI) or accessible AI
within their fields and research contexts. tools that enhance access in resource-
constrained research environments21.
2. Research institutions and training centres
establishing comprehensive data literacy 4. Research institutions, research funders,
curriculums tailored to the specific needs and scientific journals implementing
of AI applications in scientific research. mechanisms to facilitate knowledge
This involves building capacity for data translation across domains and meaningful
management, curation, and stewardship, collaboration across disciplines. This
as well as implementation of data principles requires a combination of cross-discipline
such as FAIR (Findable, Accessible, training, mentorship, publication outlets
Interoperable, and Reusable) and CARE and funding (eg through bodies such
(Collective benefit, Authority to control, as the UKRI’s Cross-Council Remit
Responsibility, and Ethics)19. Agreement that governs interdisciplinary
research proposals)22.
19 Global Indigenous Data Alliance. Care Principles for Indigenous Data Governance. See https://www.gida-global.org/
care (accessed 21 December 2023)
20 Szymanski M, Verbert K, Vanden Abeele V. 2022. Designing and evaluating explainable AI for non-AI experts:
challenges and opportunities. In Proceedings of the 16th ACM Conference on Recommender Systems
(https://doi.org/10.1145/3523227.3547427)
21 Korot E et al. 2021 Code-free deep learning for multi-modality medical image classification. Nat Mach Intell. 3,
288–298. (https://doi.org/10.1038/s42256-021-00305-2)
22 UKRI. Get Support For Your Project: If your research spans different disciplines. See: https://www.ukri.org/apply-for-
funding/how-to-apply/preparing-to-make-a-funding-application/if-your-research-spans-different-disciplines/
(accessed 13 December 2023)

RECOMMENDATIONS
AREA FOR ACTION: BUILD TRUST IN THE INTEGRITY AND QUALITY OF AI-BASED
SCIENTIFIC OUTPUTS
RECOMMENDATION 3
Research funders and scientific communities should ensure that

AI-based research meets open science principles and practices to
facilitate AI’s benefits in science.
A growing body of irreproducible AI and To address these challenges, AI in science can

machine learning (ML)-based studies are benefit from following open science principles
raising concerns regarding the soundness and practices. For example, the UNESCO
of AI-based discoveries23,24. However, Recommendation on Open Science29 offers
scientists are facing challenges to improve relevant guidelines to improve scientific rigour,
the reproducibility of their AI-based work. while noting that there is not a one-size-fits-all
These include insufficient documentation approach to practising openness across sectors
released around methods, code, data, or and regions. This aligns well with the growing
computational environments25; limited access tendency towards adopting ‘gradual’ open
to computing to validate complex ML models26; models that pair the open release of models
and limited rewards for the implementation and data with the implementation of detailed
of open science practices27. This poses risks guidance and guardrails to credible risks30.
not only to science, but also to society, if the
deployment of unreliable or untrustworthy Open science principles can also contribute
AI-based outputs leads to harmful outcomes28. towards more equitable access to the
benefits of AI and to building the capacity
of a broader range of experts to contribute
to its applications for science. This includes
underrepresented and under-resourced
scholars, data owners, or non-scientist publics.
23 Haibe-Kains B et al. 2020 Transparency and reproducibility in artificial intelligence. Nature. 586, E14–E16.
(https://doi.org/10.1038/s41586-020-2766-y)
24 Kapoor S and Narayanan A. 2023 Leakage and the reproducibility crisis in machine-learning-based science. Patterns.
4(9) (https://doi.org/10.1016/j.patter.2023.100804)
25 Pineau, J, et al. 2021. Improving reproducibility in machine learning research (a report from the Neurips 2019
Reproducibility program).” Journal of Machine Learning Research 22.164.
26 Bommasani et al. 2021. On the opportunities and risks of foundation models. See: https://crfm.stanford.edu/assets/
report.pdf (accessed 21 March 2024)
27 UK Parliament, Reproducibility and Research Integrity – Report Summary See: https://publications.parliament.uk/pa/
cm5803/cmselect/cmsctech/101/summary.html (accessed 7 February 2024)
28 Sambasivan, N, et al 2021. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes
AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
30 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the
2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/
arXiv.2302.04844)

RECOMMENDATIONS
Further work is needed to understand the 3. Research funders, research institutions and
interactions between open science and AI for industry actors incentivising international
science, as well as how to minimise safety and collaboration by investing in open science
security risks stemming from the open release infrastructures, tools, and practices. For
of models and data. example, by investing in open repositories
that enable the sharing of datasets,
Actions to promote the adoption of open software versions, and workflows, or by
science in AI-based science may include: supporting the development of context-
1. Research funders and research institutions aware documentation that enables the local
incentivising the adoption of open science adaptation of AI models across research
principles and practices to improve environments. The latter may also contribute
reproducibility of AI-based research. For towards the inclusion of underrepresented
example, by allocating funds to open research communities and scientists
science and AI training, requesting the use of working in low-resource contexts.
reproducibility checklists31 and data sharing
4. Relevant policy makers considering
protocols as part of grant applications, or by
ways of deterring the development of
supporting the development of community
closed ecosystems for AI in science by,
and field-specific reproducibility standards
for example, mandating the responsible
(eg TRIPOD-AI32).
release of benchmarks, training data,
2. Research institutions and journals rewarding and methodologies used in research led
and recognising open science practices by industry.
in career progression opportunities. For
example, by promoting the dissemination
of failed results, accepting pre-registration
and registered reports as outputs, or
recognising the release of datasets and
documentation as relevant publications
for career progression.
31 McGill School of Computer Science. The Machine Learning Reproducibility Checklist v2.0.
See: https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf (accessed 21 December 2023).
32 Collins G et al. 2021 Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI)
for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ open, 11(7), e048008.
(https://doi.org/10.1136/bmjopen-2020-048008)

RECOMMENDATIONS
AREA FOR ACTION: ENSURE SAFE AND ETHICAL USE OF AI IN SCIENTIFIC RESEARCH
RECOMMENDATION 4
Scientific communities should build the capacity to oversee AI systems

used in science and ensure their ethical use for the public good
The application of AI across scientific domains and the lack of standardised methods for
requires careful consideration of potential risks conducting ethics impact assessments, limit
and misuse cases. These can include the impact the ability of scientists to provide effective
of data bias33, data poisoning34, the spread of oversight38. Other factors include the limited
scientific misinformation35,36, and the malicious transparency of commercial models, the opaque
repurposing of AI models37. In addition to this, nature of ML-systems, and how the misuse of
the resource-intensive nature of AI (eg in terms open science practices could heighten safety
of energy, data, and human labour) raises ethical and security risks39,40.
questions regarding the extent to which AI used
by scientists can inadvertently contribute to As AI is further integrated into science, AI
environmental and societal harms. assurance mechanisms41 are needed to maintain
public trust in AI and ensure responsible
Ethical concerns are compounded by the scientific advancement that benefits humanity.
uncertainty surrounding AI risks. As of late Collaboration between AI experts, domain
2023, public debates regarding AI safety had experts and researchers from humanities and
not conclusively defined the role of scientists science, technology, engineering, the arts, and
in monitoring and mitigating risks within their mathematics (STEAM) disciplines can improve
respective fields. Furthermore, varying levels of scientists’ ability to oversee AI systems and
technical AI expertise among domain experts, anticipate harms42.
33 Arora, A, Barrett, M, Lee, E, Oborn, E and Prince, K 2023 Risk and the future of AI: Algorithmic bias, data colonialism,
and marginalization. Information and Organization, 33. (https://doi.org/10.1016/j.infoandorg.2023.100478)
34 Verde, L., Marulli, F. and Marrone, S., 2021. Exploring the impact of data poisoning attacks on machine learning model
reliability. Procedia Computer Science, 192. 2624-2632. (https://doi.org/10.1016/j.procs.2021.09.032)
35 Truhn D, Reis-Filho J.S. & Kather J.N. 2023 Large language models should be used as scientific reasoning engines,
not knowledge databases. Nat Med 29, 2983–2984. (https://doi.org/10.1038/s41591-023-02594-z)
36 The Royal Society. 2024 Insights from the Royal Society & Humane Intelligence red-teaming exercise on AI-generated
scientific disinformation. See: https://royalsociety.org/news-resources/projects/online-information-environment/
(accessed 7 May 2024)
37 Kazim, E and Koshiyama, A.S 2021 A high-level overview of AI ethics. Patterns, 2. (https://doi.org/ 10.1016/j.patter.2021.100314)
38 Wang H et al. 2023 Scientific discovery in the age of artificial intelligence. Nature, 620. 47-60. (https://doi.org/10.1038/
s41586-023-06221-2)
39 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/arXiv.2302.04844)
40
Vincent J. 2023 OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’.
The Verge. See https://www.theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-
interview (accessed 21 December 2023).
41 Brennan, J. 2023. AI assurance? Assessing and mitigating risks across the AI lifecycle. Ada Lovelace Institute.
See https://www.adalovelaceinstitute.org/report/risks-ai-systems/ (accessed 30 September 2023)
42 The Royal Society. 2023 Science in the metaverse: policy implicatioins of immersive technology.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/

RECOMMENDATIONS
Similarly, engaging with communities 3. Funders, research institutions and training

represented in or absent from AI training centres providing AI ethics training and
datasets, can improve the current building the capacity of scientists to conduct
understanding of possible risks and harms foresight activities (eg horizon scanning),
behind AI-based research projects. pre-deployment testing (eg red teaming),
or ethical impact assessments of AI models
Actions to support the ethical application of to identify relevant risks and guardrails
AI in science can include: associated with their field.
1. Research funders and institutions investing
4. Research funders, research institutions, and
in work that operationalises and establishes
training centres supporting the development
domain-specific taxonomies43 of AI risks
of interdisciplinary and participatory
in science, particularly sensitive fields (eg
approaches to safety auditing, ensuring the
chemical and biological research).
involvement of AI and non-AI scientists, and
2. Research funders, research institutions, affected communities in the evaluation of AI
industry actors, and relevant scientific applications for scientific research.
communities embracing widely available
ethical frameworks for AI, as reflected in the
UNESCO Recommendation on the Ethics
of Artificial Intelligence44, or the OECD’s
Ethical Guidelines for Artificial Intelligence45,
and implementing practices that blend
open science with safeguards against
potential risks.
43 Weidinger L, et al. 2022 Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference
on Fairness, Accountability, and Transparency. 214-229. (https://doi.org/10.1145/3531146.3533088)
44 UNESCO. 2022. Recommendation on the ethics of artificial intelligence. See: https://www.unesco.org/en/artificial-
intelligence/recommendation-ethics (accessed 5 March 2024)
45 OECD. Ethical guidelines for artificial intelligence. See: https://oecd.ai/en/catalogue/tools/ethical-guidelines-for-
artificial-intelligence (accessed 5 March 2024)

INTRODUCTION
Introduction
Scope of the report • Chapter 1 provides a descriptive review of
Science in the age of AI explores how AI how recent developments in AI (in Machine
is transforming the nature and methods of Learning (ML), deep neural networks, and
scientific research. It focuses on the impact natural language processing in particular) are
of deep learning methods and generative changing methods, processes, and practices
AI applications and explores cross-cutting in scientific research.
considerations around research integrity,
• Chapter 2 details key challenges for
skills, and ethics. While AI is transforming a
research integrity in AI-based research.
wide range of fields – including the social
It tackles issues around transparency of
sciences and humanities – this report
AI models and datasets, explainability
provides examples focused on physical
and interpretability, and barriers to verifying
and biological sciences.
the reproducibility of results.
The report addresses following questions: • Chapter 3 addresses interdisciplinary

• How are AI-driven technologies collaboration and emerging research skills
transforming the methods and nature in AI-driven scientific research. It examines
of scientific research? challenges such as siloed academic
cultures and data infrastructures, explores
• What are the opportunities, limitations,
opportunities for collaboration across fields,
and risks of these technologies for
and characterises the need for data, AI, and
scientific research?
ethics upskilling and training programmes.
• How can relevant stakeholders
• Chapter 4 addresses the growing role of
(governments, universities, industry, research
the private sector in AI-based research. It
funders, etc) best support the development,
considers the opportunities and challenges
adoption, and uses of AI-driven technologies
related to the private sector’s impact on the
in scientific research?
public sector, as well as examples of cross-
sector collaboration.
Each chapter provides evidence gathered
from interviews, roundtables, workshops, and • Chapter 5 addresses research ethics and
commissioned research undertaken for this safety in AI-based research. It highlights the
report to answer these questions. The findings need for oversight to prevent downstream
are presented as follows: harms related to ethical challenges, safety,
and security risks. Examples include data
bias, hallucinations, or the repurposing of
datasets and models with malicious intent.

INTRODUCTION
The report also includes case studies on Glossary of key terms

the application of AI for climate science, This report draws on concepts from data
material science, and rare disease science and AI fields46. Included here is an
diagnosis. These cases also explore overview of key terms used throughout.
challenges related to researchers’ access
to data (Case study 1), the implications of Artificial intelligence (AI): The development
lab automation (Case study 2), and the and study of machines capable of performing
emerging research ethics considerations in tasks that conventionally required human
applications of AI in science (Case study 3). cognitive abilities. It encompasses various
aspects of intelligence, such as reasoning,
Target audiences decision-making, learning, communication,
The following audiences should find this problem-solving, and physical movement.
report useful: AI finds widespread application in everyday
• Scientists and research funders navigating technology such as virtual assistants,
the changing role of AI in science. search engines, navigation systems, and
This report offers an overview of the online banking.
opportunities and challenges associated
with the integration of increasingly complex Artificial neural networks (ANNs): Artificial
techniques in scientific research. intelligence systems inspired by the
structure of biological brains, consisting of
• AI experts and developers. This report
interconnected computational units (neurons)
presents a case for further interdisciplinary
capable of passing data between layers.
collaboration with scientific domain
Today, they excel in tasks such as face
experts and for strengthening cross-sector
and voice recognition, with multiple layers
collaboration.
contributing to problem-solving capabilities.
• Policy makers and regulators involved in See also ‘deep learning’.
shaping AI and data strategies. Evidence
gathered in this report can contribute to Deep learning (DL): A form of machine
strategies that promote responsible AI learning utilising computational structures
development, address ethical concerns, known as ‘artificial neural networks’ to
and support scientific progress. automatically recognise patterns in data
and produce relevant outputs. Inspired
• General public seeking to understand future
by biological brains, deep learning model
directions and applications of AI. This report
are proficient at complex tasks such as image
contributes towards informing the public
and speech recognition, powering applications
regarding opportunities and challenges
like voice assistants and autonomous vehicles.
associated with AI adoption, as well as the
See ‘Artificial neural networks’.
broader societal implications of advancing
this technology.
Foundation model: A machine learning
model trained on extensive data, adaptable
Readers need not have a technical
for diverse applications. Common examples
background on AI to read this report.
include large language models, serving as the
basis for various AI applications like chatbots.
46 The Alan Turing Institute. Defining data science and AI. See: https://www.turing.ac.uk/news/data-science-and-ai-
glossary (accessed 1 March 2024)

INTRODUCTION
Generative AI: AI systems generating new Machine learning (ML): A field of artificial
text, images, audio, or video in response to intelligence involving algorithms that learn
user input using machine learning techniques. patterns from data and apply these findings
These systems, often employing Generative to make predictions or offer useful outputs.
adversarial networks (GANs), create It enables tasks like language translation,
outputs closely resembling human-created medical diagnosis, and robotics navigation
media, resulting in outputs that are often by analysing sample data to improve
indistinguishable from human-created media. performance over time.
See ‘Generative adversarial networks’.
Privacy-enhancing technologies (PETs):
General adversarial networks (GANs): An umbrella term covering a broad range of
A machine learning technique that produces technologies and approaches that can help
realistic synthetic data, like deepfake images, mitigate data security and privacy risks47.
indistinguishable from its training data. It
consists of a generator and a discriminator. Synthetic data: Data that is modelled
The generator creates fake data, while the to represent the statistical properties of
discriminator evaluates it against real data, original data; new data values are created
helping the generator improve until the which, taken as a whole, preserve relevant
discriminator can’t differentiate between statistical properties of the ‘real’ dataset48.
real and fake. This allows for training models without
accessing real-world data.
Human-in-the-loop (HITL): A hybrid system
comprising of human and artificial intelligence
that allows for human intervention, such
as training or fine-tuning the algorithm, to
enhance the systems output. Combining
the strengths of both human judgment and
machine capabilities can make up for the
limitations of both.
Large language models (LLM): Foundation

models trained on extensive textual data to
perform language-related tasks, including
chatbots and text generation. They are part
of a broader field of research called natural
language processing, and are typically much
simpler in design than smaller, more traditional
language models.
47 The Royal Society. 2019 Protecting privacy in practice. See https://royalsociety.org/topics-policy/projects/privacy-

enhancing-technologies/ (accessed 1 March 2024).
48
Ibid.

INTRODUCTION

CHAPTER ONE

CHAPTER ONE
Chapter one
How artificial intelligence
is transforming scientific
research
Left
MRI image. © iStock / MachineHeadz.

CHAPTER ONE
How artificial intelligence is

transforming scientific research
The large-scale data analysis and pattern In the physical sciences (eg, physics,
recognition capabilities of artificial intelligence chemistry, astronomy), AI is being applied
“ We are really
(AI) present significant opportunities for as a method for extracting information from
reliant on [machine advancing scientific research and discovery. rapidly accumulating data streams, (eg data
learning] just New developments in machine learning, in generated at the Large Hadron Collider47);
to make [our particular, are enabling researchers to map to identify patterns in very large datasets;
experiments] work. deforestation down to an individual tree43; and for modelling physical experiments. In
pharmaceutical companies to develop new the health sciences (eg, medicine, dentistry,
It has become
therapies44; and technology companies to veterinary sciences), it is being employed
embedded discover new materials45. These developments as a technique to aid disease detection
into what we’re present novel opportunities and challenges to and prediction; to support clinical decision-
doing. If you took the nature and method of scientific investigation. making; and to enhance surgery, training, and
machine learning robotics. In the life sciences, it is being used
Drawing on insights from roundtables, to analyse data from sensors; to support crop
out of our pipeline,
interviews, and commissioned research, and water management; and to predict the
it would fall apart.” this chapter outlines how AI is changing 3D structures of proteins.
Royal Society the scientific endeavour.
roundtable participant AI and methods of scientific research
AI in science: an overview Recent developments in AI suggest there may
A commissioned analysis of the use of AI be transformational changes to the methods of
in science (based on published academic scientific research. These changes centre on
literature and focused on breadth, rather making existing tasks more efficient, altering
than depth of techniques) shows that processes to generate knowledge, or enabling
applications of AI are found across all STEM new mechanisms of discovery.
fields46. There is a concentration in certain fields
such as medicine; materials science; robotics; The following examples emerged in the
genetics; and, unsurprisingly, computer science. research activities undertaken for this report.
The physical sciences and medicine appear
to dominate when it comes to the use of AI-
related technologies. The most prominent AI
techniques across STEM fields include artificial
neural networks (ANNs); machine learning (ML)
(including deep learning (DL)); natural language
processing; and image recognition.
43 Eyres A et al. 2024 LIFE: A metric for quantitatively mapping the impact of land-cover change on global extinctions.
Cambridge Open Engage. (https://doi.org/10.33774/coe-2023-gpn4p-v4).
44 Paul, D, Sanap, G, Shenoy, S, Kalyane, D, Kalia, K, Tekade, R. K. 2021 Artificial intelligence in drug discovery and
development. Drug discovery today, 26. 80–93. (https://doi.org/10.1016/j.drudis.2020.10.010)
45 Merchant, A, Batzner, S, Schoenholz, SS, Aykol, M, Cheon, G, Cubuk, ED. 2023 Scaling deep learning for materials
discovery. Nature, 624. 80–85. (https://doi.org/10.1038/s41586-023-06735-9)
46 Berman B, Chubb J, and Williams K, 2024. The use of artificial intelligence in science, technology, engineering,
and medicine. The Royal Society. https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
47 CERN. The Large Hadron Collider. See https://home.cern/science/accelerators/large-hadron-collider
(accessed 22 April 2024)

CHAPTER ONE
1. Growing use of deep learning across fields 2. Obtaining insights from unstructured data
The application of deep learning (DL) is A major challenge for researchers is utilising
“ We have the
transforming data analysis and knowledge unstructured data (data that does not follow
generation. Its use to automatically extract a specific format or structure, making it capacity to
and learn features from raw data, process more challenging to process, manage and record much
extensive datasets and recognise patterns use to find patterns). The ability to handle more [data] than
efficiently outperforms linear ML-based unstructured data makes DL effective for before. We live
models48. DL has found applications in tasks that involve image recognition and
in a data deluge.
diverse fields including healthcare, aiding natural language processing (NLP).
in disease detection and drug discovery, So, the hope is
or climate science, assisting in modelling In healthcare, for example, data can be that machine
climate patterns and weather detection. A detailed; multi-modal and fragmented51. learning methods
landmark example is the application of DL It can include images, text assessments, will help us make
by Google DeepMind to develop AlphaFold, or numerical values from assessments
sense of that,
a protein-folding prediction system that and readings. Data collectors across the
solved a 50-year-old challenge in biology healthcare system may record this data in and then drive
decades earlier than anticipated49. different formats or with different software. genuine, scientific
Bringing this data together, and making hypotheses.”
Developing accurate and useful DL-based sense of it, can help researchers make
Royal Society
models is challenging due to its black-box predictions and model potential health
roundtable participant
nature and variations in real-world problems interventions. Similarly, generative AI
and data. This limits their explanatory models can contribute towards generating
power and reliability as scientific tools50. and converting data into different modes
(See Chapter 2). and standards, that are not limited to the
type of data fed into the algorithm52.
48 Choudhary, A, Fox, G, Hey, T. 2023. Artificial intelligence for science: A deep learning revolution. World Scientific
Publishing Co. Pte Ltd. (https://doi.org/10.1142/13123)
49 Google DeepMind. AlphaFold. See: https://deepmind.google/technologies/alphafold/ (accessed 5 March 2024)
50 Sarker, I. H. 2021. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research
directions. SN Computer Science, 2. 420. (https://doi.org/10.1007/s42979-021-00815-1)
51
Healy, M. J. R. 1973. What Computers Can and Cannot Do. Proceedings of the Royal Society of London. Series B,
Biological Sciences, 184(1077), 375–378. (https://doi.org/10.1098/rspb.1973.0056)
52 World Health Organization. 2024. Ethics and governance of artificial intelligence for health: guidance on large
multi-modal models. See: https://www.who.int/publications/i/item/9789240084759 (accessed 5 March 2024)

CHAPTER ONE
Other techniques such as causal machine

learning (methods to estimate cause and synthetic data58 (artificially generated
and effect in data)53 can help process data) or digital twins59 (virtual representations
unstructured data by learning complex of physical assets) are examples of how
nonlinear relations between variables54,55. AI-based tools can be used to train
Platforms claiming to be able to gain systems60. For molecular research, this
insights from unstructured data include involves using deep neural networks that
Benevolent AI56 (using unstructured data use data about how molecules interact to
from biomedical literature) and Microsoft’s accurately simulate the behaviour at the
Project Alexandria57 (focused mainly on atomic level 61. The use of synthetic data,
enterprise knowledge). especially privacy-preserving synthetic data,
can also help mitigate the challenge of data
3. Large-scale, multi-faceted simulations bias and protect individuals’ privacy62.
The generative capability of AI tools to
learn from existing content and generate 4. Expediting information synthesis
predictions of new content, provides Large language models (LLMs) and NLP
scientists with the opportunity to run accurate techniques are increasingly being used
predictions. The generation of simulations to accelerate text-based tasks such as
academic writing63, conducting literature
reviews, or producing summaries64.
53
Kaddour J, Lynch A, Liu Q, Kusner M J, Silva R. 2022. Causal machine learning: A survey and open problems.
arXiv preprint (https://doi.org/10.48550/arXiv.2206.15475)
54 Sanchez P, Voisey J, Xia T, Watson H, O’Neil A, and Tsaftaris, S. 2022 Causal machine learning for healthcare
and precision medicine. R. Soc open sci. 9: 220638 (https://doi.org/10.1098/rsos.220638)
55 Royal Society roundtable on large language models, July 2023.
56 Benevolent AI. 2019 Extracting existing facts without requiring any training data or hand-crafted rules.
See https://www.benevolent.com/news-and-media/blog-and-videos/extracting-existing-facts-without-requiring-any-
training-data-or-hand-crafted-rules/ (accessed 21 December 2023).
57 Rajput S, Winn J, Moneypenny N, Zaykov Y, and Tan C. 2021 Alexandria in Microsoft Viva Topics: from big data to big
knowledge. 26 April 2021. See https://www.microsoft.com/en-us/research/blog/alexandria-in-microsoft-viva-topics-
from-big-data-to-big-knowledge/ (accessed 21 December 2023).
58 Jordon et al. 2023 Synthetic Data – what, why and how? See https://royalsociety.org/news-resources/projects/
privacy-enhancing-technologies/ (accessed 21 December 2023)
60 Jordon et al. 2023 Synthetic Data – what, why and how? See https://royalsociety.org/news-resources/projects/
privacy-enhancing-technologies/ (accessed 21 December 2023).
61 Zhang L, Han J, Wang H, Car R, Weinan E. 2018 Deep Potential Molecular Dynamics: A Scalable Model
with the Accuracy of Quantum Mechanics. Phys Rev Lett. 2018 Apr 6;120(14):143001. (https://doi.org/10.1103/
PhysRevLett.120.143001. PMID: 29694129)
62 The Royal Society. 2023 From privacy to partnership. See https://royalsociety.org/topics-policy/projects/privacy-
63 Lin Z. 2023 Why and how to embrace AI such as ChatGPT in your academic life. R. Soc. Open Sci.10:
230658 230658 (https://doi.org/10.1098/rsos.230658)
64 Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus. 2023 Feb
19;15(2):e35179. (https://doi: 10.7759/cureus.35179)

CHAPTER ONE
Examples of automated literature review As a writing tool they also have a limited
tools include Semantic Scholar65, Elicit66, and ability to grasp nuanced value judgments,
Consensus67. It is also available on prominent assist in scientific meaning-making73, or
platforms such as GPT4 and Gemini. articulate the complexities of scientific
Beneficial use cases include using LLMs research74. There are also concerns that
to improve the quality of academic writing, the use of LLMs for academic writing risks
assist with translation, or emulate specific diminishing creative and interdisciplinary
writing styles (eg producing lay summaries). aspects of scientific discovery75. Additionally,
Beyond academic texts, they can also be there are questions around the impact of
used to streamline administrative tasks and LLMs on intellectual property (IP).
assist in drafting grant applications. These
tools could also improve accessibility for 5. Addressing complex coding challenges
researchers from diverse backgrounds (eg Developing computational analysis software
non-English speakers and neurodivergent code has become an important aspect of the
individuals) who consume and produce modern scientific endeavour. For example,
academic content in multiple languages LLMs – which are designed to analyse text
and formats68. inputs and generate responses that they
determine are likely to be accurate – can
These tools also have limitations including be used for generating software code in
the potential to exacerbate biases from various coding languages. This presents
the training data (eg bias towards positive an opportunity for scientific researchers to
results69, language biases70 or geographic convert code from one computer language to
bias71), inaccuracies and unreliable another, or from one application to another76.
scientific inputs72.
65 Semantic Scholar: AI-powered research tool. See https://www.semanticscholar.org/ (accessed 21 December 2023).
66 Elicit: The AI research assistant. See https://elicit.com/ (accessed 21 December 2023).
67 Consensus: AI search engine for research. See https://consensus.app/ (accessed 21 December 2023).
68
Royal Society and Department for Science, Innovation, and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, 2023.
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/. (accessed 7 May 2024).
70 Barrot JS, 2023. Using ChatGPT for second language writing: Pitfalls and potentials. Assessing Writing, 57.100745.
71 Skopec M, Issa H, Reed J, Harris M. 2020. The role of geographic bias in knowledge diffusion: a systematic review
and narrative synthesis. Research integrity and peer review, 5. 1-14. (https://doi.org/10.1186/s41073-019-0088-0.)
72 Sanderson K. 2023. GPT-4 is here: what scientists think. Nature, 615.773. 30 March 2023. See https://www.nature.
com/articles/d41586-023-00816-5.pdf (accessed 21 December 2023)
73 Birhane A, Kasirzadeh A, Leslie D, Wachter S. 2023. Science in the age of large language models. Nature Reviews
Physics, 1-4 (https://doi.org/10.1038/s42254-023-00581-4)
74 Bender E, Koller A. 2020 Climbing towards NLU: on meaning, form, and understanding in the age of data.
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 5185–5198
75
Royal Society and Department for Science, Innovation, and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, 2023.
76 Royal Society roundtable on large language models, July 2023.

CHAPTER ONE
Even if the output is not accurate on a first became the first machine to independently
attempt, these models can be used as discover new scientific knowledge85. The
coding assistants to help identify coding robot was programmed to independently
mistakes, make suggestions, and save time. design experiments, record and evaluate
Prominent examples include Microsoft’s results, and develop new questions –
Copilot77; OpenAI’s GPT478; Meta’s Code automating the entire research workflow86.
Llama79; and Google DeepMind’s Gemini.80 Building on this breakthrough, ‘robot
scientists’ continue to be developed to speed
6. Task automation up the discovery process, while reducing
AI tools can automate a range of time costs, uncertainty, and human error in labs87.
and labour-intensive tasks within the
scientific workflow.81 Automation can lead to As research becomes more automated,
productivity gains for scientists82 and unlock there are concerns that future generations
the potential to test diverse hypotheses of scientists may become de-skilled in
beyond human capability. For example, in core skills such as hypothesis generation,
2023, Google DeepMind claimed two such experimental design, and contextual
examples: FunSearch83, and GNoME84. interpretation88. Methodological
transparency and understanding of cause-
The use of robotic research assistants is also effect relationships could also decline,
contributing to the automation of laboratory and an overemphasis on computational
workflows (See Case Study 2). In 2009, a techniques risks disengaging scientists
robot developed by Aberystwyth University who seek creative outlets in their work89.
77 GitHub. Copilot – Your AI pair programmer. See https://github.com/features/copilot (accessed 21 December 2023).
78 Open AI. GPT4. See https://openai.com/gpt-4 (accessed 21 December 2023).
79 Meta. 2023 Introducing Code Llama, a state-of-the-art large language model for coding. Meta. 24 August 2023.
See https://ai.meta.com/blog/code-llama-large-language-model-coding/ (accessed 21 December 2023).
80 Google DeepMind. Gemini. See https://deepmind.google/technologies/gemini/#introduction (accessed 21 December 2023).
81 Xie, Y, Sattari, K, Zhang, C, & Lin, J. 2023 Toward autonomous laboratories: Convergence of artificial intelligence and
experimental automation. Progress in Materials Science, 132. 101043. (https://doi.org/10.1016/j.pmatsci.2022.101043)
82 OECD. 2023. Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research, OECD
Publishing, Paris (https://doi.org/10.1787/a8d820bd-en).
83 Fawzi A and Paredes B. 2023. FunSearch: Making new discoveries in mathematical sciences using Large Language
Models. Google DeepMind. See https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-
mathematical-sciences-using-large-language-models/ (accessed 21 December 2023).
84 Merchant A and Cubuk E. 2023 Millions of new materials discovered with deep learning. Google DeepMind. See https://
deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/ (accessed 21 December 2023).
85 University of Cambridge. Robot scientist becomes first machine to discover new scientific knowledge.
See: https://www.cam.ac.uk/research/news/robot-scientist-becomes-first-machine-to-discover-new-scientific-knowledge
(accessed 3 March 2024)
86 Sparkes A et al. 2010. Towards Robot Scientists for autonomous scientific discovery. Autom Exp 2, 1
(https://doi.org/10.1186/1759-4499-2-1)
87 University of Cambridge. Artificially-intelligent Robot Scientist ‘Eve’ could boost search for new drugs. See: https://www.cam.
ac.uk/research/news/artificially-intelligent-robot-scientist-eve-could-boost-search-for-new-drugs (accessed 7 March 2024)
88 Lin Z. 2023 Why and how to embrace AI such as ChatGPT in your academic life. R Soc Open Sci. 2023 Aug
23;10(8):230658 (https://doi.org/ 10.1098/rsos.230658.)
89
Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety risks

CHAPTER ONE
AI and the nature of scientific research Cloud-based solutions, which do not require
Beyond the impact of AI on the methods users to own physical infrastructure (eg to
of scientific research, there is a potentially store data) include Amazon Web Services94
transformative impact on the nature of the and Oracle Cloud Infrastructure.95
scientific endeavour itself. These impacts
primarily relate to the prevalence of big data- 2. Domination of big data centric research
led research, reliance on computing power The ability to collect big data (large and
and new ways of organising skills and labour in heterogeneous forms of data that have been
the scientific process. collected without strict experimental design96)
and combine these with other datasets has
Drawing on the activities undertaken for this presented clear and significant opportunities
report, the following six themes emerged as key for the scientific endeavour. The value being
impacts of AI on the nature of scientific research. gained from applying AI to these datasets
has already provided countless examples
1. Computers and labour as foundational of positive applications from mitigating the
AI infrastructures impact of COVID-19 to combating climate
An assemblage of digital infrastructure change (See Case Study 3)97. This is likely to
and human labour underpins major AI continue to reshape the research endeavour
applications90. The digital infrastructure to be more AI and big data-centric98. The
refers to devices which collect data, ability to engage in data-centric research,
personal computers which they are however, remains dependent on access
analysed on, and supercomputers which to computing infrastructure that enables
power large-scale data analysis. The human processing of large heterogenous datasets.
labour refers to the act of data collection,
cleansing, and labelling, as well as the act The domination of big data centric research
of design, testing, and implementation. also has implications for research in which
The types of digital infrastructure required only incomplete or small data is available.
includes supercomputers (eg those Without careful governance, it risks reducing
included in HPC-UK91 and the EuroHPC research investment and support in priority
JU92); privacy enhancing technologies;93 areas (eg subjects or regions) where primary
and data storage facilities (eg data centres). data collection at that scale is limited, difficult
90 Penn J. 2024. Historical review on the role of disruptive technologies in transforming science and society.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/ (accessed 7 May 2024)
91 HPC-UK. UK HPC Facilities. See https://www.hpc-uk.ac.uk/facilities/ (accessed 21 December 2023).
92 The European High Performance Computing Joint Undertaking. See https://eurohpc-ju.europa.eu/index_en
(accessed 21 December 2023).
93 The Royal Society. Privacy Enhancing Technologies. See https://royalsociety.org/topics-policy/projects/privacy-
94 Amazon Web Services. Cloud Computing Services. See https://aws.amazon.com/ (accessed 21 December 2023).
95 Oracle. Cloud Infrastructure. See https://www.oracle.com/cloud/ (accessed 21 December 2023).
96 The Royal Society. 2017 Machine Learning: The power and promise of computers that learn by example.
See https://royalsociety.org/topics-policy/projects/machine-learning/ (accessed 21 December 2023).

CHAPTER ONE
or not desirable. It is also likely to increase However, the increasing use of proprietary
attention on techniques such as data AI presents challenges for open science.
augmentation and the use of synthetic data. Researchers are increasingly relying on
The case of rare disease research (See Case tools developed and maintained by private
Study 1) illustrates applications of AI in small companies (see Chapter 4), even though the
data research. inner workings may remain opaque103. This
is exacerbated by the opacity of training
3. Open vs closed science data which underpins prominent AI tools.
Open science, which seeks to open the Poor transparency risks limiting the utility
entire research and publication process of AI tools for solving real world problems
(including but not limited to open data; open as policymakers and scientists may not
protocols; open code; and transparent peer consider AI-generated results as reliable for
review), is a principle and practice advocated important decisions104. It also undermines
for by the Royal Society, and others99. It efforts to detect and scrutinise negative
is also promoted by major technology impacts or discriminatory effects105.
companies including Meta and OpenAI,
although this has been challenged as A fully open approach that prompts the
‘aspirational’ or, even, ‘marketing’ rather than release of datasets and models without
a technical descriptor100. As well as providing guardrails or guidance may not be desirable
transparency, open science approaches can either, as datasets or models can be
enable replication of experiments, wider manipulated by bad actors106. Context-
public scrutiny of research products101 and specific and AI-compatible open science
further the right of everyone to share in approaches are needed to boost oversight
scientific advancement102. and transparency107,108.
99 The Royal Society. Open Science. See https://royalsociety.org/journals/open-access/open-science/

100 Widder D, West S, Whittaker M. 2023 Open (for business): Big tech, concentrated power, and the political economy
of Open AI. SSRN. (https://doi.org/10.2139/ssrn.4543807)
101 The Royal Society. 2022 The online information environment. See https://royalsociety.org/topics-policy/projects/
online-information-environment/ (accessed 21 December 2023).
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/ (accessed 7 May 2024)
105 UNESCO Recommendation on ethics of artificial intelligence. 2022. See: https://www.unesco.org/en/articles/
recommendation-ethics-artificial-intelligence (accessed 6 February 2024)
106 Vincent J. 2023 OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’.
The Verge. 15 March 2023. See https://www.theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-
research-ilya-sutskever-interview (accessed 21 December 2023)
107 House of Lords. 2024 Large language models and generative AI. Report of Session 2023-24.
See: https://publications.parliament.uk/pa/ld5804/ldselect/ldcomm/54/5402.htm (accessed 2 February 2024)
arXiv.2302.04844)

CHAPTER ONE
4. Challenging notions of transparency the information required to reproduce a

and explainability study.114 This would not meet the threshold
The scientific method can be generally of how science is traditionally accepted
described as the act of creating a hypothesis, to be undertaken as peers would
conducting an experiment, recording its struggle to confirm or falsify a hypothesis.
outcome, and refining the original hypothesis (See Chapter 2 for further details on
according to the results. This concept dates challenges AI poses for explainability
back more than a thousand years ago to and reproducibility).
Hasan Ibn al-Haytham, who emphasised
the need for experimental data and 5. Science as an interdisciplinary endeavour
reproducibility of results.109 Underpinning Successful application of AI in scientific
this, and other approaches to scientific research, and its translation to real-world
methodology is “the search for explanations value, requires interdisciplinary skills and
as a fundamental aim of science”110. understanding. Computer scientists who
wish to apply AI to solve major scientific
This is being challenged by recent problems need to be able to evaluate AI
developments in AI due to the non-linear models in other research fields (eg health,
relationships which can be derived from climate science). Similarly, non-computer
big data and the general black-box nature scientists need to understand how to
of AI tools111,112. These discoveries could be effectively use AI tools and techniques for
transformational for society. If, however, their experiments. Integrating knowledge
researchers develop an overreliance113 on from various fields and knowledge systems
AI for the interpretation and analysis of can also lead to more accurate models and
results, they may be unable to explain how foster curiosity-driven research (beyond
conclusions were reached or provide commercially-driven interests)115.
109 Al-Khalili J. 2009 The ‘first true scientist’. BBC News. 4 January 2009. See http://news.bbc.co.uk/1/hi/sci/
tech/7810846.stm (accessed 21 December 2023).
110
Maxwell N. 1972 A Critique of Popper’s Views on Scientific Method. Philosophy of Science, 39(2), 131-152.
(doi:10.1086/288429)
111 Succi S, Coveney P. 2019 Big data: The end of the scientific method? Phil. Trans. R. Soc. A. 377: 20180145.
(https://doi.org/10.1098/rsta.2018.0145)
112 The Royal Society. 2019 Explainable AI: the basics. See https://royalsociety.org/topics-policy/projects/explainable-ai/
113 Buçinca Z, Malaya M, Gajos K. 2021 To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on
AI in AI-assisted Decision-making. Proc. ACM Hum.-Comput. Interact. 5:188. (https://doi.org/10.1145/3449287)
115 The Royal Society roundtable on the role of interdisciplinarity in AI for scientific research, June 2023.

CHAPTER ONE
The UK’s eScience initiative (2001 – 2008)116,117 6. Blending human expertise with
stands out as an effort to cultivate AI automation
interdisciplinary collaboration by fostering The turn to automation offers opportunities
a culture where scientists and computer to combine human expertise with
science experts work together. Ongoing efficiencies enabled by AI. AI can be
initiatives like Alan Turing Institute118 and used to either complement the human
Arizona State University’s School of scientist by assisting or augmenting human
Sustainability119 also continue to champion capability; or to develop autonomous
interdisciplinary approaches. mechanisms for discovery (See Figure 1)122.
Across this spectrum, the human scientist
However, interdisciplinarity is stifled by remains essential for contextual scientific
siloed institutions and insufficient career understanding. The growing use of AI tools
progression opportunities. Interdisciplinarity also risks making scientists vulnerable to
need not be limited to natural sciences, with ’illusions of understanding’ in which only
value to be gained from scientists working a limited set of viewpoints and methods
with researchers in the arts, humanities, are represented in outputs123. There is a
and social sciences. An example of this need to further understand “human-in-
includes the importance of artists in the the-loop” approaches that recognise AI as
user experience design of immersive complementary to human judgment and the
environments120,121 (See chapter 3 for role of human intervention to ensure the
further details on interdisciplinarity in quality of outputs.
AI-based research).
116 Hey T, Trefethen A. 2002 The UK e-Science Core Programme and the Grid Hey, T., & Trefethen, A. E.
International Conference on Computational Science (pp. 3-21). Berlin, Heidelberg: Springer Berlin Heidelberg
(https://doi.org/10.1016/S0167-739X(02)00082-1)
117 Hey T. 2005. e-Science and open access. See https://www.researchgate.net/publication/28803295_E-Science_
and_Open_Access (accessed 7 May 2024)
118 The Alan Turing Institute. Research. See https://www.turing.ac.uk/research (accessed 21 December 2023).
119 Arizona State University - School of Sustainability. See https://schoolofsustainability.asu.edu/
120 The Royal Society. 2023 Science in the metaverse: policy implications of immersive technologies.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/
121
Ibid.
122 Krenn M. et al. 2022. On scientific understanding with artificial intelligence. Nature Reviews Physics 4.
(https://doi.org/10.1038/s42254-022-00518-3)
123 Messeri L, Crockett MJ. 2024 Artificial intelligence and illusions of understanding in scientific research. Nature.
Mar;627(8002):49-58. (https://doi.org/10.1038/s41586-024-07146-0.)

CHAPTER ONE
FIGURE 1
Reproduction of a visualisation of the three general roles of AI for scientific research as either
a computational microscope, resource of human inspiration, or an agent of understanding124.
Advanced simulations and

data representation
Computational
microscope
Resource of Agent of
inspiration understanding
Identifying surprises Acquiring new scientific
in data and models understanding
Identifying areas of Transferring scientific insights

interest from literature to a human expert
124 The diagram describes three possible ways in which AI can contribute to scientific understanding. The ‘computational microscope’ refers to
the role of AI in providing information through advanced simulation and data representation that cannot be obtained through experimentation.
‘Resource of inspiration’ refers to scenarios in which AI provides information that expands the scope of human imagination or creativity. The ‘agent
of understanding’ illustrates a scenario in which autonomous AI systems can share insights with human experts by translating observations into new
knowledge. As of yet, there is no evidence to suggest that computers can act as true agents of scientific understanding. See: Krenn M. et al. 2022.
On Scientific Understanding with Artificial Intelligence.

CHAPTER ONE
AI and access to high-quality data 1. Volume and heterogeneity

Data is the foundation of AI systems. The Data collected for scientific experiments
“ Everybody wants
expression and principle of ‘garbage in, can be extremely large, measuring into
the sparkly garbage out’, which dates to the early days terabytes, petabytes, and exabytes in size128.
fountain, but of computing125 and the writings of Charles This volume challenge applies to diverse
very few people Babbage FRS126, remain applicable today. fields including genomics, high-energy
are thinking Poor-quality data which is incomplete, incorrect, physics, climate science, and astronomy.
or unrepresentative, can result in misleading
of the boring
outcomes. Ensuring high-quality datasets to An example presented at the US-UK Scientific
plumbing system train AI systems involves addressing challenges Forum129 is the Event Horizon Telescope
underneath it.” such as trust, access, bias, availability, and (Georgia Institute of Technology) which took
interoperability across the data lifecycle. the first two photographs of black holes.130
Pete Buttigieg
The project, which involved ten telescopes
Participant in the
US-UK Scientific
Drawing upon the 2023 US-UK Scientific across the world, recorded one petabyte of
Forum on Researcher Forum on Researcher Access to Data, data per night. The vast size of this type of
Access to Data organised by the Royal Society and the data and its often heterogenous nature makes
US National Academy of Sciences127; and objectives such as interoperability difficult to
interviews, the following themes emerged achieve. This challenge calls for integrated
as key challenges associated with the use and central repositories that provide long-
of data for AI-based scientific research: term stewardship and access for researchers,
near real time dissemination and analysis, and
solutions for the annotation of heterogeneous
training data. For example, the National
Oceanic and Atmospheric Administration are
experimenting with a videogame solution
called FathomVerse, asking players to identify
broad categories of species they can see in
video footage from exploration vessels131 The
establishment of large data infrastructures
requires further consideration of its
maintenance and environmental impact 132.
125 Mellin W. 1957 Work with new electronic ‘brains’ opens field for army math experts. The Hammond Times.
126 Babbage C. 1964. Passages from the life of a philosopher. Cambridge, UK: Cambridge University Press.
127 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
128 Today, a typical PC or laptop comes with one terabyte storage. Petabytes are akin to the storage capacity of a
thousand of these PCs and exabytes are akin to the storage capacity of a million.
130 Event Horizon Telescope. See https://eventhorizontelescope.org/ (accessed 21 December 2023).
132 SKAO. Portuguese prove SKA green energy system. See: https://www.skao.int/en/impact/440/portuguese-prove-
ska-green-energy-system (accessed 21 March 2024)

CHAPTER ONE
2. The role of data institutions Their fundamental role in maintaining

Data institutions (eg archives, statistical the quality of data available for scientific
agencies, data repositories) can play an research makes them akin to foundational
important role in facilitating researcher infrastructure for AI tools. As such, they
access to data. These institutions have cost money and require maintenance.133
a range of functions, including: Examples of data institutions include the
• Protecting sensitive data and granting US Government’s open data website,
access under restricted conditions. Data.gov134, the University of Michigan’s
Inter-university Consortium for Political
• Combining or linking data from multiple
and Social Research (holding data on
sources and providing insights and
more than 19,000 social and behavioural
other services back to those who have
science studies)135, and the UK’s Office for
contributed data.
National Statistics136.
• Creating open datasets that anyone can
access, use, and share to further a particular 3. Sensitive data sharing
mission or cause. Limits to sharing sensitive data can block
potential scientific breakthroughs. In the
• Acting as a gatekeeper for data held by
field of health care, for example, sharing
other organisations.
and processing health data is complex
• Developing and maintaining identifiers, due to its confidential and fragmented
standards, and other infrastructure for nature (both within institutions and across
a sector or field, such as by registering borders). While advancements in medical
identifiers or publishing open standards. imaging, text, audio, and AI offer new
possibilities for diagnosis and treatment,
• Enabling people to take a more active role
they also risk exposing sensitive patient
in stewarding data about themselves and
information through imaging and metadata.
their communities.
Similarly, bad actors can also weaponise
environmental data (eg rainfall, deforestation,
or poaching data) to cause national security
and environmental threats. Researchers are
calling for a definition of ‘sensitive data’, that
considers how data subject to exploitation,
misuse, and misinterpretation137 can cause
societal and environmental harm138.
134 Data.gov. See https://data.gov/ (accessed 21 December 2023).
135 ICPSR. See https://www.icpsr.umich.edu/web/pages/index.html (accessed 21 December 2023).
136 Office for National Statistics. See https://www.ons.gov.uk/ (accessed 21 December 2023).
137 Leonelli S, Williamson H. 2022 Introduction: Towards Responsible Plant Data Linkage. In: Towards Responsible Plant
Data Linkage: Data Challenges for Agricultural Research and Development. Springer International Publishing
(https://doi.org/10.1007/978-3-031-13276-6_1).

CHAPTER ONE
The use of trusted research environments Public trust and acceptability around the
and privacy enhancing technologies use of sensitive datasets relating to people
(including AI-based approaches such as (eg health information, demographics,
federated machine learning), is enabling location, etc.) is also essential. As set out in
researchers to model problems without the Royal Society’s 2023 report, Creating
requiring data access, offering a potential resilient and trusted data systems, trust in
technical solution to addressing concerns data sharing requires clarity of purpose and
surrounding sensitive data. These are transparency in data flows; as well as robust
explained in detail in the Royal Society’s systems for security and privacy141. Private
2019 report Protecting privacy in practice139 sector actors such as IBM, Microsoft and
and the 2023 report From privacy to Siemens are addressing publics concerns
partnership (which contains various by establishing communities of trust142.
use cases).140 Other approaches include data governance
frameworks that encourage the public to get
involved in data-driven scientific projects
while retaining control of their data (eg data
donation drives143).
139 The Royal Society. 2019 Protecting privacy in practice. See https://royalsociety.org/topics-policy/projects/privacy-
140 Ibid.
141 The Royal Society. 2023 Creating resilient and trusted data systems. See https://royalsociety.org/topics-policy/
projects/data-for-emergencies/ (accessed 21 December 2023).
142 Charter of Trust. See: www.charteroftrust.com (accessed 21 December 2023)
143 The Tidepool Big Data Donation Project. See: https://www.tidepool.org/bigdata (accessed 21 December 2023)

CHAPTER ONE
CASE STUDY 1
AI and rare disease diagnosis
A rare disease is a condition that affects AI applications in the field of rare diseases
fewer than 1 in 2,000 people and is often
• Leveraging medical imaging for early
characterised by diverse, complex, and
diagnosis: Clinicians are using AI to find
overlapping genetic manifestations144. Of the
patterns in large datasets of patient
more than 7,000 rare diseases described
information, including genetic data and clinical
worldwide, only 5% have a treatment145. A
records, that may indicate the presence
lack of understanding of underlying causes,
of a rare disease. ML is particularly useful
fragmented patient data, and inadequate
to analyse multimodal data from different
policies have contributed to making the
sources, including imaging data (eg, MRI,
diagnosis and treatment of rare diseases a
X-rays) that is becoming standard practice
public health challenge146.
to understand disease manifestation149.
For example, researchers at the Institute for
The application of ML and generative AI
Genomics Statistics and Bioinformatic at the
techniques offers an opportunity to overcome
University of Bonn are using deep neural
some of these limitations. Rare disease
networks (DNNs) and computational facial
researchers are using ML techniques to analyse
analysis to accelerate the diagnosis of
high-dimensional datasets, such as high-
ultra-rare and novel disorders150.
dimensional molecular data, to identify relevant
biomarkers for known diseases or to identify • Improving capabilities for automated diagnosis:
new diseases147. The shift towards digitising ML techniques can also be used to improve
health records is also creating opportunities automated diagnostic support for clinicians.
to identify patients with rare diseases more Applying ML to very large multi-modal health
promptly. Promising applications show potential datasets, such as UK Biobank151, for example, is
to improve low diagnostic rates, treatments, creating new possibilities to discover unknown
and drug development processes148. and novel variants that can contribute to a
molecular diagnosis of rare diseases152.
144 Department of Health and Social Care. 2021. The UK Rare Diseases Framework. See: https://www.gov.uk/government/
publications/uk-rare-diseases-framework/the-uk-rare-diseases-framework (accessed 30 September 2023).
145 Brasil S, Pascoal C, Francisco R, Dos Reis Ferreira V, Videira PA, Valadão AG. 2019 Artificial Intelligence (AI) in Rare
Diseases: Is the Future Brighter? (https://doi.org/10:978 10.3390/genes10120978)
146 Decherchi S, Pedrini E, Mordenti M, Cavalli A, Sangiorgi L. 2021 Opportunities and challenges for machine learning
in rare diseases. Frontiers in Medicine, 8, 747612. (https://doi.org/10.3389/fmed.2021.747612)
147 Banerjee J et al. 2023 Machine learning in rare disease. Nat Methods 20, 803–814. (https://doi.org/10.1038/s41592-
023-01886-z)
148 Schaefer J, Lehne M, Schepers J, Prasser F, Thun S. 2020 The use of machine learning in rare diseases: a scoping
review. Orphanet J Rare Dis.(https://doi.org/10.1186/s13023-020-01424-6)
149 Ibid.
150 Hsieh TC, Krawitz PM. 2023 Computational facial analysis for rare Mendelian disorders. American Journal of Medical
Genetics Part C: Seminars in Medical Genetics. (https://doi.org/10.1002/ajmg.c.32061
151 UK Biobank. See: https://www.ukbiobank.ac.uk/ (accessed 21 December 2023)
152 Turro E et al. 2020 Whole-genome sequencing of patients with rare diseases in a national health system. Nature
583, 96–102 (https://doi.org/10.1038/s41586-020-2434-2

CHAPTER ONE
Similarly, ML is applied to determine • Biased and unrepresentative datasets:

whether data in health datasets can be When rare disease data is available, it can be
used to identify patients who have not been unrepresentative, creating issues which span
previously tested for rare diseases and may from potential false positives or negatives
have gone undiagnosed153,154. to the underrepresentation of different age
groups or ethnic minorities. Imbalanced
• Accelerate treatment and drug discovery:
data can lead to biased and ‘overfitted’
ML models, in particular generative AI,
models that rely on patterns that are unique
can be leveraged to accelerate the
to the training data and thereby perform
drug discovery process. Models screen
poorly on new datasets, with implications
molecular libraries to predict potential drug
for transferability and generalisability of ML
candidates and assess their effectiveness in
models across contexts.
treating specific rare diseases.155 This area
tends to be dominated by private sector • Heterogenous and noisy data: The use
pharmaceutical companies such as Insilico of ML is best suited for large and well-
Medicine, Recursion, or Healx156. curated datasets, while rare disease data
can be heterogenous, incomplete, or
Data challenges for the application of AI for incorrectly labelled (eg, misdiagnoses or
rare disease studies incorrect labelling can be common in rare
disease studies due to a limited or evolving
• Limited data availability: Rare diseases
understanding of a condition)158. Data related
affect a very small percentage of the global
to rare diseases may come from various
population. Relevant data – if available –
sources, including clinical records, genetic
is siloed, scattered, behind paywalls or
testing, and patient surveys. These data
commercially owned. This scarcity of data
sources may have different formats, quality
can make it difficult to train accurate and
standards, and levels of detail. Integrating
robust AI models157. This is exacerbated
and harmonising such heterogeneous data
by a lack of channels to coordinate across
can be a significant challenge.
labs and institutions to integrate and cross
reference datasets.
153 Cohen AM at al. Detecting rare diseases in electronic health records using machine learning and knowledge
engineering: Case study of acute hepatic porphyria. PLoS. (https://doi.org/10.1371/journal.pone.0238277 )
154 Hersh WR, Cohen AM, Nguyen MM, Bensching KL, Deloughery TG. 2022 Clinical study applying machine learning
to detect a rare disease: results and lessons learned. JAMIA Open, 5. (https://doi.org/10.1093/jamiaopen/ooac053)
155 Nag S, et al. 2022 Deep learning tools for advancing drug discovery and development. 3 Biotech. 12: 110.
156 Steve Nouri. Generative AI Drugs Are Coming. See: https://www.forbes.com/sites/forbestechcouncil/2023/09/05/
generative-ai-drugs-are-coming/ (accessed September 30 2023)
157 Banerjee J et al. 2023 Machine learning in rare disease. Nat Methods 20, 803–814. (https://doi.org/10.1038/s41592-
023-01886-z)
158 Ibid.

CHAPTER ONE
• Cost and resource constraints: Collecting

and annotating data for rare diseases also desirable, such as regional or global
can be expensive and time-consuming. patient registries162 that can widen access
Many healthcare organisations, including relevant data, promote standardisation
in resource-limited settings, may not and interoperability of registries across
have access to resources to invest build institutions. For example, the European Rare
and maintain large-scale datasets for Diseases Platform has released the ‘Set of
rare diseases159. common data elements for Rare Diseases
Registration’. Establishing federated learning
• Sensitive data: Medical data is highly
infrastructures, such as Gaia-X, can also
sensitive, and there are strict data
facilitate sensitive data sharing163.
governance, management, and protection
considerations. Anonymising and de- • Use AI for data augmentation: Generative
identifying data is a common practice, techniques can be an effective way to address
however small samples in rare diseases data scarcity, noise, or incompleteness164. For
increase the likelihood of identifying people example, data augmentation strategies or the
through triangulation160. Sharing rare disease use of synthetic data can be used to populate
data while ensuring patient privacy and incomplete datasets with artificial samples that
complying with regulations can be complex. increase diversity and representativeness of
More resources are needed to set up trusted datasets (eg, address outliers and biases),
and secure research environments that minimising the need for personal data.
enable sensitive data sharing. Similarly, computer vision approaches can
be used to improve the quality and fidelity
Strategies to maximise the value of AI in rare of imaging data. Outstanding challenges
disease research include ensuring the reliability and adequate
Concerted efforts in data sharing, training of generative models.
standardisation and data governance can pave
• Establish multi-stakeholder cooperation
the way for AI to make a significant impact on
networks: Rare disease researchers stressed
the study of rare diseases and improving the
the importance of collaboration to widen
lives of those affected by these conditions.
access to resources and align multiple
stakeholder interests. The Asia-Pacific
• Independent and interoperable patient
Economic Cooperation’s Rare Disease
registries: Cross-institutional, international
Network is a relevant model that brings
collaboration is needed to create large,
together policymakers, academia, and industry
centralised datasets suitable for ML-based
to manage and harmonise data practices165.
research161. Data pooling initiatives are
159 The Royal Society interviews with scientists and researchers. 2022 – 2023
160 The Royal Society interviews with scientists and researchers. 2022 – 2023
161 Boycott KM et al. 2017 International cooperation to enable the diagnosis of all rare genetic diseases. Am. J. Hum.
Genet. 100, 695–705. (https://doi.org/10.1016/j.ajhg.2017.04.003)
162 Bellgard MI, Snelling T, McGree JM. 2019 RD-RAP: beyond rare disease patient registries, devising a comprehensive
data and analytic framework. Orphanet J Rare Dis 14, 176. (https://doi.org/10.1186/s13023-019-1139-9)
163 Decherchi S, Pedrini E, Mordenti M, Cavalli A, Sangiorgi,L. 2021 Opportunities and challenges for machine learning
in rare diseases. Frontiers in Medicine, 8, 747612 (https://doi.org/10.3389/fmed.2021.747612)
164 Kokosi T, Harron K. 2022. Synthetic data in medical research. BMJ medicine, 1.
(https://doi.org/10.1136/bmjmed-2022-000167)
165 Global Rare Disease Policy Network. See: https://www.rarediseasepolicy.org/ (accessed 21 March 2024)

CHAPTER TWO

CHAPTER TWO
Chapter two
Research integrity
and trustworthiness
Left
Rhinosporidium seeberi
parasite, the causative
agent of rhinosporidiosis.
© iStock / Dr_Microbe.

CHAPTER TWO
Research integrity
and trustworthiness
Trust in AI is essential for its responsible Based on interviews and a roundtable on
use in scientific research, particularly as reproducibility conducted for this report,
“ It is hardly possible
scientists become increasingly reliant on these the following observations capture unique
to imagine higher technologies164. This reliance hinges on an challenges AI poses for research integrity
stakes than these assumption that AI-based systems – as well and trustworthiness.
for the world of as their analysis and outputs – can produce
science. The reliable, low-error, and trustworthy findings. Reproducibility challenges in AI-based research
Reproducibility refers to the ability of
future existence
However, the adoption of AI in scientific independent researchers to scrutinise the
and social role [of research has been coupled with challenges results of a research study, replicate them, and
science] seem to to rigour and scientific integrity. Core issues reproduce an experiment in future studies167.
hinge on the ability include a lack of understanding about how AI
of researchers models work, insufficient documentation of If researchers develop an overreliance on
experiments, and scientists lacking the required AI for data analysis, while remaining unable
and scientific
technical expertise for building, testing and to explain how conclusions were reached
institutions to finding errors in a model. A growing body of and how to reproduce a study168, it will not
respond to the irreproducible studies using ML techniques are meet thresholds for scrutiny and verification.
crisis, thus averting also raising concerns regarding the challenges Similarly, if results cannot be verified, they
a complete loss of to reproduce AI-based experiments and the can contribute to inflated expectations,
reliability of AI-based results and discoveries165. exaggerated claims of accuracy, or research
trust in scientific
Together, these issues pose risks not just to outputs based on spurious correlations169. In
expertise by civil science, but also to society if the deployment the case of AI-based research, being able to
society.” of unreliable or untrustworthy AI technologies reproduce a study not only involves replicating
leads to harmful outcomes166. the method, but also being able to reproduce
Royal Society
the code, data, and environmental conditions
under which the experiment was conducted
(eg computing, hardware, software)170,171.
164 Echterhölter A, Schröter J, Sudmann A. 2021 How is Artificial Intelligence Changing Science? Research in the Era
of Learning Algorithms. OSF Preprint (https://doi.org/10.33767/osf.io/28pnx)
165 Sohn E. 2023 The reproducibility issues that haunt health-care AI. Nature. 9 January 2023.
See https://www.nature.com/articles/d41586-023-00023-2 (accessed 21 December 2023)
166 Sambasivan N et al. 2021 “Everyone wants to do the model work, not the data work”: Data Cascades in
High-Stakes AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
(https://doi.org/10.1145/3411764.3445518)
167 Haibe-Kains B et al. 2020 Transparency and reproducibility in artificial intelligence. Nature. 586, E14–E16.
(https://doi.org/10.1038/s41586-020-2766-y)
168 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning
AI safety risks across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/
169 Echterhölter A, Schröter J, Sudmann A. 2021 How is Artificial Intelligence Changing Science? Research in the Era
of Learning Algorithms. OSF Preprint (https://doi.org/10.33767/osf.io/28pnx)
170 Gundersen O, Gil Y and Aha D. 2018 On Reproducible AI: Towards Reproducible Research, Open Science, and
Digital Scholarship in AI Publications. AI Magazine. 39: 56-68. (doi.org/10.1609/aimag.v39i3.2816)
171 Gunderson O, Coakley K, Kirkpatrick C, and Gil Y. 2022 Sources of irreproducibility in machine learning: A review.
arXiv preprint. (doi.org/10.48550/arXiv.2204.07610)

CHAPTER TWO
Reproducibility failures do not only risk the Explainable AI (See Box 1) can help researchers
validity of the individual study172, but can identify errors in data, models, or assumptions
“There may be a
also affect research conducted for other – mitigating challenges such as data bias
studies, including those in other disciplines. – and ensure these systems produce high disproportionate
For example, a study led by the Center for quality results which can be used for real-world problem with
Statistics and Machine Learning at Princeton implementation176 (See Box 1). This can become machine learning.
University showed how ‘data leakage’ in a significant challenge for scientists who We’ve come very
one study (a leading cause of errors in ML integrate highly variable and complex models
far with the ability
applications due to errors in training data or into their work, such as deep learning models,
model features) may affect 294 papers across that are known to outperform less complex to handle huge
17 scientific fields, including high-stakes fields and more linear and transparent models. amounts of data,
like medicine173. Furthermore, these types of using software that
issues are likely to be underreported due to Opacity increases when models are developed is very competent
factors such as unpublished data; insufficient in a commercial setting. For instance, most
and well developed.
documentation; absence of mechanisms to leading LLMs are developed by large
report failed experiments; and high variability technology companies like Google, Microsoft, But I think perhaps
across experimentation or research contexts174. Meta, and OpenAI. These models are a lot of people using
proprietary systems, and as such, reveal limited it don’t actually
Opacity and the black-box nature of information about their model architecture, understand what
machine learning training data, and the decision-making
they’re doing in
At the core of the reproducibility challenge processes that would enhance understanding177.
are opaque ML-based models that not every a way that may
scientist can explain, interpret, or understand. not be so true for
ML models are commonly referred to as other areas. It’s
‘black-box models’ (models that can produce a compounded
useful information and outputs, even when
problem, where
researchers do not understand exactly how the
system works). The opaque nature of models there are many,
limits explainability and the ability of scientists many things you can
to interpret how ML models arrive at specific get wrong. I wonder
results or conclusions175. how many people
really understand
the software that
they’re using.”
172 McDermott M, Wang S, Marinsek N, Ranganath R, Foschini L, and Ghassemi M. 2021 Reproducibility in machine
learning for health research: Still a way to go. Sci. Transl. Med. 13, eabb1655. (doi.org/10.1126/scitranslmed.abb1655) Royal Society
173 Kapoor S and Narayanan A. 2023 Leakage and the reproducibility crisis in machine-learning-based roundtable participant
science. Patterns. 4(9) (doi.org/10.1016/j.patter.2023.100804)
174 Gundersen O, Gil Y and Aha D. 2018 On Reproducible AI: Towards Reproducible Research, Open Science, and
Digital Scholarship in AI Publications. AI Magazine. 39: 56-68. (doi.org/10.1609/aimag.v39i3.2816)
175 Royal Society. Royal Society response on Reproducibility and Research Integrity. See: https://royalsociety.org/news-
resources/publications/2021/research-reproducibility/ (accessed 7 March 2024)
report.pdf (accessed March 21 2024)

CHAPTER TWO
BOX 1
Explainability and interpretability

Explainability and interpretability refer • Helps researchers better understand
to information that allows users to the insights and patterns that come
understand how an AI system works and from the use of complex machine
the reasoning behind its outputs178. For learning models and large datasets.
example, in ML interpretability methods
• Enhances the potential for scientists
can offer information into ‘how a model
to draw insights from AI systems
works’ while explainability answers
to reveal potential new scientific
why certain conclusions are reached or
breakthroughs or discoveries181.
“what else can this model tell me?”179.
• Improves reproducibility by enabling
As set out in the Royal Society’s 2019 third parties to scrutinise the model, as
report, Explainable AI: The basics180, well as identify and correct errors.
ensuring explainability and interpretability
• Improves transferability and assessment
in science can have the following
of whether models could be suitable
benefits for trustworthiness:
across disciplines or contexts.
• Improves accountability and ensures

scientists can offer justification
behind the use of ML models182.
• In the case of science-based applications

that affect the public – from health to
public policy – explainability can ensure
policy makers and regulators can provide
oversight and prevent harms caused by
erroneous predictions or models183.
178 Marcinkevičs, R., Vogt, J. E. 2023. Interpretable and explainable machine learning: A methods-centric overview with
concrete examples (https://doi.org/10.1002/widm.1493)
179 Lipton,. 2018. The mythos of model interpretability. Queue, 16(3), 31–57. (https://doi.org/10.1145/3236386.3241340)
181 Li Z, Ji J, and Zhang Y. 2023 From Kepler to Newton: Explainable AI for science. arXiv preprint.
(doi.org/10.48550/arXiv.2111.12210)
182 McDermid J, Jia Y, Porter Z and Habli I. 2021 Artificial intelligence explainability: the technical and ethical
dimensions. Philosophical Transactions of the Royal Society A. 379(2207), 20200363. (doi.org/10.1098/rsta.2020.0363)
183 McGough M. 2018 How bad is Sacramento’s air, exactly? Google results appear at odds with reality, some say.
Sacramento Bee. 7 August 2018. See https://www.sacbee.com/news/california/fires/article216227775.html

CHAPTER TWO
The trade-off between explainability These perceptions raise questions on whether

and performance opacity has been normalised as the new status
One of the main limitations to explainability quo and whether explainability is a feasible
is what has been referred to as a trade- goal worth pursuing186,187. Furthermore, while
off between explainability and accuracy184. transparency can enhance understanding,
It is considered that the highest accuracy providing complex technical information about
of modelling (in terms of prediction or a system may not always improve the ability
classification) for large modern datasets is often of end users to interact with and understand
achieved by opaque complex models. This is systems188. As an alternative to restricting
coupled with a general acceptance among AI research to explainable models only, some
users of opacity involved in using ML models. have suggested a greater focus on improving
The competitive and fast-paced adoption the interpretability of models189.
of AI in scientific research is entrenching
this acceptance even further. The current AI Promising approaches to improve
ecosystem rewards high performance and explainability or interpretability include:
competitive models that are ’useful’ and
• Discipline-specific explainable AI methods
accurate, rather than transparent, accessible,
(XAI): As users from diverse disciplines
or ’user-friendly’185.
integrate AI into their work, explainability
methods are becoming discipline and
application-specific. XAI methods are
emerging in fields such as material
science190, biomedicine191, earth science192,
184 Lundberg S and Lee S. 2017 A unified approach to interpreting model predictions. In Proceedings
of the 31st International Conference on Neural Information Processing Systems. 4768–4777.
(dl.acm.org/doi/10.5555/3295222.3295230)
185 Cartwright H. 2023 Interpretability: Should – and can – we understand the reasoning of machine-learning systems?
In: OECD (ed.) Artificial Intelligence in Science. OECD. (doi.org/10.1787/a8d820bd-en)
186 Royal Society roundtable on reproducibility, April 2023.
187 Birhane A et al. 2023 Science in the age of large language models. Nat Rev Phys 5, 277–280.
(doi.org/10.1038/s42254-023-00581-4)
188 Bell A, Solano-Kamaiko I, Nov O, and Stoyanovich J. 2022 It’s Just Not That Simple: An Empirical Study of the
Accuracy-Explainability Trade-off in Machine Learning for Public Policy. In Proceedings of the 2022 ACM Conference
on Fairness, Accountability, and Transparency. Association for Computing Machinery. 248–266.
(doi.org/10.1145/3531146.3533090)
189 Miller K. 2021 Should AI models be explainable? That depends. Stanford University Human-Centered Artificial
Intelligence. See: https://hai.stanford.edu/news/should-ai-models-be-explainable-depends (accessed 21 December 2023).
190 Zhong X et al. 2022 Explainable machine learning in materials science. NPJ Comput Mater 8, 204.
(doi.org/10.1038/s41524-022-00884-7)
191 Combi C et al. 2022 A manifesto on explainability for artificial intelligence in medicine. Artificial intelligence in
medicine. 133, 102423. (doi.org/10.1016/j.artmed.2022.102423)
192 Hanson, B. Garbage in, garbage out: mitigating risks and maximizing benefits of AI in research.
See https://www.nature.com/articles/d41586-023-03316-8 (accessed 5 March 2024)

CHAPTER TWO
and environmental research193 for multiple • Knowledge graphs: Knowledge graphs are
purposes. These include enhancing scientific an advanced data structure that represents
“ One of the things
understanding derived from AI (eg better information in a network of interlinked
that is true with understanding of physical principles and entities. They reduce reliance on opaque
modelling is you generation of new hypothesis194); improving statistical patterns in training data for LLMs.
can get almost oversight and enforcement of environmental Medical LLMs, for example, can leverage
any result you protection regulations; and minimising the ontological biomedical data in knowledge
environmental footprint of AI systems195. graphs for transparent structured reasoning
want based on
about diseases and treatments. During
the assumptions • Glass-box architectures: Glass-box model
inference, LLMs consult knowledge graphs
architectures aim to make LLMs internal
you use to drive for relevant facts, providing a grounded
data representations more transparent
them. I think this is framework alongside their intrinsic
by incorporating attention mechanisms,
a dangerous area pattern recognition. Joint training with
modular structures, and visualisation tools
knowledge graphs improves LLMs’ factual
that our field is that can help surface how information flows
reasoning and aids in identifying gaps or
moving in. It’s too through layers of the neural network. In
misconceptions through audits198.
addition, augmented training techniques
much reliance on
like adversarial learning and contrastive
model results and Barriers limiting reproducibility
examples can probe the model’s decision
the pretty pictures Beyond technical challenges, there are a
boundaries. Analysing when the LLM
series of institutional and social constraints
that come out of it succeeds or fails on these special
that prevent researchers from adopting
as a reproduction training samples provides insights into
more rigorous and transparent processes.
its reasoning process196,197.
of truth.” Table 1 lists key barriers to reproducibility in
AI-based research199.
Royal Society
193 Arashpour M. 2023 AI explainability framework for environmental management research. Journal of environmental
management. 342, 118149. (doi.org/10.1016/j.jenvman.2023.118149)
194 Zhong X et al. 2022 Explainable machine learning in materials science. npj Comput Mater 8, 204. (doi.org/10.1038/
s41524-022-00884-7)
195 Arashpour M. 2023 AI explainability framework for environmental management research. Journal of environmental
management. 342, 118149. (doi.org/10.1016/j.jenvman.2023.118149)
196 Lengerich, B J et al. 2023. LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs. arXiv
preprint (https://doi.org/10.48550/arXiv.2308.01157)
197 Garrett BL, Rudin C 2023. The Right to a Glass Box: Rethinking the Use of Artificial Intelligence in Criminal Justice.
Cornell Law Review, Forthcoming, Duke Law School Public Law & Legal Theory Series.
198 Gaur, M, Faldu, K, & Sheth, A 2021. Semantics of the black-box: Can knowledge graphs help make deep learning
systems more interpretable and explainable? IEEE Internet Computing, 25, 51-59.
199 Royal Society roundtable on reproducibility, April 2023.

CHAPTER TWO
TABLE 1
Barriers to reproducibility and examples.
Barrier to reproducibility Examples

Misconceptions • An underlying assumption that machine learning (ML) models are inherently reproducible
and assumptions due to their reliance on computation.
about ML200 • Overreliance on ML-based outputs and questionable uses of statistical techniques to
smoothen bias or exclude uncomfortable or inconvenient results.
Computational • Different hardware and software environments may yield different results.
or environmental • Reproducibility at scale implies having access to computation capacity that enables
conditions researchers to validate complex machine learning models201.
• Private sector companies are better resourced than academia and can afford to train and
validate larger models (eg OpenAI’s GPT-4) while researchers in other sectors cannot202.
Documentation • Insufficient or incomplete documentation around research methods, code, data,
and transparency or computational environments.
practices • The growing development and adoption of less transparent, proprietary models.
• Lack of discipline-specific documentation that addresses barriers faced across fields,
applications, and research contexts (eg healthcare-specific documentation that tackles
reproducibility guidelines for disease treatment and diagnosis research).
• Insufficient efforts to make documentation accessible to scientists from different
backgrounds and with diverse levels of technical expertise.
Skills, training • Lack of clarity regarding who is responsible for different stages of the workflow and few
and capacity resources to incorporate reproducibility work.
• Lack of training for new ML users and insufficient guidelines on the limitations of different
models and the appropriateness of different techniques for field-specific applications.
• Lack of tools for non-ML experts to follow reproducibility guidelines and identify limitations
of models.
• Lack of mechanisms that facilitate interdisciplinary collaboration between scientists who
do not have a technical background in AI and computer or data scientists who carry
expertise to input data, identify errors, and validate experiments.
Incentives and • Few career progression opportunities in academia for roles needed to advance open
research culture and reproducible research (eg data curation and wrangling; research data management;
data stewardship; research managers).
• No incentives to publish errors in ML-based research (failed results) or remedies
• Narrow view of what outputs are worthy of publishing (eg data, models) and limited
rewards for conducting open science practices and publishing reproducibility reports.
• No specific incentives to encourage the use and development of human-interpretable
models when possible203.
200 Benjamin D et al. 2018 Redefine statistical significance. Nat Hum Behav 2, 6–10. (doi.org/10.1038/s41562-017-0189-z)
201 Bommasani et al. 2021. On the opportunities and risks of foundation models. See: https://crfm.stanford.edu/assets/report.pdf (accessed March 21 2024)
202
Ibid.
203 Rudin C. 2019 Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1,
206–215). (doi.org/10.1038/s42256-019-0048-x)

CHAPTER TWO
Research culture and reproducibility Reproducibility across contexts

The ‘publish or perish’ culture was A universal approach to achieving
“ The [reproducibility]
highlighted as a key limiting factor for reproducibility is not desirable. This usually
crisis or the scientists, in so far as it rewards the number demands a high level of control over
challenge with and type of publications as requisite for environmental and social conditions of a
machine learning career progression but does not recognise study, as well as a direct replication of inputs,
is about who datasets, documentation or reproduced outputs, and methods that may be unfeasible
studies as outputs worth rewarding. Coupled across diverse research environments,
is involved in
with ineffective practices of quality control cultures, and contexts.
determining how and self-correction within journals, the
reproducibility current publishing system is considered A standardised approach to reproducibility
is defined and to play a significant role in diminishing can also discourage researchers from
for whom will incentives to conduct the time-intensive approaching documentation and reporting
and collaborative work required to from a reflexive standpoint that addresses
this definition be
demonstrate reproducibility204. variability and the more idiosyncratic aspects
useful? Are we of scientific research206. Models that are
applying contextual These challenges are further explored in the not generalisable across contexts can, for
knowledge Royal Society’s 2018 report, Research culture: example, offer valuable insights regarding the
and situated Embedding inclusive excellence. The report source of variations and why they matter. In
highlights that one of the primary incentives the context of healthcare, the local conditions
understanding
for disseminating research findings should (eg, admission protocols, lab testing, record
of what the be to benefit the community as a whole managements, or clinician-patient interactions)
technology will be and to advance the research enterprise205. of a hospital can significantly shift the outputs.
used for? To what It also discusses the value of transparency
extent will it support for embedding a culture of integrity and as An alternative to a standardised approach to
a means for guarding against unnecessary improving reproducibility is a more contextual
diverse research
duplication of research. approach to documentation and research
communities protocols that embraces variability and provides
to enhance the insight into the local adaptation of models across
trustworthiness of contexts207. This approach has the potential to
their experiments?” support researchers who wish to adapt models,
rather than exporting or importing models that
Royal Society do not transfer well to different geographical
roundtable participant and cultural ‘contexts of discovery’208.
204 Leonelli S. 2018 Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary
Morgan: curiosity, imagination, and surprise. 36, 129-146. Emerald Publishing Limited.
205 The Royal Society. 2018 Research culture: embedding inclusive excellence. See https://royalsociety.org/topicspolicy/
publications/2018/research-culture-embedding-inclusive-excellence/ (accessed 21 December 2023)
207 Miller K. 2022 Healthcare algorithms don’t always need to be generalizable. Stanford University Human-Centered
Artificial Intelligence. See https://hai.stanford.edu/news/healthcare-algorithms-dont-always-need-be-generalizable

CHAPTER TWO
BOX 2 Advancing transparency and trustworthiness

Challenges with trustworthiness have led
Robustness and generalisability scientists to develop research protocols,
standards, tools, and open science practices
in ML
to ensure transparency and scientific rigour
Model robustness refers to a model’s ability in AI-based research210. These include:
to perform accurately across contexts209.
While modern ML models are optimised Incentivising the publication of
to accomplish narrow and specific tasks, reproducibility reports
developments in AI involve developing the • Pre-registration and registered reports.
capacity for ‘generalising’ or transferring Initiatives to publish methodologies as
learning from a training task to novel promoted by the Centre for Open Science
applications. Without generalisability, can enhance transparency of research
AI models might perform well on the studies, by encouraging researchers to
training data but fail to perform well on document their research plan before
new, unseen data, or in new research conducting further research and submitting
environments. Reproducibility plays a the methodology to peer review211.
major role in ensuring parties from diverse
• Pre-print servers. The increased use of preprint
research environments can replicate an
servers (eg such as bioRxiv by the biological
experiment and test generalisability.
and biomedical communities) may play a role
in facilitating communication of successful and
unsuccessful replication results.
• Grand challenges. For example, the ML

Reproducibility Challenge invites participants
to reproduce papers published in eleven top
ML conferences and publish a community-led
reproducibility report documenting findings212.
209 The Royal Society. 2017 Machine Learning: The power and promise of computers that learn by example.
See https://royalsociety.org/topics-policy/projects/machine-learning/ (accessed 21 December 2023).
210 Birhane A et al. 2023 Science in the age of large language models. Nat Rev Phys 5, 277–280.
(doi.org/10.1038/s42254-023-00581-4)
211 Center for Open Science. What is preregistration? See https://www.cos.io/initiatives/prereg
212 Papers With Code. ML Reproducibility Challenge 2022. See https://paperswithcode.com/rc2022

CHAPTER TWO
Guidance to produce documentation and • The release of data sheets and model
follow open science practices cards. Industry can play an important role in
• Reproducibility checklists and protocols. releasing information that provide insight into
Examples include the Machine Learning what a model does; its intended audience;
Reproducibility Checklist213, Checklist intended uses; potential limitations;
for AI in Medical Imaging (CLAIM)214, or confidence metrics; and information about
the field-agnostic REFORMS checklist215, the model architecture and the training data.
developed by experts in computer science, Meta219, Google220, and Hugging Face221 have
mathematics, social science, and health released different iterations of model cards.
research. These facilitate compliance and
• Context-aware documentation. Involving
documentation of the multiple dimensions
diverse actors in defining how reproducibility
of reproducibility.
is defined; promoting reporting mechanisms
• Community standards for documentation. that explicitly address contextual inputs
The development of domain-specific and sources of variation; and documenting
community standards such as TRIPOD-AI216 how local or team culture influences
provide guidance on how to document, implementation222.
report and reproduce machine-learning
based prediction model studies in health
research. The synthetic biology and
genomics community have also defined
experimental protocol standards and
documentation of the genomic workflow
to improve reproducibility217,218.
213 McGill School of Computer Science. The Machine Learning Reproducibility Checklist v2.0.
See: https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf (accessed 21 December 2023).
214 Mongan J, Moy L, and Kahn C. 2020 Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for
Authors and Reviewers. Radiology. Artificial intelligence, 2(2), e200029. (doi.org/10.1148/ryai.2020200029)
215 Reporting standards for ML-based science. See: https://reforms.cs.princeton.edu/ (accessed 21 December 2023).
216 Collins G et al. 2021 Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool
(PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.
BMJ open, 11(7), e048008. (doi.org/10.1136/bmjopen-2020-048008)
217 Lin X. 2020 Learning Lessons on Reproducibility and Replicability in Large Scale Genome-Wide Association Studies.
Harvard Data Science Review. 2. (doi.org/10.1162/99608f92.33703976)
218 Kanwal S et al. 2017 Investigating reproducibility and tracking provenance – A genomic workflow case study.
BMC Bioinformatics 18, 337. (doi.org/10.1186/s12859-017-1747-0)
219 Meta. 2022 System Cards, a new resource for understanding how AI systems work. Meta. 23 February 2022.
See https://ai.meta.com/blog/system-cards-a-new-resource-for-understanding-how-ai-systems-work/
220 Google. Model Cards. See https://modelcards.withgoogle.com/about (accessed 21 December 2023).
221 Hugging Face. Model Cards. See https://huggingface.co/docs/hub/model-cards (accessed 21 December 2023).
Morgan: curiosity, imagination, and surprise (Vol. 36, pp. 129-146). Emerald Publishing Limited.

CHAPTER TWO
Collaborative and accessible tools

and platforms
• Online collaborative platforms and
repositories that facilitate the sharing of
datasets; software versions; algorithms;
workflows; and methods. Examples include
CodaLab223 and OpenML224.
• Interactive and code-free tools. The

development of accessible AI tools can
help build trust and incorporate user
expertise into the design and evaluation
of model parameters. Examples include
interactive dashboards225 and code-
free solutions that employ user-friendly
interfaces to display datasets, confidence
metrics and train the model with new
examples. Code-free tools and ‘edge
models’ can also be used in regions
without internet access with limitations
in functionality226.
223 CodaLab. CodaLab Worksheets. See https://worksheets.codalab.org/ (accessed 21 December 2023).

224 OpenML. See https://www.openml.org/ (accessed 21 December 2023).
225 Morris M. 2023 Scientists’ perspectives on the potential for generative AI in their fields. Google Research.
(doi.org/10.48550/arXiv.2304.01420)
226 Korot E et al. 2021 Code-free deep learning for multi-modality medical image classification. Nat Mach Intell. 3,
288–298. (doi.org/10.1038/s42256-021-00305-2)

CHAPTER THREE

CHAPTER THREE
Chapter three
Research skills and
interdisciplinarity
Left
Preparation of nanomaterials
for Scanning Electron
Microscope (SEM) machine.
© iStock / AnuchaCheechang.

CHAPTER THREE
Research skills and interdisciplinarity

The successful application of AI in scientific Challenges for interdisciplinarity
research, and its translation to real-world in AI-based research
“ We’re getting to a
value, may often require interdisciplinary skills Interdisciplinary collaboration faces various
scale of data and and knowledge227. Interdisciplinary research barriers including the following:
complexity, that (IDR) involves activities that integrate more
you can’t do it all than one discipline with the aim to create new 1. Siloed academic disciplines and
yourself, or you knowledge or solve a common problem228. research cultures
Computer scientists need domain expertise Disciplines often operate within distinct
can but you will be
to design suitable models, while domain ‘epistemic cultures’ encompassing norms;
limited. If you want experts need AI expertise to leverage those methodologies; theoretical frameworks;
to be successful, tools for their research. This interdisciplinary evaluation; and funding models232.
you need to collaboration can facilitate knowledge sharing Establishing a shared language demands
understand how and drive innovative solutions to complex persistent effort and initiative to bridge
global problems such as climate change, terminological, paradigmatic, and cognitive
to collaborate and
biodiversity loss and epidemics229,230. gaps233. Interdisciplinary centres can take
co-create with an interdisciplinary approach to AI-based
people. And that This chapter draws upon insights obtained research. For instance, the Centre for the
is hard. It is painful from a roundtable hosted by the Royal Society Study of Existential Risk at the University
and it is slower, on the role of interdisciplinarity in ensuring the of Cambridge studies the biological,
advancement of AI-based scientific research231. environmental, global justice, and extreme
but potentially
technological risks of AI234,235. Similarly,
more impactful, doctoral training programmes, such as
right? […] If you the UKRI Centre for Doctoral Training in
want quick results, Environmental Intelligence236, can foster
do it yourself. an interdisciplinary culture by providing
PhD students with cross-disciplinary
If you want
training in AI ethics, governance, and
impactful things, responsible innovation.
you need to work
together and do
things differently.”
Royal Society
227 Eguíluz, V M, Mirasso, C R, & Vicente, R 2021. Fundamentals and Applications of AI: An Interdisciplinary
Perspective. Frontiers in Physics, 8. (https://doi.org/10.3389/fphy.2020.633494)
228 Weber, C T, & Syed, S. 2019. Interdisciplinary optimism? Sentiment analysis of Twitter data. Royal Society
open science, 6. (https://doi.org/10.1098/rsos.190473)
230 Wang H et al 2023. Scientific discovery in the age of artificial intelligence. Nature, 620. 47-60.
(https://doi.org/10.1038/s41586-023-06221-2)
232 Zeller F, Dwyer L. 2022 Systems of collaboration: challenges and solutions for interdisciplinary research in AI
and social robotics. Discover Artificial Intelligence, 2.12. (https://doi.org/10.1007/s44163-022-00027-3)
233 The Royal Society roundtable on the role of interdisciplinarity in AI for scientific research, 2023.
234 University of Cambridge. Centre for the study of existential risk. See https://www.cser.ac.uk/
236 University of Exeter. Environmental Intelligence: Data Science & AI for Sustainable Futures.
See https://www.exeter.ac.uk/research/eicdt/ (accessed 22 February 2024)

CHAPTER THREE
2. Siloed data infrastructures The emerging AI conference publication

While disciplines are collecting large model, driven by frequent deadlines, is
amounts of data, siloed data infrastructure accelerating paper output leading to a
limits knowledge and data sharing237. cycle of frequent conferences, such as
Integrated data systems and consortiums the Conference on Neural Information
can mitigate data redundancies, Processing Systems (NeurIPS) and the
standardise data processing and facilitate International Conference on ML (ICML) 241.
researcher access to data, while reducing This may result in a high volume of papers
duplication of effort. Examples include the that lack depth and quality, contrasting the
Environmental Data Service, a network iterative journal review process242. Secretive
providing UK environmental science data practices (eg closed data and models) also
and tools for interdisciplinary analysis238 conflict with open science principles. This
and the Ocean Data Platform, which model differs significantly from traditional
aggregates global ocean data in a cloud academic journal publications and may limit
environment, overcoming storage and trustworthy IDR in AI-based research.
computing limitations239.
A hybrid journal-conference model where
3. Different publication models across papers undergo thorough review in short
scientific disciplines turnaround journals, such as the Journal of
Divergent publication models limit Machine Learning Research (JMLR), before
interdisciplinary collaboration in AI conference presentations could encourage
research Examples include disparities in higher-quality results and facilitate
publication venues (conference vs. journal collaborations across disciplines valuing
publications), publication styles (single different publication norms243. Achieving
vs. group author publications), and levels this is likely to require a multi-stakeholder
of disclosure (open vs. closed science), approach including funding agencies,
impacting incentives and readiness to research institutions, and AI conference
collaborate in AI-based research240. communities to balance cutting-edge
dissemination with deeper review.
238 UKRI NERC Environmental Data Service. See: https://eds.ukri.org/
241 Bengio Y. 2020 Time to rethink the publication process in machine learning. See https://yoshuabengio.
org/2020/02/26/time-to-rethink-the-publication-process-in-machine-learning/ (accessed 10 January 2024)
242 Ibid.
243 Slow Science. See http://slow-science.org/ (accessed 10 January 2024)

CHAPTER THREE
4. Collaborating with non-STEM researchers Collaboration between AI experts and

Reconciling quantitative and qualitative researchers in non-STEM fields can offer
“ Students are
methods can pose workflow challenges opportunities including249:
expected to do for interdisciplinary collaboration between • Adopting AI-driven methods in non-
certain ‘service science, technology, engineering and STEM fields (eg Generative AI in art and
tasks’. It’s maths (STEM) and non-STEM researchers photography)250;
important work. like arts, humanities, and social sciences.
• Leveraging artists’ creativity and expertise
Furthermore, non-STEM researchers
We call it ‘service for AI-based research (eg user experience
face additional obstacles related to
work’ because design)251;
funding opportunities and a lack of ‘true
because it’s really interdisciplinary inclusion’ which can limit • Including social science and humanities
a service to the meaningful, sustained collaboration244. perspectives for responsible AI, AI safety,
science, but it’s and research ethics discussions252.
A STEAM approach, where art is
not glorious, it’s • Adopting non-STEM participatory research
directly included into STEM, can help
not glamorous, methods, including citizen science and open
arts complement STEM research245.
dialogues, to enhance trust, transparency,
it’s not credited Interdisciplinary programmes such as
and inclusivity in AI-based research253.
enough. And in Institute for Interdisciplinary Data Science
the end, when and AI at the University of Birmingham246
and the College of Integrative Sciences
you’re looking for
and Arts (CISA) at Arizona State University247
academic jobs, can address this gap by supporting
these things are researchers to work across disciplines248.
not valued.”
Royal Society
interview participant
244 The Royal Society interviews with scientists and researchers. 2022 - 2023
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/ (accessed 21 December 2023).
246 University of Birmingham. The Institute for Interdisciplinarity Data Science and AI.
See https://www.birmingham.ac.uk/research/data-science/index.aspx (accessed 11 January 2024)
247 Arizona State University. See: https://news.asu.edu/20230407-university-news-asu-college-integrative-sciences-
arts-reorganizes-3-new-schools. (accessed 13 December 2023)
250 Wu T, Zhang, SH. 2024 Applications and Implication of Generative AI in Non-STEM Disciplines in Higher Education.
In: Zhao, F., Miao, D. (eds) AI-generated Content. AIGC 2023. Communications in Computer and Information
Science, vol 1946. Springer, Singapore. (doi.org/10.1007/978-981-99-7587-7_29)
252 Zeller F, Dwyer L. 2022 Systems of collaboration: challenges and solutions for interdisciplinary research in AI
and social robotics. Discover Artificial Intelligence, 2. 12. (https://doi.org/10.1007/s44163-022-00027-3)

CHAPTER THREE
Emerging research skills in AI-based research 2. AI literacy training

The increased application of AI highlights the The skills gap can also include a lack of
“ As an ageing
need for foundational AI and data skills across understanding of challenges related to bias,
research fields254. While advanced data and reproducibility, and data requirements when scientist who is not
programming expertise may not always be employing AI models. Upskilling scientific a bioinformatics
required, nor AI methods necessary for all domain experts is essential for envisioning person, I find a
research areas, awareness of the latest tools innovative AI applications. Organisations lot of these things
and techniques can increase accessibility and like Cambridge Spark provide educational
quite impenetrable.
benefits for science. The following areas of resources for data science, addressing
emerging research skills were drawn from Royal the skills gap through apprenticeships, Often you don’t
Society interviews and roundtables involving corporate training, or skills bootcamps257. have the confidence
scientists from across academia and industry. Additionally, the EU is fostering education to find something
and skills through various initiatives under and you’re going
1. Specialist data skills Horizon Europe, such as the European
to need other
Demand for specialised data skills has Institute for Innovation and Technology
increased as more research fields adopt AI. Knowledge and Innovation Communities colleagues to really
Data scientists and engineers are predicted (EIT KICs), the European Innovation Council, go in there and have
to have the fastest growth between 2023 and the Erasmus+ programme258. an in-depth look.”
and 2027255 and roles including data
Royal Society
stewards and reproducibility experts are
interview participant
gaining increasing value as AI challenges
emerge (See Chapter 2). However, critical
data management tasks including curation,
cleaning, and quality assurance are often
undervalued as “service work”256. While
enhancing data literacy across disciplines
can aid AI uptake, the rapid advancement
of AI tools risks outpacing training on
effective data practices.
254 The Royal Society. 2019. Dynamics of data science skills: How can all sectors benefit from data science talent.
See https://royalsociety.org/-/media/policy/projects/dynamics-of-data-science/dynamics-of-data-science-skills-report.pdf
(accessed 6 January 2024)
255 World Economic Forum. 2023 The Future of Jobs Report 2023. See https://www3.weforum.org/docs/WEF_Future_
of_Jobs_2023.pdf (accessed 30 January 2024)
256 The Royal Society roundtable on reproducibility, April 2023
257 Cambridge Spark. See https://www.cambridgespark.com/ (accessed 1 August 2023)
258 Petkova, D, Roman, L. 2023 AI in science: Harnessing the power of AI to accelerate discovery and foster innovation
– Policy brief, Publications Office of the European Commission, Directorate-General for Research and Innovation.
(doi/10.2777/401605)

CHAPTER THREE
3. AI ethics training 4. Human-in-the-loop systems and skills

Researchers are adopting new AI AI can augment researchers’ skills,
“ I have always
techniques, such as generative AI, without a assist tasks, or automate processes262
used machine full understanding of the ethical implications. (See Chapter 1). It can also play a role
learning This is due to a lack of evidence on in supporting human judgement and
technologies, but potential risks and limited AI ethics training. creativity in scientific endeavours. Defining
I hadn’t thought (See Chapter 5)259. complementary roles for humans and
AI to support scientific research and
about bad uses
Collaboration with ethicists and AI ethics reskilling for automation (as seen in total
of AI. I never had training can help bridge this gap. This is laboratory automation case studies263)
an ethics lecture being explored by organisations such as the will be necessary. This transition also
in my life as a Montreal AI Ethics Institute, an international highlights the need for human-in-the-
scientist and that non-profit organisation, which aims to loop systems for quality control, quality
‘democratise AI ethics literacy’ by providing assurance, and adaptation to changing
was in the last
accessible resources, including a living workflow dynamics264.
30 years or so of dictionary, and an AI ethics briefing260.
my educational
life. That goes to Involving researchers in AI assurance
show that we’ve activities can also contribute towards
building skills to identify vulnerabilities and
messed up over
risks in AI systems. Examples include red
the last 30 years.” teaming (See Box 3) or training in principles
Royal Society like adversarial machine learning261.
259 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/arXiv.2302.04844)
260 Montreal AI Ethics Institute. See https://montrealethics.ai/. (accessed 26 February 2024.)
261 The Royal Society. 2024 Insights from the Royal Society & Humane Intelligence red-teaming exercise on AI-generated
scientific disinformation. See: https://royalsociety.org/news-resources/projects/online-information-environment/
262 OECD. 2023. Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research, OECD
Publishing, Paris (https://doi.org/10.1787/a8d820bd-en).
263 Archetti C, Montanelli A, Finazzi D, Caimi L, Garrafa E. Clinical laboratory automation: a case study. J Public Health
Res. 2017;6(1):881. (doi: 10.4081/jphr.2017.881)
264 Al Naam YA et al 2022 The Impact of Total Automaton on the Clinical Laboratory Workforce: A Case Study. J Health
Leadersh. 9;14:55-62. (doi:10.2147/JHL.S362614)

CHAPTER THREE
BOX 3
Insights from the Royal Society & Humane Intelligence red-teaming exercise
on AI-generated disinformation content
Red teaming refers to the process of While guardrails prevented some
actively identifying potential weaknesses, common disinformation trends, such as
failure modes, biases, or other those related to COVID-19, participants
limitations in a model, technology, or were still able to generate outputs
process by having groups ‘attack’ it. that distorted verifiable scientific facts
arriving at incorrect conclusions.
In the run-up to the UK’s 2023 Global
AI Safety Summit, the Royal Society and The exercise demonstrated the value
Humane Intelligence brought together of involving domain experts in AI safety
40 postgraduate students in health assessments before deployment. Their
and climate sciences to scrutinise scientific expertise allowed them to stress
how potential vulnerabilities in LLMs test systems in ways that exposed critical
(Meta’s Llama 2) could enable the failures. Participants also expressed
production of scientific misinformation. optimism regarding the future of LLM
disinformation guardrails and more
By assuming different ‘misinformation confidence in using LLMs in their own
actor’ roles, participants tested the model’s research. Their insights suggest that red
guardrails related to topics of infectious teaming could play a role in enhancing AI
diseases and climate change. In under literacy within the scientific community.
two hours, they exposed concerning
vulnerabilities, including the model’s inability
to convey scientific uncertainty and its
reliance on questionable or ficticious sources.

CHAPTER THREE
CASE STUDY 2
AI and material science
Materials science is a field where AI and Materials design and prediction

ML techniques have the potential to be Modelling and simulation are well-established
transformative, with wide-ranging societal techniques within materials science. Density
benefits, provided that appropriate support Functional Theory (DFT)268 is one of the
infrastructure is in place. most common modelling methods and is an
important tool in materials modelling. It allows
The need to develop advanced materials, for accurate calculations of materials behaviour,
from new battery materials for energy storage although it does not work well for certain
to catalysts to create biodegradable plastics, classes of materials or for certain properties
has been a driver for emerging technologies. that are important to material behaviour. It is
Historically, this intricate process relied heavily also a computationally expensive method
on a scientist’s prior knowledge, intuition, or and, as such, is limited to materials of low to
serendipity to navigate an estimated 10 trillion medium complexity269. Other techniques such
possible chemistry combinations. as Monte-Carlo simulation and molecular
dynamics are also commonly used but are
However, AI and ML are now accelerating similarly computationally expensive for
materials discovery and optimisation265. complex materials270.
These techniques rapidly screen candidates,
predict structures, and offer suggested AI and ML techniques can be used to predict
changes which would have overwhelmed the structure and properties of materials. An
manual approaches266. In turn, AI-based example of this is using generative algorithms
workflows can lead to time efficiencies, and foundation models to predict what
allowing more ideas to progress from materials might exhibit desirable properties271.
conception to commercialisation within However, the current materials knowledge
years instead of decades267. base is large and disparate, with data often
being incomplete, noisy, inconsistently
formatted, and poorly labelled272. Significant
amounts of materials data are sequestered
in journals without open-access, or never
published at all. Furthermore, data from
negative or unsuccessful experiments are
not routinely published.
265 Davies et. al. 2016 Computational screening of all stoichiometric inorganic materials. Chem. 1, 617-627.
(https://doi.org/10.1016/j.chempr.2016.09.010).
266 Pyzer-Knapp et. al. 2023 Accelerating materials discovery using artificial intelligence, high performance computing
and robotics. npj Computational Materials. 8, 84. (https://doi.org/10.1038/s41524-022-00765-z ).
267 Materials Genome Initiative. About the Materials Genome Initiative. See https://www.mgi.gov/about
(accessed 14 July 2023).
268 Argaman N, Makov G. 2000 Density functional theory: An introduction. American Journal of Physics. 68, 69-79.
(https://doi.org/10.1119/1.19375).
269 Alberi et. al. 2019 The 2019 materials by design roadmap. Journal of Physics D: Applied Physics. 52, 013001.
(https://doi.org/10.1088/1361-6463/aad926).
270 Tao Q, Xu P, Li M, Lu W. 2021. Machine learning for perovskite materials design and discovery. npj Computational
Materials. 7, 23. (https://doi.org/10.1038/s41524-021-00495-8).
271 Ross et. al. 2022 Large-scale chemical language representations capture molecular structure and properties.
Nature Machine Intelligence. 4, 1256-1264. (https://doi.org/10.1038/s42256-022-00580-7).
272 Liu Y, Zhao T, Ju W, Shi S. 2017 Materials discovery and design using machine learning. Journal of Materiomics. 3,
159-177. (https://doi.org/10.1016/j.jmat.2017.08.002).

CHAPTER THREE
In recent years, there have been several There have been several success stories in
materials databases developed with the goal recent years of ML being used for materials
of aggregating data in consistent formats discovery, some examples of which are listed
which can then be used for further research. in Table 2. A variety of ML and AI techniques,
Examples include the Materials Project273 including generative AI, have been used to
and Aflow274 databases (which both contain identify materials with desired properties for
computed properties) and the Inorganic a wide range of applications. These have
Crystal Structure Database (ICSD)275 and the been integrated with established techniques
High Throughput Experimental Materials such as DFT, stability calculations and
(HTEM)276 database (which are both examples experiments to narrow down the predicted
of experimental databases). There are also materials280. Sustainability of proposed
tools to help with the creation and analysis materials could also be used as an objective
of materials datasets, such as NOMAD277, for predictive models281, to prevent new, more
ChemML278, and atomate279. These datasets complex materials being harder to recycle or
which can be significant in size (eg the dispose of safely.
Materials Project database currently contains
data for more than 150,000 materials),
have been facilitating the use of ML for
materials discovery.
273 Jain et. al. 2013 The Materials Project: A materials genome approach to accelerating materials innovation. APL
Materials. 1, 011002. (https://doi.org/10.1063/1.4812323).
274
Curtarolo et. al. 2012 AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab-initio
calculations. Computational Materials Science. 58, 227-235. (https://doi.org/10.1016/j.commatsci.2012.02.002).
275 Physical Sciences Data-Science Service. ICSD. See https://www.psds.ac.uk/icsd (accessed 14 July 2023).
276 Zakutayev et. al. 2018 An open experimental database for exploring inorganic materials. Scientific Data. 5, 180053.
(https://doi.org/10.1038/sdata.2018.53).
277 Draxl C, Scheffler M. 2019 The NOMAD laboratory: from data sharing to artificial intelligence. Journal of Physics:
Materials. 2, 036001. (https://doi.org/10.1088/2515-7639/ab13bb).
278 ChemML. See https://hachmannlab.github.io/chemml/ (accessed 14 July 2023).
279 Atomate. See https://atomate.org/ (accessed 14 July 2023).
280 DeCost et. al. 2020 Scientific AI in materials science: a path to a sustainable and scalable paradigm.
Machine Learning: Science and Technology. 1, 033001. (https://doi.org/10.1088/2632-2153/ab9a20).
281 Raabe D, Mianroodi J, Neugebauer J. 2023 Accelerating the design of compositionally complex materials
via physics-informed artificial intelligence. Nature Computational Science. 3, 198-209.
(https://doi.org/10.1038/s43588-023-00412-7)

CHAPTER THREE
TABLE 2
Examples of machine learning in materials discovery
Researchers Result
Lyngby et. al. 282
Predicted 11,630 new, stable 2D materials.
Yao et. al.283 Found 2 new ‘invar alloys’ which have a low thermal expansion
and can be useful for several applications.
Vasylenko et. al.284 Identified 4 new materials, including materials that have desirable
properties for use in solid state batteries.
Sun et. al.285 An approach for pre-screening for new organic photovoltaic materials.
Stanev et. al.286 Identified >30 potential high-temperature superconducting materials.
282 Lyngby P, Sommer Thygesen K. 2022. Data-driven discovery of 2D materials by deep generative models. npj
Computational Materials. 8, 232. (https://doi.org/10.1038/s41524-022-00923-3).
283 Rao et. al. 2022 Machine learning-enabled high-entropy alloy discovery. Science. 378, 78-85.
(https://doi.org/10.1126/science.abo4940).
284 Vasylenko et. al. 2021 Element selection for crystalline inorganic solid discovery guided by unsupervised machine
learning of experimentally explored chemistry. Nature Communications. 12, 5561. (https://doi.org/10.1038/s41467-021-
25343-7).
285 Sun et. al. 2019. Machine learning-assisted molecular design and efficiency prediction for high-performance organic
photovoltaic materials. Science Advances. 5, 11. (https://doi.org/10.1126/sciadv.aay4275).
286 Stanev et. al. 2018. Machine learning modelling of superconducting critical temperature. npj Computational
Materials. 4, 29. (https://doi.org/10.1038/s41524-018-0085-8).

CHAPTER THREE
Automated experimentation Although AE technologies exist and are being

Another important application of AI in materials used for materials research, there are several
is for automated experimentation (AE). The barriers which need to be addressed prior to
central idea is to integrate AI and robotics in wider spread adoption. Firstly, experimental
a closed experimental loop, whereby an AE hardware and software improvements would
system can undergo an iterative experimental be needed, including non-proprietary interfaces
process based on prior knowledge, improving and tools for effective characterisation of
the output after each iteration287. There is samples. Collaborative partnerships between
human intervention to initialise the experiment materials scientists and AI experts would be
and define the objectives. beneficial, as well as an understanding that
human involvement will be necessary where
Increased adoption of AE systems in labs there are safety or ethical concerns. Finally, AE
would have a significant impact on materials systems would need access to a broad range of
design and experimentation. Firstly, it would data, including metadata and negative results,
increase the speed of research at a reduced to supply to the system as pre-knowledge,
per-experiment cost288. This would free up requiring standardisation of data management
scientists to work on other tasks, rather and sharing.
than focusing on routine, time-consuming
experiments, and would also enable faster There are several examples of AE systems
development and commercialisation of being successfully used for materials research.
technologically relevant materials289. A global One such example is the 2016 demonstration
network of integrated AE systems would by Nikolaev et al 290. of the first use of the
increase the accessibility of materials science Autonomous Research System to optimise
research, opening advanced techniques to the growth of carbon nanotubes, a material
research groups with fewer resources. which has potential uses in carbon capture
technologies as well as an astounding array
of current applications291. However, carbon
nanotubes are expensive; the price depends
on configuration and quality, but 1 gram
retails from around £100 to more than £1200.
AE can be used to improve the growth of
carbon nanotubes by rapidly iterating growth
parameters for property optimisations and
greater yields.
287 Stach et. al. 2021 Autonomous experimentation systems for materials development: A community perspective.
Matter. 4, 2702-2726. (https://doi.org/10.1016/j.matt.2021.06.036).
288 Stein H, Gregoire J. 2019 Progress and prospects for accelerating materials science with automated and
autonomous workflows. Chemical Science. 10, 9640. (https://doi.org/10.1039%2Fc9sc03766g)
289 Maruyama et. al. 2023 Artificial intelligence for materials research at extremes. MRS Bulletin. 47, 1154-1164.
(https://doi.org/10.1557/s43577-022-00466-4).
290 Nikolaev et. al. 2016 Autonomy in materials research: a case study in carbon nanotube growth. npj Computational
Materials. 2, 16031. (https://doi.org/10.1038/npjcompumats.2016.31).
291 De Volder M, Tawfick S, Baughman R, Hart J. 2013 Carbon Nanotubes: Present and Future Commercial Applications.
Science. 339, 535-539. (https://doi.org/10.1126/science.1222453).

CHAPTER FOUR

CHAPTER FOUR
Chapter four
Research, innovation
and the private sector
Left
Electronic circuits.
© iStock / onuma Inthapong.

CHAPTER FOUR
Research, innovation
and the private sector
The large investment in AI by the private The role of the private sector in science
sector and its significance in scientific is also expanding as many companies
research present various implications. These contribute to provisioning essential
include the centralisation of critical digital resources like computational power, data
infrastructure292; the attraction of talent away access and novel AI technologies to the
from academia to the private sector293; and wider research community297.
challenges to open science294.
This chapter examines the growing role of
The influence of the private sector in the private sector in science, drawing on a
the development of AI for science is not commissioned review of the global AI patent
extraordinary. Historically, the automation landscape, which describes the distribution
of tasks has been driven by industry actors of ownership, development and impact of AI
in the pursuit of reduced labour costs and technologies. It also gathers perspectives from
greater scalability295. Today, the private a horizon-scanning workshop on AI security
sector continues to play a prominent role risks and a commissioned historical review.
in advancing scientific research, with many
companies having AI-driven scientific
programmes such as Alphabet’s Google
DeepMind and Microsoft’s AI for Science296.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
293 Gofman M, Jin Z. 2022 Artificial Intelligence, Education, and Entrepreneurship. Journal of Finance, Forthcoming.
(https://doi.org/10.1111/jofi.13302)
294 Ibid.
296 Microsoft. Microsoft Research. AI4Science. See https://www.microsoft.com/en-us/research/lab/microsoft-research-
ai4science/ (accessed 21 December 2023)
297 Kak A, Myers West S, Whittaker M. 2023 Opinion: Make no mistake – AI is owned by Big Tech. MIT Technology
Review. See https://www.technologyreview.com/2023/12/05/1084393/make-no-mistake-ai-is-owned-by-big-tech/.

CHAPTER FOUR
The changing landscape of AI technologies One method of understanding the changing

in scientific research landscape of AI technologies in scientific
Public and private sector investment in AI research is by looking at intellectual property
for scientific advancement is increasing298. (IP) trends. IP refers to creations of the mind,
For instance, the UK Government, through such as inventions, designs, or literacy
UKRI, committed £54 million to universities and artistic work. IP law includes copyright,
across the country to support responsible AI trademark, trade secrets and patents which
development and fund AI-based research grant exclusive rights to inventors or assignees
projects and a further £50 million to accelerate for a limited time in exchange for public
research ventures with industry and the third disclosure of the invention303. This section
sector299. However, in 2023, Microsoft invested primarily draws on patent trends (See Box 4)
over £8 billion into OpenAI300 and Meta and addresses the changing landscape of AI
pledged around £26.5 billion in expanding technologies in scientific research.
their ‘AI capacity’301. It is estimated that, in
2022, the private sector accounts for 67%
of AI investment in the EU, with the public
sector contributing 33%302.
298 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
299 £54 million boost to develop secure and trustworthy AI research. Gov.UK. See https://www.gov.uk/government/
news/54-million-boost-to-develop-secure-and-trustworthy-ai-research (accessed 21 December 2023)
300 Bass D. 2023 Microsoft invests $10 billion in ChatGPT maker OpenAI. Bloomberg. 23 January 2023.
See https://www.bloomberg.com/news/articles/2023-01-23/microsoft-makes-multibillion-dollar-investment-in-openai
301 Targett E. 2023 Meta to spend up to $33 billion on AI, as Zuckerberg pledges open approach to LLMs. The Stack.
27 April 2023. See https://www.thestack.technology/meta-ai-investment/ (accessed 21 December 2023).
302 Ibid.
303 Intellectual property and your work. Gov.UK. See: https://www.gov.uk/intellectual-property-an-overview
(accessed March 22 2024)

CHAPTER FOUR
BOX 4
IP Pragmatics 2023: Global patent landscape analysis

The Royal Society commissioned a While patent landscape reviews can
global patent landscape review on AI provide useful findings, there are limitations.
technology patents for scientific research. Obtaining complete global patent data can
The analysis assessed the ownership, be challenging, and the analysis may present
development, and impact of AI patents an incomplete picture due to limitations in
among countries, organisations, and data availability or accessibility. While delay
industries in the past 10 years. It also times may vary between jurisdictions by
identified key players, trends, and potential a few months, there is an 18-month delay
implications for the scientific community. between priority application and patent
publication in most of the main global
This analysis defined AI as the study in territories. This means that data in this
computer science aimed at developing review from 2021 – 2023 is incomplete305.
machines and systems that can carry
out tasks considered to require human Additionally, patent data alone does not
intelligence, such as ML or ANNs. Key capture the full extent of AI research and
search terms and relevant International development, as some innovations may
Patent Classification (IPC) codes were not be patented or may be protected
used to identify AI-related patents through other IP laws including copyright,
comprehensively (see IP Pragmatics trade secrets and trademarks.
report for more information)304.
304 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society.
See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
305 Ibid.

CHAPTER FOUR
1. Growth in the AI patent landscape Notably, China leads in AI patent filings,

The AI patent landscape has surged in the holding approximately 62% of the landscape,
past decade, with approximately 74% of followed by the United States with around
the total patent filings occurring in the last 13.2%308. The Asia-Pacific region is projected
five years306 (see Figure 2). In 2022, the to have the highest CAGR (48.6%) between
market value reached £109.718 billion, with 2022 and 2027, in comparison to the US (43%)
a projected compound annual growth rate and Europe (46.5%). This finding complements
(CAGR) of 37.3% from 2023 to 2030307. the reported increase in AI innovation in the
healthcare industry in China and India309,310.
FIGURE 2
Patent filing trends of AI-related technological inventions in the last 10 years
Incomplete data
40,000
35,000
30,000
INPADOC family count
25,000
20,000
15,000
10,000
5,000
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Earliest priority year
(Data for 2021 – 2023 is not complete given the 18-month delay from the priority filing date and the date of publication).
306 Ibid.
307 Grand View Research. See https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-ai-market#
(Accessed 21 December 2023)
308 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society.
309 Nair M, Sethumadhaven A. 2022 AI in healthcare: India’s trillion-dollar opportunity. World Economic Forum.
See https://www.weforum.org/agenda/2022/10/ai-in-healthcare-india-trillion-dollar (accessed 21 December 2023)
310 Olcott E. 2022, China sets the pace in adoption of AI in healthcare technology. Financial Times. 31 January 2022.
See: https://www.ft.com/content/c1fe6fbf-8a87-4328-9e75-816009a07a59 (accessed 21 December 2023)

CHAPTER FOUR
2. Global market shares in AI for science North America, with its rich concentration of
The use of AI in the science and technology firms and skilled professionals,
engineering market is being driven primarily dominates this market312. In Europe, Germany
by the demand for AI technology to drive leads, but the United Kingdom stands out
innovation and economic growth. As such, with a significant 14.7% share in the AI for life
there is a correlation between patent filing sciences market and has the region’s highest
trends and global market shares311. forecasted CAGR of 47.9%313.
FIGURE 3
Global distribution of the number of AI-related patent families by 1st priority country
KEY
Count of patent families

1 120,772
311 BBC Research. 2022 Global Markets for Machine Learning in the Life Sciences. October 2022.
See https://www.bccresearch.com/market-research/healthcare/global-markets-for-machine-learning-in-life-sciences.
html (accessed 21 December 2023)
312 Ibid.
313 Ibid.

CHAPTER FOUR

CHAPTER FOUR
The UK, ranking 10th globally and 2nd in This case led to an adjustment in UKIPO’s
Europe for patent filings, demonstrates strong examination practices, removing specific
growth potential314. The UK Intellectual Property guidance on ANNs316. While this decision has
Office (UKIPO) adopts a more patentee-friendly been appealed and currently awaits review,
approach to examining computer-implemented recent rulings have reinforced the UK’s
and AI inventions compared to the European position as a preferred region for AI-related
Patent Office (EPO), as underscored by recent IP protection, bolstering its role as a key
decisions like Emotional Perception AI vs player in AI innovation.
Comptroller General of Patents315.
FIGURE 4
Global Market Shares of Machine Learning in the Life Sciences,

by Region, 2021 (%)
Rest of world
5.2%
North America
Asia-Pacific 43.8%
24.1%
Europe
26.8%
Source: BCC Research.
314 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society.
315 Emotional perception ai ltd v comptroller-general of patents, designs and trademarks. 2023. Find case law –
The National Archives. See https://caselaw.nationalarchives.gov.uk/ewhc/ch/2023/29482948
(accessed 4 March 2024).
316 Examination of patent applications involving Artificial Neural Networks (ANN). Gov.UK.
See https://www.gov.uk/government/publications/examination-of-patent-applications-involving-artificial-neural-
networks/examination-of-patent-applications-involving-artificial-neural-networks-ann (accessed 4 March 2024).

CHAPTER FOUR
However, the global landscape is marred by Despite a surge in African tech hubs, high IP
disparities. The costly and intricate patent registration expenses and lack of a unified
application processes, particularly in regions system hamper patenting317. Initiatives like the
like Africa, pose considerable barriers. Pan-African Intellectual Property Organisation
For example, patenting in Africa through aim to address these challenges, although
the African Regional Intellectual Property they currently face operational delays318.
Organisation costs over £29,000, significantly
higher than in the UK, priced at around £1,900.
FIGURE 5
European Market Shares of Machine Learning in the Life Sciences,

by Country, 2021 (%)
Germany
16%
Rest of Europe
38.2%
UK
14.7%
France
12.2%
Spain
7.3%
Italy
11.5%
Source: BCC Research.
317 Lewis J, Schneegans S, Straza T. 2021 UNESCO Science Report: The race against time for smarter development.
UNESCO Publishing. See: https://unesdoc.unesco.org/ark:/48223/pf0000377250 (accessed 22 March 2024)
318 Ibid.

CHAPTER FOUR
3. Key players in the AI for Science Challenges related to the role of the private
patent landscape sector in AI-based science
In terms of technological impact (indicated In addition to looking at patenting trends,
by the number of times that a patent the Royal Society explored the challenges of
is cited by a later patent or forward private sector involvement in AI-based scientific
citations) the US stands out for having research. Ahead of the Global AI Safety Summit
valuable patents. Comparatively, despite hosted by the United Kingdom in 2023, the
India’s significant growth in AI patent Royal Society and the UK’s Department for
filings, it has not yet achieved large Science, Innovation and Technology (DSIT)
technological impact. While the UK, though convened a horizon scanning workshop on
representing a smaller portion of the patent the safety risks of AI in scientific research322.
landscape, demonstrates research and Challenges identified include:
innovation influence, ranking among the
highest globally319. 1. Private sector dominance and centralisation
of AI-based science development
The analysis of the top 20 assignees in Centralisation of AI development under
AI-related patents underscores the active large technology firms (eg Google, Microsoft,
involvement of both industry and academic Amazon, Meta and Alibaba) could lead to
entities within the broader scientific and corporate dominance over infrastructure
engineering research sphere. Notably, critical for scientific progress. This includes
companies such as Canon, Alphabet, ownership over massive datasets for training
Siemens, IBM, and Samsung have emerged AI models, vast computing infrastructures,
as key contributors, with substantial patent and top AI talent323.
portfolios that wield considerable influence
across scientific and engineering domains. Centralisation can limit wider participation
Despite the dominance of commercial in steering the AI research agenda and can
entities in most regions, academic restrict a small number of decision-makers
institutions including the University of Oxford, to shape what research is conducted and
Imperial College London, and University of published from influential industrial labs. For
Cambridge feature prominently among the instance, the high-profile and controversial
top patent filers in the UK 320, suggesting dismissal of AI researcher Dr Timnit Gebru
blend of academic-industry collaboration from Google highlighted the opaque
and independent contributions321. internal decision-making in private sector
research units.
319 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society.
320 Ibid.
321 Legislation.Gov.UK. Copyright, Designs and Patents Act 1988. See: https://www.legislation.gov.uk/ukpga/1988/48/contents
322 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety
risks across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/
323 Kak A, Myers West S, Whittaker M. 2023 Opinion: Make no mistake – AI is owned by Big Tech. MIT Technology
Review. 5 December 2023 See https://www.technologyreview.com/2023/12/05/1084393/make-no-mistake-ai-is-
owned-by-big-tech/. (accessed 21 December 2023)

CHAPTER FOUR
This is an example of a misalignment The patent system wields substantial

between corporate interest to protect their influence over markets, stimulating
market advantage and academic values competition and growth in all sectors.
to openly scrutinise the societal impact of Through this system, patent proprietors
advancing technology324. can ensure investment return and control
of technology dissemination. However,
2. Overreliance on industry-driven tools without oversight, there is a risk that these
and benchmarks for AI-based science entities will prioritise commercial interests
Industry’s growing influence in setting over broader scientific advancement,
key benchmarks325, developing cutting- controlling critical infrastructure,
edge models326, and steering academic databases, and algorithms to maintain
publications327 is centralising control over their market dominance329.
AI ecosystems, encompassing hardware,
software, and data328. The concentration of power and resources
could not only hinder competition and
transparency but also establish single
points of failure, raising concerns about
the resilience and openness of AI-based
scientific research330.
324 Hao K. 2020 We read the paper that forced Timnit Gebru out of Google. Here’s what it says. MIT Technology
Review. 4 December 2020. See https://www.technologyreview.com/2020/12/04/1013294/google-ai-ethics-research-
paper-forced-out-timnit-gebru/ (accessed 12 July 2023)
325 Hodak, M., Ellison, D., & Dholakia, A. (2020, August). Benchmarking AI inference: where we are in 2020. In
Technology Conference on Performance Evaluation and Benchmarking (pp. 93-102). Cham: Springer International
Publishing.
326 Ibid.
327 Ahmed N, Wahed M, Thompson, N. C. 2023. The growing influence of industry in AI research. Science, 379(6635),
884-886. (https://doi: 10.1126/science.ade2420)
330 Westgarth, T., Chen, W., Hay, G., & Heath, R. 2022 Understanding UK Artificial Intelligence R&D commercialisation
and the role of standards. See https://oxfordinsights.com/wp-content/uploads/2023/10/DCMS_and_OAI_-_
Understanding_UK_Artificial_Intelligence_R_D_commercialisation__accessible-1.pdf (accessed 21 December 2023)

CHAPTER FOUR
BOX 5
The role of the private sector in patenting medicine

and pharmaceutical inventions
An analysis of AI patents in medicine and However, Alphabet’s strategy of filing
pharmaceutical interventions shows that cluster of patents, considered ‘patent
while Harvard University and Massachusetts ring-fencing’, suggests a broader trend
Institute of Technology (MIT) were pioneers of firms leveraging their IP to safeguard
in this area, patent portfolios held by Roche and expand their market position336.
and IBM appear to be most valuable in Such approaches not only protect against
this sector331. Alphabet has expanded its infringement but also prevent competitors
influence through subsidiaries like Google from developing adjacent technologies,
DeepMind, which developed AlphaFold reinforcing Alphabet’s – and by extension,
– an AI system that has revolutionised Google DeepMind’s – dominance
protein structure prediction332. This in AI-driven medical research.
innovation marks a significant shift in the
medical patent landscape333,334, prompting
other technology giants like Microsoft,
to invest in similar technologies. This
highlights the commercial potential and
competitive dynamics in this sector335.
331 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
332 Google DeepMind. Technology: AlphaFold. See https://deepmind.google/technologies/alphafold/
333 Borkakoti N, Thornton J.M., 2023. AlphaFold2 protein structure prediction: Implications for drug discovery.
Current opinion in structural biology, 78, p.102526 (https://doi.org/10.1016/j.sbi.2022.102526)
334 Jumper, J., et al. 2021 Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), pp.583-589.
(doi: 10.1038/s41586-021-03819-2)
335 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
336 Ibid.

CHAPTER FOUR
3. The private sector and open science Further approaches include changes to
The commercial incentives driving private legislation such as the requirements for
ownership of data for AI-based research social media companies to share data in
could restrict open science practices. the European Digital Services Act340 and
This limits non-industry scientists’ the principles for intervention to unlock the
ability to equitably contribute to and value of data across the economy in the
scrutinise data for AI systems alongside UK’s National Data Strategy341. Additionally,
industry counterparts. technical approaches include privacy
enhancing technologies342 and cyber-
Privately held data is often commercially security legislation to provide legal measures
sensitive and could necessitate non- and ensure safer hardware and software343.
disclosure agreements, potentially affecting
research integrity. Data considered low risk Open-source code and platforms do also
initially may later gain commercial value and offer some advantages to private sector
get withdrawn, as seen with some social organisations, including speed and cost-
media companies tightening data access effectiveness, but also have significant
following the surge of LLMs training on limitations including lack of support, security
public data337,338. risks, and compatibility. For example,
industrial partnerships for mutual benefits,
Alternative monetisation approaches such as the partnership between Siemens
like encouraging the licensing of data and Microsoft, can drive cross-industry AI
lakes and utilising database provisions adoption by sharing software, hardware and
can provide a more open and pragmatic talent344. During the COVID-19 pandemic,
approach to data sharing339. some private organisations relinquished
patent rights for the common good, with
leading technology companies donating
their patents to open-source initiatives345.
337 Isaac M. 2023 Reddit wants to get paid for helping to teach big AI systems. The New York Times. 18 April 2023.
See https://www.nytimes.com/2023/04/18/technology/reddit-ai-openai-google.html (accessed 21 December 2023).
338 Murphy H. 2023 Elon Musk rolls out paywall for Twitter’s data. The Financial Times. 29 April 2023.
See https://www.ft.com/content/574a9f82-580c-4690-be35-37130fba2711 (accessed 21 December 2023).
339 Grossman R L. 2019 Data lakes, clouds, and commons: A review of platforms for analyzing and sharing genomic
data. Trends in Genetics, 35(3), pp.223-234. (https://doi.org/10.1016/j.tig.2018.12.006)
340 European Commission. The Digital Services Act. See https://commission.europa.eu/strategy-and-policy/
priorities-2019-2024/europe-fit-digital-age/digital-services-act_en (accessed 5 February 2024)
341 Department for Science, Innovation and Technology. National Data Strategy. 5 December 2022.
See https://www.gov.uk/guidance/national-data-strategy (accessed 5 February 2024)
342 The Royal Society. 2023 From privacy to partnership. See https://royalsociety.org/topics-policy/projects/privacy-
343 European Commission. Directive on measures for a high common level of cybersecurity across the Union (NIS2
Directive). See https://digital-strategy.ec.europa.eu/en/policies/nis2-directive (accessed 22 February 2024)
344 Siemens. 2024 Siemens and Microsoft partner to drive cross-industry AI adoption. See https://press.siemens.com/
global/en/pressrelease/siemens-and-microsoft-partner-drive-cross-industry-ai-adoption (accessed 26 February 2024)

CHAPTER FOUR
4. The private sector’s role in AI safety Universities can also play a crucial role
Private sector dominance in AI for science in advancing AI safety, by promoting
also poses challenges to AI safety. ethical research standards or incentivising
Organisations and institutions leading AI academic research on AI harms. However,
development often determine their own they do not have the same capabilities as
ability to assess harm, establish safeguards, large technology companies to institute
and safely release their models. As robust safeguards and best practices across
described by OpenAI in the paper behind all aspects of complex AI development.
the release of GPT-4, commercial incentives Recently, national governments have been
and safety considerations can come into placing greater significance on AI safety
tension with scientific values such as discussions. Since the Global AI Safety
transparency and open science practices346. Summit in November 2023, the UK has
launched the AI Safety Institute348 while the
Hugging Face, an open-source organisation, US announced the US AI Safety Institute
suggests evaluating the trade-offs for safe under the National Institute of Standards
and responsible release as illustrated in the and Technology (NIST)349.
Gradient of System Access347 (see Figure 6).
Similar frameworks can be considered
and developed by scientific communities
to assess the conditions under which
releasing training data is safe, allowing them
to contribute to scientific progress while
reducing potential for harm and misuse.
346 OpenAI et al. 2023 Gpt-4 technical report. arxiv 2303.08774. View in Article, 2, p.13. (https://doi.org/10.48550/
arXiv.2303.08774)
arXiv.2302.04844)
348 Gov.UK. 2024 Introducing the AI Safety Institute. See https://www.gov.uk/government/publications/ai-safety-institute-
overview/introducing-the-ai-safety-institute )(accessed 26 February 2024)
349 NIST. 2024 U.S. Artificial Intelligence Safety Institute. See https://www.nist.gov/artificial-intelligence/artificial-
intelligence-safety-institute (accessed 26 February 2024)

CHAPTER FOUR
FIGURE 6
Reproduction of the Gradient of System Access developed by Hugging Face
Internal Community
CONSIDERATIONS
research only research
High risk control Low risk control
Low auditability High auditability
Limited Broader
perspectives perspectives
GATED TO PUBLIC
LEVEL OF ACCESS
Gradual /
Hosted Cloud-based
Fully closed staged Downloadable Fully open
access API access
release
PaLM GPT-2 DALLE•2 GPT-3 OPT BLOOM

(Google) (OpenAI) (OpenAI) (OpenAI) (Meta) (BigScience)
SYSTEM (DEVELOPER)
Gopher Stable Midjourney Craiyon GPT-J

(DeepMind) Diffusion (Midjourney) (Craiyon) (EleutherAI)
(Stability AI)
Imagen
(Google)
Make-A-Video
(Meta)
Source: Hugging Face350.
350 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the 2023 ACM Conference on Fairness,
Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/arXiv.2302.04844)

CHAPTER FOUR
Opportunities for cross-sector collaboration To counter this trend, initiatives like the UK’s
Cross-sector collaboration offers significant Life Sciences Innovative Manufacturing Fund356
“ The freedom,
opportunities, leveraging the innovative and (which includes £17 million in government
innovation and educational strengths of academia with the funding and a private investment of £260
creativity of resources and practical focus of industry351. million), demonstrate how government and
academia with Despite concerns about the patent system private investments can synergistically support
the resource and centralising AI development, it can also foster projects that drive innovation and economic
collaboration. Published patent applications growth357. This collaborative model not only
structure and
enhance technological transparency and fuels technological advancements but also
management provide a revenue stream that can support joint offers a platform for academia to engage in
of the private ventures between universities and industry. cutting-edge research while benefitting from
sector… it’s been industry resources.
completely However, the increasing presence of the
private sector in AI-based science funding Other partnerships could extend beyond
liberating.”
raises concerns that industry’s influence might financial aspects, encompassing joint research
Royal Society shift the focus from fundamental research to projects358, shared publications, and intellectual
interview participant applied science352. This shift could exacerbate exchanges at conferences or through
referring to joint the ‘brain drain’353, where a significant flow informal networks359. They also offer practical
academic-industry roles of AI talent leaves academia for the private engagement opportunities like internships
sector354, driven by higher salaries, advanced and sabbaticals, allowing academics to gain
resources and the opportunity to work on industry experience without departing from
practical applications355. their academic roles360.
351 Wright B et al. 2014 Technology transfer: Industry-funded academic inventions boost innovation. Nature 507,
297–299. https://doi.org/10.1038/507297a
352 Ibid.
353 Kunze L. 2019. Can we stop the academic AI brain drain? KI-Künstliche Intelligenz, 33(1), 1-3. (https://doi.org/10.1007/
s13218-019-00577-2)
354 Gofman M, Jin Z. 2022 Artificial Intelligence, Education, and Entrepreneurship. Journal of Finance, Forthcoming.
(https://doi.org/10.1111/jofi.13302)
355 UK universities alarmed by poaching of top computer science brains. Financial Times. 9 May 2018.
See https://www.ft.com/content/895caede-4fad-11e8-a7a9-37318e776bab (accessed 10 June 2023)
356 Life sciences companies supercharged with £277 million in government and private investment. Gov.UK
See https://www.gov.uk/government/news/life-sciences-companies-supercharged-with-277-million-in-government-
and-private-investment (accessed 26 February 2024)
357 Initial £100 million for expert taskforce to help UK build and adopt next generation of safe AI. Gov.UK
See https://www.gov.uk/government/news/initial-100-million-for-expert-taskforce-to-help-uk-build-and-adopt-next-
generation-of-safe-ai (accessed 26 February 2023
358 Evans JA (2010) Industry induces academic science to know less about more. Am J Sociol 116(2):389–452
359 Perkmann M, Walsh K (2007) University–industry relationships and open innovation: towards a research agenda.
Int J Manag Rev 9(4):259–280 (https://doi.org/10.1111/j.1468-2370.2007.00225.x)
360 Cohen WM, Nelson RR, Walsh JP. 2002 Links and impacts: the influence of public research on industrial R&D.
Manag Sci 48(1):1–23.( https://doi.org/10.1287/mnsc.48.1.1.14273)

CHAPTER FOUR
To optimise the benefits of cross-sector

collaboration, universities and research
institutions can also develop robust IP policies,
fostering an environment where innovation
is protected and can be shared effectively
with industry partners. By creating structured
pathways for collaboration, such as joint
patenting efforts or licensing agreements,
both sectors can contribute to advancing AI
research while addressing challenges like
resource disparities and data privacy.

CHAPTER FIVE

CHAPTER FIVE
Chapter five
Research ethics
and AI safety
Left
Carbon dioxide emissions.
© iStock / janiecbros.

CHAPTER FIVE
Research ethics and AI safety

As the use of AI expands across scientific 1. Data and algorithmic bias
disciplines, new ethical challenges are arising AI systems can have biases embedded in
“ We might not
around the unintended or intended misuse them through training data and algorithmic
want to make of AI364. There is also a growing concern from design. When left unmitigated, algorithmic
some of the the public regarding the fair and ethical use bias can lead to unfair outcomes and
datasets available of their data365 and the extent to which AI- exacerbate inequalities368. The integration of
because of the based tools can propagate harmful biases, AI in medicine has, for example, highlighted
discrimination, and societal harms366. AI Safety how algorithmic biases can lead to
ease of misuse.
risks also need to be considered as it has inaccurate medical diagnoses, inadequate
And that seems become easier to repurpose algorithms for treatment, and exacerbated healthcare
sort of the malicious use367. disparities369,370. If data bias translates into
opposite of what the training data for AI models, there is a
we try to strive Ahead of the Global AI Safety Summit hosted risk that models will not map well on to
by the United Kingdom in 2023, the Royal other communities371 (See Box 6).
for as scientists.
Society and the UK’s Department for Science,
But we may want Innovation and Technology (DSIT) convened Algorithmic harms can also manifest
to think more a horizon scanning workshop on AI safety. in the realm of funding and scholarly
about safety and The following themes emerged as ethical communication. AI tools are used to make
security now.” challenges associated with the use of AI in initial screenings of grant and peer review
scientific research: processes less time intensive372. Among
The Royal Society other applications, it can support reviewers
in identifying false citations373, boost the
quality of papers374 and reduce plagiarism375.
364 Ghotbi N. 2024. Ethics of Artificial Intelligence in Academic Research and Education. In Second Handbook of
Academic Integrity (pp. 1355-1366). (https://doi.org/10.1007/978-981-287-079-7_143-1)
365 Lomas N. 2023 UK court tosses class-action style health data misuse claim against Google Deepmind. Tech Crunch.
19 May 2023. See https://techcrunch.com/2023/05/19/uk-court-tosses-class-action-style-health-data-misuse-claim-
against-google-deepmind (accessed 21 December 2023)
366 Brennan, J. 2023. AI assurance? Assessing and mitigating risks across the AI lifecycle. Ada Lovelace Institute. See
https://www.adalovelaceinstitute.org/report/risks-ai-systems/ (accessed September 30 2023)
367 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2022. Dual use of artificial-intelligence-powered drug discovery. Nature
Machine Intelligence, 4(3), 189-191. (https://doi.org/10.1038/s42256-022-00465-9)
368 UK Parliament POST. 2024. Policy implications of artificial intelligence (AI). https://researchbriefings.files.parliament.
uk/documents/POST-PN-0708/POST-PN-0708.pdf
369 Panch, T., Mattie, H. and Atun, R. 2019. Artificial intelligence and algorithmic bias: implications for health
systems. Journal of global health, 9(2).
370 Celi, L.A et al,. 2022. Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global
review. PLOS Digital Health, 1(3), p.e0000022.
371 Royal Society and Department for Science, Innovation, and Technology workshop on horizon-scanning AI safety
risks across scientific disciplines, 2023.
372 Checco A, Bracciale L, Loreti P, Pinfield S, Bianchi G. 2021 AI-assisted peer review. Humanities and Social Sciences
Communications, 8(1), pp.1-11. (https://doi.org/10.1057/s41599-020-00703-8 )
373 The Royal Society roundtable on Large Language Models, July 2023
375 Heaven D. 2022 AI peer reviewers unleashed to ease publishing grind. Nature. 22 November 2018.
See https://www.nature.com/articles/d41586-018-07245-9 (accessed 27 March 2023)

CHAPTER FIVE
While AI can contribute towards reducing 2. Hallucinations and AI-generated

“first-impression” bias by human reviewers, disinformation
it can also reinforce pre-existing gender, The growing use of general-purpose
language or institutional biases entrenched or foundation models in science (eg
in training datasets376 which can harm generative AI and LLMs) brings about
career progression opportunities for unique considerations around ethics and
underrepresented scholars377. Initiatives safety. For example, while LLMs can be
like the GRAIL project are exploring ethical used to accelerate academic writing, they
principles and best practices for using AI in can also be used to intentionally generate
research funding and evaluation378. scientific disinformation381. Increased
public access to LLMs reduces barriers for
BOX 6
malicious actors to generate convincing
machine-created content that reduces the
Multilingual language models likelihood of human detection382 (see Box 1
on the Royal Society’s red teaming exercise
The development of multilingual language on scientific disinformation).
models can contribute towards reducing
biased outputs. For example, BLOOM The use of LLMs in a scientific project,
(BigScience Language Open-science Open- can also increase exposure to ‘hallucinations’
access Multilingual) is currently the largest – which refers to the generation of
open research language model developed convincing and realistic outputs which do
with the objective of reducing harmful and not correspond to real-world inputs. Even
biased outputs by training it on a smaller when there is no malicious intent, general
selection of higher-quality, multilingual text pre-trained transformer (GPT) technologies
sources. BLOOM was developed by 1000 can fabricate facts, data and citations when
researchers and trained on 46 different responding to a prompt. The rapid surge of
languages and 13 programming languages379. machine-generated disinformation online
Training data is available, and the model has increases the risk that the next generation
been developed under an ethical charter that of models trained on web-scraped data
centres values such as diversity, inclusivity, will degrade in performance and absorb
reproducibility; and aims to foster accessibility, distortions and inaccuracies found in
multilingualism and interdisciplinarity380. fabricated text and data383.
377 Chawla, D.S. 2022, Should AI have a role in assessing research quality?. Nature. (DOI: 10.1038/d41586-022-03294-3)
378 Research on Research Institute. See https://researchonresearch.org/project/grail/ (accessed 5 January 2024)
379 Hugging Face. Documentation of BLOOM. See: https://huggingface.co/docs/transformers/model_doc/bloom
380 Hugging Face. BigScience Ethical Charter. See: https://bigscience.huggingface.co/blog/bigscience-ethical-charter
381 Wang H et al. 2023 Scientific discovery in the age of artificial intelligence. Nature, 620. 47-60. (https://doi.
org/10.1038/s41586-023-06221-2)
report.pdf (accessed March 21 2024)
383 Pan Y, Pan L, Chen W, Nakov P, Kan M Y, Wang W Y. 2023. On the Risk of Misinformation Pollution with Large
Language Models. arXiv preprint (arXiv:2305.13661)

CHAPTER FIVE
For example, Meta’s LLM for science, documented examples of malicious use of AI,
Galactica, was trained on 48 million scientific is the development of chemical and biological
articles, websites, textbooks, and other inputs weapons using AI systems that have
to help researchers summarise the literature, beneficial applications for scientific research.
generate academic papers, write scientific
code and annotate data (eg, molecules and In 2020, the company Collaborations
proteins). However, the demo was paused Pharmaceuticals, a biopharma company that
after three days of use. One of the largest builds ML models to assist drug discovery
risks posed by Galactica was how confidently and the treatment of rare diseases,
it produced false information and the lack of published results on what they have called
guidelines to identify it384. a ‘teachable moment’ regarding the use
of AI-powered drug discovery methods.
As with other forms of misinformation, Following an invitation from the Swiss
hallucinations can erode public trust in Federal Institute for NBC (nuclear, biological,
science385. Methods for AI validation and and chemical) protection, the company
disclosure, such as watermarking or content trained an AI-powered molecule generator
provenance technologies386, are being used for drug discovery to generate toxic
explored to enable the detection of AI- molecules within a specified threshold of
generated content and mitigate potential toxicity390. Drawing from a public database,
harms caused by hallucinations387,as well and in less than 6 hours, the model had
as, to ensure public trust in emerging generated 40,000 molecules. Many of these
AI systems388. molecules were similar or more toxic than
the nerve agent VX, a banned and highly
3. Dual use of AI technologies developed toxic lethal chemical weapon.
for science
The dual use of AI systems refers to situations While the theoretical generation of toxic
in which a system developed for a specific molecules does not imply their production
use is then appropriated or modified for is viable or feasible, the experiment shows
a different use. Malicious use refers to how AI can speed up the process of
applications in which the intent is to cause creating hazardous substances, including
harm389. Among the most prominent and lethal bioweapons391. The company has
384 MIT Review. Why Meta’s latest large language model survived only three days online 2022.
See: https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-
days-gpt-3-science/ (accessed September 30 2023)
385 Bontridder, N. and Poullet, Y., 2021. The role of artificial intelligence in disinformation. Data & Policy, 3, p.e32.
(doi:10.1017/dap.2021.20)
386 The Royal Society. Generative AI, content provenance and a public service internet. See: https://royalsociety.org/
news-resources/publications/2023/digital-content-provenance-bbc/
387 Watermarking refers to techniques that can embed identification information into the original data, model, or content
to indicate provenance or ownership.
388 Partnership on AI. PAI’s Responsible Practices for Synthetic Media. See: https://syntheticmedia.partnershiponai.
org/#read_the_framework (accessed 21 December 2023)
389 Ueno, H. 2023. Artificial Intelligence as Dual-Use Technology. In Fusion of Machine Learning Paradigms: Theory
and Applications (pp. 7-32). Cham: Springer International Publishing. (https://doi.org/10.1007/978-3-031-22371-6_2)
390 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2022. Dual use of artificial-intelligence-powered drug discovery. Nature
Machine Intelligence, 4(3), 189-191. (https://doi.org/10.1038/s42256-022-00465-9)
391 Sohn R. 2022.AI Drug Discovery Systems Might Be Repurposed to Make Chemical Weapons, Researchers Warn.
Scientific American [Internet]. 21 April 2022. See https://www.scientificamerican.com/article/ai-drug-discovery-
systems-might-be-repurposed-to-make-chemical-weapons-researchers-warn/ (accessed 21 December 2023)

CHAPTER FIVE
called for several actions to address 5. Environmental costs of using AI systems

security risks, as well as to monitor, address The collection, analysis, storage and sharing
and limit malicious applications of AI models of data required for AI-based systems
used in science392. has a significant environmental impact399.
For example, storing a terabyte of data
4. Data poisoning and adversarial machine is estimated to consume 10kg of carbon
learning attacks dioxide annually400, while training a ChatGPT-
Measures to enhance AI safety in science style LLM can create 550 tonnes of carbon
need to consider the development dioxide emissions401. It is estimated that the
of robust models that can withstand global greenhouse gas emissions of data
adversarial data-based attacks393. Training centres are the same as the emissions of US
AI models on vast, and poorly curated commercial aviation, and as datasets and
datasets, creates vulnerabilities to instances models get larger, this is likely to increase.
of ‘data poisoning’, ‘false data injections’ or
‘one-pixel’ attacks394. To mitigate the negative impacts of climate
change, these systems will need to meet
These tactics involve inserting noisy, the principle of energy proportionality402 and
incorrect, or manipulated data to deceive environmentally sustainable computational
machine learning systems while remaining science (ESCS) best practices. Other
imperceptible or hard to detect for developments to improve the environmental
humans395,396,397. ‘Poisoned’ or manipulated sustainability of AI-based tools include:
datasets are one of the most common and • Integration of green computing techniques
documented attacks to the reliability of into research methods403
AI systems398.
392 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2022. Dual use of artificial-intelligence-powered drug discovery.
Nature Machine Intelligence, 4(3), 189-191. (https://doi.org/10.1038/s42256-022-00465-9)
393 Collins K et al. 2023. ‘Human Uncertainty in Concept-Based AI Systems.’ Paper presented at the Sixth AAAI/ACM
Conference on Artificial Intelligence, Ethics and Society (AIES 2023), August 8-10, 2023. Montréal, QC, Canada.
394 Su J, Vargas D V, Sakurai K. 2019 One pixel attack for fooling deep neural networks. IEEE Transactions on
Evolutionary Computation, 23(5), pp.828-841. (DOI: 10.1109/TEVC.2019.2890858)
395 Verde L, Marulli F, Marrone S. 2021 Exploring the impact of data poisoning attacks on machine learning model
reliability. Procedia Computer Science, 192, pp.2624-2632. (https://doi.org/10.1016/j.procs.2021.09.032)
396 Xu Y at al. 2021 Artificial intelligence: A powerful paradigm for scientific research. The Innovation, 2(4).
(https://doi.org/10.1016/j.xinn.2021.100179)
397 Liu Y, Ning P, Reiter M K. 2011 False data injection attacks against state estimation in electric power grids. ACM
Transactions on Information and System Security (TISSEC), 14(1), pp.1-33. (https://doi.org/10.1145/1952982.1952995)
398 Verde L, Marulli F, Marrone S. 2021 Exploring the impact of data poisoning attacks on machine learning model
reliability. Procedia Computer Science, 192, pp.2624-2632. (https://doi.org/10.1016/j.procs.2021.09.032)
399 Henderson P, Hu J, Romof J, Brunskill E, Jurafsky D, Pineau J (2020) Towards the systematic reporting of the energy
and carbon footprints of machine learning. J Mach Learn Res 21(248):1–43
400 Lannelongue L et al. 2023 GREENER principles for environmentally sustainable computational science.
Nat Comput Sci 3, 514–521. (doi.org/10.1038/s43588-023-00461-y)
401 Patterson D et al. 2021 Carbon emissions and large neural network training. arXiv preprint. (doi.org/10.48550/
arXiv.2104.10350)
403 Chithra, J, Vijay, A, & Vieira, D 2014. A study of green computing techniques. Int. J. Comput. Sci. Inf. Technol.

CHAPTER FIVE
• Certification standards for sustainable lab 6. Human cost of training AI systems

practices. Expanding ‘Green Lab’ grassroots The development and use of AI tools relies
networks in conjunction with institutional on a critical but often invisible human
policies can reshape researcher norms404. infrastructure. Even though human labour
is essential for AI deployment, in some
• Funders can mandate carbon reporting
cases it remains underappreciated under
and support upgrades like energy-efficient
the guise of ‘automation’. Technology critic,
hardware in AI-based research projects405.
Astra Taylor argues that the discourse of
• Governmental mandates and regulatory automation can be used to marginalise
pressure – the EU’s carbon reporting rules certain contributors to the scientific process
now cover 50,000+ companies406. (eg women or ghost workers) and justify
cost-cutting measures without addressing
• Using AI to optimise and minimise the
issues of equity and fairness409,410.
environmental impact of research methods.
For example, immersive technologies
The impact on labour can span from shifts
provide virtual experiences, minimising
in the labour market411 to the exploitation of
on-site damage and visualisation which
data workers that power large AI systems412.
can bolster preparation efficiency407,408.
Interrogating the organisation of labour can
contribute towards generating accountability
to develop responsible AI supply chains413.
Addressing AI ethics in scientific research

There are opportunities for the scientific
community (from scientists to system developers
and funders)414 to proactively consider strategies
to monitor, anticipate and respond to unforeseen
harms caused by the use of AI systems415.
404 Green Your Lab Network. See: https://network.greenyourlab.org/ (accessed March 21 2024)
405 The Royal Society roundtable on AI and climate science, June 2023.
406 More than 50,000 companies to report climate impact in EU after pushback fails. Financial Times. 18 October 2024.
See: https://www.ft.com/content/a3216188-8e50-4a62-a8d9-e89172b3ddc7 (accessed March 21 2024)
410 Taylor A. 2018 The Automation Charade. LogicMag. 1 August 2018. See: https://logicmag.io/05-the-automation-
charade/ (accessed February 28 2024)
411 World Economic Forum, These are the jobs most likely to be lost – and created – because of AI.
See: https://www.weforum.org/agenda/2023/05/jobs-lost-created-ai-gpt/ (accessed February 24 2024)
412 Barret, M. The dark side of AI: algorithmic bias and global inequality. See: https://www.jbs.cam.ac.uk/2023/the-dark-
side-of-ai-algorithmic-bias-and-global-inequality/ (accessed December 10 2023)
414 Wang H et al. 2023 Scientific discovery in the age of artificial intelligence. Nature, 620. 47-60.
(https://doi.org/10.1038/s41586-023-06221-2)
415 Kazim E, Koshiyama A S. 2021 A high-level overview of AI ethics. Patterns, 2(9). (https://doi.org/10.1016/j.patter.2021.100314)

CHAPTER FIVE
Drawing from interviews and roundtable Further domain-specific guidance is needed

discussions, the following measures were to ensure scientists across domains and
suggested to ensure the ethical application sectors can make informed decisions when
of AI across sectors: integrating AI into their work.
• Domain-specific taxonomy for harms:
• Communication and knowledge sharing:
Establish audits, impact assessments or
Drawing from the United Nations Office of
evaluation frameworks to understand socio-
Disarmament Affairs420, US-based private
technical harms stemming from different
sector companies (OpenAI, Anthropic,
fields. Examples include the taxonomy
Microsoft, Hugging Face), and civil society
published by DeepMind listing different
have put forward a proposal to improve
types of human-computer interaction,
trust through confidence-building measures,
environmental and socioeconomic harms416.
such as communication and coordination,
Another example is the multi-stakeholder
observation and verification, cooperation
framework developed to evaluate the Social
and integration, and transparency421.
Impact of Generative AI in Systems and
Society. It accounts for harms related to • Sanctions and restrictions: Explore
representation, cultural values and sensitive the regulation of specific software and
content, performance, privacy and data applications in industry and academia, and
protection, financial costs, environmental the viability of limiting access to tools and
costs, and labour costs417. models with high potential for misuse422.
• Ethical guidelines and reviews: Ethical • Public engagement: Explore new

guidelines and codes of conduct can governance approaches to engage affected
guide design of AI models used for publics in the co-construction of constraints
science, prevent misuse and establish and guardrails. A strategy to communicate
best practices. Examples include Hague risk to the public also needs to be
Ethical Guidelines418, which promote a considered, while preventing a general loss
code of conduct to guard against the of trust in science.
misuse of chemistry research, or UNESCO’s
guidelines for Ethical Artificial Intelligence,
the first-ever global standard on AI ethics
aimed at maximising the benefits and
minimising the downside risks of the
use of AI for scientific discoveries419.
416 Weidinger L et al. 2022 Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM
Conference on Fairness, Accountability, and Transparency (pp. 214-229). (https://doi.org/10.1145/3531146.3533088)
417 Solaiman, I et al. 2023. Evaluating the Social Impact of Generative AI Systems in Systems and Society. arXiv preprint
(arXiv:2306.05949)
418 The Organisation for the Prohibition of Chemical Weapons (OPCW). The Hague Ethical Guidelines.
See: https://www.opcw.org/hague-ethical-guidelines (accessed 28 February 2024)
419 UNESCO Recommendation on ethics of artificial intelligence. 2022. See: https://www.unesco.org/en/articles/
recommendation-ethics-artificial-intelligence (accessed 6 February 2024)
420 United Nations, Office of Disarmament Affairs. Confidence Building Measures. See: https://disarmament.unoda.org/
biological-weapons/confidence-building-measures/ (accessed 21 December 2023)
421 Shoker S et al. 2023 Confidence-building measures for artificial intelligence: Workshop proceedings. arXiv preprint
(arXiv:2308.00862)
422 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2023 Preventing AI From Creating Biochemical Threats.
Journal of Chemical Information and Modeling, 63(3), 691-694 (https://doi.org/10.1021/acs.jcim.2c01616)

CHAPTER FIVE
CASE STUDY 3
AI and climate science
As AI and ML capabilities become further If done successfully, this fusion of datasets can
integrated into climate science research and improve the accuracy of models and estimates,
applications, they are expanding the capacity contributing to long-term weather forecasting,
of scientists and policy makers to mitigate the which supports disaster preparedness and
climate crisis423. resource management for extreme events427.
Opportunities for AI in climate science research Other AI techniques have demonstrated

Realising the potential of climate science data effectiveness in forecasting global mean
can be difficult due to the heterogeneous temperature changes428, predicting climatic
nature of environmental data. This presents phenomena like El Niño429, cloud systems430,
a challenge in linking data together due and regional weather patterns, such as
to misaligned data across spatiotemporal rainfall in specific areas431. For instance, a
resolutions; varying privacy and security levels 2023 Nature paper showed an AI model had
(particularly in relation to satellite imaging); and predicted weather better than the world’s
a lack of regulation424. most advanced forecasting system432, soon
after, DeepMind’s ML approach surpassed
DL techniques can interpolate measurement even that benchmark433. The Royal Society’s
gaps for intricate pattern recognition in fields like 2020 report, Digital technology and the
space weather forecasting, a technique used planet: Harnessing computing for net zero,
in NASA’s Centre for Climate Simulation425, 426. also outlines the role AI can play in achieving
global net zero ambitions434.
423 Huntingford C, Jeffers E S, Bonsall M B, Christensen H M, Lees T, Yang H. 2019 Machine learning and artificial
intelligence to aid climate change research and preparedness. Environmental Research Letters, 14(12), 124007.
(DOI 10.1088/1748-9326/ab4e55)
425 Kadow, C, Hall, DM, Ulbrich, U, 2020. Artificial intelligence reconstructs missing climate information. Nature
Geoscience, 13, pp.408-413. (https://doi.org/10.1038/s41561-020-0582-5)
426 NASA Centre for Climate Simulation. See https://www.nccs.nasa.gov/news-events/nccs-highlights/
acceleratingScience. (Accessed 21 December 2023)
427 Buizza, C at al 2022. Data learning: Integrating data assimilation and machine learning. Journal of Computational
Science, 58, p.101525. (https://doi.org/10.1016/j.jocs.2021.101525)
428 Ise, T, Oba, Y. 2019. Forecasting climatic trends using neural networks: an experimental study using global historical
data. Frontiers in Robotics and AI, 32. (https://doi.org/10.3389/frobt.2019.00032)
429 Ham, YG, Kim, JH, Luo, JJ. 2019. Deep learning for multi-year ENSO forecasts. Nature, 573, 568-572.
(https://doi.org/10.1038/s41586-019-1559-7)
430 Rasp, S, Pritchard, MS., & Gentine, P. 2018. Deep learning to represent sub grid processes in climate
models. Proceedings of the National Academy of Sciences, 115, 9684-9689. (https://doi.org/10.1073/pnas.181028611)
431 Zheng, G, Li, X, Zhang, RH, Liu, B 2020. Purely satellite data–driven deep learning forecast of complicated tropical
instability waves. Science advances, 6, eaba1482. (DOI: 10.1126/sciadv.aba1482)
432 Bi, K, Xie, L, Zhang, H, Chen, X, Gu, X, Tian, Q. 2023. Accurate medium-range global weather forecasting with 3D
neural networks. Nature, 619, 533-538. (https://doi.org/10.1038/s41586-023-06185-3)
433 Wong, C. 2023. DeepMind AI accurately forecasts weather-on a desktop computer. Nature. 14 November 2023
(https://doi.org/10.1038/d41586-023-03552-y)

CHAPTER FIVE
AI for climate preparedness and The European Space Agency’s Destination

decision-making Earth is developing a digital twin that can
Adopting a systems approach435 to AI in generate rich data flows to enable a ‘control
climate science could enable more effective loop’ for the planet’s emissions439. This can
climate decision-making436. AI-driven climate facilitate monitoring of the natural and human
models not only simulate complex systems, activity contributing to climate change, allow
but reinforced learning can also contribute experts to anticipate and plan for extreme
to sustainable policymaking through the events and adapt policies to climate related
systematic evaluation of climate actions, challenges440. Real-world applications, like
trade-offs, and risks437. An example of this Tuvalu’s digital twin of the island nation, to
includes digital twins, a virtual representation safeguard its existence against sea level rise
of a physical asset which can be used demonstrates the promise and the vital need
to understand, predict, and optimise the for this technological development441.
performance of this asset438.
435 The Royal Academy of Engineering 2020 Net Zero: A systems perspective on the climate challenge.
See raeng.org.uk/publications/reports/net-zero-a-systems-perspective-on-the-climate-chal
(accessed 14 October 2020)
436 The Royal Society. 2021 Computing for net zero: how digital technology can create a ’control loop for the protection
of the planet’. See https://royalsociety.org/-/media/policy/projects/climate-change-science-solutions/climate-science-
solutions-computing.pdf (accessed 21 December 2023)
437 Abrell J, Kosch M, Rausch S (2019) How effective was the UK carbon tax?—A machine learning approach to policy
evaluation. SSRN Scholarly Paper ID 3372388. Social Science Research Network, Rochester. 10.2139/ssrn.3372388
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023)
440 The European Space Agency. Destination Earth. See https://www.esa.int/Applications/Observing_the_Earth/
Destination_Earth. (accessed 21 December 2023)
441 Accenture. Case Study: Tuvalu. See https://www.accenture.com/us-en/case-studies/technology/tuvalu.

CHAPTER FIVE
Research ethics challenges in climate science 3. Global funding and grants

AI-based climate science holds great potential There is a significant global disparity in
for solving global climate change challenges. funding and grant distribution. An analysis
However, this potential comes with challenges into energy and climate research funding
to research ethics. Identifying the risks and between 1990 and 2020 global disparity in
minimising pitfalls remains vital to maximising funding distribution with Western countries
positive impact. (European Commission, UK, and US)
receiving most of the funding444. Notably,
1. Environmental costs of AI the paper found that no research institution
Running complex simulations can have from Africa ranked among the top 10 funded
high carbon footprints from the immense institutions. Increased investment in AI
computational power required, potentially initiatives for underrepresented regions
offsetting intended environmental benefits is needed to support capacity-building
of climate science research if sustainability and fostering of climate scientists in the
practices are not integrated442. Global South445.
2. Data bias and inaccuracies 4. Sensitive data sharing

Bias and inaccuracies in training data Environmental data can contain
presents another key challenge. For sensitive information, include private or
instance, models disproportionately trained personal details that can be linked to
on Western populations risk overlooking specific, nonconsenting individuals or
issues in the Global South and entrenching communities446. High-resolution spatial
unfairness for disadvantaged regions. data, and digital traces pose privacy risks,
Ongoing participation, auditing, and which are magnified when researchers
updating of systems with localised data lack cultural understanding and sensitivity
is critical for equity443. towards different communities. For example,
while well intentioned, the digitisation of
land records to increase researcher access
to data can result in private actors like
landowners with more financial resources
to capitalise on this new data447.
442 Henderson P, Hu J, Romof J, Brunskill E, Jurafsky D, Pineau J 2020. Towards the systematic reporting of the energy
and carbon footprints of machine learning. J Mach Learn Res 21:1–43
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/ (accessed 7 May 2024)
444 AbdulRafiu A, Sovacool B K, Daniels C. 2022 The dynamics of global public research funding on climate change,
energy, transport, and industrial decarbonisation. Renewable and Sustainable Energy Reviews, 162, 112420.
(https://doi.org/10.1016/j.rser.2022.112420)
445 Grantham Research Institute on Climate Change and the Environment. What opportunities and risks does AI present
for climate action? See: https://www.lse.ac.uk/granthaminstitute/explainers/what-opportunities-and-risks-does-ai-
present-for-climate-action/
446 Zipper S C et al. 2019 Balancing open science and data privacy in the water sciences. Water Resources Research,
55, 5202-5211.(https://doi.org/10.1029/2019WR025080)
447 Donovan K P. 2012 Seeing like a slum: Towards open, deliberative development. Georgetown Journal of
International Affairs.

CHAPTER FIVE
Environmentally sensitive data can also 2. Improving global researcher access to data
adversely impact the environment448. For The disparity in researcher access to
example, sharing biodiversity data, such as data raises concerns about the equitable
nesting locations of rare birds, can lead to development and application of AI452.
bad actors harming those environments449. This could hinder the development of
effective climate solutions tailored to the
Strategies for ethical AI-based research unique challenges of specific communities.
practices in climate science Networks such as the Pacific Community’s
1. Pursuing energy proportionality Statistics for Development Division can
Develop strategies to ensure that promote equitable access to data across
technologies developed in pursuit of net diverse contexts, fostering collaboration
zero deliver environmental benefits that and knowledge sharing453. Similarly, the
outweigh their emissions450. Interdisciplinary establishment of trusted data institutions
research on carbon accounting and impact can contribute towards enhancing data
assessment tools like the Green Algorithms sharing and usage to address emergencies
Project451 can contribute towards evaluating and crises454 455.
and mitigating the environmental impact
of computational processes used in
3. Contextualising data governance
climate science.
Universal approaches to open data do not
always engage with minority groups’ rights
and interests. Existing data sharing principles
like FAIR (findable, accessible, interoperable,
reusable)456 can be complemented by
people and purpose-oriented governance
principles like the CARE Principles for
Indigenous Data Governance (collective
benefit, authority to control, responsibility,
ethics) that considers a broader approach
to sensitive data457.
448 NBN. Sensitive Data. See https://nbn.org.uk/sensitive-data/ (accessed 6 March 2024)

449 Ibid.
451 Green Algorithms Project. See https://www.green-algorithms.org/ (accessed 21 December 2023)
453 Pacific Community. Statistics for Development Division. See https://sdd.spc.int/ (accessed 6 March 2024)
454 The Royal Society. 2023 Creating resilient and trusted data systems. See https://royalsociety.org/topics-policy/
projects/data-for-emergencies/ (accessed 21 December 2023).
455 Global Partnership on Artificial Intelligence. 2023 Designing Trustworthy Data Intuitions - Scanning the Local Data
Ecosystem in Climate-Induced Migration in Lake Chad Basin - Pilot Study in Cameroon. See: https://gpai.ai/projects/
data-governance/ (accessed 6 March 2024)
456 Wilkinson M. D., et al. 2016 The FAIR Guiding Principles for scientific data management and stewardship. Scientific
data, 3(1), 1-9. (doi: 10.1038/sdata.2016.18.)
457 Global Indigenous Data Alliance. Care Principles for Indigenous Data Governance. See https://www.gida-global.org/
care (accessed 21 December 2023)

CONCLUSION

CONCLUSION
Conclusion
Left
Graphcore Stereo Image Matching
Benchmark, October 2015.

CONCLUSION
Conclusion
As explored throughout the report, the Moving forward, and according to the findings
applications of AI in scientific research are of this report, three areas of action require
bringing a new age of possibilities and attention from scientific communities and
challenges. The transformative potential relevant policy makers.
of AI, fuelled by big data and advanced
techniques, offers substantial opportunities The first is to address issues of access and
across domains. From mapping deforestation capability to use AI in science. Access to
to aiding drug discovery and predicting computing resources, high quality datasets,
rare diseases, the applications are vast and AI tools and relevant expertise is critical to
promising. Through the case studies on achieve scientific breakthroughs. At the time of
climate science, material science, and rare publication, access to essential infrastructures
disease diagnosis, this report envisions a remained unequally distributed. This, coupled
future in which AI can be a powerful tool for with a growing influence of the private
scientific researchers. sector as highlighted in Chapter 4 can have
implications on the future of university-based
However, these opportunities bring about a AI research. Another challenge in this area is
series of challenges related to reproducibility, knowledge siloes between AI experts and
interdisciplinary collaboration, and ethics. scientific domain experts (Chapter 3). To ensure
Finding a balance in which scientists can equitable distribution of AI across research
harness the benefits of automation and the communities, actions need to go beyond
accelerated pace of discovery while ensuring facilitating access, and focus on enhancing
research integrity and responsible use of AI capabilities to collaborate, co-design and
will be essential. Following the Royal Society’s use AI across different scientific fields and
commitment to ensuring science – and this research environments.
case AI – is applied for the benefit of humanity,
the report calls for collective efforts in Second, open science principles and practices
addressing these challenges. offer a clear pathway to improve transparency,
reproducibility, and public scrutiny – all of
which have proven challenging in AI-based
scientific projects. As stressed in Chapter 2,
the stakes of not addressing these issues are
high, posing risks not just to science but also
to society if the deployment of unreliable or
erroneous AI-based outputs leads to harms.
Further work is needed to understand the
interactions between open science and AI for
science and how to best minimise safety and
security risks stemming from the open release
of models and data.

CONCLUSION
Third, as AI’s role expands in science, ethical

and safety considerations need to be centred
in its design and implementation (Chapter
5). The growing reliance on large datasets,
prompts questions about the potential
misuse of sensitive information and biases
that could perpetuate inequalities or lead
to incorrect conclusions. The autonomous
nature of AI systems also introduces safety
risks, especially in fields like healthcare or
environmental monitoring, where errors
could have severe consequences; or in
fields such as chemistry and biology, where
datasets and models can be repurposed with
malicious intent. Addressing these challenges
requires interdisciplinary collaboration and
building scientists’ capacity to anticipate
risk and provide oversight that minimises
potential harms.
Looking ahead, further exploration by the

scientific community and policymakers is
needed to understand the implications of AI
on the future of science. Questions about
how universities can adapt training and skill
requirements, how funders can continue to
support non-AI scientific work and how to
optimise AI for environmental sustainability
are key to understand the impact of this
on technology in science, society, and on
the planet.

APPENDICES

APPENDICES
Appendices
Left
Microsoft Research ResNet-18
Training, April 2017.

APPENDICES
APPENDIX 1
List of figures and boxes

Figure 1 Reproduction of the three general roles of AI for scientific research
as either a computational microscope, resource of human inspiration,
or an agent of understanding 31
Figure 2 Patent filing trends of AI-related technological inventions

in the last 10 years 67
Figure 3 Global distribution of the number of AI-related patent families

by 1st priority country 68
Figure 4 Global Market Shares of Machine Learning in the Life Sciences,

by Region, 2021 (%) 70
Figure 5 European Market Shares of Machine Learning in the Life Sciences,

by Country, 2021 (%) 71
Figure 6 Reproduction of the Gradient of System Access developed

by Hugging Face 77
Box 1 Explainability and interpretability 42
Box 2 Robustness and generalisability in ML 47
Box 3 Insights from the Royal Society & Humane Intelligence red-teaming
exercise on AI-generated disinformation content 57
Box 4 IP Pragmatics 2023: Global patent landscape analysis 66
Box 5 The role of the private sector in patenting medicine and

pharmaceutical inventions 74
Box 6 Multilingual language models 83

APPENDICES
APPENDIX 2
Further details on methodology

Summary of research activities Commissioned evidence-gathering
• Three commissioned research projects and reviews
including a historical review on the role • Penn J, 2024. Historical review on the role
of disruptive technologies in transforming of disruptive technologies in transforming
science, a taxonomy of the use of artificial science and society.
intelligence in science, technology,
• Berman B, Chubb J, and Williams K, 2024.
engineering and medicine, and a patent
The use of artificial intelligence in science,
landscape review of artificial intelligence
technology, engineering, and medicine.
and related inventions.
• IP Pragmatics, 2024. Artificial intelligence
• 30+ semi-structured interviews
related inventions.
• Four roundtables on the topics of
reproducibility, interdisciplinarity, climate
science research and the impact of large
language models (LLMs) in science.
• Horizon scanning exercise on AI risks for

science co-organised with the Department
of Science, Innovation and Technology (DSIT)
• International US-UK Scientific Forum on

Researchers Access to Data, co-hosted by
the Royal Society and the National Academy
of Science

APPENDICES
Event and research activities

The Royal Society would like to thank all those who contributed to the development of this
project, in particular through participation in the following events.
30+ interviews, August 2022 – June 2023

Royal Society staff interviewed scientists and researchers across scientific disciplines on
emerging themes and technologies in their fields.
Roundtable on immersive technologies in scientific research, June 2022

The Royal Society hosted a roundtable at the University of Exeter, as part of the ‘Creating
Connections’ events series that convened academics and industry professionals from the
Southwest of England to discuss the policy priorities for the use of immersive technologies
in scientific research. The roundtable was chaired by Professor Samuel Vine, Professor of
Psychology at the University of Exeter. The key topics discussed were the challenges faced
by industry and academic researchers working with immersive technologies including training,
sustainability, and access to markets.
Roundtable on reproducibility, April 2023

The Royal Society’s roundtable on the challenges of reproducibility in AI-based scientific research
provided insights from Professor Sabina Leonelli and Joelle Pineau, and multiple reproducibility,
computer science and open science experts.
Name Organisation
Dorothy Bishop FRS University of Oxford
Odd Erik Gunderson Norwegian University of Science and Technology; Aneo
Sayash Kapoor Princeton University
Mark Kelson University of Exeter
Rebecca Kirk PLOS
Sabina Leonelli University of Exeter
Ralitsa Madsen University of Dunde; UK Committee on Research Integrity
Victoria Moody JISC
Joelle Pineau McGill University; Meta AI
Susanna-Assunta Sanson University of Oxford
Malvika Sharan Alan Turing Institute; Open Life Science
Joaquin Vanschoren Eindhoven University of Technology; OpenML

APPENDICES
Roundtable on AI and climate science, June 2023

The Royal Society convened a roundtable for climate and data scientists to explore the role of
AI in climate science research, share insights, challenges, and innovative ideas for the future of
this field. Dame Professor Jane Francis FRS provided insights from role as director of the British
Antarctic Survey.
Name Organisation
Anna-Louise Ellis Met Office
Jane Francis FRS British Antarctic Survey
Anna Hogg University of Leeds
Scott Hosking British Antarctic Survey; The Alan Turing Institute
Konstantin Klemmer Microsoft Research
Joycelyn Longdon University of Cambridge; Climate in Colour
Shakir Mohamed Google DeepMind
Alistair Nolan OECD
Tim Palmer University of Oxford
Suman Ravuri Google DeepMind
Emily Shuckburgh University of Cambridge
Philip Stier University of Oxford
Dave Topping University of Manchester
Richard Turner University of Cambridge

APPENDICES
Roundtable on interdisciplinarity, July 2023

The Royal Society convened a roundtable on the role of interdisciplinarity in AI-driven scientific
research. The roundtable provided a comprehensive exploration of interdisciplinarity’s pivotal
role in navigating the transformative landscape of AI-driven scientific research, featuring insights
from Professor Alison Noble FRS, prominent experts and organisations across academia and the
private sector.
Name Organisation
Ankit Agrawal Northwestern University
Seth Baum Global Catastrophic Risk Institute
Michael Castelle University of Warwick
Claude Chelala Queen Mary University of London
Gareth Conduit University of Cambridge
James Dracott UKRI
Victoria Henickx KU Leuven
Georgios Leontidis The University of Aberdeen
Alison Noble FRS University of Oxford
Alistair Nolan OECD
Bradley Love University College London
Cecilia Mascolo University of Cambridge
Raffaella Mulas Vrije Universiteit Amsterdam
Mirco Musolesi Univerity College London
Daniele Quercia King's College London; Nokia Bell Lab Cambridge
Verena Reiser Google DeepMind
Reuben Shipway University of Plymouth
Tommaso Venturini University of Geneva; CNRS
Hujun Yin University of Manchester

APPENDICES
Roundtable on large language models and scientific research, July 2023

The Royal Society convened a roundtable on opportunities and risks of using LLMs in scientific
research in which Professor Andrew Blake FRS and Gary Marcus provided opening remarks.
The roundtable on the use of Large Language Models (LLMs) in scientific research presented
both the positive potential and challenges associated with their integration. Participants
stressed the importance of developing strategies to mitigate risks and ensuring a balanced
approach to the integration of LLMs in research as crucial for the responsible advancement
of AI technologies.
Name Organisation
Seth Baum Global Catastrophic Risk Institute
Andrew Blake FRS Scientific advisor and AI consultant, University of Cambridge
Phil Blunsom University of Oxford
Anthony Cohn University of Leeds; The Alan Turing Institute
Jeff Dalton University of Glasgow
Yarin Gal University of Oxford
Gabe Gomes Carnegie Mellon University
Andres Guadamuz University of Sussex
Atoosa Kasirzadeh University of Edinburgh
Samuel Kaski University of Manchester; Aalto University
Hannah Kirk University of Oxford; The Alan Turing Institute
Gary Marcus New York University
Jessica Montgomery University of Cambridge
Denis Newman-Griffis University of Sheffield
Alistair Nolan OECD
Johan Ordish Medicines and Healthcare products Regulatory Agency (MHRA)
Michael Osborne University of Oxford; Mind Foundry
Matthias Rillig Freie Universität Berlin
Edward Tian GPT Zero
Michael Woolridge University of Oxford
US-UK Scientific Forum on Researcher Access to Data, September 2023

The Forum addressed the evolution of researcher access to data; best practices and lessons learned
from fields that are on the forefront of data sharing (ie climate studies, astrophysics, biomedicine);
and challenges related to pressing societal problems such as online information (and misinformation),
modelling for pandemics, and using data in emergencies (See https://www.nasonline.org/programs/
scientific-forum/scientificforum/researcher-access.html for more information)

APPENDICES
Workshop on horizon scanning AI safety risks across scientific disciplines, October 2023
Ahead of the Global AI Safety Summit, being organised by the UK Government, the Royal Society
will be hosting an official pre-Summit workshop in partnership with the Department for Science,
Innovation and Technology. The event brought together senior scientists from academia and
industry to horizon-scan the risks associated with AI across scientific disciplines.
Name Organisation
Alessandro Abate University of Oxford
Andrew Blake FRS Scientific advsior and AI consultant, University of Cambridge
Craig Butts University of Bristol
Lee Cronin University of Glasgow
Gwenetta Curry University of Edinburgh
Christl Donnelly FRS Imperial College London
Anthony Finkelstein City, University of London
Jacques Fleuriot University of Edinburgh
Ben Glocker Imperial College London
Julia Gog University of Cambridge
Cathy Holloway University College London
Caroline Jay University of Manchester
Alexander Kasprzyk University of Nottingham
Frank Kelly FRS Imperial College London
Georgia Keyworth Department for Science, Innovation and Technology
Bradley Love University College London
Carsten Maple University of Warwick
Alexandru Marcoci Centre for the Study of Existential Risk, University of Cambridge
Chris Martin Department for Science, Innovation and Technology
Cecilia Mascolo University of Cambridge
Emran Mian Department for Science, Innovation and Technology
Daniel Mortlock Imperial College London
Gina Neff University of Oxford
Cassidy Nelson Centre for Long Term Resilience
Alistair Nolan OECD
Abigail Sellen FRS Microsoft Research Cambridge
Karen Tingay Office for Statistics Regulation
Daniel Tor Department for Science, Innovation and Technology
Hujun Yin University of Manchester

APPENDICES
Name Organisation
Steven Abel Durham University
Paul Beasley Siemens
Viscount Camrose House of Lords, DSIT
Sarah Chan University of Edinburgh
Linjiang Chen University of Birmingham
Peter Falkingham Liverpool John Moores University
Tom Fiddian Innovate UK
Michael Fisher University of Manchester
Seraphina Goldfarb-Tarrant Cohere
Sabine Hauert University of Bristol
Richard Horne British Antarctic Survey
Scott Hosking British Antarctic Survey; The Alan Turing Institute
Rohan Kemp Department for Science, Innovation and Technology
Ottoline Leyser UK Research and Innovation
Richard Mallah Future of Life Institute
Thomas Nowotny University of Sussex
Yannis Pandis Pfizer
Maria Perez-Ortiz University College London
Nathalie Pettorelli Zoological Society of London
Reza Razavi King’s College London
Yvonne Rogers FRS University College London
Sophie Rose Centre for Long Term Resilience
Stuart Russell UC Berkeley, Future of Life Institute
Rossi Setchi Cardiff University
Nigel Shadbolt FRS University of Oxford
Shaarad Sharma Government Office for Science
Mihaela van der Schaar University of Cambridge
Mark Wilkinson University of Sheffield
Study on red teaming LLM’s for resilience to scientific disinformation, October 2023
Ahead of the Global AI Safety Summit, being organised by the UK Government, the Royal Society
and Humane Intelligence brought together 40 postgraduate students in health and climate
sciences to scrutinise how potential vulnerabilities in LLMs (Meta’s Llama 2) could enable the
generation and spread of scientific misinformation (See Royal Society website for more information).

APPENDICES
APPENDIX 3
Acknowledgements
Working Group members
The members of the Working Group involved in this report are listed below. Members acted in
an individual and not a representative capacity and declared any potential conflicts of interest.
Members contributed to the project based on their own expertise and good judgement.
Chair
Professor Alison Noble CBE FREng FRS, Foreign Secretary of the Royal Society,
and Technikos Professor of Biomedical Engineering, University of Oxford.
Members
Professor Paul Beasly, Head of Research and Development, Siemens.
Dr Peter Dayan FRS, Director, Max Plack Institute for Biological Cybernetics.
Professor Sabina Leonelli, Professor of Philosophy and History of Science, University of Exeter.
Alistair Nolan, Senior Policy Analyst, Organisation for Economic Co-operation and Development.
Dr Philip Quinlan, Director of Health Informatics, University of Nottingham.
Professor Abigail Sellen FRS, Distinguished Scientist and Lab Director, Microsoft Research.
Professor Rossi Setch, Professor in High Value Manufacturing, Cardiff University.
Kelly Vere, Director of Technical Strategy, University of Nottingham
Royal Society staff
Royal Society secretariat

Denisse Albornoz, Senior Policy Adviser and Project Lead
Eva Blum-Dumontet, Senior Policy Adviser (until July 2023)
Areeq Chowdhury, Head of Policy, Data and Digital Technologies
Nicole Mwananshiku, Policy Adviser
Dr Kyle Bennett, Fast Stream placement
Rebecca Conybeare, UKRI placement
Caroline Gehin, UKRI placement

APPENDICES
Reviewers
This report has been reviewed by expert readers and by an independent Panel of experts, before
being approved by Officers of the Royal Society. The Review Panel members were not asked to
endorse the conclusions or recommendations of the report, but to act as independent referees of
its technical content and presentation. Panel members acted in a personal and not a representative
capacity. The Royal Society gratefully acknowledges the contribution of the reviewers.
Reviewers
Dr Yoshua Bengio FRS, Professor at University of Montreal and Scientific Director of MILA
Ruhi Chitre, Ezra Clark, Tiffany Straza and Ana Persic (Natural Sciences Sector);
and Irakli Khodeli (Social and Human Sciences Sector), UNESCO
Dr Rumman Chowdhury, CEO and Founder of Humane Intelligence. 2024 US Science Envoy.
Professor Tony Hey, Honorary Senior Data Scientist at Rutherford Appleton Laboratory.
Co-author of Artificial Intelligence For Science: A Deep Learning Revolution

The Royal Society is a self-governing Fellowship of many
of the world’s most distinguished scientists drawn from all
areas of science, engineering, and medicine. The Society’s
fundamental purpose, as it has been since its foundation
in 1660, is to recognise, promote, and support excellence
in science and to encourage the development and use of
science for the benefit of humanity.
The Society’s strategic priorities emphasise its commitment

to the highest quality science, to curiosity-driven research,
and to the development and use of science for the benefit
of society. These priorities are:
• The Fellowship, Foreign Membership and beyond
• Influencing
• Research system and culture
• Science and society
• Corporate and governance
For further information

The Royal Society
6 – 9 Carlton House Terrace
London SW1Y 5AG
T +44 20 7451 2500

W royalsociety.org
Registered Charity No 207043
9 781782 527121
ISBN: 978-1-78252-712-1
Issued: May 2024 DES8836_1

Science in The Age of Ai Report

Uploaded by

Copyright:

Available Formats

Science in The Age of Ai Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Science in The Age of Ai Report

Uploaded by

Copyright:

Available Formats

Science in

The text of this work is licensed under the terms of

This report and other project outputs can be viewed at:

Cover image: Computational graph developed by Graphcore,

Chapter one: How artificial intelligence is transforming scientific research 21

Chapter two: Research integrity and trustworthiness 39

Chapter three: Research skills and interdisciplinarity 51

Chapter four: Research, innovation and the private sector 63

Chapter five: Research ethics and AI safety 81

SCIENCE IN THE AGE OF AI 3

4 SCIENCE IN THE AGE OF AI

1 Sejnowski T. 2018 The Deep Learning Revolution. MIT press

SCIENCE IN THE AGE OF AI 5

3 The Royal Society. Mathematical Futures Programme. See https://royalsociety.org/news-resources/projects/

6 SCIENCE IN THE AGE OF AI

• Industry and academic institutions are The significant potential to advance

SCIENCE IN THE AGE OF AI 7

Future research questions 5. AI and the future of skills for science:

8 SCIENCE IN THE AGE OF AI

Governments, research funders and AI developers should improve

Access to computing resources has been Actions to enhance access to AI

SCIENCE IN THE AGE OF AI 9

AREA FOR ACTION: ENHANCE ACCESS TO ESSENTIAL AI INFRASTRUCTURES

Funders and AI developers should prioritise accessibility and usability

Improving usability can also enhance the role

10 SCIENCE IN THE AGE OF AI

Actions to enhance the usability of AI tools 3. Research funders and AI developers

SCIENCE IN THE AGE OF AI 11

Research funders and scientific communities should ensure that

A growing body of irreproducible AI and To address these challenges, AI in science can

12 SCIENCE IN THE AGE OF AI

SCIENCE IN THE AGE OF AI 13

Scientific communities should build the capacity to oversee AI systems

14 SCIENCE IN THE AGE OF AI

Similarly, engaging with communities 3. Funders, research institutions and training

SCIENCE IN THE AGE OF AI 15

The report addresses following questions: • Chapter 3 addresses interdisciplinary

16 SCIENCE IN THE AGE OF AI

The report also includes case studies on Glossary of key terms

SCIENCE IN THE AGE OF AI 17

Large language models (LLM): Foundation

47 The Royal Society. 2019 Protecting privacy in practice. See https://royalsociety.org/topics-policy/projects/privacy-

18 SCIENCE IN THE AGE OF AI

SCIENCE IN THE AGE OF AI 19

20 SCIENCE IN THE AGE OF AI

SCIENCE IN THE AGE OF AI 21

How artificial intelligence is

22 SCIENCE IN THE AGE OF AI

SCIENCE IN THE AGE OF AI 23

Other techniques such as causal machine

24 SCIENCE IN THE AGE OF AI

SCIENCE IN THE AGE OF AI 25

26 SCIENCE IN THE AGE OF AI

SCIENCE IN THE AGE OF AI 27

99 The Royal Society. Open Science. See https://royalsociety.org/journals/open-access/open-science/

28 SCIENCE IN THE AGE OF AI

4. Challenging notions of transparency the information required to reproduce a

SCIENCE IN THE AGE OF AI 29

30 SCIENCE IN THE AGE OF AI

1 Sejnowski T. 2018 The Deep Learning Revolution. MIT press

47 The Royal Society. 2019 Protecting privacy in practice. See https://royalsociety.org/topics-policy/projects/privacy-

99 The Royal Society. Open Science. See https://royalsociety.org/journals/open-access/open-science/

223 CodaLab. CodaLab Worksheets. See https://worksheets.codalab.org/ (accessed 21 December 2023).