Science in The Age of Ai Report
Science in The Age of Ai Report
Science in The Age of Ai Report
the age of AI
How artificial intelligence
is changing the nature and
method of scientific research
Science in the age of AI: How artificial intelligence is
changing the nature and method of scientific research
Issued: May 2024 DES8836_1
ISBN: 978-1-78252-712-1
© The Royal Society
Contents
Foreword 4
Executive summary 5
Key findings 6
Future research questions 8
Recommendations 9
Introduction 16
Conclusion 93
Appendices 97
Foreword
With the growing availability of large datasets, The rapid uptake of AI in science has also
new algorithmic techniques and increased presented challenges related to its safe and
computing power, artificial intelligence (AI) rigorous use. A growing body of irreproducible
is becoming an established tool used by studies are raising concerns regarding the
researchers across scientific fields. robustness of AI-based discoveries. The black-
box and non-transparent nature of AI systems
Now more than ever, we need to understand creates challenges for verification and external
Image: Professor Alison the extent of the transformative impact of AI on scrutiny. Furthermore, its widespread but
Noble FRS. science and what scientific communities need inequitable adoption raises ethical questions
to do to fully harness its benefits. regarding its environmental and societal
impact. Yet, ongoing advancements in making
This report, Science in the age of AI, explores AI systems more transparent and ethically
this topic. Building on the experiences of more aligned hold the promise of overcoming
than 100 scientists who have incorporated these challenges.
AI into their workflows, it delves into how
AI technologies, such as deep learning or In this regard, the report calls for a balanced
large language models, are transforming the approach that celebrates the potential of
nature and methods of scientific inquiry. It also AI in science while not losing sight of the
explores how notions of research integrity, challenges that still need to be overcome.
research skills and research ethics are The recommendations offer a pathway that
inevitably changing – and what the implications leverages open science principles to enable
are for the future of science and scientists. reliable AI-driven scientific contributions, while
creating opportunities for resource sharing
New opportunities are emerging. The case and collaboration. They also call for policies
studies in this report demonstrate that AI and practices that recognise the links between
is enhancing the efficiency, accuracy, and science and society, emphasising the need
creativity of scientists. Across multiple fields, for ethical AI, equitable access to its benefits,
the application of AI is breaking new ground and the importance of keeping public trust in
by facilitating, for example, the discovery of scientific research.
rare diseases or enabling the development
of more sustainable materials. While it’s clear that AI can significantly aid
scientific advancement, the goal remains to
Playing the role of tutor, peer or assistant, ensure these breakthroughs benefit humanity
scientists are using AI applications to and the planet. We hope this report inspires
perform tasks at a pace and scale previously actors across the scientific ecosystem to
unattainable. There is much excitement around engage with the recommendations and work
the synergy between human intelligence towards a future where we can realise the
and AI and how this partnership is leading potential of AI to transform science and benefit
to scientific advancements. However, to our collective wellbeing.
ensure robustness and mitigate harms, human
judgement and expertise will continue to be of Professor Alison Noble CBE FREng FRS,
utmost importance. Foreign Secretary of the Royal Society and
Chair of the Royal Society Science in the
Age of AI Working Group.
Executive summary
The unprecedented speed and scale of The opportunities of AI for scientific research
progress with artificial intelligence (AI) in recent are highlighted throughout this report and
years suggests society may be living through explored in depth through three case studies
an inflection point. The virality of platforms on its application for climate science, material
such as ChatGPT and Midjourney, which can science, and rare disease diagnosis.
generate human-like text and image content,
has accelerated public interest in the field Alongside these opportunities, there are
and raised flags for policymakers who have various challenges arising from the increased
concerns about how AI-based technologies adoption of AI. These include reproducibility
may be integrated into wider society. Beyond (in which other researchers cannot replicate
this, comments made by prominent computer experiments conducted using AI tools);
scientists and public figures regarding the interdisciplinarity (where limited collaboration
risks AI poses to humanity have transformed between AI and non-AI disciplines can lead to
the subject into a mainstream political issue. a less rigorous uptake of AI across domains);
For scientific researchers, AI is not a novel and environmental costs (due to high energy
topic and has been adopted in some form for consumption being required to operate
decades. However, the increased investment, large compute infrastructure). There are also
interest, and adoption within academic and growing barriers to the effective adoption
industry-led research has led to a ‘deep of open science principles due to the black-
learning revolution’1 that is transforming the box nature of AI systems and the limited
landscape of scientific discovery. transparency of commercial models that power
AI-based research. Furthermore, the changing
Enabled by the advent of big data (for instance, incentives across the scientific ecosystem
large and heterogenous forms of data gathered may be increasing pressure on researchers
from telescopes, satellites, and other advanced to incorporate advanced AI techniques at the
sensors), AI-based techniques are helping to neglect of more conventional methodologies, or
identify new patterns and relationships in large to be ‘good at AI’ rather than ‘good at science’2.
datasets which would otherwise be too difficult
to recognise. This offers substantial potential These challenges, and potential solutions,
for scientific research and is encouraging are detailed throughout this report in the
scientists to adopt more complex techniques chapters on research integrity; skills and
that outperform existing methods in their fields. interdisciplinarity; innovation and the private
The capability of AI tools to identify patterns sector; and research ethics.
from existing content and generate predictions
of new content, also allows scientists to run
more accurate simulations and create synthetic
data. These simulations, which draw data
from lots of different sources (potentially in
real time), can help decision-makers assess
more accurately the efficacy of potential
interventions and address pressing societal
or environmental challenges.
As an organisation that exists to promote the Further research questions are outlined below.
use of science for the benefit of humanity, The Society’s two programmes of work on
this subject is of great importance to the Mathematical Futures3 and Science 20404 will
Royal Society. This report, Science in the Age explore, in more depth, relevant challenges
of AI, provides an overview of key issues related to skills and universities.
to address for AI to positively transform the
scientific endeavour. Its recommendations, Key findings
when taken together, should ensure that the • Beyond landmark cases like AlphaFold, AI
application of AI in scientific research is able applications can be found across all STEM
to reach its full potential and help maintain fields, with a concentration in fields such
public trust in science and the integrity of the as medicine, materials science, robotics,
scientific method. agriculture, genetics, and computer science.
The most prominent AI techniques across
This report has been guided by a working STEM fields include artificial neural networks,
group of leading experts in AI and applied deep learning, natural language processing
science and informed by a series of activities and image recognition5.
undertaken by the Royal Society. These
• High quality data is foundational for AI
include interviews with Fellows of the Royal
applications, but researchers face barriers
Society; a global patent landscape analysis;
related to the volume, heterogeneity,
a historical literature review; a commissioned
sensitivity, and bias of available data. The
taxonomy of AI for scientific applications;
large volume of some scientific data (eg
and several workshops on topics ranging
collected from telescopes and satellites) can
from large language models to immersive
total petabytes, making objectives such as
technologies. These activities are listed in full
data sharing and interoperability difficult to
in the appendix. In total, more than 100 leading
achieve. The heterogeneity of data collected
scientific researchers from diverse disciplines
from sensor data also presents difficulties
contributed to this report.
for human annotation and standardisation,
while the training of AI models on biased
While the report covers some of the critical
inputs can likely lead to biased outputs.
areas related to the role of AI in scientific
Given these challenges, data curators
research, it is not comprehensive and does
and information managers are essential to
not cover, for example, the provision of high-
maintain quality and address risks linked
performance computing infrastructure, the
to artificial data generation, such as data
potential of artificial general intelligence, nor a
fabrication, poisoning, or contamination.
detailed breakdown of the new skills required
across industries and academia.
6 Ahmed, N, Wahed, M, & Thompson, N. C. 2023. The growing influence of industry in AI research. Science, 379(6635),
884-886. (DOI: 10.1126/science.ade2420)
7 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
resources/projects/science-in-the-age-of-ai/
8 UNESCO Recommendation on Open Science. 2021. See: https://www.unesco.org/en/legal-affairs/recommendation-
open-science (accessed 6 February 2024)
9 Kaiser J. 2023. Funding agencies say no to AI peer review. Science. 14 July 2023 .See: https://www.science.org/
content/article/science-funding-agencies-say-no-using-ai-peer-review (accessed 23 January 2024)
10 Harker J. 2023. Science Journals set new authorship guidelines for AI-generated text. National Institute of
Environmental Health Sciences. See https://factor.niehs.nih.gov/2023/3/feature/2-artificial-intelligence-ethics.
(accessed 23 January 2024)
Recommendations
AREA FOR ACTION: ENHANCE ACCESS TO ESSENTIAL AI INFRASTRUCTURES
AND TOOLS
RECOMMENDATION 1
11 Technopolis Group, Alan Turing Institute. 2022. Review of Digital Research Infrastructure Requirements for AI.
See: https://www.turing.ac.uk/sites/default/files/2022-09/ukri-requirements-report_final_edits.pdf
(accessed February 6 2024)
12 UKRI. Transforming our world with AI. See: https://www.ukri.org/publications/transforming-our-world-with-ai/
(accessed 6 February 2024)
13 United Nations. 2023 Interim Report: Governing AI for Humanity. See: https://www.un.org/sites/un2.un.org/files/
ai_advisory_body_interim_report.pdf (accessible 6 February 2024)
14 Lannelongue, L, et al. 2023. Greener principles for environmentally sustainable computational science.
Nat Comput Sci3, 514–521. (https://doi.org/10.1038/s43588-023-00461-y)
15 The Royal Society. 2023 Privacy Enhancing Technologies. See https://royalsociety.org/topics-policy/projects/privacy-
enhancing-technologies/ (accessed 21 December 2023).
16 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023).
RECOMMENDATION 2
Access to AI does not guarantee its meaningful Taking steps to improve the usability of AI-
and responsible use. Complex and high- based tools (eg software applications, libraries,
performance AI tools and methods can be APIs, or general AI systems) should therefore
challenging for researchers from non-AI involve a combination of mechanisms that
backgrounds to adopt and utilise effectively17. make AI understandable for non-AI experts
Similarly, new skills are needed across the and build their capacity to use AI responsibly.
AI lifecycle, such as data scientists who For example, training should ensure that every
understand the importance of metadata and scientist is able to recognise when they require
data curation, or engineers who are familiar with specialised data or programming expertise in
GPU programming for image-based processing. their teams, or when the use of complex and
opaque AI techniques could undermine the
integrity and quality of results.
17 Cartwright H. 2023 Interpretability: Should – and can – we understand the reasoning of machine-learning systems?
In: OECD (ed.) Artificial Intelligence in Science. OECD. (https://doi.org/10.1787/a8d820bd-en)
18 UKRI. Trustworthy Autonomous Systems Hub. Developing machine learning models with codesign: how everyone can
shape the future of AI. See: https://tas.ac.uk/developing-machine-learning-models-with-codesign-how-everyone-can-
shape-the-future-of-ai/ (accessed 7 March 2023)
19 Global Indigenous Data Alliance. Care Principles for Indigenous Data Governance. See https://www.gida-global.org/
care (accessed 21 December 2023)
20 Szymanski M, Verbert K, Vanden Abeele V. 2022. Designing and evaluating explainable AI for non-AI experts:
challenges and opportunities. In Proceedings of the 16th ACM Conference on Recommender Systems
(https://doi.org/10.1145/3523227.3547427)
21 Korot E et al. 2021 Code-free deep learning for multi-modality medical image classification. Nat Mach Intell. 3,
288–298. (https://doi.org/10.1038/s42256-021-00305-2)
22 UKRI. Get Support For Your Project: If your research spans different disciplines. See: https://www.ukri.org/apply-for-
funding/how-to-apply/preparing-to-make-a-funding-application/if-your-research-spans-different-disciplines/
(accessed 13 December 2023)
AREA FOR ACTION: BUILD TRUST IN THE INTEGRITY AND QUALITY OF AI-BASED
SCIENTIFIC OUTPUTS
RECOMMENDATION 3
23 Haibe-Kains B et al. 2020 Transparency and reproducibility in artificial intelligence. Nature. 586, E14–E16.
(https://doi.org/10.1038/s41586-020-2766-y)
24 Kapoor S and Narayanan A. 2023 Leakage and the reproducibility crisis in machine-learning-based science. Patterns.
4(9) (https://doi.org/10.1016/j.patter.2023.100804)
25 Pineau, J, et al. 2021. Improving reproducibility in machine learning research (a report from the Neurips 2019
Reproducibility program).” Journal of Machine Learning Research 22.164.
26 Bommasani et al. 2021. On the opportunities and risks of foundation models. See: https://crfm.stanford.edu/assets/
report.pdf (accessed 21 March 2024)
27 UK Parliament, Reproducibility and Research Integrity – Report Summary See: https://publications.parliament.uk/pa/
cm5803/cmselect/cmsctech/101/summary.html (accessed 7 February 2024)
28 Sambasivan, N, et al 2021. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes
AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
29 UNESCO Recommendation on Open Science. 2021. See: https://www.unesco.org/en/legal-affairs/recommendation-
open-science (accessed 6 February 2024)
30 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the
2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/
arXiv.2302.04844)
Further work is needed to understand the 3. Research funders, research institutions and
interactions between open science and AI for industry actors incentivising international
science, as well as how to minimise safety and collaboration by investing in open science
security risks stemming from the open release infrastructures, tools, and practices. For
of models and data. example, by investing in open repositories
that enable the sharing of datasets,
Actions to promote the adoption of open software versions, and workflows, or by
science in AI-based science may include: supporting the development of context-
1. Research funders and research institutions aware documentation that enables the local
incentivising the adoption of open science adaptation of AI models across research
principles and practices to improve environments. The latter may also contribute
reproducibility of AI-based research. For towards the inclusion of underrepresented
example, by allocating funds to open research communities and scientists
science and AI training, requesting the use of working in low-resource contexts.
reproducibility checklists31 and data sharing
4. Relevant policy makers considering
protocols as part of grant applications, or by
ways of deterring the development of
supporting the development of community
closed ecosystems for AI in science by,
and field-specific reproducibility standards
for example, mandating the responsible
(eg TRIPOD-AI32).
release of benchmarks, training data,
2. Research institutions and journals rewarding and methodologies used in research led
and recognising open science practices by industry.
in career progression opportunities. For
example, by promoting the dissemination
of failed results, accepting pre-registration
and registered reports as outputs, or
recognising the release of datasets and
documentation as relevant publications
for career progression.
31 McGill School of Computer Science. The Machine Learning Reproducibility Checklist v2.0.
See: https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf (accessed 21 December 2023).
32 Collins G et al. 2021 Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI)
for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ open, 11(7), e048008.
(https://doi.org/10.1136/bmjopen-2020-048008)
AREA FOR ACTION: ENSURE SAFE AND ETHICAL USE OF AI IN SCIENTIFIC RESEARCH
RECOMMENDATION 4
The application of AI across scientific domains and the lack of standardised methods for
requires careful consideration of potential risks conducting ethics impact assessments, limit
and misuse cases. These can include the impact the ability of scientists to provide effective
of data bias33, data poisoning34, the spread of oversight38. Other factors include the limited
scientific misinformation35,36, and the malicious transparency of commercial models, the opaque
repurposing of AI models37. In addition to this, nature of ML-systems, and how the misuse of
the resource-intensive nature of AI (eg in terms open science practices could heighten safety
of energy, data, and human labour) raises ethical and security risks39,40.
questions regarding the extent to which AI used
by scientists can inadvertently contribute to As AI is further integrated into science, AI
environmental and societal harms. assurance mechanisms41 are needed to maintain
public trust in AI and ensure responsible
Ethical concerns are compounded by the scientific advancement that benefits humanity.
uncertainty surrounding AI risks. As of late Collaboration between AI experts, domain
2023, public debates regarding AI safety had experts and researchers from humanities and
not conclusively defined the role of scientists science, technology, engineering, the arts, and
in monitoring and mitigating risks within their mathematics (STEAM) disciplines can improve
respective fields. Furthermore, varying levels of scientists’ ability to oversee AI systems and
technical AI expertise among domain experts, anticipate harms42.
33 Arora, A, Barrett, M, Lee, E, Oborn, E and Prince, K 2023 Risk and the future of AI: Algorithmic bias, data colonialism,
and marginalization. Information and Organization, 33. (https://doi.org/10.1016/j.infoandorg.2023.100478)
34 Verde, L., Marulli, F. and Marrone, S., 2021. Exploring the impact of data poisoning attacks on machine learning model
reliability. Procedia Computer Science, 192. 2624-2632. (https://doi.org/10.1016/j.procs.2021.09.032)
35 Truhn D, Reis-Filho J.S. & Kather J.N. 2023 Large language models should be used as scientific reasoning engines,
not knowledge databases. Nat Med 29, 2983–2984. (https://doi.org/10.1038/s41591-023-02594-z)
36 The Royal Society. 2024 Insights from the Royal Society & Humane Intelligence red-teaming exercise on AI-generated
scientific disinformation. See: https://royalsociety.org/news-resources/projects/online-information-environment/
(accessed 7 May 2024)
37 Kazim, E and Koshiyama, A.S 2021 A high-level overview of AI ethics. Patterns, 2. (https://doi.org/ 10.1016/j.patter.2021.100314)
38 Wang H et al. 2023 Scientific discovery in the age of artificial intelligence. Nature, 620. 47-60. (https://doi.org/10.1038/
s41586-023-06221-2)
39 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/arXiv.2302.04844)
40
Vincent J. 2023 OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’.
The Verge. See https://www.theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-
interview (accessed 21 December 2023).
41 Brennan, J. 2023. AI assurance? Assessing and mitigating risks across the AI lifecycle. Ada Lovelace Institute.
See https://www.adalovelaceinstitute.org/report/risks-ai-systems/ (accessed 30 September 2023)
42 The Royal Society. 2023 Science in the metaverse: policy implicatioins of immersive technology.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/
43 Weidinger L, et al. 2022 Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference
on Fairness, Accountability, and Transparency. 214-229. (https://doi.org/10.1145/3531146.3533088)
44 UNESCO. 2022. Recommendation on the ethics of artificial intelligence. See: https://www.unesco.org/en/artificial-
intelligence/recommendation-ethics (accessed 5 March 2024)
45 OECD. Ethical guidelines for artificial intelligence. See: https://oecd.ai/en/catalogue/tools/ethical-guidelines-for-
artificial-intelligence (accessed 5 March 2024)
Introduction
Scope of the report • Chapter 1 provides a descriptive review of
Science in the age of AI explores how AI how recent developments in AI (in Machine
is transforming the nature and methods of Learning (ML), deep neural networks, and
scientific research. It focuses on the impact natural language processing in particular) are
of deep learning methods and generative changing methods, processes, and practices
AI applications and explores cross-cutting in scientific research.
considerations around research integrity,
• Chapter 2 details key challenges for
skills, and ethics. While AI is transforming a
research integrity in AI-based research.
wide range of fields – including the social
It tackles issues around transparency of
sciences and humanities – this report
AI models and datasets, explainability
provides examples focused on physical
and interpretability, and barriers to verifying
and biological sciences.
the reproducibility of results.
46 The Alan Turing Institute. Defining data science and AI. See: https://www.turing.ac.uk/news/data-science-and-ai-
glossary (accessed 1 March 2024)
Generative AI: AI systems generating new Machine learning (ML): A field of artificial
text, images, audio, or video in response to intelligence involving algorithms that learn
user input using machine learning techniques. patterns from data and apply these findings
These systems, often employing Generative to make predictions or offer useful outputs.
adversarial networks (GANs), create It enables tasks like language translation,
outputs closely resembling human-created medical diagnosis, and robotics navigation
media, resulting in outputs that are often by analysing sample data to improve
indistinguishable from human-created media. performance over time.
See ‘Generative adversarial networks’.
Privacy-enhancing technologies (PETs):
General adversarial networks (GANs): An umbrella term covering a broad range of
A machine learning technique that produces technologies and approaches that can help
realistic synthetic data, like deepfake images, mitigate data security and privacy risks47.
indistinguishable from its training data. It
consists of a generator and a discriminator. Synthetic data: Data that is modelled
The generator creates fake data, while the to represent the statistical properties of
discriminator evaluates it against real data, original data; new data values are created
helping the generator improve until the which, taken as a whole, preserve relevant
discriminator can’t differentiate between statistical properties of the ‘real’ dataset48.
real and fake. This allows for training models without
accessing real-world data.
Human-in-the-loop (HITL): A hybrid system
comprising of human and artificial intelligence
that allows for human intervention, such
as training or fine-tuning the algorithm, to
enhance the systems output. Combining
the strengths of both human judgment and
machine capabilities can make up for the
limitations of both.
Chapter one
How artificial intelligence
is transforming scientific
research
Left
MRI image. © iStock / MachineHeadz.
43 Eyres A et al. 2024 LIFE: A metric for quantitatively mapping the impact of land-cover change on global extinctions.
Cambridge Open Engage. (https://doi.org/10.33774/coe-2023-gpn4p-v4).
44 Paul, D, Sanap, G, Shenoy, S, Kalyane, D, Kalia, K, Tekade, R. K. 2021 Artificial intelligence in drug discovery and
development. Drug discovery today, 26. 80–93. (https://doi.org/10.1016/j.drudis.2020.10.010)
45 Merchant, A, Batzner, S, Schoenholz, SS, Aykol, M, Cheon, G, Cubuk, ED. 2023 Scaling deep learning for materials
discovery. Nature, 624. 80–85. (https://doi.org/10.1038/s41586-023-06735-9)
46 Berman B, Chubb J, and Williams K, 2024. The use of artificial intelligence in science, technology, engineering,
and medicine. The Royal Society. https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
(accessed 7 May 2024)
47 CERN. The Large Hadron Collider. See https://home.cern/science/accelerators/large-hadron-collider
(accessed 22 April 2024)
1. Growing use of deep learning across fields 2. Obtaining insights from unstructured data
The application of deep learning (DL) is A major challenge for researchers is utilising
“ We have the
transforming data analysis and knowledge unstructured data (data that does not follow
generation. Its use to automatically extract a specific format or structure, making it capacity to
and learn features from raw data, process more challenging to process, manage and record much
extensive datasets and recognise patterns use to find patterns). The ability to handle more [data] than
efficiently outperforms linear ML-based unstructured data makes DL effective for before. We live
models48. DL has found applications in tasks that involve image recognition and
in a data deluge.
diverse fields including healthcare, aiding natural language processing (NLP).
in disease detection and drug discovery, So, the hope is
or climate science, assisting in modelling In healthcare, for example, data can be that machine
climate patterns and weather detection. A detailed; multi-modal and fragmented51. learning methods
landmark example is the application of DL It can include images, text assessments, will help us make
by Google DeepMind to develop AlphaFold, or numerical values from assessments
sense of that,
a protein-folding prediction system that and readings. Data collectors across the
solved a 50-year-old challenge in biology healthcare system may record this data in and then drive
decades earlier than anticipated49. different formats or with different software. genuine, scientific
Bringing this data together, and making hypotheses.”
Developing accurate and useful DL-based sense of it, can help researchers make
Royal Society
models is challenging due to its black-box predictions and model potential health
roundtable participant
nature and variations in real-world problems interventions. Similarly, generative AI
and data. This limits their explanatory models can contribute towards generating
power and reliability as scientific tools50. and converting data into different modes
(See Chapter 2). and standards, that are not limited to the
type of data fed into the algorithm52.
48 Choudhary, A, Fox, G, Hey, T. 2023. Artificial intelligence for science: A deep learning revolution. World Scientific
Publishing Co. Pte Ltd. (https://doi.org/10.1142/13123)
49 Google DeepMind. AlphaFold. See: https://deepmind.google/technologies/alphafold/ (accessed 5 March 2024)
50 Sarker, I. H. 2021. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research
directions. SN Computer Science, 2. 420. (https://doi.org/10.1007/s42979-021-00815-1)
51
Healy, M. J. R. 1973. What Computers Can and Cannot Do. Proceedings of the Royal Society of London. Series B,
Biological Sciences, 184(1077), 375–378. (https://doi.org/10.1098/rspb.1973.0056)
52 World Health Organization. 2024. Ethics and governance of artificial intelligence for health: guidance on large
multi-modal models. See: https://www.who.int/publications/i/item/9789240084759 (accessed 5 March 2024)
53
Kaddour J, Lynch A, Liu Q, Kusner M J, Silva R. 2022. Causal machine learning: A survey and open problems.
arXiv preprint (https://doi.org/10.48550/arXiv.2206.15475)
54 Sanchez P, Voisey J, Xia T, Watson H, O’Neil A, and Tsaftaris, S. 2022 Causal machine learning for healthcare
and precision medicine. R. Soc open sci. 9: 220638 (https://doi.org/10.1098/rsos.220638)
55 Royal Society roundtable on large language models, July 2023.
56 Benevolent AI. 2019 Extracting existing facts without requiring any training data or hand-crafted rules.
See https://www.benevolent.com/news-and-media/blog-and-videos/extracting-existing-facts-without-requiring-any-
training-data-or-hand-crafted-rules/ (accessed 21 December 2023).
57 Rajput S, Winn J, Moneypenny N, Zaykov Y, and Tan C. 2021 Alexandria in Microsoft Viva Topics: from big data to big
knowledge. 26 April 2021. See https://www.microsoft.com/en-us/research/blog/alexandria-in-microsoft-viva-topics-
from-big-data-to-big-knowledge/ (accessed 21 December 2023).
58 Jordon et al. 2023 Synthetic Data – what, why and how? See https://royalsociety.org/news-resources/projects/
privacy-enhancing-technologies/ (accessed 21 December 2023)
59 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023).
60 Jordon et al. 2023 Synthetic Data – what, why and how? See https://royalsociety.org/news-resources/projects/
privacy-enhancing-technologies/ (accessed 21 December 2023).
61 Zhang L, Han J, Wang H, Car R, Weinan E. 2018 Deep Potential Molecular Dynamics: A Scalable Model
with the Accuracy of Quantum Mechanics. Phys Rev Lett. 2018 Apr 6;120(14):143001. (https://doi.org/10.1103/
PhysRevLett.120.143001. PMID: 29694129)
62 The Royal Society. 2023 From privacy to partnership. See https://royalsociety.org/topics-policy/projects/privacy-
enhancing-technologies/ (accessed 21 December 2023).
63 Lin Z. 2023 Why and how to embrace AI such as ChatGPT in your academic life. R. Soc. Open Sci.10:
230658 230658 (https://doi.org/10.1098/rsos.230658)
64 Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus. 2023 Feb
19;15(2):e35179. (https://doi: 10.7759/cureus.35179)
Examples of automated literature review As a writing tool they also have a limited
tools include Semantic Scholar65, Elicit66, and ability to grasp nuanced value judgments,
Consensus67. It is also available on prominent assist in scientific meaning-making73, or
platforms such as GPT4 and Gemini. articulate the complexities of scientific
Beneficial use cases include using LLMs research74. There are also concerns that
to improve the quality of academic writing, the use of LLMs for academic writing risks
assist with translation, or emulate specific diminishing creative and interdisciplinary
writing styles (eg producing lay summaries). aspects of scientific discovery75. Additionally,
Beyond academic texts, they can also be there are questions around the impact of
used to streamline administrative tasks and LLMs on intellectual property (IP).
assist in drafting grant applications. These
tools could also improve accessibility for 5. Addressing complex coding challenges
researchers from diverse backgrounds (eg Developing computational analysis software
non-English speakers and neurodivergent code has become an important aspect of the
individuals) who consume and produce modern scientific endeavour. For example,
academic content in multiple languages LLMs – which are designed to analyse text
and formats68. inputs and generate responses that they
determine are likely to be accurate – can
These tools also have limitations including be used for generating software code in
the potential to exacerbate biases from various coding languages. This presents
the training data (eg bias towards positive an opportunity for scientific researchers to
results69, language biases70 or geographic convert code from one computer language to
bias71), inaccuracies and unreliable another, or from one application to another76.
scientific inputs72.
65 Semantic Scholar: AI-powered research tool. See https://www.semanticscholar.org/ (accessed 21 December 2023).
66 Elicit: The AI research assistant. See https://elicit.com/ (accessed 21 December 2023).
67 Consensus: AI search engine for research. See https://consensus.app/ (accessed 21 December 2023).
68
Royal Society and Department for Science, Innovation, and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, 2023.
69 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/. (accessed 7 May 2024).
70 Barrot JS, 2023. Using ChatGPT for second language writing: Pitfalls and potentials. Assessing Writing, 57.100745.
71 Skopec M, Issa H, Reed J, Harris M. 2020. The role of geographic bias in knowledge diffusion: a systematic review
and narrative synthesis. Research integrity and peer review, 5. 1-14. (https://doi.org/10.1186/s41073-019-0088-0.)
72 Sanderson K. 2023. GPT-4 is here: what scientists think. Nature, 615.773. 30 March 2023. See https://www.nature.
com/articles/d41586-023-00816-5.pdf (accessed 21 December 2023)
73 Birhane A, Kasirzadeh A, Leslie D, Wachter S. 2023. Science in the age of large language models. Nature Reviews
Physics, 1-4 (https://doi.org/10.1038/s42254-023-00581-4)
74 Bender E, Koller A. 2020 Climbing towards NLU: on meaning, form, and understanding in the age of data.
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 5185–5198
75
Royal Society and Department for Science, Innovation, and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, 2023.
76 Royal Society roundtable on large language models, July 2023.
Even if the output is not accurate on a first became the first machine to independently
attempt, these models can be used as discover new scientific knowledge85. The
coding assistants to help identify coding robot was programmed to independently
mistakes, make suggestions, and save time. design experiments, record and evaluate
Prominent examples include Microsoft’s results, and develop new questions –
Copilot77; OpenAI’s GPT478; Meta’s Code automating the entire research workflow86.
Llama79; and Google DeepMind’s Gemini.80 Building on this breakthrough, ‘robot
scientists’ continue to be developed to speed
6. Task automation up the discovery process, while reducing
AI tools can automate a range of time costs, uncertainty, and human error in labs87.
and labour-intensive tasks within the
scientific workflow.81 Automation can lead to As research becomes more automated,
productivity gains for scientists82 and unlock there are concerns that future generations
the potential to test diverse hypotheses of scientists may become de-skilled in
beyond human capability. For example, in core skills such as hypothesis generation,
2023, Google DeepMind claimed two such experimental design, and contextual
examples: FunSearch83, and GNoME84. interpretation88. Methodological
transparency and understanding of cause-
The use of robotic research assistants is also effect relationships could also decline,
contributing to the automation of laboratory and an overemphasis on computational
workflows (See Case Study 2). In 2009, a techniques risks disengaging scientists
robot developed by Aberystwyth University who seek creative outlets in their work89.
77 GitHub. Copilot – Your AI pair programmer. See https://github.com/features/copilot (accessed 21 December 2023).
78 Open AI. GPT4. See https://openai.com/gpt-4 (accessed 21 December 2023).
79 Meta. 2023 Introducing Code Llama, a state-of-the-art large language model for coding. Meta. 24 August 2023.
See https://ai.meta.com/blog/code-llama-large-language-model-coding/ (accessed 21 December 2023).
80 Google DeepMind. Gemini. See https://deepmind.google/technologies/gemini/#introduction (accessed 21 December 2023).
81 Xie, Y, Sattari, K, Zhang, C, & Lin, J. 2023 Toward autonomous laboratories: Convergence of artificial intelligence and
experimental automation. Progress in Materials Science, 132. 101043. (https://doi.org/10.1016/j.pmatsci.2022.101043)
82 OECD. 2023. Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research, OECD
Publishing, Paris (https://doi.org/10.1787/a8d820bd-en).
83 Fawzi A and Paredes B. 2023. FunSearch: Making new discoveries in mathematical sciences using Large Language
Models. Google DeepMind. See https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-
mathematical-sciences-using-large-language-models/ (accessed 21 December 2023).
84 Merchant A and Cubuk E. 2023 Millions of new materials discovered with deep learning. Google DeepMind. See https://
deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/ (accessed 21 December 2023).
85 University of Cambridge. Robot scientist becomes first machine to discover new scientific knowledge.
See: https://www.cam.ac.uk/research/news/robot-scientist-becomes-first-machine-to-discover-new-scientific-knowledge
(accessed 3 March 2024)
86 Sparkes A et al. 2010. Towards Robot Scientists for autonomous scientific discovery. Autom Exp 2, 1
(https://doi.org/10.1186/1759-4499-2-1)
87 University of Cambridge. Artificially-intelligent Robot Scientist ‘Eve’ could boost search for new drugs. See: https://www.cam.
ac.uk/research/news/artificially-intelligent-robot-scientist-eve-could-boost-search-for-new-drugs (accessed 7 March 2024)
88 Lin Z. 2023 Why and how to embrace AI such as ChatGPT in your academic life. R Soc Open Sci. 2023 Aug
23;10(8):230658 (https://doi.org/ 10.1098/rsos.230658.)
89
Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/. (accessed 7 May 2024)
AI and the nature of scientific research Cloud-based solutions, which do not require
Beyond the impact of AI on the methods users to own physical infrastructure (eg to
of scientific research, there is a potentially store data) include Amazon Web Services94
transformative impact on the nature of the and Oracle Cloud Infrastructure.95
scientific endeavour itself. These impacts
primarily relate to the prevalence of big data- 2. Domination of big data centric research
led research, reliance on computing power The ability to collect big data (large and
and new ways of organising skills and labour in heterogeneous forms of data that have been
the scientific process. collected without strict experimental design96)
and combine these with other datasets has
Drawing on the activities undertaken for this presented clear and significant opportunities
report, the following six themes emerged as key for the scientific endeavour. The value being
impacts of AI on the nature of scientific research. gained from applying AI to these datasets
has already provided countless examples
1. Computers and labour as foundational of positive applications from mitigating the
AI infrastructures impact of COVID-19 to combating climate
An assemblage of digital infrastructure change (See Case Study 3)97. This is likely to
and human labour underpins major AI continue to reshape the research endeavour
applications90. The digital infrastructure to be more AI and big data-centric98. The
refers to devices which collect data, ability to engage in data-centric research,
personal computers which they are however, remains dependent on access
analysed on, and supercomputers which to computing infrastructure that enables
power large-scale data analysis. The human processing of large heterogenous datasets.
labour refers to the act of data collection,
cleansing, and labelling, as well as the act The domination of big data centric research
of design, testing, and implementation. also has implications for research in which
The types of digital infrastructure required only incomplete or small data is available.
includes supercomputers (eg those Without careful governance, it risks reducing
included in HPC-UK91 and the EuroHPC research investment and support in priority
JU92); privacy enhancing technologies;93 areas (eg subjects or regions) where primary
and data storage facilities (eg data centres). data collection at that scale is limited, difficult
90 Penn J. 2024. Historical review on the role of disruptive technologies in transforming science and society.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/ (accessed 7 May 2024)
91 HPC-UK. UK HPC Facilities. See https://www.hpc-uk.ac.uk/facilities/ (accessed 21 December 2023).
92 The European High Performance Computing Joint Undertaking. See https://eurohpc-ju.europa.eu/index_en
(accessed 21 December 2023).
93 The Royal Society. Privacy Enhancing Technologies. See https://royalsociety.org/topics-policy/projects/privacy-
enhancing-technologies/ (accessed 21 December 2023).
94 Amazon Web Services. Cloud Computing Services. See https://aws.amazon.com/ (accessed 21 December 2023).
95 Oracle. Cloud Infrastructure. See https://www.oracle.com/cloud/ (accessed 21 December 2023).
96 The Royal Society. 2017 Machine Learning: The power and promise of computers that learn by example.
See https://royalsociety.org/topics-policy/projects/machine-learning/ (accessed 21 December 2023).
97 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023).
98 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/. (accessed 7 May 2024)
or not desirable. It is also likely to increase However, the increasing use of proprietary
attention on techniques such as data AI presents challenges for open science.
augmentation and the use of synthetic data. Researchers are increasingly relying on
The case of rare disease research (See Case tools developed and maintained by private
Study 1) illustrates applications of AI in small companies (see Chapter 4), even though the
data research. inner workings may remain opaque103. This
is exacerbated by the opacity of training
3. Open vs closed science data which underpins prominent AI tools.
Open science, which seeks to open the Poor transparency risks limiting the utility
entire research and publication process of AI tools for solving real world problems
(including but not limited to open data; open as policymakers and scientists may not
protocols; open code; and transparent peer consider AI-generated results as reliable for
review), is a principle and practice advocated important decisions104. It also undermines
for by the Royal Society, and others99. It efforts to detect and scrutinise negative
is also promoted by major technology impacts or discriminatory effects105.
companies including Meta and OpenAI,
although this has been challenged as A fully open approach that prompts the
‘aspirational’ or, even, ‘marketing’ rather than release of datasets and models without
a technical descriptor100. As well as providing guardrails or guidance may not be desirable
transparency, open science approaches can either, as datasets or models can be
enable replication of experiments, wider manipulated by bad actors106. Context-
public scrutiny of research products101 and specific and AI-compatible open science
further the right of everyone to share in approaches are needed to boost oversight
scientific advancement102. and transparency107,108.
109 Al-Khalili J. 2009 The ‘first true scientist’. BBC News. 4 January 2009. See http://news.bbc.co.uk/1/hi/sci/
tech/7810846.stm (accessed 21 December 2023).
110
Maxwell N. 1972 A Critique of Popper’s Views on Scientific Method. Philosophy of Science, 39(2), 131-152.
(doi:10.1086/288429)
111 Succi S, Coveney P. 2019 Big data: The end of the scientific method? Phil. Trans. R. Soc. A. 377: 20180145.
(https://doi.org/10.1098/rsta.2018.0145)
112 The Royal Society. 2019 Explainable AI: the basics. See https://royalsociety.org/topics-policy/projects/explainable-ai/
(accessed 21 December 2023).
113 Buçinca Z, Malaya M, Gajos K. 2021 To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on
AI in AI-assisted Decision-making. Proc. ACM Hum.-Comput. Interact. 5:188. (https://doi.org/10.1145/3449287)
114 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/. (accessed 7 May 2024)
115 The Royal Society roundtable on the role of interdisciplinarity in AI for scientific research, June 2023.
The UK’s eScience initiative (2001 – 2008)116,117 6. Blending human expertise with
stands out as an effort to cultivate AI automation
interdisciplinary collaboration by fostering The turn to automation offers opportunities
a culture where scientists and computer to combine human expertise with
science experts work together. Ongoing efficiencies enabled by AI. AI can be
initiatives like Alan Turing Institute118 and used to either complement the human
Arizona State University’s School of scientist by assisting or augmenting human
Sustainability119 also continue to champion capability; or to develop autonomous
interdisciplinary approaches. mechanisms for discovery (See Figure 1)122.
Across this spectrum, the human scientist
However, interdisciplinarity is stifled by remains essential for contextual scientific
siloed institutions and insufficient career understanding. The growing use of AI tools
progression opportunities. Interdisciplinarity also risks making scientists vulnerable to
need not be limited to natural sciences, with ’illusions of understanding’ in which only
value to be gained from scientists working a limited set of viewpoints and methods
with researchers in the arts, humanities, are represented in outputs123. There is a
and social sciences. An example of this need to further understand “human-in-
includes the importance of artists in the the-loop” approaches that recognise AI as
user experience design of immersive complementary to human judgment and the
environments120,121 (See chapter 3 for role of human intervention to ensure the
further details on interdisciplinarity in quality of outputs.
AI-based research).
116 Hey T, Trefethen A. 2002 The UK e-Science Core Programme and the Grid Hey, T., & Trefethen, A. E.
International Conference on Computational Science (pp. 3-21). Berlin, Heidelberg: Springer Berlin Heidelberg
(https://doi.org/10.1016/S0167-739X(02)00082-1)
117 Hey T. 2005. e-Science and open access. See https://www.researchgate.net/publication/28803295_E-Science_
and_Open_Access (accessed 7 May 2024)
118 The Alan Turing Institute. Research. See https://www.turing.ac.uk/research (accessed 21 December 2023).
119 Arizona State University - School of Sustainability. See https://schoolofsustainability.asu.edu/
(accessed 21 December 2023).
120 The Royal Society. 2023 Science in the metaverse: policy implications of immersive technologies.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/
(accessed 21 December 2023).
121
Ibid.
122 Krenn M. et al. 2022. On scientific understanding with artificial intelligence. Nature Reviews Physics 4.
(https://doi.org/10.1038/s42254-022-00518-3)
123 Messeri L, Crockett MJ. 2024 Artificial intelligence and illusions of understanding in scientific research. Nature.
Mar;627(8002):49-58. (https://doi.org/10.1038/s41586-024-07146-0.)
FIGURE 1
Reproduction of a visualisation of the three general roles of AI for scientific research as either
a computational microscope, resource of human inspiration, or an agent of understanding124.
Computational
microscope
Resource of Agent of
inspiration understanding
Identifying surprises Acquiring new scientific
in data and models understanding
124 The diagram describes three possible ways in which AI can contribute to scientific understanding. The ‘computational microscope’ refers to
the role of AI in providing information through advanced simulation and data representation that cannot be obtained through experimentation.
‘Resource of inspiration’ refers to scenarios in which AI provides information that expands the scope of human imagination or creativity. The ‘agent
of understanding’ illustrates a scenario in which autonomous AI systems can share insights with human experts by translating observations into new
knowledge. As of yet, there is no evidence to suggest that computers can act as true agents of scientific understanding. See: Krenn M. et al. 2022.
On Scientific Understanding with Artificial Intelligence.
125 Mellin W. 1957 Work with new electronic ‘brains’ opens field for army math experts. The Hammond Times.
126 Babbage C. 1964. Passages from the life of a philosopher. Cambridge, UK: Cambridge University Press.
127 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
128 Today, a typical PC or laptop comes with one terabyte storage. Petabytes are akin to the storage capacity of a
thousand of these PCs and exabytes are akin to the storage capacity of a million.
129 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
130 Event Horizon Telescope. See https://eventhorizontelescope.org/ (accessed 21 December 2023).
131 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
132 SKAO. Portuguese prove SKA green energy system. See: https://www.skao.int/en/impact/440/portuguese-prove-
ska-green-energy-system (accessed 21 March 2024)
133 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
134 Data.gov. See https://data.gov/ (accessed 21 December 2023).
135 ICPSR. See https://www.icpsr.umich.edu/web/pages/index.html (accessed 21 December 2023).
136 Office for National Statistics. See https://www.ons.gov.uk/ (accessed 21 December 2023).
137 Leonelli S, Williamson H. 2022 Introduction: Towards Responsible Plant Data Linkage. In: Towards Responsible Plant
Data Linkage: Data Challenges for Agricultural Research and Development. Springer International Publishing
(https://doi.org/10.1007/978-3-031-13276-6_1).
138 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
The use of trusted research environments Public trust and acceptability around the
and privacy enhancing technologies use of sensitive datasets relating to people
(including AI-based approaches such as (eg health information, demographics,
federated machine learning), is enabling location, etc.) is also essential. As set out in
researchers to model problems without the Royal Society’s 2023 report, Creating
requiring data access, offering a potential resilient and trusted data systems, trust in
technical solution to addressing concerns data sharing requires clarity of purpose and
surrounding sensitive data. These are transparency in data flows; as well as robust
explained in detail in the Royal Society’s systems for security and privacy141. Private
2019 report Protecting privacy in practice139 sector actors such as IBM, Microsoft and
and the 2023 report From privacy to Siemens are addressing publics concerns
partnership (which contains various by establishing communities of trust142.
use cases).140 Other approaches include data governance
frameworks that encourage the public to get
involved in data-driven scientific projects
while retaining control of their data (eg data
donation drives143).
139 The Royal Society. 2019 Protecting privacy in practice. See https://royalsociety.org/topics-policy/projects/privacy-
enhancing-technologies/ (accessed 21 December 2023).
140 Ibid.
141 The Royal Society. 2023 Creating resilient and trusted data systems. See https://royalsociety.org/topics-policy/
projects/data-for-emergencies/ (accessed 21 December 2023).
142 Charter of Trust. See: www.charteroftrust.com (accessed 21 December 2023)
143 The Tidepool Big Data Donation Project. See: https://www.tidepool.org/bigdata (accessed 21 December 2023)
CASE STUDY 1
A rare disease is a condition that affects AI applications in the field of rare diseases
fewer than 1 in 2,000 people and is often
• Leveraging medical imaging for early
characterised by diverse, complex, and
diagnosis: Clinicians are using AI to find
overlapping genetic manifestations144. Of the
patterns in large datasets of patient
more than 7,000 rare diseases described
information, including genetic data and clinical
worldwide, only 5% have a treatment145. A
records, that may indicate the presence
lack of understanding of underlying causes,
of a rare disease. ML is particularly useful
fragmented patient data, and inadequate
to analyse multimodal data from different
policies have contributed to making the
sources, including imaging data (eg, MRI,
diagnosis and treatment of rare diseases a
X-rays) that is becoming standard practice
public health challenge146.
to understand disease manifestation149.
For example, researchers at the Institute for
The application of ML and generative AI
Genomics Statistics and Bioinformatic at the
techniques offers an opportunity to overcome
University of Bonn are using deep neural
some of these limitations. Rare disease
networks (DNNs) and computational facial
researchers are using ML techniques to analyse
analysis to accelerate the diagnosis of
high-dimensional datasets, such as high-
ultra-rare and novel disorders150.
dimensional molecular data, to identify relevant
biomarkers for known diseases or to identify • Improving capabilities for automated diagnosis:
new diseases147. The shift towards digitising ML techniques can also be used to improve
health records is also creating opportunities automated diagnostic support for clinicians.
to identify patients with rare diseases more Applying ML to very large multi-modal health
promptly. Promising applications show potential datasets, such as UK Biobank151, for example, is
to improve low diagnostic rates, treatments, creating new possibilities to discover unknown
and drug development processes148. and novel variants that can contribute to a
molecular diagnosis of rare diseases152.
144 Department of Health and Social Care. 2021. The UK Rare Diseases Framework. See: https://www.gov.uk/government/
publications/uk-rare-diseases-framework/the-uk-rare-diseases-framework (accessed 30 September 2023).
145 Brasil S, Pascoal C, Francisco R, Dos Reis Ferreira V, Videira PA, Valadão AG. 2019 Artificial Intelligence (AI) in Rare
Diseases: Is the Future Brighter? (https://doi.org/10:978 10.3390/genes10120978)
146 Decherchi S, Pedrini E, Mordenti M, Cavalli A, Sangiorgi L. 2021 Opportunities and challenges for machine learning
in rare diseases. Frontiers in Medicine, 8, 747612. (https://doi.org/10.3389/fmed.2021.747612)
147 Banerjee J et al. 2023 Machine learning in rare disease. Nat Methods 20, 803–814. (https://doi.org/10.1038/s41592-
023-01886-z)
148 Schaefer J, Lehne M, Schepers J, Prasser F, Thun S. 2020 The use of machine learning in rare diseases: a scoping
review. Orphanet J Rare Dis.(https://doi.org/10.1186/s13023-020-01424-6)
149 Ibid.
150 Hsieh TC, Krawitz PM. 2023 Computational facial analysis for rare Mendelian disorders. American Journal of Medical
Genetics Part C: Seminars in Medical Genetics. (https://doi.org/10.1002/ajmg.c.32061
151 UK Biobank. See: https://www.ukbiobank.ac.uk/ (accessed 21 December 2023)
152 Turro E et al. 2020 Whole-genome sequencing of patients with rare diseases in a national health system. Nature
583, 96–102 (https://doi.org/10.1038/s41586-020-2434-2
153 Cohen AM at al. Detecting rare diseases in electronic health records using machine learning and knowledge
engineering: Case study of acute hepatic porphyria. PLoS. (https://doi.org/10.1371/journal.pone.0238277 )
154 Hersh WR, Cohen AM, Nguyen MM, Bensching KL, Deloughery TG. 2022 Clinical study applying machine learning
to detect a rare disease: results and lessons learned. JAMIA Open, 5. (https://doi.org/10.1093/jamiaopen/ooac053)
155 Nag S, et al. 2022 Deep learning tools for advancing drug discovery and development. 3 Biotech. 12: 110.
156 Steve Nouri. Generative AI Drugs Are Coming. See: https://www.forbes.com/sites/forbestechcouncil/2023/09/05/
generative-ai-drugs-are-coming/ (accessed September 30 2023)
157 Banerjee J et al. 2023 Machine learning in rare disease. Nat Methods 20, 803–814. (https://doi.org/10.1038/s41592-
023-01886-z)
158 Ibid.
159 The Royal Society interviews with scientists and researchers. 2022 – 2023
160 The Royal Society interviews with scientists and researchers. 2022 – 2023
161 Boycott KM et al. 2017 International cooperation to enable the diagnosis of all rare genetic diseases. Am. J. Hum.
Genet. 100, 695–705. (https://doi.org/10.1016/j.ajhg.2017.04.003)
162 Bellgard MI, Snelling T, McGree JM. 2019 RD-RAP: beyond rare disease patient registries, devising a comprehensive
data and analytic framework. Orphanet J Rare Dis 14, 176. (https://doi.org/10.1186/s13023-019-1139-9)
163 Decherchi S, Pedrini E, Mordenti M, Cavalli A, Sangiorgi,L. 2021 Opportunities and challenges for machine learning
in rare diseases. Frontiers in Medicine, 8, 747612 (https://doi.org/10.3389/fmed.2021.747612)
164 Kokosi T, Harron K. 2022. Synthetic data in medical research. BMJ medicine, 1.
(https://doi.org/10.1136/bmjmed-2022-000167)
165 Global Rare Disease Policy Network. See: https://www.rarediseasepolicy.org/ (accessed 21 March 2024)
Chapter two
Research integrity
and trustworthiness
Left
Rhinosporidium seeberi
parasite, the causative
agent of rhinosporidiosis.
© iStock / Dr_Microbe.
Research integrity
and trustworthiness
Trust in AI is essential for its responsible Based on interviews and a roundtable on
use in scientific research, particularly as reproducibility conducted for this report,
“ It is hardly possible
scientists become increasingly reliant on these the following observations capture unique
to imagine higher technologies164. This reliance hinges on an challenges AI poses for research integrity
stakes than these assumption that AI-based systems – as well and trustworthiness.
for the world of as their analysis and outputs – can produce
science. The reliable, low-error, and trustworthy findings. Reproducibility challenges in AI-based research
Reproducibility refers to the ability of
future existence
However, the adoption of AI in scientific independent researchers to scrutinise the
and social role [of research has been coupled with challenges results of a research study, replicate them, and
science] seem to to rigour and scientific integrity. Core issues reproduce an experiment in future studies167.
hinge on the ability include a lack of understanding about how AI
of researchers models work, insufficient documentation of If researchers develop an overreliance on
experiments, and scientists lacking the required AI for data analysis, while remaining unable
and scientific
technical expertise for building, testing and to explain how conclusions were reached
institutions to finding errors in a model. A growing body of and how to reproduce a study168, it will not
respond to the irreproducible studies using ML techniques are meet thresholds for scrutiny and verification.
crisis, thus averting also raising concerns regarding the challenges Similarly, if results cannot be verified, they
a complete loss of to reproduce AI-based experiments and the can contribute to inflated expectations,
reliability of AI-based results and discoveries165. exaggerated claims of accuracy, or research
trust in scientific
Together, these issues pose risks not just to outputs based on spurious correlations169. In
expertise by civil science, but also to society if the deployment the case of AI-based research, being able to
society.” of unreliable or untrustworthy AI technologies reproduce a study not only involves replicating
leads to harmful outcomes166. the method, but also being able to reproduce
Royal Society
the code, data, and environmental conditions
roundtable participant
under which the experiment was conducted
(eg computing, hardware, software)170,171.
164 Echterhölter A, Schröter J, Sudmann A. 2021 How is Artificial Intelligence Changing Science? Research in the Era
of Learning Algorithms. OSF Preprint (https://doi.org/10.33767/osf.io/28pnx)
165 Sohn E. 2023 The reproducibility issues that haunt health-care AI. Nature. 9 January 2023.
See https://www.nature.com/articles/d41586-023-00023-2 (accessed 21 December 2023)
166 Sambasivan N et al. 2021 “Everyone wants to do the model work, not the data work”: Data Cascades in
High-Stakes AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
(https://doi.org/10.1145/3411764.3445518)
167 Haibe-Kains B et al. 2020 Transparency and reproducibility in artificial intelligence. Nature. 586, E14–E16.
(https://doi.org/10.1038/s41586-020-2766-y)
168 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning
AI safety risks across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/
(accessed 7 May 2024)
169 Echterhölter A, Schröter J, Sudmann A. 2021 How is Artificial Intelligence Changing Science? Research in the Era
of Learning Algorithms. OSF Preprint (https://doi.org/10.33767/osf.io/28pnx)
170 Gundersen O, Gil Y and Aha D. 2018 On Reproducible AI: Towards Reproducible Research, Open Science, and
Digital Scholarship in AI Publications. AI Magazine. 39: 56-68. (doi.org/10.1609/aimag.v39i3.2816)
171 Gunderson O, Coakley K, Kirkpatrick C, and Gil Y. 2022 Sources of irreproducibility in machine learning: A review.
arXiv preprint. (doi.org/10.48550/arXiv.2204.07610)
Reproducibility failures do not only risk the Explainable AI (See Box 1) can help researchers
validity of the individual study172, but can identify errors in data, models, or assumptions
“There may be a
also affect research conducted for other – mitigating challenges such as data bias
studies, including those in other disciplines. – and ensure these systems produce high disproportionate
For example, a study led by the Center for quality results which can be used for real-world problem with
Statistics and Machine Learning at Princeton implementation176 (See Box 1). This can become machine learning.
University showed how ‘data leakage’ in a significant challenge for scientists who We’ve come very
one study (a leading cause of errors in ML integrate highly variable and complex models
far with the ability
applications due to errors in training data or into their work, such as deep learning models,
model features) may affect 294 papers across that are known to outperform less complex to handle huge
17 scientific fields, including high-stakes fields and more linear and transparent models. amounts of data,
like medicine173. Furthermore, these types of using software that
issues are likely to be underreported due to Opacity increases when models are developed is very competent
factors such as unpublished data; insufficient in a commercial setting. For instance, most
and well developed.
documentation; absence of mechanisms to leading LLMs are developed by large
report failed experiments; and high variability technology companies like Google, Microsoft, But I think perhaps
across experimentation or research contexts174. Meta, and OpenAI. These models are a lot of people using
proprietary systems, and as such, reveal limited it don’t actually
Opacity and the black-box nature of information about their model architecture, understand what
machine learning training data, and the decision-making
they’re doing in
At the core of the reproducibility challenge processes that would enhance understanding177.
are opaque ML-based models that not every a way that may
scientist can explain, interpret, or understand. not be so true for
ML models are commonly referred to as other areas. It’s
‘black-box models’ (models that can produce a compounded
useful information and outputs, even when
problem, where
researchers do not understand exactly how the
system works). The opaque nature of models there are many,
limits explainability and the ability of scientists many things you can
to interpret how ML models arrive at specific get wrong. I wonder
results or conclusions175. how many people
really understand
the software that
they’re using.”
172 McDermott M, Wang S, Marinsek N, Ranganath R, Foschini L, and Ghassemi M. 2021 Reproducibility in machine
learning for health research: Still a way to go. Sci. Transl. Med. 13, eabb1655. (doi.org/10.1126/scitranslmed.abb1655) Royal Society
173 Kapoor S and Narayanan A. 2023 Leakage and the reproducibility crisis in machine-learning-based roundtable participant
science. Patterns. 4(9) (doi.org/10.1016/j.patter.2023.100804)
174 Gundersen O, Gil Y and Aha D. 2018 On Reproducible AI: Towards Reproducible Research, Open Science, and
Digital Scholarship in AI Publications. AI Magazine. 39: 56-68. (doi.org/10.1609/aimag.v39i3.2816)
175 Royal Society. Royal Society response on Reproducibility and Research Integrity. See: https://royalsociety.org/news-
resources/publications/2021/research-reproducibility/ (accessed 7 March 2024)
176 The Royal Society. 2019 Explainable AI: the basics. See https://royalsociety.org/topics-policy/projects/explainable-ai/
(accessed 21 December 2023).
177 Bommasani et al. 2021. On the opportunities and risks of foundation models. See: https://crfm.stanford.edu/assets/
report.pdf (accessed March 21 2024)
BOX 1
178 Marcinkevičs, R., Vogt, J. E. 2023. Interpretable and explainable machine learning: A methods-centric overview with
concrete examples (https://doi.org/10.1002/widm.1493)
179 Lipton,. 2018. The mythos of model interpretability. Queue, 16(3), 31–57. (https://doi.org/10.1145/3236386.3241340)
180 The Royal Society. 2019 Explainable AI: the basics. See https://royalsociety.org/topics-policy/projects/explainable-ai/
(accessed 21 December 2023).
181 Li Z, Ji J, and Zhang Y. 2023 From Kepler to Newton: Explainable AI for science. arXiv preprint.
(doi.org/10.48550/arXiv.2111.12210)
182 McDermid J, Jia Y, Porter Z and Habli I. 2021 Artificial intelligence explainability: the technical and ethical
dimensions. Philosophical Transactions of the Royal Society A. 379(2207), 20200363. (doi.org/10.1098/rsta.2020.0363)
183 McGough M. 2018 How bad is Sacramento’s air, exactly? Google results appear at odds with reality, some say.
Sacramento Bee. 7 August 2018. See https://www.sacbee.com/news/california/fires/article216227775.html
(accessed 21 December 2023).
184 Lundberg S and Lee S. 2017 A unified approach to interpreting model predictions. In Proceedings
of the 31st International Conference on Neural Information Processing Systems. 4768–4777.
(dl.acm.org/doi/10.5555/3295222.3295230)
185 Cartwright H. 2023 Interpretability: Should – and can – we understand the reasoning of machine-learning systems?
In: OECD (ed.) Artificial Intelligence in Science. OECD. (doi.org/10.1787/a8d820bd-en)
186 Royal Society roundtable on reproducibility, April 2023.
187 Birhane A et al. 2023 Science in the age of large language models. Nat Rev Phys 5, 277–280.
(doi.org/10.1038/s42254-023-00581-4)
188 Bell A, Solano-Kamaiko I, Nov O, and Stoyanovich J. 2022 It’s Just Not That Simple: An Empirical Study of the
Accuracy-Explainability Trade-off in Machine Learning for Public Policy. In Proceedings of the 2022 ACM Conference
on Fairness, Accountability, and Transparency. Association for Computing Machinery. 248–266.
(doi.org/10.1145/3531146.3533090)
189 Miller K. 2021 Should AI models be explainable? That depends. Stanford University Human-Centered Artificial
Intelligence. See: https://hai.stanford.edu/news/should-ai-models-be-explainable-depends (accessed 21 December 2023).
190 Zhong X et al. 2022 Explainable machine learning in materials science. NPJ Comput Mater 8, 204.
(doi.org/10.1038/s41524-022-00884-7)
191 Combi C et al. 2022 A manifesto on explainability for artificial intelligence in medicine. Artificial intelligence in
medicine. 133, 102423. (doi.org/10.1016/j.artmed.2022.102423)
192 Hanson, B. Garbage in, garbage out: mitigating risks and maximizing benefits of AI in research.
See https://www.nature.com/articles/d41586-023-03316-8 (accessed 5 March 2024)
and environmental research193 for multiple • Knowledge graphs: Knowledge graphs are
purposes. These include enhancing scientific an advanced data structure that represents
“ One of the things
understanding derived from AI (eg better information in a network of interlinked
that is true with understanding of physical principles and entities. They reduce reliance on opaque
modelling is you generation of new hypothesis194); improving statistical patterns in training data for LLMs.
can get almost oversight and enforcement of environmental Medical LLMs, for example, can leverage
any result you protection regulations; and minimising the ontological biomedical data in knowledge
environmental footprint of AI systems195. graphs for transparent structured reasoning
want based on
about diseases and treatments. During
the assumptions • Glass-box architectures: Glass-box model
inference, LLMs consult knowledge graphs
architectures aim to make LLMs internal
you use to drive for relevant facts, providing a grounded
data representations more transparent
them. I think this is framework alongside their intrinsic
by incorporating attention mechanisms,
a dangerous area pattern recognition. Joint training with
modular structures, and visualisation tools
knowledge graphs improves LLMs’ factual
that our field is that can help surface how information flows
reasoning and aids in identifying gaps or
moving in. It’s too through layers of the neural network. In
misconceptions through audits198.
addition, augmented training techniques
much reliance on
like adversarial learning and contrastive
model results and Barriers limiting reproducibility
examples can probe the model’s decision
the pretty pictures Beyond technical challenges, there are a
boundaries. Analysing when the LLM
series of institutional and social constraints
that come out of it succeeds or fails on these special
that prevent researchers from adopting
as a reproduction training samples provides insights into
more rigorous and transparent processes.
its reasoning process196,197.
of truth.” Table 1 lists key barriers to reproducibility in
AI-based research199.
Royal Society
roundtable participant
193 Arashpour M. 2023 AI explainability framework for environmental management research. Journal of environmental
management. 342, 118149. (doi.org/10.1016/j.jenvman.2023.118149)
194 Zhong X et al. 2022 Explainable machine learning in materials science. npj Comput Mater 8, 204. (doi.org/10.1038/
s41524-022-00884-7)
195 Arashpour M. 2023 AI explainability framework for environmental management research. Journal of environmental
management. 342, 118149. (doi.org/10.1016/j.jenvman.2023.118149)
196 Lengerich, B J et al. 2023. LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs. arXiv
preprint (https://doi.org/10.48550/arXiv.2308.01157)
197 Garrett BL, Rudin C 2023. The Right to a Glass Box: Rethinking the Use of Artificial Intelligence in Criminal Justice.
Cornell Law Review, Forthcoming, Duke Law School Public Law & Legal Theory Series.
198 Gaur, M, Faldu, K, & Sheth, A 2021. Semantics of the black-box: Can knowledge graphs help make deep learning
systems more interpretable and explainable? IEEE Internet Computing, 25, 51-59.
199 Royal Society roundtable on reproducibility, April 2023.
TABLE 1
200 Benjamin D et al. 2018 Redefine statistical significance. Nat Hum Behav 2, 6–10. (doi.org/10.1038/s41562-017-0189-z)
201 Bommasani et al. 2021. On the opportunities and risks of foundation models. See: https://crfm.stanford.edu/assets/report.pdf (accessed March 21 2024)
202
Ibid.
203 Rudin C. 2019 Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1,
206–215). (doi.org/10.1038/s42256-019-0048-x)
204 Leonelli S. 2018 Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary
Morgan: curiosity, imagination, and surprise. 36, 129-146. Emerald Publishing Limited.
205 The Royal Society. 2018 Research culture: embedding inclusive excellence. See https://royalsociety.org/topicspolicy/
publications/2018/research-culture-embedding-inclusive-excellence/ (accessed 21 December 2023)
206 Leonelli S. 2018 Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary
Morgan: curiosity, imagination, and surprise. 36, 129-146. Emerald Publishing Limited.
207 Miller K. 2022 Healthcare algorithms don’t always need to be generalizable. Stanford University Human-Centered
Artificial Intelligence. See https://hai.stanford.edu/news/healthcare-algorithms-dont-always-need-be-generalizable
(accessed 21 December 2023).
208 Leonelli S. 2018 Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary
Morgan: curiosity, imagination, and surprise. 36, 129-146. Emerald Publishing Limited.
209 The Royal Society. 2017 Machine Learning: The power and promise of computers that learn by example.
See https://royalsociety.org/topics-policy/projects/machine-learning/ (accessed 21 December 2023).
210 Birhane A et al. 2023 Science in the age of large language models. Nat Rev Phys 5, 277–280.
(doi.org/10.1038/s42254-023-00581-4)
211 Center for Open Science. What is preregistration? See https://www.cos.io/initiatives/prereg
(accessed 21 December 2023).
212 Papers With Code. ML Reproducibility Challenge 2022. See https://paperswithcode.com/rc2022
(accessed 21 December 2023).
Guidance to produce documentation and • The release of data sheets and model
follow open science practices cards. Industry can play an important role in
• Reproducibility checklists and protocols. releasing information that provide insight into
Examples include the Machine Learning what a model does; its intended audience;
Reproducibility Checklist213, Checklist intended uses; potential limitations;
for AI in Medical Imaging (CLAIM)214, or confidence metrics; and information about
the field-agnostic REFORMS checklist215, the model architecture and the training data.
developed by experts in computer science, Meta219, Google220, and Hugging Face221 have
mathematics, social science, and health released different iterations of model cards.
research. These facilitate compliance and
• Context-aware documentation. Involving
documentation of the multiple dimensions
diverse actors in defining how reproducibility
of reproducibility.
is defined; promoting reporting mechanisms
• Community standards for documentation. that explicitly address contextual inputs
The development of domain-specific and sources of variation; and documenting
community standards such as TRIPOD-AI216 how local or team culture influences
provide guidance on how to document, implementation222.
report and reproduce machine-learning
based prediction model studies in health
research. The synthetic biology and
genomics community have also defined
experimental protocol standards and
documentation of the genomic workflow
to improve reproducibility217,218.
213 McGill School of Computer Science. The Machine Learning Reproducibility Checklist v2.0.
See: https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf (accessed 21 December 2023).
214 Mongan J, Moy L, and Kahn C. 2020 Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for
Authors and Reviewers. Radiology. Artificial intelligence, 2(2), e200029. (doi.org/10.1148/ryai.2020200029)
215 Reporting standards for ML-based science. See: https://reforms.cs.princeton.edu/ (accessed 21 December 2023).
216 Collins G et al. 2021 Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool
(PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.
BMJ open, 11(7), e048008. (doi.org/10.1136/bmjopen-2020-048008)
217 Lin X. 2020 Learning Lessons on Reproducibility and Replicability in Large Scale Genome-Wide Association Studies.
Harvard Data Science Review. 2. (doi.org/10.1162/99608f92.33703976)
218 Kanwal S et al. 2017 Investigating reproducibility and tracking provenance – A genomic workflow case study.
BMC Bioinformatics 18, 337. (doi.org/10.1186/s12859-017-1747-0)
219 Meta. 2022 System Cards, a new resource for understanding how AI systems work. Meta. 23 February 2022.
See https://ai.meta.com/blog/system-cards-a-new-resource-for-understanding-how-ai-systems-work/
(accessed 21 December 2023).
220 Google. Model Cards. See https://modelcards.withgoogle.com/about (accessed 21 December 2023).
221 Hugging Face. Model Cards. See https://huggingface.co/docs/hub/model-cards (accessed 21 December 2023).
222 Leonelli S. 2018 Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary
Morgan: curiosity, imagination, and surprise (Vol. 36, pp. 129-146). Emerald Publishing Limited.
Chapter three
Research skills and
interdisciplinarity
Left
Preparation of nanomaterials
for Scanning Electron
Microscope (SEM) machine.
© iStock / AnuchaCheechang.
237 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
238 UKRI NERC Environmental Data Service. See: https://eds.ukri.org/
239 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
240 The Royal Society roundtable on the role of interdisciplinarity in AI for scientific research, June 2023.
241 Bengio Y. 2020 Time to rethink the publication process in machine learning. See https://yoshuabengio.
org/2020/02/26/time-to-rethink-the-publication-process-in-machine-learning/ (accessed 10 January 2024)
242 Ibid.
243 Slow Science. See http://slow-science.org/ (accessed 10 January 2024)
244 The Royal Society interviews with scientists and researchers. 2022 - 2023
245 The Royal Society. 2023 Science in the metaverse: policy implications of immersive technologies.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/ (accessed 21 December 2023).
246 University of Birmingham. The Institute for Interdisciplinarity Data Science and AI.
See https://www.birmingham.ac.uk/research/data-science/index.aspx (accessed 11 January 2024)
247 Arizona State University. See: https://news.asu.edu/20230407-university-news-asu-college-integrative-sciences-
arts-reorganizes-3-new-schools. (accessed 13 December 2023)
248 The Royal Society. 2023 Science in the metaverse: policy implications of immersive technologies.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/ (accessed 21 December 2023).
249 The Royal Society roundtable on the role of interdisciplinarity in AI for scientific research, June 2023.
250 Wu T, Zhang, SH. 2024 Applications and Implication of Generative AI in Non-STEM Disciplines in Higher Education.
In: Zhao, F., Miao, D. (eds) AI-generated Content. AIGC 2023. Communications in Computer and Information
Science, vol 1946. Springer, Singapore. (doi.org/10.1007/978-981-99-7587-7_29)
251 The Royal Society. 2023 Science in the metaverse: policy implications of immersive technologies.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/ (accessed 21 December 2023).
252 Zeller F, Dwyer L. 2022 Systems of collaboration: challenges and solutions for interdisciplinary research in AI
and social robotics. Discover Artificial Intelligence, 2. 12. (https://doi.org/10.1007/s44163-022-00027-3)
253 UNESCO Recommendation on Open Science. 2021. See: https://www.unesco.org/en/legal-affairs/recommendation-
open-science (accessed 6 February 2024)
254 The Royal Society. 2019. Dynamics of data science skills: How can all sectors benefit from data science talent.
See https://royalsociety.org/-/media/policy/projects/dynamics-of-data-science/dynamics-of-data-science-skills-report.pdf
(accessed 6 January 2024)
255 World Economic Forum. 2023 The Future of Jobs Report 2023. See https://www3.weforum.org/docs/WEF_Future_
of_Jobs_2023.pdf (accessed 30 January 2024)
256 The Royal Society roundtable on reproducibility, April 2023
257 Cambridge Spark. See https://www.cambridgespark.com/ (accessed 1 August 2023)
258 Petkova, D, Roman, L. 2023 AI in science: Harnessing the power of AI to accelerate discovery and foster innovation
– Policy brief, Publications Office of the European Commission, Directorate-General for Research and Innovation.
(doi/10.2777/401605)
259 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/arXiv.2302.04844)
260 Montreal AI Ethics Institute. See https://montrealethics.ai/. (accessed 26 February 2024.)
261 The Royal Society. 2024 Insights from the Royal Society & Humane Intelligence red-teaming exercise on AI-generated
scientific disinformation. See: https://royalsociety.org/news-resources/projects/online-information-environment/
(accessed 7 May 2024)
262 OECD. 2023. Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research, OECD
Publishing, Paris (https://doi.org/10.1787/a8d820bd-en).
263 Archetti C, Montanelli A, Finazzi D, Caimi L, Garrafa E. Clinical laboratory automation: a case study. J Public Health
Res. 2017;6(1):881. (doi: 10.4081/jphr.2017.881)
264 Al Naam YA et al 2022 The Impact of Total Automaton on the Clinical Laboratory Workforce: A Case Study. J Health
Leadersh. 9;14:55-62. (doi:10.2147/JHL.S362614)
BOX 3
Insights from the Royal Society & Humane Intelligence red-teaming exercise
on AI-generated disinformation content
Red teaming refers to the process of While guardrails prevented some
actively identifying potential weaknesses, common disinformation trends, such as
failure modes, biases, or other those related to COVID-19, participants
limitations in a model, technology, or were still able to generate outputs
process by having groups ‘attack’ it. that distorted verifiable scientific facts
arriving at incorrect conclusions.
In the run-up to the UK’s 2023 Global
AI Safety Summit, the Royal Society and The exercise demonstrated the value
Humane Intelligence brought together of involving domain experts in AI safety
40 postgraduate students in health assessments before deployment. Their
and climate sciences to scrutinise scientific expertise allowed them to stress
how potential vulnerabilities in LLMs test systems in ways that exposed critical
(Meta’s Llama 2) could enable the failures. Participants also expressed
production of scientific misinformation. optimism regarding the future of LLM
disinformation guardrails and more
By assuming different ‘misinformation confidence in using LLMs in their own
actor’ roles, participants tested the model’s research. Their insights suggest that red
guardrails related to topics of infectious teaming could play a role in enhancing AI
diseases and climate change. In under literacy within the scientific community.
two hours, they exposed concerning
vulnerabilities, including the model’s inability
to convey scientific uncertainty and its
reliance on questionable or ficticious sources.
CASE STUDY 2
265 Davies et. al. 2016 Computational screening of all stoichiometric inorganic materials. Chem. 1, 617-627.
(https://doi.org/10.1016/j.chempr.2016.09.010).
266 Pyzer-Knapp et. al. 2023 Accelerating materials discovery using artificial intelligence, high performance computing
and robotics. npj Computational Materials. 8, 84. (https://doi.org/10.1038/s41524-022-00765-z ).
267 Materials Genome Initiative. About the Materials Genome Initiative. See https://www.mgi.gov/about
(accessed 14 July 2023).
268 Argaman N, Makov G. 2000 Density functional theory: An introduction. American Journal of Physics. 68, 69-79.
(https://doi.org/10.1119/1.19375).
269 Alberi et. al. 2019 The 2019 materials by design roadmap. Journal of Physics D: Applied Physics. 52, 013001.
(https://doi.org/10.1088/1361-6463/aad926).
270 Tao Q, Xu P, Li M, Lu W. 2021. Machine learning for perovskite materials design and discovery. npj Computational
Materials. 7, 23. (https://doi.org/10.1038/s41524-021-00495-8).
271 Ross et. al. 2022 Large-scale chemical language representations capture molecular structure and properties.
Nature Machine Intelligence. 4, 1256-1264. (https://doi.org/10.1038/s42256-022-00580-7).
272 Liu Y, Zhao T, Ju W, Shi S. 2017 Materials discovery and design using machine learning. Journal of Materiomics. 3,
159-177. (https://doi.org/10.1016/j.jmat.2017.08.002).
In recent years, there have been several There have been several success stories in
materials databases developed with the goal recent years of ML being used for materials
of aggregating data in consistent formats discovery, some examples of which are listed
which can then be used for further research. in Table 2. A variety of ML and AI techniques,
Examples include the Materials Project273 including generative AI, have been used to
and Aflow274 databases (which both contain identify materials with desired properties for
computed properties) and the Inorganic a wide range of applications. These have
Crystal Structure Database (ICSD)275 and the been integrated with established techniques
High Throughput Experimental Materials such as DFT, stability calculations and
(HTEM)276 database (which are both examples experiments to narrow down the predicted
of experimental databases). There are also materials280. Sustainability of proposed
tools to help with the creation and analysis materials could also be used as an objective
of materials datasets, such as NOMAD277, for predictive models281, to prevent new, more
ChemML278, and atomate279. These datasets complex materials being harder to recycle or
which can be significant in size (eg the dispose of safely.
Materials Project database currently contains
data for more than 150,000 materials),
have been facilitating the use of ML for
materials discovery.
273 Jain et. al. 2013 The Materials Project: A materials genome approach to accelerating materials innovation. APL
Materials. 1, 011002. (https://doi.org/10.1063/1.4812323).
274
Curtarolo et. al. 2012 AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab-initio
calculations. Computational Materials Science. 58, 227-235. (https://doi.org/10.1016/j.commatsci.2012.02.002).
275 Physical Sciences Data-Science Service. ICSD. See https://www.psds.ac.uk/icsd (accessed 14 July 2023).
276 Zakutayev et. al. 2018 An open experimental database for exploring inorganic materials. Scientific Data. 5, 180053.
(https://doi.org/10.1038/sdata.2018.53).
277 Draxl C, Scheffler M. 2019 The NOMAD laboratory: from data sharing to artificial intelligence. Journal of Physics:
Materials. 2, 036001. (https://doi.org/10.1088/2515-7639/ab13bb).
278 ChemML. See https://hachmannlab.github.io/chemml/ (accessed 14 July 2023).
279 Atomate. See https://atomate.org/ (accessed 14 July 2023).
280 DeCost et. al. 2020 Scientific AI in materials science: a path to a sustainable and scalable paradigm.
Machine Learning: Science and Technology. 1, 033001. (https://doi.org/10.1088/2632-2153/ab9a20).
281 Raabe D, Mianroodi J, Neugebauer J. 2023 Accelerating the design of compositionally complex materials
via physics-informed artificial intelligence. Nature Computational Science. 3, 198-209.
(https://doi.org/10.1038/s43588-023-00412-7)
TABLE 2
Researchers Result
Lyngby et. al. 282
Predicted 11,630 new, stable 2D materials.
Yao et. al.283 Found 2 new ‘invar alloys’ which have a low thermal expansion
and can be useful for several applications.
Vasylenko et. al.284 Identified 4 new materials, including materials that have desirable
properties for use in solid state batteries.
Sun et. al.285 An approach for pre-screening for new organic photovoltaic materials.
Stanev et. al.286 Identified >30 potential high-temperature superconducting materials.
282 Lyngby P, Sommer Thygesen K. 2022. Data-driven discovery of 2D materials by deep generative models. npj
Computational Materials. 8, 232. (https://doi.org/10.1038/s41524-022-00923-3).
283 Rao et. al. 2022 Machine learning-enabled high-entropy alloy discovery. Science. 378, 78-85.
(https://doi.org/10.1126/science.abo4940).
284 Vasylenko et. al. 2021 Element selection for crystalline inorganic solid discovery guided by unsupervised machine
learning of experimentally explored chemistry. Nature Communications. 12, 5561. (https://doi.org/10.1038/s41467-021-
25343-7).
285 Sun et. al. 2019. Machine learning-assisted molecular design and efficiency prediction for high-performance organic
photovoltaic materials. Science Advances. 5, 11. (https://doi.org/10.1126/sciadv.aay4275).
286 Stanev et. al. 2018. Machine learning modelling of superconducting critical temperature. npj Computational
Materials. 4, 29. (https://doi.org/10.1038/s41524-018-0085-8).
287 Stach et. al. 2021 Autonomous experimentation systems for materials development: A community perspective.
Matter. 4, 2702-2726. (https://doi.org/10.1016/j.matt.2021.06.036).
288 Stein H, Gregoire J. 2019 Progress and prospects for accelerating materials science with automated and
autonomous workflows. Chemical Science. 10, 9640. (https://doi.org/10.1039%2Fc9sc03766g)
289 Maruyama et. al. 2023 Artificial intelligence for materials research at extremes. MRS Bulletin. 47, 1154-1164.
(https://doi.org/10.1557/s43577-022-00466-4).
290 Nikolaev et. al. 2016 Autonomy in materials research: a case study in carbon nanotube growth. npj Computational
Materials. 2, 16031. (https://doi.org/10.1038/npjcompumats.2016.31).
291 De Volder M, Tawfick S, Baughman R, Hart J. 2013 Carbon Nanotubes: Present and Future Commercial Applications.
Science. 339, 535-539. (https://doi.org/10.1126/science.1222453).
Chapter four
Research, innovation
and the private sector
Left
Electronic circuits.
© iStock / onuma Inthapong.
Research, innovation
and the private sector
The large investment in AI by the private The role of the private sector in science
sector and its significance in scientific is also expanding as many companies
research present various implications. These contribute to provisioning essential
include the centralisation of critical digital resources like computational power, data
infrastructure292; the attraction of talent away access and novel AI technologies to the
from academia to the private sector293; and wider research community297.
challenges to open science294.
This chapter examines the growing role of
The influence of the private sector in the private sector in science, drawing on a
the development of AI for science is not commissioned review of the global AI patent
extraordinary. Historically, the automation landscape, which describes the distribution
of tasks has been driven by industry actors of ownership, development and impact of AI
in the pursuit of reduced labour costs and technologies. It also gathers perspectives from
greater scalability295. Today, the private a horizon-scanning workshop on AI security
sector continues to play a prominent role risks and a commissioned historical review.
in advancing scientific research, with many
companies having AI-driven scientific
programmes such as Alphabet’s Google
DeepMind and Microsoft’s AI for Science296.
292 Penn J. 2024. Historical review on the role of disruptive technologies in transforming science and society.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
293 Gofman M, Jin Z. 2022 Artificial Intelligence, Education, and Entrepreneurship. Journal of Finance, Forthcoming.
(https://doi.org/10.1111/jofi.13302)
294 Ibid.
295 Penn J. 2024. Historical review on the role of disruptive technologies in transforming science and society.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
296 Microsoft. Microsoft Research. AI4Science. See https://www.microsoft.com/en-us/research/lab/microsoft-research-
ai4science/ (accessed 21 December 2023)
297 Kak A, Myers West S, Whittaker M. 2023 Opinion: Make no mistake – AI is owned by Big Tech. MIT Technology
Review. See https://www.technologyreview.com/2023/12/05/1084393/make-no-mistake-ai-is-owned-by-big-tech/.
(accessed 21 December 2023)
298 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
resources/projects/science-in-the-age-of-ai/
299 £54 million boost to develop secure and trustworthy AI research. Gov.UK. See https://www.gov.uk/government/
news/54-million-boost-to-develop-secure-and-trustworthy-ai-research (accessed 21 December 2023)
300 Bass D. 2023 Microsoft invests $10 billion in ChatGPT maker OpenAI. Bloomberg. 23 January 2023.
See https://www.bloomberg.com/news/articles/2023-01-23/microsoft-makes-multibillion-dollar-investment-in-openai
(accessed 21 December 2023).
301 Targett E. 2023 Meta to spend up to $33 billion on AI, as Zuckerberg pledges open approach to LLMs. The Stack.
27 April 2023. See https://www.thestack.technology/meta-ai-investment/ (accessed 21 December 2023).
302 Ibid.
303 Intellectual property and your work. Gov.UK. See: https://www.gov.uk/intellectual-property-an-overview
(accessed March 22 2024)
BOX 4
304 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society.
See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
305 Ibid.
FIGURE 2
Incomplete data
40,000
35,000
30,000
INPADOC family count
25,000
20,000
15,000
10,000
5,000
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Earliest priority year
(Data for 2021 – 2023 is not complete given the 18-month delay from the priority filing date and the date of publication).
306 Ibid.
307 Grand View Research. See https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-ai-market#
(Accessed 21 December 2023)
308 IP Pragmatics, 2024 Artificial intelligence related inventions. The Royal Society.
See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
309 Nair M, Sethumadhaven A. 2022 AI in healthcare: India’s trillion-dollar opportunity. World Economic Forum.
See https://www.weforum.org/agenda/2022/10/ai-in-healthcare-india-trillion-dollar (accessed 21 December 2023)
310 Olcott E. 2022, China sets the pace in adoption of AI in healthcare technology. Financial Times. 31 January 2022.
See: https://www.ft.com/content/c1fe6fbf-8a87-4328-9e75-816009a07a59 (accessed 21 December 2023)
2. Global market shares in AI for science North America, with its rich concentration of
The use of AI in the science and technology firms and skilled professionals,
engineering market is being driven primarily dominates this market312. In Europe, Germany
by the demand for AI technology to drive leads, but the United Kingdom stands out
innovation and economic growth. As such, with a significant 14.7% share in the AI for life
there is a correlation between patent filing sciences market and has the region’s highest
trends and global market shares311. forecasted CAGR of 47.9%313.
FIGURE 3
Global distribution of the number of AI-related patent families by 1st priority country
KEY
311 BBC Research. 2022 Global Markets for Machine Learning in the Life Sciences. October 2022.
See https://www.bccresearch.com/market-research/healthcare/global-markets-for-machine-learning-in-life-sciences.
html (accessed 21 December 2023)
312 Ibid.
313 Ibid.
The UK, ranking 10th globally and 2nd in This case led to an adjustment in UKIPO’s
Europe for patent filings, demonstrates strong examination practices, removing specific
growth potential314. The UK Intellectual Property guidance on ANNs316. While this decision has
Office (UKIPO) adopts a more patentee-friendly been appealed and currently awaits review,
approach to examining computer-implemented recent rulings have reinforced the UK’s
and AI inventions compared to the European position as a preferred region for AI-related
Patent Office (EPO), as underscored by recent IP protection, bolstering its role as a key
decisions like Emotional Perception AI vs player in AI innovation.
Comptroller General of Patents315.
FIGURE 4
Rest of world
5.2%
North America
Asia-Pacific 43.8%
24.1%
Europe
26.8%
314 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society.
See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
315 Emotional perception ai ltd v comptroller-general of patents, designs and trademarks. 2023. Find case law –
The National Archives. See https://caselaw.nationalarchives.gov.uk/ewhc/ch/2023/29482948
(accessed 4 March 2024).
316 Examination of patent applications involving Artificial Neural Networks (ANN). Gov.UK.
See https://www.gov.uk/government/publications/examination-of-patent-applications-involving-artificial-neural-
networks/examination-of-patent-applications-involving-artificial-neural-networks-ann (accessed 4 March 2024).
However, the global landscape is marred by Despite a surge in African tech hubs, high IP
disparities. The costly and intricate patent registration expenses and lack of a unified
application processes, particularly in regions system hamper patenting317. Initiatives like the
like Africa, pose considerable barriers. Pan-African Intellectual Property Organisation
For example, patenting in Africa through aim to address these challenges, although
the African Regional Intellectual Property they currently face operational delays318.
Organisation costs over £29,000, significantly
higher than in the UK, priced at around £1,900.
FIGURE 5
Germany
16%
Rest of Europe
38.2%
UK
14.7%
France
12.2%
Spain
7.3%
Italy
11.5%
317 Lewis J, Schneegans S, Straza T. 2021 UNESCO Science Report: The race against time for smarter development.
UNESCO Publishing. See: https://unesdoc.unesco.org/ark:/48223/pf0000377250 (accessed 22 March 2024)
318 Ibid.
3. Key players in the AI for Science Challenges related to the role of the private
patent landscape sector in AI-based science
In terms of technological impact (indicated In addition to looking at patenting trends,
by the number of times that a patent the Royal Society explored the challenges of
is cited by a later patent or forward private sector involvement in AI-based scientific
citations) the US stands out for having research. Ahead of the Global AI Safety Summit
valuable patents. Comparatively, despite hosted by the United Kingdom in 2023, the
India’s significant growth in AI patent Royal Society and the UK’s Department for
filings, it has not yet achieved large Science, Innovation and Technology (DSIT)
technological impact. While the UK, though convened a horizon scanning workshop on
representing a smaller portion of the patent the safety risks of AI in scientific research322.
landscape, demonstrates research and Challenges identified include:
innovation influence, ranking among the
highest globally319. 1. Private sector dominance and centralisation
of AI-based science development
The analysis of the top 20 assignees in Centralisation of AI development under
AI-related patents underscores the active large technology firms (eg Google, Microsoft,
involvement of both industry and academic Amazon, Meta and Alibaba) could lead to
entities within the broader scientific and corporate dominance over infrastructure
engineering research sphere. Notably, critical for scientific progress. This includes
companies such as Canon, Alphabet, ownership over massive datasets for training
Siemens, IBM, and Samsung have emerged AI models, vast computing infrastructures,
as key contributors, with substantial patent and top AI talent323.
portfolios that wield considerable influence
across scientific and engineering domains. Centralisation can limit wider participation
Despite the dominance of commercial in steering the AI research agenda and can
entities in most regions, academic restrict a small number of decision-makers
institutions including the University of Oxford, to shape what research is conducted and
Imperial College London, and University of published from influential industrial labs. For
Cambridge feature prominently among the instance, the high-profile and controversial
top patent filers in the UK 320, suggesting dismissal of AI researcher Dr Timnit Gebru
blend of academic-industry collaboration from Google highlighted the opaque
and independent contributions321. internal decision-making in private sector
research units.
319 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society.
See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
320 Ibid.
321 Legislation.Gov.UK. Copyright, Designs and Patents Act 1988. See: https://www.legislation.gov.uk/ukpga/1988/48/contents
322 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety
risks across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/
(accessed 7 May 2024)
323 Kak A, Myers West S, Whittaker M. 2023 Opinion: Make no mistake – AI is owned by Big Tech. MIT Technology
Review. 5 December 2023 See https://www.technologyreview.com/2023/12/05/1084393/make-no-mistake-ai-is-
owned-by-big-tech/. (accessed 21 December 2023)
324 Hao K. 2020 We read the paper that forced Timnit Gebru out of Google. Here’s what it says. MIT Technology
Review. 4 December 2020. See https://www.technologyreview.com/2020/12/04/1013294/google-ai-ethics-research-
paper-forced-out-timnit-gebru/ (accessed 12 July 2023)
325 Hodak, M., Ellison, D., & Dholakia, A. (2020, August). Benchmarking AI inference: where we are in 2020. In
Technology Conference on Performance Evaluation and Benchmarking (pp. 93-102). Cham: Springer International
Publishing.
326 Ibid.
327 Ahmed N, Wahed M, Thompson, N. C. 2023. The growing influence of industry in AI research. Science, 379(6635),
884-886. (https://doi: 10.1126/science.ade2420)
328 The Royal Society interviews with scientists and researchers. 2022 - 2023
329 Penn J. 2024. Historical review on the role of disruptive technologies in transforming science and society.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
330 Westgarth, T., Chen, W., Hay, G., & Heath, R. 2022 Understanding UK Artificial Intelligence R&D commercialisation
and the role of standards. See https://oxfordinsights.com/wp-content/uploads/2023/10/DCMS_and_OAI_-_
Understanding_UK_Artificial_Intelligence_R_D_commercialisation__accessible-1.pdf (accessed 21 December 2023)
BOX 5
331 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
resources/projects/science-in-the-age-of-ai/
332 Google DeepMind. Technology: AlphaFold. See https://deepmind.google/technologies/alphafold/
(accessed 21 December 2023)
333 Borkakoti N, Thornton J.M., 2023. AlphaFold2 protein structure prediction: Implications for drug discovery.
Current opinion in structural biology, 78, p.102526 (https://doi.org/10.1016/j.sbi.2022.102526)
334 Jumper, J., et al. 2021 Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), pp.583-589.
(doi: 10.1038/s41586-021-03819-2)
335 IP Pragmatics. 2024 Artificial intelligence related inventions. The Royal Society. See https://royalsociety.org/news-
resources/projects/science-in-the-age-of-ai/
336 Ibid.
3. The private sector and open science Further approaches include changes to
The commercial incentives driving private legislation such as the requirements for
ownership of data for AI-based research social media companies to share data in
could restrict open science practices. the European Digital Services Act340 and
This limits non-industry scientists’ the principles for intervention to unlock the
ability to equitably contribute to and value of data across the economy in the
scrutinise data for AI systems alongside UK’s National Data Strategy341. Additionally,
industry counterparts. technical approaches include privacy
enhancing technologies342 and cyber-
Privately held data is often commercially security legislation to provide legal measures
sensitive and could necessitate non- and ensure safer hardware and software343.
disclosure agreements, potentially affecting
research integrity. Data considered low risk Open-source code and platforms do also
initially may later gain commercial value and offer some advantages to private sector
get withdrawn, as seen with some social organisations, including speed and cost-
media companies tightening data access effectiveness, but also have significant
following the surge of LLMs training on limitations including lack of support, security
public data337,338. risks, and compatibility. For example,
industrial partnerships for mutual benefits,
Alternative monetisation approaches such as the partnership between Siemens
like encouraging the licensing of data and Microsoft, can drive cross-industry AI
lakes and utilising database provisions adoption by sharing software, hardware and
can provide a more open and pragmatic talent344. During the COVID-19 pandemic,
approach to data sharing339. some private organisations relinquished
patent rights for the common good, with
leading technology companies donating
their patents to open-source initiatives345.
337 Isaac M. 2023 Reddit wants to get paid for helping to teach big AI systems. The New York Times. 18 April 2023.
See https://www.nytimes.com/2023/04/18/technology/reddit-ai-openai-google.html (accessed 21 December 2023).
338 Murphy H. 2023 Elon Musk rolls out paywall for Twitter’s data. The Financial Times. 29 April 2023.
See https://www.ft.com/content/574a9f82-580c-4690-be35-37130fba2711 (accessed 21 December 2023).
339 Grossman R L. 2019 Data lakes, clouds, and commons: A review of platforms for analyzing and sharing genomic
data. Trends in Genetics, 35(3), pp.223-234. (https://doi.org/10.1016/j.tig.2018.12.006)
340 European Commission. The Digital Services Act. See https://commission.europa.eu/strategy-and-policy/
priorities-2019-2024/europe-fit-digital-age/digital-services-act_en (accessed 5 February 2024)
341 Department for Science, Innovation and Technology. National Data Strategy. 5 December 2022.
See https://www.gov.uk/guidance/national-data-strategy (accessed 5 February 2024)
342 The Royal Society. 2023 From privacy to partnership. See https://royalsociety.org/topics-policy/projects/privacy-
enhancing-technologies/ (accessed 21 December 2023).
343 European Commission. Directive on measures for a high common level of cybersecurity across the Union (NIS2
Directive). See https://digital-strategy.ec.europa.eu/en/policies/nis2-directive (accessed 22 February 2024)
344 Siemens. 2024 Siemens and Microsoft partner to drive cross-industry AI adoption. See https://press.siemens.com/
global/en/pressrelease/siemens-and-microsoft-partner-drive-cross-industry-ai-adoption (accessed 26 February 2024)
345 UNESCO Recommendation on Open Science. 2021. See: https://www.unesco.org/en/legal-affairs/recommendation-
open-science (accessed 6 February 2024)
4. The private sector’s role in AI safety Universities can also play a crucial role
Private sector dominance in AI for science in advancing AI safety, by promoting
also poses challenges to AI safety. ethical research standards or incentivising
Organisations and institutions leading AI academic research on AI harms. However,
development often determine their own they do not have the same capabilities as
ability to assess harm, establish safeguards, large technology companies to institute
and safely release their models. As robust safeguards and best practices across
described by OpenAI in the paper behind all aspects of complex AI development.
the release of GPT-4, commercial incentives Recently, national governments have been
and safety considerations can come into placing greater significance on AI safety
tension with scientific values such as discussions. Since the Global AI Safety
transparency and open science practices346. Summit in November 2023, the UK has
launched the AI Safety Institute348 while the
Hugging Face, an open-source organisation, US announced the US AI Safety Institute
suggests evaluating the trade-offs for safe under the National Institute of Standards
and responsible release as illustrated in the and Technology (NIST)349.
Gradient of System Access347 (see Figure 6).
Similar frameworks can be considered
and developed by scientific communities
to assess the conditions under which
releasing training data is safe, allowing them
to contribute to scientific progress while
reducing potential for harm and misuse.
346 OpenAI et al. 2023 Gpt-4 technical report. arxiv 2303.08774. View in Article, 2, p.13. (https://doi.org/10.48550/
arXiv.2303.08774)
347 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the
2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/
arXiv.2302.04844)
348 Gov.UK. 2024 Introducing the AI Safety Institute. See https://www.gov.uk/government/publications/ai-safety-institute-
overview/introducing-the-ai-safety-institute )(accessed 26 February 2024)
349 NIST. 2024 U.S. Artificial Intelligence Safety Institute. See https://www.nist.gov/artificial-intelligence/artificial-
intelligence-safety-institute (accessed 26 February 2024)
FIGURE 6
Internal Community
CONSIDERATIONS
Limited Broader
perspectives perspectives
GATED TO PUBLIC
LEVEL OF ACCESS
Gradual /
Hosted Cloud-based
Fully closed staged Downloadable Fully open
access API access
release
Make-A-Video
(Meta)
350 Solaiman, I. 2023 The gradient of generative AI release: Methods and considerations. In Proceedings of the 2023 ACM Conference on Fairness,
Accountability, and Transparency (pp. 111-122). (https://doi.org/10.48550/arXiv.2302.04844)
Opportunities for cross-sector collaboration To counter this trend, initiatives like the UK’s
Cross-sector collaboration offers significant Life Sciences Innovative Manufacturing Fund356
“ The freedom,
opportunities, leveraging the innovative and (which includes £17 million in government
innovation and educational strengths of academia with the funding and a private investment of £260
creativity of resources and practical focus of industry351. million), demonstrate how government and
academia with Despite concerns about the patent system private investments can synergistically support
the resource and centralising AI development, it can also foster projects that drive innovation and economic
collaboration. Published patent applications growth357. This collaborative model not only
structure and
enhance technological transparency and fuels technological advancements but also
management provide a revenue stream that can support joint offers a platform for academia to engage in
of the private ventures between universities and industry. cutting-edge research while benefitting from
sector… it’s been industry resources.
completely However, the increasing presence of the
private sector in AI-based science funding Other partnerships could extend beyond
liberating.”
raises concerns that industry’s influence might financial aspects, encompassing joint research
Royal Society shift the focus from fundamental research to projects358, shared publications, and intellectual
interview participant applied science352. This shift could exacerbate exchanges at conferences or through
referring to joint the ‘brain drain’353, where a significant flow informal networks359. They also offer practical
academic-industry roles of AI talent leaves academia for the private engagement opportunities like internships
sector354, driven by higher salaries, advanced and sabbaticals, allowing academics to gain
resources and the opportunity to work on industry experience without departing from
practical applications355. their academic roles360.
351 Wright B et al. 2014 Technology transfer: Industry-funded academic inventions boost innovation. Nature 507,
297–299. https://doi.org/10.1038/507297a
352 Ibid.
353 Kunze L. 2019. Can we stop the academic AI brain drain? KI-Künstliche Intelligenz, 33(1), 1-3. (https://doi.org/10.1007/
s13218-019-00577-2)
354 Gofman M, Jin Z. 2022 Artificial Intelligence, Education, and Entrepreneurship. Journal of Finance, Forthcoming.
(https://doi.org/10.1111/jofi.13302)
355 UK universities alarmed by poaching of top computer science brains. Financial Times. 9 May 2018.
See https://www.ft.com/content/895caede-4fad-11e8-a7a9-37318e776bab (accessed 10 June 2023)
356 Life sciences companies supercharged with £277 million in government and private investment. Gov.UK
See https://www.gov.uk/government/news/life-sciences-companies-supercharged-with-277-million-in-government-
and-private-investment (accessed 26 February 2024)
357 Initial £100 million for expert taskforce to help UK build and adopt next generation of safe AI. Gov.UK
See https://www.gov.uk/government/news/initial-100-million-for-expert-taskforce-to-help-uk-build-and-adopt-next-
generation-of-safe-ai (accessed 26 February 2023
358 Evans JA (2010) Industry induces academic science to know less about more. Am J Sociol 116(2):389–452
359 Perkmann M, Walsh K (2007) University–industry relationships and open innovation: towards a research agenda.
Int J Manag Rev 9(4):259–280 (https://doi.org/10.1111/j.1468-2370.2007.00225.x)
360 Cohen WM, Nelson RR, Walsh JP. 2002 Links and impacts: the influence of public research on industrial R&D.
Manag Sci 48(1):1–23.( https://doi.org/10.1287/mnsc.48.1.1.14273)
Chapter five
Research ethics
and AI safety
Left
Carbon dioxide emissions.
© iStock / janiecbros.
364 Ghotbi N. 2024. Ethics of Artificial Intelligence in Academic Research and Education. In Second Handbook of
Academic Integrity (pp. 1355-1366). (https://doi.org/10.1007/978-981-287-079-7_143-1)
365 Lomas N. 2023 UK court tosses class-action style health data misuse claim against Google Deepmind. Tech Crunch.
19 May 2023. See https://techcrunch.com/2023/05/19/uk-court-tosses-class-action-style-health-data-misuse-claim-
against-google-deepmind (accessed 21 December 2023)
366 Brennan, J. 2023. AI assurance? Assessing and mitigating risks across the AI lifecycle. Ada Lovelace Institute. See
https://www.adalovelaceinstitute.org/report/risks-ai-systems/ (accessed September 30 2023)
367 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2022. Dual use of artificial-intelligence-powered drug discovery. Nature
Machine Intelligence, 4(3), 189-191. (https://doi.org/10.1038/s42256-022-00465-9)
368 UK Parliament POST. 2024. Policy implications of artificial intelligence (AI). https://researchbriefings.files.parliament.
uk/documents/POST-PN-0708/POST-PN-0708.pdf
369 Panch, T., Mattie, H. and Atun, R. 2019. Artificial intelligence and algorithmic bias: implications for health
systems. Journal of global health, 9(2).
370 Celi, L.A et al,. 2022. Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global
review. PLOS Digital Health, 1(3), p.e0000022.
371 Royal Society and Department for Science, Innovation, and Technology workshop on horizon-scanning AI safety
risks across scientific disciplines, 2023.
372 Checco A, Bracciale L, Loreti P, Pinfield S, Bianchi G. 2021 AI-assisted peer review. Humanities and Social Sciences
Communications, 8(1), pp.1-11. (https://doi.org/10.1057/s41599-020-00703-8 )
373 The Royal Society roundtable on Large Language Models, July 2023
374 Checco A, Bracciale L, Loreti P, Pinfield S, Bianchi G. 2021 AI-assisted peer review. Humanities and Social Sciences
Communications, 8(1), pp.1-11. (https://doi.org/10.1057/s41599-020-00703-8 )
375 Heaven D. 2022 AI peer reviewers unleashed to ease publishing grind. Nature. 22 November 2018.
See https://www.nature.com/articles/d41586-018-07245-9 (accessed 27 March 2023)
376 Checco A, Bracciale L, Loreti P, Pinfield S, Bianchi G. 2021 AI-assisted peer review. Humanities and Social Sciences
Communications, 8(1), pp.1-11. (https://doi.org/10.1057/s41599-020-00703-8 )
377 Chawla, D.S. 2022, Should AI have a role in assessing research quality?. Nature. (DOI: 10.1038/d41586-022-03294-3)
378 Research on Research Institute. See https://researchonresearch.org/project/grail/ (accessed 5 January 2024)
379 Hugging Face. Documentation of BLOOM. See: https://huggingface.co/docs/transformers/model_doc/bloom
(accessed 21 December 2023)
380 Hugging Face. BigScience Ethical Charter. See: https://bigscience.huggingface.co/blog/bigscience-ethical-charter
(accessed 21 December 2023)
381 Wang H et al. 2023 Scientific discovery in the age of artificial intelligence. Nature, 620. 47-60. (https://doi.
org/10.1038/s41586-023-06221-2)
382 Bommasani et al. 2021. On the opportunities and risks of foundation models. See: https://crfm.stanford.edu/assets/
report.pdf (accessed March 21 2024)
383 Pan Y, Pan L, Chen W, Nakov P, Kan M Y, Wang W Y. 2023. On the Risk of Misinformation Pollution with Large
Language Models. arXiv preprint (arXiv:2305.13661)
For example, Meta’s LLM for science, documented examples of malicious use of AI,
Galactica, was trained on 48 million scientific is the development of chemical and biological
articles, websites, textbooks, and other inputs weapons using AI systems that have
to help researchers summarise the literature, beneficial applications for scientific research.
generate academic papers, write scientific
code and annotate data (eg, molecules and In 2020, the company Collaborations
proteins). However, the demo was paused Pharmaceuticals, a biopharma company that
after three days of use. One of the largest builds ML models to assist drug discovery
risks posed by Galactica was how confidently and the treatment of rare diseases,
it produced false information and the lack of published results on what they have called
guidelines to identify it384. a ‘teachable moment’ regarding the use
of AI-powered drug discovery methods.
As with other forms of misinformation, Following an invitation from the Swiss
hallucinations can erode public trust in Federal Institute for NBC (nuclear, biological,
science385. Methods for AI validation and and chemical) protection, the company
disclosure, such as watermarking or content trained an AI-powered molecule generator
provenance technologies386, are being used for drug discovery to generate toxic
explored to enable the detection of AI- molecules within a specified threshold of
generated content and mitigate potential toxicity390. Drawing from a public database,
harms caused by hallucinations387,as well and in less than 6 hours, the model had
as, to ensure public trust in emerging generated 40,000 molecules. Many of these
AI systems388. molecules were similar or more toxic than
the nerve agent VX, a banned and highly
3. Dual use of AI technologies developed toxic lethal chemical weapon.
for science
The dual use of AI systems refers to situations While the theoretical generation of toxic
in which a system developed for a specific molecules does not imply their production
use is then appropriated or modified for is viable or feasible, the experiment shows
a different use. Malicious use refers to how AI can speed up the process of
applications in which the intent is to cause creating hazardous substances, including
harm389. Among the most prominent and lethal bioweapons391. The company has
384 MIT Review. Why Meta’s latest large language model survived only three days online 2022.
See: https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-
days-gpt-3-science/ (accessed September 30 2023)
385 Bontridder, N. and Poullet, Y., 2021. The role of artificial intelligence in disinformation. Data & Policy, 3, p.e32.
(doi:10.1017/dap.2021.20)
386 The Royal Society. Generative AI, content provenance and a public service internet. See: https://royalsociety.org/
news-resources/publications/2023/digital-content-provenance-bbc/
387 Watermarking refers to techniques that can embed identification information into the original data, model, or content
to indicate provenance or ownership.
388 Partnership on AI. PAI’s Responsible Practices for Synthetic Media. See: https://syntheticmedia.partnershiponai.
org/#read_the_framework (accessed 21 December 2023)
389 Ueno, H. 2023. Artificial Intelligence as Dual-Use Technology. In Fusion of Machine Learning Paradigms: Theory
and Applications (pp. 7-32). Cham: Springer International Publishing. (https://doi.org/10.1007/978-3-031-22371-6_2)
390 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2022. Dual use of artificial-intelligence-powered drug discovery. Nature
Machine Intelligence, 4(3), 189-191. (https://doi.org/10.1038/s42256-022-00465-9)
391 Sohn R. 2022.AI Drug Discovery Systems Might Be Repurposed to Make Chemical Weapons, Researchers Warn.
Scientific American [Internet]. 21 April 2022. See https://www.scientificamerican.com/article/ai-drug-discovery-
systems-might-be-repurposed-to-make-chemical-weapons-researchers-warn/ (accessed 21 December 2023)
392 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2022. Dual use of artificial-intelligence-powered drug discovery.
Nature Machine Intelligence, 4(3), 189-191. (https://doi.org/10.1038/s42256-022-00465-9)
393 Collins K et al. 2023. ‘Human Uncertainty in Concept-Based AI Systems.’ Paper presented at the Sixth AAAI/ACM
Conference on Artificial Intelligence, Ethics and Society (AIES 2023), August 8-10, 2023. Montréal, QC, Canada.
394 Su J, Vargas D V, Sakurai K. 2019 One pixel attack for fooling deep neural networks. IEEE Transactions on
Evolutionary Computation, 23(5), pp.828-841. (DOI: 10.1109/TEVC.2019.2890858)
395 Verde L, Marulli F, Marrone S. 2021 Exploring the impact of data poisoning attacks on machine learning model
reliability. Procedia Computer Science, 192, pp.2624-2632. (https://doi.org/10.1016/j.procs.2021.09.032)
396 Xu Y at al. 2021 Artificial intelligence: A powerful paradigm for scientific research. The Innovation, 2(4).
(https://doi.org/10.1016/j.xinn.2021.100179)
397 Liu Y, Ning P, Reiter M K. 2011 False data injection attacks against state estimation in electric power grids. ACM
Transactions on Information and System Security (TISSEC), 14(1), pp.1-33. (https://doi.org/10.1145/1952982.1952995)
398 Verde L, Marulli F, Marrone S. 2021 Exploring the impact of data poisoning attacks on machine learning model
reliability. Procedia Computer Science, 192, pp.2624-2632. (https://doi.org/10.1016/j.procs.2021.09.032)
399 Henderson P, Hu J, Romof J, Brunskill E, Jurafsky D, Pineau J (2020) Towards the systematic reporting of the energy
and carbon footprints of machine learning. J Mach Learn Res 21(248):1–43
400 Lannelongue L et al. 2023 GREENER principles for environmentally sustainable computational science.
Nat Comput Sci 3, 514–521. (doi.org/10.1038/s43588-023-00461-y)
401 Patterson D et al. 2021 Carbon emissions and large neural network training. arXiv preprint. (doi.org/10.48550/
arXiv.2104.10350)
402 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023).
403 Chithra, J, Vijay, A, & Vieira, D 2014. A study of green computing techniques. Int. J. Comput. Sci. Inf. Technol.
404 Green Your Lab Network. See: https://network.greenyourlab.org/ (accessed March 21 2024)
405 The Royal Society roundtable on AI and climate science, June 2023.
406 More than 50,000 companies to report climate impact in EU after pushback fails. Financial Times. 18 October 2024.
See: https://www.ft.com/content/a3216188-8e50-4a62-a8d9-e89172b3ddc7 (accessed March 21 2024)
407 The Royal Society. 2023 Science in the metaverse: policy implications of immersive technologies.
See https://royalsociety.org/news-resources/publications/2023/science-in-the-metaverse/ (accessed 21 December 2023).
408 The Royal Society interviews with scientists and researchers. 2022 - 2023
409 Penn J. 2024. Historical review on the role of disruptive technologies in transforming science and society.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
410 Taylor A. 2018 The Automation Charade. LogicMag. 1 August 2018. See: https://logicmag.io/05-the-automation-
charade/ (accessed February 28 2024)
411 World Economic Forum, These are the jobs most likely to be lost – and created – because of AI.
See: https://www.weforum.org/agenda/2023/05/jobs-lost-created-ai-gpt/ (accessed February 24 2024)
412 Barret, M. The dark side of AI: algorithmic bias and global inequality. See: https://www.jbs.cam.ac.uk/2023/the-dark-
side-of-ai-algorithmic-bias-and-global-inequality/ (accessed December 10 2023)
413 Penn J. 2024. Historical review on the role of disruptive technologies in transforming science and society.
The Royal Society. See https://royalsociety.org/news-resources/projects/science-in-the-age-of-ai/
414 Wang H et al. 2023 Scientific discovery in the age of artificial intelligence. Nature, 620. 47-60.
(https://doi.org/10.1038/s41586-023-06221-2)
415 Kazim E, Koshiyama A S. 2021 A high-level overview of AI ethics. Patterns, 2(9). (https://doi.org/10.1016/j.patter.2021.100314)
416 Weidinger L et al. 2022 Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM
Conference on Fairness, Accountability, and Transparency (pp. 214-229). (https://doi.org/10.1145/3531146.3533088)
417 Solaiman, I et al. 2023. Evaluating the Social Impact of Generative AI Systems in Systems and Society. arXiv preprint
(arXiv:2306.05949)
418 The Organisation for the Prohibition of Chemical Weapons (OPCW). The Hague Ethical Guidelines.
See: https://www.opcw.org/hague-ethical-guidelines (accessed 28 February 2024)
419 UNESCO Recommendation on ethics of artificial intelligence. 2022. See: https://www.unesco.org/en/articles/
recommendation-ethics-artificial-intelligence (accessed 6 February 2024)
420 United Nations, Office of Disarmament Affairs. Confidence Building Measures. See: https://disarmament.unoda.org/
biological-weapons/confidence-building-measures/ (accessed 21 December 2023)
421 Shoker S et al. 2023 Confidence-building measures for artificial intelligence: Workshop proceedings. arXiv preprint
(arXiv:2308.00862)
422 Urbina F, Lentzos F, Invernizzi C, Ekins S. 2023 Preventing AI From Creating Biochemical Threats.
Journal of Chemical Information and Modeling, 63(3), 691-694 (https://doi.org/10.1021/acs.jcim.2c01616)
CASE STUDY 3
As AI and ML capabilities become further If done successfully, this fusion of datasets can
integrated into climate science research and improve the accuracy of models and estimates,
applications, they are expanding the capacity contributing to long-term weather forecasting,
of scientists and policy makers to mitigate the which supports disaster preparedness and
climate crisis423. resource management for extreme events427.
423 Huntingford C, Jeffers E S, Bonsall M B, Christensen H M, Lees T, Yang H. 2019 Machine learning and artificial
intelligence to aid climate change research and preparedness. Environmental Research Letters, 14(12), 124007.
(DOI 10.1088/1748-9326/ab4e55)
424 National Academy of Sciences. 2024 Toward a New Era of Data Sharing: Summary of the US-UK Scientific Forum
on Researcher Access to Data. Washington, DC: The National Academies Press. https://doi.org/10.17226/27520.
425 Kadow, C, Hall, DM, Ulbrich, U, 2020. Artificial intelligence reconstructs missing climate information. Nature
Geoscience, 13, pp.408-413. (https://doi.org/10.1038/s41561-020-0582-5)
426 NASA Centre for Climate Simulation. See https://www.nccs.nasa.gov/news-events/nccs-highlights/
acceleratingScience. (Accessed 21 December 2023)
427 Buizza, C at al 2022. Data learning: Integrating data assimilation and machine learning. Journal of Computational
Science, 58, p.101525. (https://doi.org/10.1016/j.jocs.2021.101525)
428 Ise, T, Oba, Y. 2019. Forecasting climatic trends using neural networks: an experimental study using global historical
data. Frontiers in Robotics and AI, 32. (https://doi.org/10.3389/frobt.2019.00032)
429 Ham, YG, Kim, JH, Luo, JJ. 2019. Deep learning for multi-year ENSO forecasts. Nature, 573, 568-572.
(https://doi.org/10.1038/s41586-019-1559-7)
430 Rasp, S, Pritchard, MS., & Gentine, P. 2018. Deep learning to represent sub grid processes in climate
models. Proceedings of the National Academy of Sciences, 115, 9684-9689. (https://doi.org/10.1073/pnas.181028611)
431 Zheng, G, Li, X, Zhang, RH, Liu, B 2020. Purely satellite data–driven deep learning forecast of complicated tropical
instability waves. Science advances, 6, eaba1482. (DOI: 10.1126/sciadv.aba1482)
432 Bi, K, Xie, L, Zhang, H, Chen, X, Gu, X, Tian, Q. 2023. Accurate medium-range global weather forecasting with 3D
neural networks. Nature, 619, 533-538. (https://doi.org/10.1038/s41586-023-06185-3)
433 Wong, C. 2023. DeepMind AI accurately forecasts weather-on a desktop computer. Nature. 14 November 2023
(https://doi.org/10.1038/d41586-023-03552-y)
434 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023).
435 The Royal Academy of Engineering 2020 Net Zero: A systems perspective on the climate challenge.
See raeng.org.uk/publications/reports/net-zero-a-systems-perspective-on-the-climate-chal
(accessed 14 October 2020)
436 The Royal Society. 2021 Computing for net zero: how digital technology can create a ’control loop for the protection
of the planet’. See https://royalsociety.org/-/media/policy/projects/climate-change-science-solutions/climate-science-
solutions-computing.pdf (accessed 21 December 2023)
437 Abrell J, Kosch M, Rausch S (2019) How effective was the UK carbon tax?—A machine learning approach to policy
evaluation. SSRN Scholarly Paper ID 3372388. Social Science Research Network, Rochester. 10.2139/ssrn.3372388
438 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023)
439 The Royal Society. 2020 Digital technology and the planet: Harnessing computing to achieve net zero.
See https://royalsociety.org/topics-policy/projects/digital-technology-and-the-planet/ (accessed 21 December 2023).
440 The European Space Agency. Destination Earth. See https://www.esa.int/Applications/Observing_the_Earth/
Destination_Earth. (accessed 21 December 2023)
441 Accenture. Case Study: Tuvalu. See https://www.accenture.com/us-en/case-studies/technology/tuvalu.
(accessed 21 December 2023)
442 Henderson P, Hu J, Romof J, Brunskill E, Jurafsky D, Pineau J 2020. Towards the systematic reporting of the energy
and carbon footprints of machine learning. J Mach Learn Res 21:1–43
443 Royal Society and Department for Science, Innovation and Technology workshop on horizon scanning AI safety risks
across scientific disciplines, October 2023. See https://royalsociety.org/current-topics/ai-data/ (accessed 7 May 2024)
444 AbdulRafiu A, Sovacool B K, Daniels C. 2022 The dynamics of global public research funding on climate change,
energy, transport, and industrial decarbonisation. Renewable and Sustainable Energy Reviews, 162, 112420.
(https://doi.org/10.1016/j.rser.2022.112420)
445 Grantham Research Institute on Climate Change and the Environment. What opportunities and risks does AI present
for climate action? See: https://www.lse.ac.uk/granthaminstitute/explainers/what-opportunities-and-risks-does-ai-
present-for-climate-action/
446 Zipper S C et al. 2019 Balancing open science and data privacy in the water sciences. Water Resources Research,
55, 5202-5211.(https://doi.org/10.1029/2019WR025080)
447 Donovan K P. 2012 Seeing like a slum: Towards open, deliberative development. Georgetown Journal of
International Affairs.
Environmentally sensitive data can also 2. Improving global researcher access to data
adversely impact the environment448. For The disparity in researcher access to
example, sharing biodiversity data, such as data raises concerns about the equitable
nesting locations of rare birds, can lead to development and application of AI452.
bad actors harming those environments449. This could hinder the development of
effective climate solutions tailored to the
Strategies for ethical AI-based research unique challenges of specific communities.
practices in climate science Networks such as the Pacific Community’s
1. Pursuing energy proportionality Statistics for Development Division can
Develop strategies to ensure that promote equitable access to data across
technologies developed in pursuit of net diverse contexts, fostering collaboration
zero deliver environmental benefits that and knowledge sharing453. Similarly, the
outweigh their emissions450. Interdisciplinary establishment of trusted data institutions
research on carbon accounting and impact can contribute towards enhancing data
assessment tools like the Green Algorithms sharing and usage to address emergencies
Project451 can contribute towards evaluating and crises454 455.
and mitigating the environmental impact
of computational processes used in
3. Contextualising data governance
climate science.
Universal approaches to open data do not
always engage with minority groups’ rights
and interests. Existing data sharing principles
like FAIR (findable, accessible, interoperable,
reusable)456 can be complemented by
people and purpose-oriented governance
principles like the CARE Principles for
Indigenous Data Governance (collective
benefit, authority to control, responsibility,
ethics) that considers a broader approach
to sensitive data457.
Conclusion
Left
Graphcore Stereo Image Matching
Benchmark, October 2015.
Conclusion
As explored throughout the report, the Moving forward, and according to the findings
applications of AI in scientific research are of this report, three areas of action require
bringing a new age of possibilities and attention from scientific communities and
challenges. The transformative potential relevant policy makers.
of AI, fuelled by big data and advanced
techniques, offers substantial opportunities The first is to address issues of access and
across domains. From mapping deforestation capability to use AI in science. Access to
to aiding drug discovery and predicting computing resources, high quality datasets,
rare diseases, the applications are vast and AI tools and relevant expertise is critical to
promising. Through the case studies on achieve scientific breakthroughs. At the time of
climate science, material science, and rare publication, access to essential infrastructures
disease diagnosis, this report envisions a remained unequally distributed. This, coupled
future in which AI can be a powerful tool for with a growing influence of the private
scientific researchers. sector as highlighted in Chapter 4 can have
implications on the future of university-based
However, these opportunities bring about a AI research. Another challenge in this area is
series of challenges related to reproducibility, knowledge siloes between AI experts and
interdisciplinary collaboration, and ethics. scientific domain experts (Chapter 3). To ensure
Finding a balance in which scientists can equitable distribution of AI across research
harness the benefits of automation and the communities, actions need to go beyond
accelerated pace of discovery while ensuring facilitating access, and focus on enhancing
research integrity and responsible use of AI capabilities to collaborate, co-design and
will be essential. Following the Royal Society’s use AI across different scientific fields and
commitment to ensuring science – and this research environments.
case AI – is applied for the benefit of humanity,
the report calls for collective efforts in Second, open science principles and practices
addressing these challenges. offer a clear pathway to improve transparency,
reproducibility, and public scrutiny – all of
which have proven challenging in AI-based
scientific projects. As stressed in Chapter 2,
the stakes of not addressing these issues are
high, posing risks not just to science but also
to society if the deployment of unreliable or
erroneous AI-based outputs leads to harms.
Further work is needed to understand the
interactions between open science and AI for
science and how to best minimise safety and
security risks stemming from the open release
of models and data.
Appendices
Left
Microsoft Research ResNet-18
Training, April 2017.
APPENDIX 1
Box 3 Insights from the Royal Society & Humane Intelligence red-teaming
exercise on AI-generated disinformation content 57
APPENDIX 2
Name Organisation
Dorothy Bishop FRS University of Oxford
Odd Erik Gunderson Norwegian University of Science and Technology; Aneo
Sayash Kapoor Princeton University
Mark Kelson University of Exeter
Rebecca Kirk PLOS
Sabina Leonelli University of Exeter
Ralitsa Madsen University of Dunde; UK Committee on Research Integrity
Victoria Moody JISC
Joelle Pineau McGill University; Meta AI
Susanna-Assunta Sanson University of Oxford
Malvika Sharan Alan Turing Institute; Open Life Science
Joaquin Vanschoren Eindhoven University of Technology; OpenML
Name Organisation
Anna-Louise Ellis Met Office
Jane Francis FRS British Antarctic Survey
Anna Hogg University of Leeds
Scott Hosking British Antarctic Survey; The Alan Turing Institute
Konstantin Klemmer Microsoft Research
Joycelyn Longdon University of Cambridge; Climate in Colour
Shakir Mohamed Google DeepMind
Alistair Nolan OECD
Tim Palmer University of Oxford
Suman Ravuri Google DeepMind
Emily Shuckburgh University of Cambridge
Philip Stier University of Oxford
Dave Topping University of Manchester
Richard Turner University of Cambridge
Name Organisation
Ankit Agrawal Northwestern University
Seth Baum Global Catastrophic Risk Institute
Michael Castelle University of Warwick
Claude Chelala Queen Mary University of London
Gareth Conduit University of Cambridge
James Dracott UKRI
Victoria Henickx KU Leuven
Georgios Leontidis The University of Aberdeen
Alison Noble FRS University of Oxford
Alistair Nolan OECD
Bradley Love University College London
Cecilia Mascolo University of Cambridge
Raffaella Mulas Vrije Universiteit Amsterdam
Mirco Musolesi Univerity College London
Daniele Quercia King's College London; Nokia Bell Lab Cambridge
Verena Reiser Google DeepMind
Reuben Shipway University of Plymouth
Tommaso Venturini University of Geneva; CNRS
Hujun Yin University of Manchester
Name Organisation
Seth Baum Global Catastrophic Risk Institute
Andrew Blake FRS Scientific advisor and AI consultant, University of Cambridge
Phil Blunsom University of Oxford
Anthony Cohn University of Leeds; The Alan Turing Institute
Jeff Dalton University of Glasgow
Yarin Gal University of Oxford
Gabe Gomes Carnegie Mellon University
Andres Guadamuz University of Sussex
Atoosa Kasirzadeh University of Edinburgh
Samuel Kaski University of Manchester; Aalto University
Hannah Kirk University of Oxford; The Alan Turing Institute
Gary Marcus New York University
Jessica Montgomery University of Cambridge
Denis Newman-Griffis University of Sheffield
Alison Noble FRS University of Oxford
Alistair Nolan OECD
Johan Ordish Medicines and Healthcare products Regulatory Agency (MHRA)
Michael Osborne University of Oxford; Mind Foundry
Matthias Rillig Freie Universität Berlin
Edward Tian GPT Zero
Michael Woolridge University of Oxford
Workshop on horizon scanning AI safety risks across scientific disciplines, October 2023
Ahead of the Global AI Safety Summit, being organised by the UK Government, the Royal Society
will be hosting an official pre-Summit workshop in partnership with the Department for Science,
Innovation and Technology. The event brought together senior scientists from academia and
industry to horizon-scan the risks associated with AI across scientific disciplines.
Name Organisation
Alessandro Abate University of Oxford
Andrew Blake FRS Scientific advsior and AI consultant, University of Cambridge
Craig Butts University of Bristol
Lee Cronin University of Glasgow
Gwenetta Curry University of Edinburgh
Christl Donnelly FRS Imperial College London
Anthony Finkelstein City, University of London
Jacques Fleuriot University of Edinburgh
Ben Glocker Imperial College London
Julia Gog University of Cambridge
Cathy Holloway University College London
Caroline Jay University of Manchester
Alexander Kasprzyk University of Nottingham
Frank Kelly FRS Imperial College London
Georgia Keyworth Department for Science, Innovation and Technology
Bradley Love University College London
Carsten Maple University of Warwick
Alexandru Marcoci Centre for the Study of Existential Risk, University of Cambridge
Chris Martin Department for Science, Innovation and Technology
Cecilia Mascolo University of Cambridge
Emran Mian Department for Science, Innovation and Technology
Daniel Mortlock Imperial College London
Gina Neff University of Oxford
Cassidy Nelson Centre for Long Term Resilience
Alison Noble FRS University of Oxford
Alistair Nolan OECD
Abigail Sellen FRS Microsoft Research Cambridge
Karen Tingay Office for Statistics Regulation
Daniel Tor Department for Science, Innovation and Technology
Hujun Yin University of Manchester
Name Organisation
Steven Abel Durham University
Paul Beasley Siemens
Viscount Camrose House of Lords, DSIT
Sarah Chan University of Edinburgh
Linjiang Chen University of Birmingham
Peter Falkingham Liverpool John Moores University
Tom Fiddian Innovate UK
Michael Fisher University of Manchester
Seraphina Goldfarb-Tarrant Cohere
Sabine Hauert University of Bristol
Richard Horne British Antarctic Survey
Scott Hosking British Antarctic Survey; The Alan Turing Institute
Rohan Kemp Department for Science, Innovation and Technology
Ottoline Leyser UK Research and Innovation
Richard Mallah Future of Life Institute
Thomas Nowotny University of Sussex
Yannis Pandis Pfizer
Maria Perez-Ortiz University College London
Nathalie Pettorelli Zoological Society of London
Reza Razavi King’s College London
Yvonne Rogers FRS University College London
Sophie Rose Centre for Long Term Resilience
Stuart Russell UC Berkeley, Future of Life Institute
Rossi Setchi Cardiff University
Nigel Shadbolt FRS University of Oxford
Shaarad Sharma Government Office for Science
Mihaela van der Schaar University of Cambridge
Mark Wilkinson University of Sheffield
Study on red teaming LLM’s for resilience to scientific disinformation, October 2023
Ahead of the Global AI Safety Summit, being organised by the UK Government, the Royal Society
and Humane Intelligence brought together 40 postgraduate students in health and climate
sciences to scrutinise how potential vulnerabilities in LLMs (Meta’s Llama 2) could enable the
generation and spread of scientific misinformation (See Royal Society website for more information).
APPENDIX 3
Acknowledgements
Working Group members
The members of the Working Group involved in this report are listed below. Members acted in
an individual and not a representative capacity and declared any potential conflicts of interest.
Members contributed to the project based on their own expertise and good judgement.
Chair
Professor Alison Noble CBE FREng FRS, Foreign Secretary of the Royal Society,
and Technikos Professor of Biomedical Engineering, University of Oxford.
Members
Professor Paul Beasly, Head of Research and Development, Siemens.
Dr Peter Dayan FRS, Director, Max Plack Institute for Biological Cybernetics.
Professor Sabina Leonelli, Professor of Philosophy and History of Science, University of Exeter.
Alistair Nolan, Senior Policy Analyst, Organisation for Economic Co-operation and Development.
Dr Philip Quinlan, Director of Health Informatics, University of Nottingham.
Professor Abigail Sellen FRS, Distinguished Scientist and Lab Director, Microsoft Research.
Professor Rossi Setch, Professor in High Value Manufacturing, Cardiff University.
Kelly Vere, Director of Technical Strategy, University of Nottingham
Reviewers
This report has been reviewed by expert readers and by an independent Panel of experts, before
being approved by Officers of the Royal Society. The Review Panel members were not asked to
endorse the conclusions or recommendations of the report, but to act as independent referees of
its technical content and presentation. Panel members acted in a personal and not a representative
capacity. The Royal Society gratefully acknowledges the contribution of the reviewers.
Reviewers
Dr Yoshua Bengio FRS, Professor at University of Montreal and Scientific Director of MILA
Ruhi Chitre, Ezra Clark, Tiffany Straza and Ana Persic (Natural Sciences Sector);
and Irakli Khodeli (Social and Human Sciences Sector), UNESCO
Dr Rumman Chowdhury, CEO and Founder of Humane Intelligence. 2024 US Science Envoy.
Professor Tony Hey, Honorary Senior Data Scientist at Rutherford Appleton Laboratory.
Co-author of Artificial Intelligence For Science: A Deep Learning Revolution
9 781782 527121
ISBN: 978-1-78252-712-1
Issued: May 2024 DES8836_1