TheLeidenManifesto NatureComment 23042015

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/275335177

The Leiden Manifesto for research metrics

Article in Nature · April 2015


DOI: 10.1038/520429a

CITATIONS READS

574 1,827

5 authors, including:

Paul Wouters Sarah de Rijcke


Leiden University Leiden University
102 PUBLICATIONS 3,378 CITATIONS 43 PUBLICATIONS 1,029 CITATIONS

SEE PROFILE SEE PROFILE

Ismael Rafols
Universitat Politècnica de València
111 PUBLICATIONS 4,860 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Scientific assessment in Brazil: study of scientific communication in scientific areas View project

Promoting Access to Publicly Funded Research Data View project

All content following this page was uploaded by Sarah de Rijcke on 22 November 2015.

The user has requested enhancement of the downloaded file.


COMMENT
SUSTAINABILITY Data needed CONSERVATION Economics GEOLOGY Questions raised over HISTORY Music inspired
to drive UN development and environ­mental proposed Anthropocene Newton to add more colours
goals p.432 catastrophe p.434 dates p.436 to the rainbow p.436

The Leiden Manifesto


for research metrics
Use these ten principles to guide research evaluation, urge Diana Hicks,
Paul Wouters and colleagues.

D
ata are increasingly used to govern advice on, good practice and interpretation. were introduced, such as InCites (using the
ILLUSTRATION BY DAVID PARKINS

science. Research evaluations that Before 2000, there was the Science Cita- Web of Science) and SciVal (using Scopus),
were once bespoke and performed tion Index on CD-ROM from the Institute for as well as software to analyse individual cita-
by peers are now routine and reliant on Scientific Information (ISI), used by experts tion profiles using Google Scholar (Publish or
metrics1. The problem is that evaluation is for specialist analyses. In 2002, Thomson Perish, released in 2007).
now led by the data rather than by judge- Reuters launched an integrated web platform, In 2005, Jorge Hirsch, a physicist at the
ment. Metrics have proliferated: usually making the Web of Science database widely University of California, San Diego, pro-
well intentioned, not always well informed, accessible. Competing citation indices were posed the h-index, popularizing citation
often ill applied. We risk damaging the sys- created: Elsevier’s Scopus (released in 2004) counting for individual researchers. Inter-
tem with the very tools designed to improve and Google Scholar (beta version released est in the journal impact factor grew steadily
it, as evaluation is increasingly implemented in 2004). Web-based tools to easily compare after 1995 (see ‘Impact-factor obsession’).
by organizations without knowledge of, or institutional research productivity and impact Lately, metrics related to social usage

2 3 A P R I L 2 0 1 5 | VO L 5 2 0 | NAT U R E | 4 2 9
© 2015 Macmillan Publishers Limited. All rights reserved
COMMENT

and online comment have gained deliberation. This should strengthen peer stated rules, set before the research has been
momentum — F1000Prime was estab- review, because making judgements about completed. This was common practice
lished in 2002, Mendeley in 2008, and colleagues is difficult without a range of rel- among the academic and commercial groups
Altmetric.com (supported by Macmillan evant information. However, assessors must that built bibliometric evaluation methodol-
Science and Education, which owns Nature not be tempted to cede decision-making to ogy over several decades. Those groups
Publishing Group) in 2011. the numbers. Indicators must not substitute referenced protocols published in the peer-
As scientometricians, social scientists and for informed judgement. Everyone retains reviewed literature. This transparency
research administrators, we have watched responsibility for their assessments. enabled scrutiny. For example, in 2010, pub-
with increasing alarm the pervasive misap- lic debate on the technical properties of an
plication of indicators to the evaluation of Measure performance against the important indicator used by one of our
scientific performance. The following are 2 research missions of the institution, groups (the Centre for Science and Technol-
just a few of numerous examples. Across the group or researcher. Programme goals ogy Studies at Leiden University in the Neth-
world, universities have become obsessed should be stated at the start, and the indica- erlands) led to a revision in the calculation
with their position in global rankings (such as tors used to evaluate performance should of this indicator 6. Recent commercial
the Shanghai Ranking and Times Higher Edu- relate clearly to those goals. The choice of entrants should be held to the same stand-
cation’s list), even when such lists are based indicators, and the ards; no one should accept a black-box
on what are, in our view, inaccurate data and ways in which they “Simplicity evaluation machine.
arbitrary indicators. are used, should take is a virtue in Simplicity is a virtue in an indicator
Some recruiters request h-index values for into account the an indicator because it enhances transparency. But sim-
candidates. Several universities base promo- wider socio-eco- because it plistic metrics can distort the record (see
tion decisions on threshold h-index values nomic and cultural enhances principle 7). Evaluators must strive for bal-
and on the number of articles in ‘high- contexts. Scientists transparency.” ance — simple indicators true to the com-
impact’ journals. Researchers’ CVs have have diverse research plexity of the research process.
become opportunities to boast about these missions. Research that advances the fron-
scores, notably in biomedicine. Everywhere, tiers of academic knowledge differs from Allow those evaluated to verify data
supervisors ask PhD students to publish in research that is focused on delivering solu- 5 and analysis. To ensure data quality,
high-impact journals and acquire external tions to societal problems. Review may be all researchers included in bibliometric stud-
funding before they are ready. based on merits relevant to policy, industry ies should be able to check that their outputs
In Scandinavia and China, some universi- or the public rather than on academic ideas have been correctly identified. Everyone
ties allocate research funding or bonuses on of excellence. No single evaluation model directing and managing evaluation pro-
the basis of a number: for example, by cal- applies to all contexts. cesses should assure data accuracy, through
culating individual impact scores to allocate self-verification or third-party audit. Univer-
‘performance resources’ or by giving research- Protect excellence in locally relevant sities could implement this in their research
ers a bonus for a publication in a journal with 3 research. In many parts of the world, information systems and it should be a guid-
an impact factor higher than 15 (ref. 2). research excellence is equated with English- ing principle in the selection of providers of
In many cases, researchers and evalua- language publication. Spanish law, for exam- these systems. Accurate, high-quality data
tors still exert balanced judgement. Yet the ple, states the desirability of Spanish scholars take time and money to collate and process.
abuse of research metrics has become too publishing in high-impact journals. The Budget for it.
widespread to ignore. impact factor is calculated for journals
We therefore present the Leiden Manifesto, indexed in the US-based and still mostly Account for variation by field in
named after the conference at which it crys- English-language Web of Science. These 6 publication and citation practices.
tallized (see http://sti2014.cwts.nl). Its ten biases are particularly problematic in the Best practice is to select a suite of possible
principles are not news to scientometricians, social sciences and humanities, in which indicators and allow fields to choose among
although none of us would be able to recite research is more regionally and nationally them. A few years ago, a European group of
them in their entirety because codification engaged. Many other fields have a national historians received a relatively low rating in
has been lacking until now. Luminaries in the or regional dimension — for instance, HIV a national peer-review assessment because
field, such as Eugene Garfield (founder of the epidemiology in sub-Saharan Africa. they wrote books rather than articles in jour-
ISI), are on record stating some of these prin- This pluralism and societal relevance nals indexed by the Web of Science. The
ciples3,4. But they are not in the room when tends to be suppressed to create papers of historians had the misfortune to be part of a
evaluators report back to university admin- interest to the gatekeepers of high impact: psychology department. Historians and
istrators who are not expert in the relevant English-language journals. The Spanish social scientists require books and national-
methodology. Scientists searching for litera- sociologists that are highly cited in the Web language literature to be included in their
ture with which to contest an evaluation find of Science have worked on abstract mod- publication counts; computer scientists
the material scattered in what are, to them, els or study US data. Lost is the specificity require conference papers be counted.
obscure journals to which they lack access. of sociologists in high-impact Spanish- Citation rates vary by field: top-ranked
We offer this distillation of best practice language papers: topics such as local labour journals in mathematics have impact fac-
in metrics-based research assessment so that law, family health care for the elderly or tors of around 3; top-ranked journals in
researchers can hold evaluators to account, immigrant employment5. Metrics built on cell biology have impact factors of about 30.
and evaluators can hold their indicators to high-quality non-English literature would Normalized indicators are required, and the
account. serve to identify and reward excellence in most robust normalization method is based
locally relevant research. on percentiles: each paper is weighted on the
TEN PRINCIPLES basis of the percentile to which it belongs
Quantitative evaluation should sup- Keep data collection and analytical in the citation distribution of its field (the
1 port qualitative, expert assessment. 4 processes open, transparent and top 1%, 10% or 20%, for example). A single
Quantitative metrics can challenge bias simple. The construction of the databases highly cited publication slightly improves
tendencies in peer review and facilitate required for evaluation should follow clearly the position of a university in a ranking that

4 3 0 | NAT U R E | VO L 5 2 0 | 2 3 A P R I L 2 0 1 5
© 2015 Macmillan Publishers Limited. All rights reserved
COMMENT

a refereed journal; in 2000, it was Aus$800


DATA SOURCE: THOMSON REUTERS WEB OF SCIENCE; ANALYSIS: D.H., L.W.

IMPACT-FACTOR OBSESSION (around US$480 in 2000) in research


Soaring interest in one crude measure — the average citation counts funding. Predictably, the number of papers
of items published in a journal in the past two years — illustrates the DORA† declaration published by Australian researchers went
crisis in research evaluation. calls for a halt on the
equating of journal up, but they were in less-cited journals,
1 ARTICLES MENTIONING ‘IMPACT FACTOR’ IN TITLE impact factor with suggesting that article quality fell10.
research quality.
8 Editorial material Scrutinize indicators regularly and
10
Papers mentioning ‘impact factor’ in title

Research article
update them. Research missions and
the goals of assessment shift and the research
6
(per 100,000 papers*)

system itself co-evolves. Once-useful metrics


become inadequate; new ones emerge. Indi-
4 cator systems have to be reviewed and
perhaps modified. Realizing the effects of its
simplistic formula, Australia in 2010 intro-
2 duced its more complex Excellence in
Special issue of Research for Australia initiative, which
Scientometrics journal
on impact factors. emphasizes quality.
0
1984 1989 1994 1999 2004 2009 2014 NEXT STEPS
2 WHO IS MOST OBSESSED? Abiding by these ten principles, research
evaluation can play an important part in the
Multidisciplinary development of science and its interactions
journals with society. Research metrics can provide
Medical and life
crucial information that would be difficult
sciences to gather or understand by means of indi-
vidual expertise. But this quantitative infor-
Social mation must not be allowed to morph from
sciences
an instrument into the goal.
Bibliometric journals add a Editorial material
Physical large number of research The best decisions are taken by combining
sciences articles to social sciences. Research article robust statistics with sensitivity to the aim
and nature of the research that is evaluated.
0 50 100 150 200 250 300 350 400
Both quantitative and qualitative evidence
Papers published in 2005–14 mentioning ‘impact factor’ in title,
by discipline (per 100,000 papers*) are needed; each is objective in its own way.
*Indexed in the Web of Science. †DORA, San Francisco Declaration on Research Assessment. Decision-making about science must be
based on high-quality processes that are
informed by the highest quality data. ■
is based on percentile indicators, but may best practice uses multiple indicators to
propel the university from the middle to the provide a more robust and pluralistic Diana Hicks is professor of public policy at
top of a ranking built on citation averages7. picture. If uncertainty and error can be the Georgia Institute of Technology, Atlanta,
quantified, for instance using error bars, this Georgia, USA. Paul Wouters is professor of
Base assessment of individual information should accompany published scientometrics and director, Ludo Waltman
7 researchers on a qualitative judge- indicator values. If this is not possible, indi- is a researcher, and Sarah de Rijcke is
ment of their portfolio. The older you are, cator producers should at least avoid false assistant professor, at the Centre for Science
the higher your h-index, even in the absence precision. For example, the journal impact and Technology Studies, Leiden University,
of new papers. The h-index varies by field: factor is published to three decimal places to the Netherlands. Ismael Rafols is a science-
life scientists top out at 200; physicists at 100 avoid ties. However, given the conceptual policy researcher at the Spanish National
and social scientists at 20–30 (ref. 8). It is ambiguity and random variability of citation Research Council and the Polytechnic
database dependent: there are researchers in counts, it makes no sense to distinguish University of Valencia, Spain.
computer science who have an h-index of between journals on the basis of very small e-mail: [email protected]
around 10 in the Web of Science but of 20–30 impact factor differences. Avoid false preci-
1. Wouters, P. in Beyond Bibliometrics: Harnessing
in Google Scholar9. Reading and judging a sion: only one decimal is warranted. Multidimensional Indicators of Scholarly Impact
researcher’s work is much more appropriate (eds Cronin, B. & Sugimoto, C.) 47–66 (MIT
than relying on one number. Even when Recognize the systemic effects of Press, 2014).
comparing large numbers of researchers, an 9 assessment and indicators. Indica- 2. Shao, J. & Shen, H. Learned Publ. 24, 95–97
(2011).
approach that considers more information tors change the system through the 3. Seglen, P. O. Br. Med. J. 314, 498–502 (1997).
about an individual’s expertise, experience, incentives they establish. These effects 4. Garfield, E. J. Am. Med. Assoc. 295, 90–93
(2006).
activities and influence is best. should be anticipated. This means that a 5. López Piñeiro, C. & Hicks, D. Res. Eval. 24, 78–89
suite of indicators is always preferable — a (2015).
Avoid misplaced concreteness and single one will invite gaming and goal dis- 6. van Raan, A. F. J., van Leeuwen, T. N., Visser,
8 false precision. Science and technol- placement (in which the measurement M. S., van Eck, N. J. & Waltman, L. J. Informetrics
4, 431–435 (2010).
ogy indicators are prone to conceptual becomes the goal). For example, in the 7. Waltman, L. et al. J. Am. Soc. Inf. Sci. Technol. 63,
ambiguity and uncertainty and require 1990s, Australia funded university research 2419–2432 (2012).
strong assumptions that are not universally using a formula based largely on the number 8. Hirsch, J. E. Proc. Natl Acad. Sci. USA 102,
16569–16572 (2005).
accepted. The meaning of citation counts, of papers published by an institute. Univer- 9. Bar-Ilan, J. Scientometrics 74, 257–271 (2008).
for example, has long been debated. Thus, sities could calculate the ‘value’ of a paper in 10. Butler, L. Res. Policy 32, 143–155 (2003).

2 3 A P R I L 2 0 1 5 | VO L 5 2 0 | NAT U R E | 4 3 1
View publication stats © 2015 Macmillan Publishers Limited. All rights reserved

You might also like