AMSTAR-2 Checklist

17414857, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/leap.1463 by Nat Prov Indonesia, Wiley Online Library on [11/03/2023].
See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ORIGINAL ARTICLE
(wileyonlinelibrary.com) doi: 10.1002/leap.1463 Received: 8 February 2022 | Accepted: 28 April 2022 | Published online in Wiley Online Library: 7 June 2022
The AMSTAR-2 critical appraisal tool and editorial

decision-making for systematic reviews: Retrospective,
bibliometric study
Stephen J. Chapman ,1* Fahima Dossa ,2 E. Joline de Groof ,3 Celia Keane ,4
Gabrielle H. van Ramshorst ,5 and Neil J. Smart 6
1
Leeds Institute of Medical Research at St. James’s,
Abstract: AMSTAR-2 is a critical appraisal instrument for systematic re-
University of Leeds, Leeds, UK views and may have a role in editorial processes. This study explored
2
Division of General Surgery, Department of Surgery, whether associations exist between AMSTAR-2 assessments and editorial
University of Toronto, Toronto, Ontario, Canada
decisions. A retrospective, cross-sectional study of manuscripts submitted
3
Department of Surgery, Amsterdam UMC, University to a single journal between 2015 and 2017 was undertaken. All submis-
of Amsterdam (location AMC), Amsterdam,
The Netherlands
sions that reported an eligible systematic review were assessed using
4
AMSTAR-2 by two assessors. Inter-rater agreement (IRR) was calculated
Department of Surgery, Faculty of Medical and Health
Sciences, University of Auckland, Auckland, for all AMSTAR-2 items. Associations between AMSTAR-2 assessments
New Zealand and the editorial decision, final publication status in any journal, and mea-
5
Department of Gastrointestinal Surgery, Ghent sures of impact were explored. One hundred and twenty-two manuscripts
University Hospital, Ghent, Belgium
were included. Across all AMSTAR-2 items, the IRR varied from 0.03
6
Royal Devon & Exeter Hospital, Royal Devon & Exeter
(slight agreement) to 0.82 (substantial agreement). All submissions con-
NHS Foundation Trust, Exeter, UK
tained at least two critical methodological weaknesses. There was no dif-
ORCID: ference in the number of weaknesses (median: 4; IQR: 3–5 vs. median: 4;
S. J. Chapman: 0000-0003-2413-5690
IQR: 3.5–4.5; p = 0.482) between accepted and rejected submissions.
F. Dossa: 0000-0002-4670-7445
E. J. de Groof: 0000-0002-7191-4964 Neither was there a difference between rejected submissions published
C. Keane: 0000-0002-9611-2335 elsewhere and those which remained unpublished (median: 4; IQR:
G. H. van Ramshorst: 0000-0002-5368-582X
N. J. Smart: 0000-0002-3043-8324
3.5–4.5 vs. median: 4; IQR: 4.5–5; p = 0.103). The number of weaknesses
was not associated with academic impact. There was no association with
*Corresponding author: Stephen J. Chapman, NIHR
AMSTAR-2 assessments and editorial outcomes. Further work is required
Doctoral Research Fellow Room 7.16, Clinical Sciences
Building, Leeds Institute of Medical Research at to explore whether the instrument can be prospectively operationalized
St. James’s, University of Leeds, LS9 7TF, Leeds, UK. for use during editorial processes.
E-mail: [email protected]
Keywords: systematic review, meta-analysis, critical appraisal, peer-review
INTRODUCTION generate substantial impact through forward citation and adoption

in clinical practice guidelines (Goldkuhle et al., 2018; Royle
Systematic reviews are widely considered to provide the highest et al., 2013). As such, it is essential that systematic reviews are
quality of evidence to inform clinical decision-making and health designed robustly according to a pre-defined protocol and
policy. When published in prominent journals, they frequently reported fully according to accepted reporting guidelines. Failure
Learned Publishing 2022; 35: 529–538 www.learned-publishing.org © 2022 The Authors. 529
Learned Publishing published by John Wiley & Sons Ltd on behalf of ALPSP.
This is an open access article under the terms of the Creative Commons Attribution License,
which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
17414857, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/leap.1463 by Nat Prov Indonesia, Wiley Online Library on [11/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
530 S.J. Chapman et al.
to do so may lead to research waste, uncertainty, and misguided

changes to evidence-based practice (Katsura et al., 2021). Key points
In the last 20 years, much attention has focused on how • AMSTAR-2 is an instrument to assess methodological
the quality of systematic reviews can be evaluated and quality of systematic reviews and this suggests it would
improved. In 2007, the AMSTAR (A MeaSurement Tool to
make a suitable tool for editorial decisions.
Assess systematic Reviews) checklist was developed to provide
an objective framework to assess the quality of systematic • All evaluated submissions to Colorectal Disease contained
reviews of healthcare interventions (Shea et al., 2007). The at least two critical methodological weaknesses.
scope of this checklist was limited to reviews of randomized • Using AMSTAR-2 revealed no difference in the number of
controlled trials and was later updated to consider both ran-
critical or non-critical weaknesses between accepted or
domized and non-randomized studies using the AMSTAR-2
checklist, published in 2017 (Shea et al., 2017). Previous studies rejected articles.
utilizing AMSTAR and AMSTAR-2 checklists have identified a • Editorial decisions appear to be using different criteria to
high frequency of methodological weaknesses in published sys- those used by tools such as AMSTAR-2.
tematic reviews. This has led to calls for greater use of
• Methodological weaknesses identified by AMSTAR-2 had
appraisal tools by authors and editors during submission and
editorial processes (Pieper et al., 2018). no effect on citation or impact—implying that readers do
Assessment tools such as AMSTAR and AMSTAR-2 have not notice or care about such problems.
been used to describe the quality of published systematic • The use of instruments to validate methodological
reviews across numerous clinical specialties in the past. In con-
reporting and validity should be encouraged within edito-
trast, there is little evidence to describe the feasibility and utility
of these instruments within peer review processes to support rial offices.
editorial decision-making. Potential benefits include a standard-
ized approach to selecting manuscripts for publication based on
their methodological quality and a streamlined editorial pathway
capable of faster dissemination of priority research. The aim of
• For all submissions, to describe AMSTAR-2 assessments for
this study was to explore if associations exist between
manuscripts submitted to a single peer-reviewed journal and
AMSTAR-2 assessments and editorial decision-making for man-
to quantify the inter-rater variability between assessors.
uscripts which report a systematic review of healthcare inter-
• For all submissions, to explore the association of AMSTAR-2
ventions. This may form the pre-text for an approach to
assessments with the final editorial decision given by the index
operationalize AMSTAR-2 as a framework to guide editorial
journal.
decisions.
• For rejected submissions, to explore the association of
AMSTAR-2 assessments with the final publication status in
any other peer-reviewed journal.
• For all published submissions (in Colorectal Disease and else-
METHOD where after rejection), to quantify their subsequent impact and
to explore the relationship of this to the AMSTAR-2 assessment
Ethics and governance
As a bibliometric study, research ethics approval was not required
and neither was the study eligible for registration on the PROS-
PERO database of systematic reviews. No major changes to the
Study design
research design were made during the course of the study. Minor A retrospective, cross-sectional, bibliometric study was under-
and necessary amendments to the study methods are described taken. Manuscript submissions reporting a systematic review of
in Data S1. The results are reported according to an adaptation healthcare interventions submitted to Colorectal Disease (Online
of the PRISMA Statement designed specifically for meta- ISSN: 1463-1318) between January 2015 and December 2017
epidemiologic studies (Murade & Wang, 2017). were appraised using the AMSTAR-2 instrument. This was done
retrospectively after the completion of editorial processes by a
dedicated review team. For manuscripts rejected from Colorectal
Disease, serial systematic searches of MEDLINE were performed
Aims and objectives
to determine their final publication status (last search undertaken
The aim was to explore whether associations exist between on 26th April 2021) using relevant key words. For manuscripts
AMSTAR-2 assessments and editorial outcomes, which may form which were ultimately published, measures of impact were gath-
a pre-text to operationalize the instrument in editorial decision- ered from openly available online resources, including Google
making (5). The specific objectives were: Scholar and iCite (NIH Office of Portfolio Analysis).
www.learned-publishing.org © 2022 The Authors. Learned Publishing 2022; 35: 529–538

AMSTAR-2 for editorial decision-making 531
Definitions collates all evidence meeting predefined eligibility criteria to answer

a specific research question (Higgins et al., 2021). The AMSTAR-2
Colorectal Disease is a hybrid, exclusively online, monthly publication
critical appraisal instrument is a 16-item checklist for system-
that publishes original research in any discipline relating to pathol- atic reviews of either randomized and/or non-randomized
ogy of the colon and rectum. The range of Clarivate Analytics studies of healthcare interventions and is accompanied by a guid-
Impact factors between the inclusion period (2015–2017) was ance document to assist assessors in making informed evaluations
2.620–2.846. A systematic review was defined as a study which (Shea et al., 2017). It generates an assessment of confidence
TABLE 1 AMSTAR-2 Items.

Item Item type Description Responses
1 Non-critical Did the research questions and inclusion criteria for the review include the components Yes
of PICO? No
2 Critical Did the report of the review contain an explicit statement that the review methods Yes
were established prior to the conduct of the review and did the report justify any Partial yes
significant deviations from the protocol? No
3 Non-critical Did the review authors explain their selection of the study designs for inclusion in the Yes
review? No
4 Critical Did the review authors use a comprehensive literature search strategy? Yes
Partial yes
No
5 Non-critical Did the review authors perform study selection in duplicate? Yes
No
6 Non-critical Did the review authors perform data extraction in duplicate? Yes
No
7 Critical Did the review authors provide a list of excluded studies and justify the exclusions? Yes
Partial yes
No
8 Non-critical Did the review authors describe the included studies in adequate detail? Yes
Partial yes
No
9 Critical Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) Yes
in individual studies that were included in the review? Partial yes
No
10 Non-critical Did the review authors report on the sources of funding for the studies included in the Yes
review? No
11 Critical If meta-analysis was performed, did the review authors use appropriate methods for Yes
statistical combination of results? No
No MA conducted
12 Non-critical If meta-analysis was performed, did the review authors assess the potential impact of Yes
RoB in individual studies on the results of the meta-analysis or other evidence No
synthesis? No MA conducted
13 Critical Did the review authors account for RoB in individual studies when interpreting/ Yes
discussing the results of the review? No
14 Non-critical Did the review authors provide a satisfactory explanation for, and discussion of, any Yes
heterogeneity observed in the results of the review? No
15 Critical If they performed quantitative synthesis, did the review authors carry out an adequate Yes
investigation of publication bias (small study bias) and discuss its likely impact on the No
results of the review? No MA conducted
16 Non-critical Did the review authors report any potential sources of conflict of interest, including Yes
any funding they received for conducting the review? No
AMSTAR-2 criteria for overall confidence
High confidence ≤1 non-critical weaknesses -
Moderate confidence ≥2 non-critical weaknesses, with no critical weaknesses -
Low confidence 1 critical weakness, with or without non-critical weaknesses -
Critically low confidence ≥2 critical weaknesses, with or without non-critical weaknesses -
Abbreviation: MA, meta-analysis.
Learned Publishing 2022; 35: 529–538 © 2022 The Authors. www.learned-publishing.org

(high, moderate, low, or critically low) based on weaknesses identi- These were identified from an internal search of editorial records
fied across defined ‘critical’ and ‘non-critical’ domains. An outline of using the ScholarOne Manuscript editorial workflow management
the domains and criteria for decision-making is shown in Table 1. system. Simple literature/scoping reviews, diagnostic accuracy
reviews, reviews of reporting methods and studies reviewing epi-
demiological trends were excluded. No language exclusions were
Eligibility criteria
applicable since all submissions were received in English. All study
All original manuscripts submitted to Colorectal Disease between populations were eligible, including adults, pregnant adults, and
January 2015 and December 2017 which reported a systematic children and no exclusions were applied based on the clinical
review of a healthcare intervention were eligible for inclusion. subject area.
FIGURE 1 Inclusion and exclusion

of submissions.
TABLE 2 Characteristics of included studies.

Variable Number %
Country of authorship China 46/122 37.7%
United Kingdom 28/122 23.0%
Italy 9/122 7.4%
Ireland 6/122 4.9%
Australia 5/122 4.1%
Canada 4/122 3.3%
France 4/122 3.3%
The Netherlands 4/122 3.3%

a
Others 16/122 13.1%
Review type Systematic review with meta-analysis 95/122 77.9%
Systematic review without meta-analysis 27/122 22.1%
Type of included evidence Randomized studies only 41/122 33.6%
Non-randomized studies only 40/122 32.8%
Randomized and non-randomized studies 41/122 33.6%
Subject topic Malignancy 46/122 37.7%
Inflammatory bowel disease 15/122 12.3%
Other benign 30/122 24.6%
Perioperative medicine 13/122 10.7%
Abdominal wall 4/122 3.3%
Other 14/122 11.5%

a
Others include Brazil (n = 1); Denmark (n = 1); Egypt (n = 2); Greece (n = 1); India (n = 2); Iran (n = 1); New Zealand (n = 2); Poland
(n = 1); South Korea (n = 1); Spain (n = 1); Singapore (n = 1); United States (n = 2).

Manuscript assessments documents, followed by an iterative piloting process using

manuscripts which did not meet the main study eligibility
The AMSTAR-2 checklist was used to assess all submissions
criteria.
meeting the eligibility criteria. Assessments were performed
by two independent assessors (SC and JDG or FD and CK). All
were blinded to peer review comments and the final editorial
decision but unblinded to the identity and affiliations of manu-
Outcomes
script authors. Between-assessor disagreements were not The main outcomes of interest were the number of ‘critical’
addressed through consensus, but rather the total number of weaknesses and ‘non-critical’ weaknesses identified in each
‘critical’ and ‘non-critical’ weaknesses identified by each inde- assessed manuscript (Table 1). Items of the checklist which were
pendent reviewer were averaged. This was done to simulate a satisfied in full or partially (i.e., ‘Partial Yes’ responses according
realistic approach to implementing the tool into an editorial to the AMSTAR-2 instrument) were considered to be satisfactory.
decision-making process. All reviewers completed a period For published manuscripts (in Colorectal Disease or another peer-
of training delivered by experienced editors (GR, NS). This reviewed journal after the initial rejection), the level of academic
involved familiarization with the checklist and explanatory impact was quantified. This was measured using several
FIGURE 2 Scatter plots of critical weaknesses versus measures of research impact. (A–C) Scatter plots for submissions accepted in Colo-
rectal Disease after review, showing the correlation between number of critical weaknesses and (A) average annual citations according to
iCite; (B) relative citation ratio (RCR); (C) average annual citations according to Google Scholar. (D–F) scatter plots for submissions rejected
by Colorectal Disease and accepted by other journals, showing the correlation between number of critical weaknesses and (D) average
annual citations according to iCite; (E) relative citation ratio (RCR); (F) average annual citations according to Google Scholar.

approaches, including the number of forward citations according correlation coefficient. Inter-rater agreement (IRA) was explored
to the iCite portal (representing impact from MEDLINE-indexed using Cohen’s kappa, with the following parameters indicating the
sources), number of citations according to Google Scholar (rep- level of agreement: ≤0 no agreement; 0.01–0.20 slight agreement;
resenting impact from any published and online source) and the 0.21–0.40 fair agreement; 0.41–0.60 moderate agreement;
relative citation ratio (RCR; a field normalized measure of impact). 0.61–0.80 substantial agreement; and 0.81–1.00 almost perfect
agreement. Across all statistical tests, p < 0.05 was considered to
indicate statistical significance.
Data collection
Characteristic data including year of publication, colorectal
subject area (malignancy, inflammatory bowel disease, abdominal RESULTS
wall, other benign, peri-operative care), study design (systematic
review; systematic review with meta-analysis), and type of
Study characteristics
included evidence (randomized, non-randomized, both) were
collected by a single reviewer. Variables which were considered A total of 215 submissions were reviewed for eligibility and
to be wholly or partly subjective (such as colorectal subject area) 122 met all criteria for inclusion. Common reasons for exclusion
were checked by a second assessor, with disagreement addressed were submissions reporting a diagnostic accuracy review
through discussion and consensus. (n = 38), literature/scoping reviews (n = 27), and reviews of
epidemiological trends (n = 20) (Fig. 1). Submissions were most
often submitted by authors from China (n = 46; n = 37.7%) and
Statistical analysis
the UK (n = 28; 23.0%). A total of 95 (77.9%) included a pooled
A simple descriptive analysis was performed, including a summary meta-analysis. The most common subject areas were malignancy
of rates, averages (medians with interquartile ranges, IQR), and (n = 46; 37.7%), inflammatory bowel disease (n = 15; 12.3%) and
proportions. Associations between AMSTAR-2 summary outcomes other benign conditions (n = 30; 24.6%). A full outline of charac-
and editorial decisions were explored using Spearman’s Rho teristics is provided in Table 2.
TABLE 3 Sources of critical and non-critical weakness.

First assessor Second assessor Inter-rater agreementa
Critical Items
Item 2 103/122 (84.4%) 103/122 (84.4%) 0.63 (0.43–0.82)
Item 4 58/122 (47.5%) 52/122 (42.6%) 0.54 (0.39–0.69)
Item 7 120/122 (98.4%) 117/122 (95.9%) 0.27 ( 0.17–0.71)
Item 9 35/122 (28.7%) 39/122 (32.0%) 0.65 (0.51–0.80)

b b
Item 11 67/95 (70.5%) 44/95 (46.3%) 0.24 (0.07–0.41)
Item 13 83/122 (68.0%) 83/122 (68.0%) 0.25 (0.07–0.43)

b b
Item 15 53/95 (55.8%) 51/95 (53.7%) 0.75 (0.61–0.88)
Non-critical items
Item 1 14/122 (11.5%) 6/122 (4.9%) 0.03 ( 0.15–0.22)
Item 3 108/122 (88.5%) 105/122 (86.1%) 0.23 ( 0.01–0.46)
Item 5 55/122 (45.1%) 52/122 (42.6%) 0.82 (0.71–0.92)
Item 6 46/122 (37.7%) 50/122 (41.0%) 0.52 (0.37–0.67)
Item 8 33/122 (27.0%) 29/122 (23.8%) 0.64 (0.48–0.79)
Item 10 117/122 (95.9%) 119/122 (97.5%) 0.74 (0.40–1.00)

b b
Item 12 73/95 (76.8%) 68/95 (71.6%) 0.43 (0.22–0.63)
Item 14 64/122 (52.5%) 61/122 (50.0%) 0.46 (0.30–0.62)
Item 16 10/122 (8.2%) 25/122 (20.5%) 0.39 (0.18–0.60)
Note: ‘Partial Yes’ and ‘Yes’ responses for each item were considered to represent a satisfactory response (i.e. absence of the respective
weakness), as per the Method.
a
Cohen’s k with 95% confidence intervals.
b
A total of 27 manuscripts were not considered in these analyses since no formal meta-analysis was performed.

Summary of AMSTAR-2 assessments prospective methods) and item 7 (description of excluded studies),
while common sources of non-critical weaknesses were item
All 122 submissions demonstrated at least two critical weak-
3 ( justification of study design) and item 10 (description of study
nesses, indicating a ‘critically low’ level of confidence according
funding) (Table 3).
to the AMSTAR-2 instrument (Table 1). Overall, the median num-
ber of critical weaknesses was 4 (IQR: 3.5–4.875). When reviews
with and without meta-analysis were sub-analysed, the median
AMSTAR-2 inter-rater agreement
number of critical weaknesses was similar (median: 3.5, IQR 4–5;
and median: 4, IQR: 3–4.5, respectively) (p = 0.1556). The median The degree of IRA was variable across AMSTAR-2 items. Three
number of non-critical weaknesses was 4.5 (IQR: 3.5–5), which out of seven critical domains showed substantial agreement
was also similar when sub-analysed (median: 4.5; IQR: 3.5–5; and between assessors (items 2, 9 and 15), with one showing moder-
median: 4; IQR: 3.25–5, respectively) (p = 0.912). The most com- ate agreement (item 4) and the rest showing fair agreement.
mon sources of critical weakness were item 2 (statement of Across non-critical domains, one item showed near perfect
FIGURE 3 Scatter plots of non-critical weaknesses versus measures of research impact. (A–C) Scatter plots for submissions accepted in
Colorectal Disease after review, showing the correlation between number of non-critical weaknesses and (A) average annual citations
according to iCite; (B) relative citation ratio (RCR); (C) average annual citations according to Google Scholar. (D–F) scatter plots for submis-
sions rejected by Colorectal Disease and later accepted by other journals, showing the correlation between number of non-critical weak-
nesses and (D) average annual citations according to iCite; (E) relative citation ratio (RCR); (F) average annual citations according to
Google Scholar.

TABLE 4 Relationships of AMSTAR-2 outputs and academic impact.

Colorectal Disease Other journals
Number of critical weaknesses versus impact
Average citations per year (iCite) R = 0.044 p = 0.833 R= 0.056 p = 0.651
RCR (iCite) R = 0.101 p = 0.631 R= 0.063 p = 0.606
Average citations per year (Google Scholar) R = 0.112 p = 0.595 R= 0.104 p = 0.395
Number of non-critical weaknesses versus impact
Average citations per year (iCite) R = 0.208 p = 0.317 R = 0.036 p = 0.768
RCR (iCite) R = 0.127 p = 0.546 R = 0.016 p = 0.898
Average citations per year (Google Scholar) R = 0.225 p = 0.280 R = 0.003 p = 0.980
Note: All statistics are calculated using Spearman’s Rho correlation coefficient.
Abbreviation: RCR, relative citation ratio.
agreement (item 5), two showed substantial agreement (items: between the number of critical weaknesses and the average num-
8 and 10), three showed moderate agreement (items: 6, 12 and ber of citations per year via iCite (Colorectal Disease: R = 0.044;
14), and the rest showed only fair or slight agreement. A full out- Other journals: R = 0.056), the number of citations per year via
line of IRA calculations is provided in Table 3. Despite achieving Google Scholar (Colorectal Disease: R = 0.101; Other journals:
substantial agreement, the assessment of item 8 (description of R = 0.063), and the RCR (Colorectal Disease: R = 0.112; Other
included studies) was considered to be highly challenging by asses- journals: R = 0.104) (Fig. 2; Table 4). Similarly, the number of
sors. A post-hoc exercise was undertaken by assessors to review, non-critical weaknesses was not associated with academic
qualitatively identify and reflect on the perceived challenges impact, with all measures demonstrating weak or absent correla-
(Method reported fully in Data S1). Key reflections were: uncer- tion (Fig. 3; Table 4).
tainty about the level of detail required to satisfy the AMSTAR-2
criteria; disagreement on the definition of terms used by the
guidance documents; and difficulty applying the assessment
criteria to manuscripts due to incomplete or inconsistent DISCUSSION
reporting.
This study explored whether associations exist between
AMSTAR-2 assessments and editorial decision-making for manu-
AMSTAR-2 and editorial decisions scripts that report a systematic review of healthcare interven-
Twenty-five out of 122 (20.5%) submissions were accepted for tions. In this retrospective sample of manuscripts (performed
publication in Colorectal Disease. There was no difference in the after the completion of all editorial processes), all were found to
number of critical weaknesses (median: 4; IQR: 3–5 vs. median: contain multiple critical weaknesses, indicating a ‘critically low’
4; IQR: 3.5–4.5; p = 0.482) identified between accepted and level of confidence according to AMSTAR-2. Of note, there was
rejected submissions, respectively. Similarly, there was no differ- no difference in the number of critical weaknesses between man-
ence in the number of non-critical weaknesses (median: 5; IQR: uscripts accepted during the index submission and those which
3–5 vs. median: 4; IQR: 3.5–5; p = 0.787). Of the remaining were rejected, as well as no difference between manuscripts
97 rejected submissions, 69 (71.1%) were identified as being which were subsequently published in other journals and those
published in other MEDLINE-indexed journals. There were no which remained unpublished within the limits of this review.
differences in the number of critical weaknesses (median: 4; IQR: The number of critical weaknesses was not associated with the
3.5–4.5 vs. median: 4.5; IQR: 4–5; p = 0.103) or non-critical degree of subsequent academic impact, with some manuscripts
weaknesses (median: 4; IQR: 3–5 vs. median: 4.5; IQR: 3.875–5.5; of ‘critically low’ confidence achieving large volumes of citations.
p = 0.165) between submissions published in other journals and The mechanisms and determinants of editorial decision-
those where a publication was not identified, respectively. making continue to be the subject of scrutiny and debate. Some
studies have interrogated this by exploring predictors of success-
ful publication. A cross-sectional study of 1107 submitted manu-
AMSTAR-2 and research impact
scripts to three major medical journals (BMJ, the Lancet, Annals of
A total of 72 out of 94 (76.6%) submissions were published Internal Medicine) in 2003 found that acceptance was more
(including 25 in Colorectal Disease and 69 in other journals follow- likely if studies demonstrated high methodological quality, if they
ing the initial index rejection). The number of critical weaknesses had a randomized design, or if the corresponding author lived
identified in published submissions was not associated with sub- in the same country as the publishing journal, amongst other
sequent academic impact. Weak correlations were identified factors (Lee et al., 2006). In another study of 112 consecutive

meta-analyses submitted to JAMA between 1996 and 1997, a on study assessments. This was essential to eliminate avoidable
single factor—replicability of results—was associated with an between-assessor variation. Another strength was the sampling
increased rate of acceptance, whereas the overall methodological of both accepted and rejected manuscripts submitted to a single
quality was not (Stroup et al., 2001). Some studies have explored journal, which enabled a cross-sectional assessment of submis-
the impact of peer review comments on the final decision to sions which closely reflected a typical editorial pathway. Limita-
publish, with one report identifying comments on study design, tions of the study are also recognized. First, the study setting
originality, and the relationship between design, results, and con- involved submissions from a single sub-specialty journal, which is
clusions to be the most impactful factor on final decision-making a challenge for wider generalisability. Reproduction of the study
(Turcotte et al., 2004). The approach to understanding and facili- is encouraged to explore similarities and differences across differ-
tating peer review continues to be an evolving and complex ent journals and editorial processes. Secondly, it is notable that
domain of scholarly research with little clear empirical evidence many of the sampled manuscripts (submitted between 2015 and
to guide its use (Tennant & Ross-Hellauer, 2020). 2017) preceded the widespread publication of AMSTAR-2 in
The AMSTAR-2 critical appraisal tool published in 2017 pro- 2017. This was necessary to enable an assessment of forward
vides a standardized approach to assess the methodological qual- publication and academic impact in the years which followed
ity of systematic reviews that include a healthcare intervention. publication. In principle, this is not considered to be a major
Several reports have explored its usability and validity when weakness, since the checklist reflects general constructs of meth-
applied to broad samples of systematic reviews. In one report odological rigour. Furthermore, it may be considered a strength
involving 60 systematic reviews of treatments for depression, since the selection of this timeframe also pre-dates other major
AMSTAR-2 was found to have ‘moderate’ agreement between confounding changes in the publishing landscape, such as updates
four assessors (overall kappa: 0.42; 95% CI: 0.25–0.59) (Lorenz to the PRISMA checklist and revision of the Cochrane Handbook.
et al., 2019). In another report of 30 reviews relevant to anaes- Future research may consider whether the instrument can be
thesiology, the agreement between four assessors was ‘fair’ prospectively operationalized within the editorial process. This
(overall kappa: 0.3; 95% CI: 0.17 to 0.43), with the poorest agree- may involve a prospective study, with dedicated training provided
ment demonstrated for item 8 (description of included studies) to administrators, reviewers or editors. Alternatively, the feasibil-
(kappa: 0.09; 95% CI: 0.06 to 0.23 (Pieper et al., 2019). Of note, ity of other approaches to guide editorial decision-making may be
a recent report exploring the interpretation of AMSTAR-2 found of interest. Such approaches may involve alternative instruments,
that significant differences existed in the final assessment of such as the ROBIS tool which assesses the risk of bias in system-
confidence according to how the criteria were applied, leading atic reviews or a different approach altogether such as mandatory
to calls for greater transparency and clearer reporting (Pieper pre-registration and appraisal of protocols (Whiting et al., 2016).
et al., 2021). While most studies have focused on the use of Either way, due to the role of systematic review research in guid-
AMSTAR-2 from an author’s perspective, to our knowledge, there ing health policy, it is clear that more attention is required to
is little or no evidence describing its use as a tool within the refine peer review and editorial processes for systematic reviews.
editorial pathway. This is important to ensure that only the best evidence is selected
In this study, while the IRA for some items was substantial for publication and subsequent adoption in practice.
(critical items: 2, 9 and 15; non-critical items: 8 and 10), others
showed only moderate or fair agreement. The methodological
quality of all submissions was considered to be ‘critically low’, AUTHOR CONTRIBUTIONS
which is in line with previous reports demonstrating a large
Neil J. Smart and Gabrielle H. van Ramshorst conceptualized the
majority of systematic review reports with critical methodological
study. Stephen J. Chapman, Fahima Dossa, E. Joline de Groof and
weaknesses (Siemens et al., 2021). Of note, the number of critical
Celia Keane performed searches, study assessments and data
weaknesses identified in manuscripts that were accepted and
extraction. Stephen J. Chapman prepared the manuscript which
rejected during their index submission was not significantly differ-
was subsequently edited by all authors. Neil J. Smart is the study
ent. This may suggest that factors as well as methodological qual-
guarantor.
ity (such as topicality, community expectations and perceived
likelihood of citations) may be co-determinants of the final edito-
rial decision. It was not possible to explore these factors in the
CONFLICT OF INTEREST
present study, but they are important targets for future investiga-
tion. The completeness of reporting according to accepted frame- Neil J. Smart and Gabrielle H. van Ramshorst are Editor-in-Chief
works (such as PRISMA) is also important and closely related and Editor of Colorectal Disease, respectively. Stephen
since critical appraisal relies on the availability of transparent J. Chapman, Fahima Dossa, E. Joline de Groof and Celia Keane
information. The completeness of reporting was not explored in are early career members of the Colorectal Disease Editorial
the present study, but this may be an important consideration for Advisory Board. The views expressed in this publication are those
future work. of the authors and not necessarily those of the NHS, the National
A major strength of the present study was the process of Institute for Health and Care Research, Health Education England
familiarization and piloting amongst assessors prior to embarking or the Department of Health.

TRANSPARENCY Pieper, D., Koensgen, N., Breuing, J., Ge, L., & Wegewitz, U. (2018).
How is AMSTAR applied by authors - a call for better reporting.
The corresponding author (Stephen J. Chapman) attests that all BMC Medical Research Methodology, 18, 56. https://doi.org/10.
authors had access to study data, takes responsibility for the 1186/s12874-018-0520-z
accuracy of the analysis and had authority over manuscript prep- Pieper, D., Lorenz, R. C., Rombey, T., Jacobs, A., Rissling, O.,
aration and the decision to submit the manuscript for publication. Freitag, S., & Matthias, K. (2021). Authors should clearly report
how they derived the overall rating when applying AMSTAR 2-a
cross-sectional study. Journal of Clinical Epidemiology, 129,
DATA AVAILABILITY STATEMENT 97–103. https://doi.org/10.1016/j.jclinepi.2020.09.046
Coded data will be available from the authors on reasonable Pieper, D., Puljak, L., González-Lorenzo, M., & Minozzi, S. (2019).
request. Minor differences were found between AMSTAR 2 and ROBIS
in the assessment of systematic reviews including both
randomized and nonrandomized studies. Journal of Clinical
SUPPORTING INFORMATION Epidemiology, 108, 26–33. https://doi.org/10.1016/j.jclinepi.
2018.12.004
Additional supporting information may be found online in the
Royle, P., Kandala, N.-B., Barnard, K., & Waugh, N. (2013).
Supporting Information section at the end of the article:
Bibliometrics of systematic reviews: Analysis of citation rates and
Appendix S1. Supporting information. journal impact factors. Systematic Reviews, 12, 74.
Shea, B. J., Grimshaw, J. M., Wells, G. A., Boers, M., Andersson, N.,
Hamel, C., Porter, A. C., Tugwell, P., Moher, D., & Bouter, L. M.
REFERENCES (2007). Development of AMSTAR: A measurement tool to assess
Goldkuhle, M., Narayan, V. M., Weigl, A., Dahm, P., & Skoetz, N. the methodological quality of systematic reviews. BMC Medical
(2018). A systematic assessment of Cochrane reviews and sys- Research Methodology, 15, 10.
tematic reviews published in high-impact medical journals related Shea, B. J., Reeves, B. C., Wells, G., Thuku, M., Hamel, C., Moran, J.,
to cancer. BMJ Open, 8, e020869. https://doi.org/10.1136/ Moher, D., Tugwell, P., Welch, V., Kristjansson, E., & Henry, D. A.
bmjopen-2017-020869 (2017). AMSTAR 2: A critical appraisal tool for systematic reviews
Higgins, J. P. T., Thomas, J., Chandler, J., Cumpston, M., Li, T., that include randomised or non-randomised studies of healthcare
Page, M. J., & Welch, V. A. (Eds.). (2021). Cochrane handbook for interventions, or both. BMJ, 358, j4008.
systematic reviews of interventions version 6.2 (p. 2021. Available Siemens, W., Schwarzer, G., Rohe, M. S., Buroh, S., Meerpohl, J. J., &
from). Cochrane www.training.cochrane.org/handbook Becker, G. (2021). Methodological quality was critically low in
Katsura, M., Kuriyama, A., Tada, M., Tsujimoto, Y., Luo, Y., 9/10 systematic reviews in advanced cancer patients-a methodo-
Yamamoto, K., So, R., Aga, M., Matsushima, K., Fukuma, S., & logical study. Journal of Clinical Epidemiology, 136, 84–95. https://
Furukawa, T. A. (2021). High variability in results and methodologi- doi.org/10.1016/j.jclinepi.2021.03.010
cal quality among overlapping systematic reviews on the same Stroup, D. F., Thacker, S. B., Olsen, C. M., et al. (2001). Characteristics
topics in surgery: A meta-epidemiological study. The British Journal of meta-analyses related to acceptance for publication in a medi-
of Surgery, 108, 1521–1529. https://doi.org/10.1093/bjs/znab328 cal journal. Journal of Clinical Epidemiology, 54, 655–660. https://
Lee, K. P., Boyd, E. A., Holroyd-Leduc, J. M., Bacchetti, P., & doi.org/10.1016/S0895-4356(00)00362-0
Bero, L. A. (2006). Predictors of publication: Characteristics of Tennant, J. P., & Ross-Hellauer, T. (2020). The limitations to our
submitted manuscripts associated with acceptance at major bio- understanding of peer review. Research Integrity and Peer Review,
medical journals. The Medical Journal of Australia, 184, 621–626. 5, 6. https://doi.org/10.1186/s41073-020-00092-1
https://doi.org/10.5694/j.1326-5377.2006.tb00418.x Turcotte, C., Drolet, P., & Girard, M. (2004). Study design, originality
Lorenz, R. C., Matthias, K., Pieper, D., Wegewitz, U., Morche, J., and overall consistency influence acceptance or rejection of man-
Nocon, M., Rissling, O., Schirm, J., & Jacobs, A. (2019). A psycho- uscripts submitted to the journal. Canadian Journal of Anaesthesia,
metric study found AMSTAR 2 to be a valid and moderately reli- 51, 549–556. https://doi.org/10.1007/BF03018396
able appraisal tool. Journal of Clinical Epidemiology, 114, 133–140. Whiting, P., Savovic, J., Higgins, J. P. T., Caldwell, D. M., Reeves, B. C.,
https://doi.org/10.1016/j.jclinepi.2019.05.028 Shea, B., Davies, P., Kleijnen, J., Churchill, R., & ROBIS group.
Murade, M. H., & Wang, Z. (2017). Guidelines for reporting meta- (2016). ROBIS: A new tool to assess risk of bias in systematic
epidemiological methodology research. Evidence-Based Medicine, reviews was developed. Journal of Clinical Epidemiology, 69,
22, 139–142. https://doi.org/10.1136/ebmed-2017-110713 225–234. https://doi.org/10.1016/j.jclinepi.2015.06.005


AMSTAR-2 Checklist

Uploaded by

Copyright:

Available Formats

AMSTAR-2 Checklist

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AMSTAR-2 Checklist

Uploaded by

Copyright:

Available Formats

17414857, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/leap.1463 by Nat Prov Indonesia, Wiley Online Library on [11/03/2023].

The AMSTAR-2 critical appraisal tool and editorial

Keywords: systematic review, meta-analysis, critical appraisal, peer-review

INTRODUCTION generate substantial impact through forward citation and adoption

to do so may lead to research waste, uncertainty, and misguided

www.learned-publishing.org © 2022 The Authors. Learned Publishing 2022; 35: 529–538

Deﬁnitions collates all evidence meeting predeﬁned eligibility criteria to answer

TABLE 1 AMSTAR-2 Items.

AMSTAR-2 criteria for overall conﬁdence

High conﬁdence ≤1 non-critical weaknesses -

Moderate conﬁdence ≥2 non-critical weaknesses, with no critical weaknesses -

Low conﬁdence 1 critical weakness, with or without non-critical weaknesses -

Critically low conﬁdence ≥2 critical weaknesses, with or without non-critical weaknesses -

Abbreviation: MA, meta-analysis.

Learned Publishing 2022; 35: 529–538 © 2022 The Authors. www.learned-publishing.org

FIGURE 1 Inclusion and exclusion

TABLE 2 Characteristics of included studies.

Country of authorship China 46/122 37.7%

United Kingdom 28/122 23.0%

Italy 9/122 7.4%

Ireland 6/122 4.9%

Australia 5/122 4.1%

Canada 4/122 3.3%

France 4/122 3.3%

The Netherlands 4/122 3.3%

Review type Systematic review with meta-analysis 95/122 77.9%

Systematic review without meta-analysis 27/122 22.1%

Type of included evidence Randomized studies only 41/122 33.6%

Non-randomized studies only 40/122 32.8%

Randomized and non-randomized studies 41/122 33.6%

Subject topic Malignancy 46/122 37.7%

Inﬂammatory bowel disease 15/122 12.3%

Other benign 30/122 24.6%

Perioperative medicine 13/122 10.7%

Abdominal wall 4/122 3.3%

Other 14/122 11.5%

www.learned-publishing.org © 2022 The Authors. Learned Publishing 2022; 35: 529–538

Manuscript assessments documents, followed by an iterative piloting process using

Learned Publishing 2022; 35: 529–538 © 2022 The Authors. www.learned-publishing.org

TABLE 3 Sources of critical and non-critical weakness.

Item 2 103/122 (84.4%) 103/122 (84.4%) 0.63 (0.43–0.82)

Item 4 58/122 (47.5%) 52/122 (42.6%) 0.54 (0.39–0.69)

Item 7 120/122 (98.4%) 117/122 (95.9%) 0.27 ( 0.17–0.71)

Item 9 35/122 (28.7%) 39/122 (32.0%) 0.65 (0.51–0.80)

Item 13 83/122 (68.0%) 83/122 (68.0%) 0.25 (0.07–0.43)

Item 1 14/122 (11.5%) 6/122 (4.9%) 0.03 ( 0.15–0.22)

Item 3 108/122 (88.5%) 105/122 (86.1%) 0.23 ( 0.01–0.46)

Item 5 55/122 (45.1%) 52/122 (42.6%) 0.82 (0.71–0.92)

Item 6 46/122 (37.7%) 50/122 (41.0%) 0.52 (0.37–0.67)

Item 8 33/122 (27.0%) 29/122 (23.8%) 0.64 (0.48–0.79)

Item 10 117/122 (95.9%) 119/122 (97.5%) 0.74 (0.40–1.00)

Item 14 64/122 (52.5%) 61/122 (50.0%) 0.46 (0.30–0.62)

Item 16 10/122 (8.2%) 25/122 (20.5%) 0.39 (0.18–0.60)

www.learned-publishing.org © 2022 The Authors. Learned Publishing 2022; 35: 529–538

Learned Publishing 2022; 35: 529–538 © 2022 The Authors. www.learned-publishing.org

TABLE 4 Relationships of AMSTAR-2 outputs and academic impact.

Number of critical weaknesses versus impact

Average citations per year (iCite) R = 0.044 p = 0.833 R= 0.056 p = 0.651

RCR (iCite) R = 0.101 p = 0.631 R= 0.063 p = 0.606

Number of non-critical weaknesses versus impact