Quality Assessment Tool - Review Articles: Instructions For Completion
Quality Assessment Tool - Review Articles: Instructions For Completion
Quality Assessment Tool - Review Articles: Instructions For Completion
CRITERIA YES NO
Q1. Did the authors have a clearly focused question [population, intervention (strategy), and outcome(s)]?
Q3. Did the authors describe a search strategy that was comprehensive?
Circle all strategies used: health databases handsearching
psychological databases key informants
social science databases reference lists
educational databases unpublished
other
Q5. Did the authors describe the level of evidence in the primary studies included in the review?
Q6. Did the review assess the methodological quality of the primary studies, including:
(Minimum requirement: 4/7 of the following)
Research design
Study sample
Participation rates
Sources of bias (confounders, respondent bias)
Data collection (measurement of independent/dependent variables)
Follow-up/attrition rates
Data analysis
Q9. Were appropriate methods used for combining or comparing results across studies?
TOTAL SCORE:
Quality Assessment Strong Moderate Weak
Rating: (total score 8 – 10) (total score 5 – 7) (total score 4 or less)
Quality Assessment Tool Dictionary
Any part of PICO that is not addressed in a review’s main research question should be clearly stated
in the inclusion criteria to receive a Yes for criterion #1. Outcomes can be general in the research
question (e.g. to allow for a broader search strategy, especially if the topic at-hand has a limited body
of literature available), and then be addressed more specifically in the evidence tables and/or
highlighted through the process of data extraction. For example, a general question may read: “The
aim of this study, therefore, was to systematically review evidence from controlled trials on the
efficacy of motor development interventions in young children.”
If authors mention in their exclusion criteria that they rejected reviews, letters, editorials and case
reports, but do not specifically address what they chose to include, mark a No for this criterion.
For reviews measuring specifically health-related outcomes (e.g. vaccine effectiveness), at least 2
health databases need to be employed to allow for only ONE type of database to be searched.
(NOTE: The two do not have to include MEDLINE)
NOTE: If the author(s) describe the manual searching of reference lists, score as ‘Reference Lists’,
NOT as both ‘Handsearching’ and ‘Reference Lists’.
For reviews of reviews, select the level of evidence based on the types of included studies that
appeared in the systematic reviews/meta-analyses included in the review of reviews.
A minimum of four of the following areas should be assessed and the results described (in narrative
or table form for each included study) for quantitative studies:
For Cochrane Reviews authors are required to conduct a standardized ‘Risk of Bias’ assessment
(see http://www.cochrane-handbook.org/ Figure 8.6a). Their results are typically included in the
Characteristics of Included Studies table. These characteristics translate to the Health Evidence QA
tool as follows:
The JADAD and EPOC tools are well-reputed and typically code Yes, however they must still report
the results of each criteria for each study. Systematic reviews from the Cochrane Library often
employ criteria from the Cochrane Reviewers’ Handbook, however it is important to clarify the areas
of assessment as 4 out of the 7 are not always considered.
When review authors assess whether or not a primary study used a “validated measure(s)”, this
counts toward a point for Data Collection.
Use of a Funnel plot can be used towards a point for Sources of Bias, as long as it appears in the
body of the paper and is part of a larger QA.
In some instances, different quality assessment criteria may be used for different study designs
included in the same review. For example the EPOC tool has different criteria for interrupted time
series studies, compared to randomized controlled trials. In this case, as long as the majority of
reviews are assessed with 4+ criteria then Yes is appropriate.
NOTE: Reviews synthesizing qualitative primary studies address questions on aspects other than
effectiveness, and as such do not meet our relevance criteria. Reviews synthesizing both quantitative
and qualitative studies may be relevant to Health EvidenceTM if they include outcome data and
evaluate the effectiveness of an intervention / program / service / policy.
Q8 | Did review authors assess appropriateness of combining study results (i.e., test of
homogeneity, or assess similarity of results in some other way)?
It is important that primary study results be assessed for similarity prior to combining them (both
statistically and/or non-statistically).
On occasion, an author may indicate the presence of significant heterogeneity and still combine data
using a Fixed Effects Model. This IS appropriate if analyses have been conducted with both the
inclusion and exclusion of data sets that may notably skew results. The results of these separate
analyses, however, MUST be reviewed for the reader’s consideration. This process, often called
‘sensitivity analysis’, assesses the moderators that may have contributed to the heterogeneity.
If a systematic review or a narrative review is conducted for which statistical analysis is not
appropriate, the results of each study should be depicted in graph/table format in order to assess
similarity across the primary studies. Often the results will be in the form of a table, but in the case
of a narrative review the results of each study will be described at length within the body of the
review.
In some cases confidence intervals/effect sizes are NOT required. For a review of reviews, a
narrative presentation is appropriate (e.g. “the intervention had a positive effect on 20% of
participants); ideally, with a table listing main features of each of the systematic reviews under
review, or thorough, CONSISTENT discussion of the main features in the body of the review. If the
review of reviews doesn't consistently present the actual numerical results (e.g. effect sizes from the
original reviews) in the text, then it should score a No.
In general, trust the review author(s)’ judgment of what is significant heterogeneity. A declaration of
the specific number that was calculated (e.g. Chi-square score) is not mandatory.
Q9 | Weighting
Whether a meta-analysis or a systematic/narrative review the overall measure of effect should be
determined by assigning those studies of highest methodological quality greater weight.
In a meta-analysis, weighting is typically based on a variety of factors including sample size, and
variation in the outcome data. This is usually demonstrated by the size of the boxes in the forest plot.
If review authors have named a specific statistical software package (e.g. RevMan) they have used
to combine data, this is sufficient for weighting, as the vast majority of this software incorporates the
weighting of studies by a number of participants. Review authors may describe using the
DerSimonian and Laird approach to random-effects meta-analysis which also incorporates
weighting. Higgins and Green (2009) explain that:
"The random-effects method (DerSimonian 1986) incorporates an assumption that the different studies
are estimating different, yet related, intervention effects [...] The method is based on the inverse-variance
approach, making an adjustment to the study weights according to the extent of variation, or
heterogeneity, among the varying intervention effects."
Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0.,
The Cochrane Collaboration, 2011. Available from http://www.cochrane-handbook.org
In a narrative synthesis, quality of EACH of the included studies must be discussed consistently
throughout the conclusions/discussion section to receive a Yes for this criterion. If the authors show
a GRADE assessment table, this qualifies as weighting for narrative syntheses.
If the authors set a threshold for the quality of reviews to be included in their synthesis (e.g., only
synthesizing strong & moderate quality studies), this is considered weighting and we rate Yes for
this criterion.
In a mixed-methods review which contains both a meta-analysis and a narrative synthesis, both
should incorporate a discussion of quality into the analysis.
In some cases review authors disclose the QA scores of primary studies - in table format, for example
- and discuss those scores, but do not actually ‘weigh’ them; essentially, allowing the readers to
determine which ones have the most weight. This is NOT sufficient to score a Yes for this criterion,
as the review authors should be doing all summative work.
Reviews with a score of 8 or higher in the Yes column will be rated as Strong
Reviews with a score between 5-7 in the Yes column will be rated as Moderate
Reviews with a score of 4 or less in the Yes column will be rated as Weak
In the case that a score does not necessarily reflect your impression of the actual quality of a review
(i.e., Strong/Moderate/Weak), consider revisiting some of the criteria and Yes and/or No scores, or
discuss with a second reviewer, so that the corresponding quality category is a reflection of the
review’s overall methods and the score will be an accurate reflection for use by public health
decision-makers.