nrn3475 p089
nrn3475 p089
nrn3475 p089
may simply report that only nine patients were studied. study as being inconclusive or uninformative21. The pro-
A manipulation affecting only three observations could tocols of large studies are also more likely to have been
change the odds ratio from 1.00 to 1.50 in a small study registered or otherwise made publicly available, so that
but might only change it from 1.00 to 1.01 in a very large deviations in the analysis plans and choice of outcomes
study. When investigators select the most favourable, may become obvious more easily. Small studies, con-
interesting, significant or promising results among a wide versely, are often subject to a higher level of exploration
spectrum of estimates of effect magnitudes, this is inevi- of their results and selective reporting thereof.
tably a biased choice. Third, smaller studies may have a worse design quality
Publication bias and selective reporting of outcomes than larger studies. Several small studies may be oppor-
and analyses are also more likely to affect smaller, under- tunistic experiments, or the data collection and analysis
powered studies17. Indeed, investigations into publication may have been conducted with little planning. Conversely,
bias often examine whether small studies yield different large studies often require more funding and personnel
results than larger ones18. Smaller studies more readily resources. As a consequence, designs are examined more
disappear into a file drawer than very large studies that carefully before data collection, and analysis and reporting
are widely known and visible, and the results of which are may be more structured. This relationship is not absolute
eagerly anticipated (although this correlation is far from — small studies are not always of low quality. Indeed, a
perfect). A ‘negative’ result in a high-powered study can- bias in favour of small studies may occur if the small stud-
not be explained away as being due to low power 19,20, and ies are meticulously designed and collect high-quality data
thus reviewers and editors may be more willing to pub- (and therefore are forced to be small) and if large studies
lish it, whereas they more easily reject a small ‘negative’ ignore or drop quality checks in an effort to include as
large a sample as possible.
Records identified through Additional records identified Empirical evidence from neuroscience
database search through other sources Any attempt to establish the average statistical power in
(n = 246) (n = 0)
neuroscience is hampered by the problem that the true
effect sizes are not known. One solution to this problem
is to use data from meta-analyses. Meta-analysis pro-
Records after vides the best estimate of the true effect size, albeit with
duplicates removed
(n = 246) limitations, including the limitation that the individual
studies that contribute to a meta-analysis are themselves
subject to the problems described above. If anything,
Abstracts screened Excluded summary effects from meta-analyses, including power
(n = 246) (n = 73) estimates calculated from meta-analysis results, may also
be modestly inflated22.
Acknowledging this caveat, in order to estimate sta-
Full-text articles screened Excluded tistical power in neuroscience, we examined neurosci-
(n = 173) (n = 82) ence meta-analyses published in 2011 that were retrieved
using ‘neuroscience’ and ‘meta-analysis’ as search terms.
Using the reported summary effects of the meta-analy-
Full-text articles assessed ses as the estimate of the true effects, we calculated the
for eligibility Excluded
(n = 91) (n = 43) power of each individual study to detect the effect indi-
cated by the corresponding meta-analysis.
Articles included in analysis Methods. Included in our analysis were articles published
(n = 48) in 2011 that described at least one meta-analysis of previ-
ously published studies in neuroscience with a summary
Figure 2 | Flow diagram of articles selected for inclusion. Computerized
effect estimate (mean difference or odds/risk ratio) as well
databases were searched on 2 February 2012 via WebNature Reviews
of Science | Neuroscience
for papers published in
2011, using the key words ‘neuroscience’ and ‘meta-analysis’. Two authors (K.S.B. and as study level data on group sample size and, for odds/risk
M.R.M.) independently screened all of the papers that were identified for suitability ratios, the number of events in the control group.
(n = 246). Articles were excluded if no abstract was electronically available (for example, We searched computerized databases on 2 February
conference proceedings and commentaries) or if both authors agreed, on the basis of 2012 via Web of Science for articles published in 2011,
the abstract, that a meta-analysis had not been conducted. Full texts were obtained for using the key words ‘neuroscience’ and ‘meta-analysis’.
the remaining articles (n = 173) and again independently assessed for eligibility by K.S.B. All of the articles that were identified via this electronic
and M.R.M. Articles were excluded (n = 82) if both authors agreed, on the basis of the full search were screened independently for suitability by two
text, that a meta-analysis had not been conducted. The remaining articles (n = 91) were authors (K.S.B. and M.R.M.). Articles were excluded if no
assessed in detail by K.S.B. and M.R.M. or C.M. Articles were excluded at this stage if
abstract was electronically available (for example, confer-
they could not provide the following data for extraction for at least one meta-analysis:
ence proceedings and commentaries) or if both authors
first author and summary effect size estimate of the meta-analysis; and first author,
publication year, sample size (by groups) and number of events in the control group (for agreed, on the basis of the abstract, that a meta-analysis
odds/risk ratios) of the contributing studies. Data extraction was performed had not been conducted. Full texts were obtained for the
independently by K.S.B. and M.R.M. or C.M. and verified collaboratively. In total, n = 48 remaining articles and again independently assessed for
articles were included in the analysis. eligibility by two authors (K.S.B. and M.R.M.) (FIG. 2).
Data were extracted from forest plots, tables and text. — four out of the seven meta-analyses did not include
Some articles reported several meta-analyses. In those any study with over 80 participants. If we exclude these
cases, we included multiple meta-analyses only if they ‘outlying’ meta-analyses, the median statistical power
contained distinct study samples. If several meta-analyses falls to 18%.
had overlapping study samples, we selected the most com- Small sample sizes are appropriate if the true effects
prehensive (that is, the one containing the most studies) being estimated are genuinely large enough to be reliably
or, if the number of studies was equal, the first analysis observed in such samples. However, as small studies are
presented in the article. Data extraction was indepen- particularly susceptible to inflated effect size estimates and
dently performed by K.S.B. and either M.R.M. or C.M. publication bias, it is difficult to be confident in the evi-
and verified collaboratively. dence for a large effect if small studies are the sole source
The following data were extracted for each meta- of that evidence. Moreover, many meta-analyses show
analysis: first author and summary effect size estimate small-study effects on asymmetry tests (that is, smaller
of the meta-analysis; and first author, publication year, studies have larger effect sizes than larger ones) but never-
sample size (by groups), number of events in the control theless use random-effect calculations, and this is known
group (for odds/risk ratios) and nominal significance to inflate the estimate of summary effects (and thus also
(p < 0.05, ‘yes/no’) of the contributing studies. For five the power estimates). Therefore, our power calculations
articles, nominal study significance was unavailable and are likely to be extremely optimistic76.
was therefore obtained from the original studies if they
were electronically available. Studies with missing data Empirical evidence from specific fields
(for example, due to unclear reporting) were excluded One limitation of our analysis is the under-representation
from the analysis. of meta-analyses in particular subfields of neuroscience,
The main outcome measure of our analysis was the such as research using neuroimaging and animal mod-
achieved power of each individual study to detect the els. We therefore sought additional representative meta-
estimated summary effect reported in the corresponding analyses from these fields outside our 2011 sampling frame
meta-analysis to which it contributed, assuming an α level to determine whether a similar pattern of low statistical
of 5%. Power was calculated using G*Power software23. power would be observed.
We then calculated the mean and median statistical
power across all studies. Neuroimaging studies. Most structural and volumetric
MRI studies are very small and have minimal power
Results. Our search strategy identified 246 articles pub- to detect differences between compared groups (for
lished in 2011, out of which 155 were excluded after example, healthy people versus those with mental health
an initial screening of either the abstract or the full diseases). A cl ear excess significance bias has been dem-
text. Of the remaining 91 articles, 48 were eligible for onstrated in studies of brain volume abnormalities 73,
inclusion in our analysis24–71, comprising data from 49 and similar problems appear to exist in fMRI studies
meta-analyses and 730 individual primary studies. A of the blood-oxygen-level-dependent response77. In
flow chart of the article selection process is shown in order to establish the average statistical power of stud-
FIG. 2, and the characteristics of included meta-analyses ies of brain volume abnormalities, we applied the same
are described in TABLE 1. analysis as described above to data that had been pre-
Our results indicate that the median statistical power viously extracted to assess the presence of an excess of
in neuroscience is 21%. We also applied a test for an significance bias73. Our results indicated that the median
excess of statistical significance72. This test has recently statistical power of these studies was 8% across 461 indi-
been used to show that there is an excess significance bias vidual studies contributing to 41 separate meta-analyses,
in the literature of various fields, including in studies of which were drawn from eight articles that were published
brain volume abnormalities73, Alzheimer’s disease genet- between 2006 and 2009. Full methodological details
ics70,74 and cancer biomarkers75. The test revealed that the describing how studies were identified and selected are
actual number (349) of nominally significant studies in available elsewhere73.
our analysis was significantly higher than the number
expected (254; p < 0.0001). Importantly, these calcula- Animal model studies. Previous analyses of studies using
tions assume that the summary effect size reported in each animal models have shown that small studies consist-
study is close to the true effect size, but it is likely that ently give more favourable (that is, ‘positive’) results than
they are inflated owing to publication and other biases larger studies78 and that study quality is inversely related
described above. to effect size79–82. In order to examine the average power
Interestingly, across the 49 meta-analyses included in neuroscience studies using animal models, we chose
in our analysis, the average power demonstrated a clear a representative meta-analysis that combined data from
bimodal distribution (FIG. 3). Most meta-analyses com- studies investigating sex differences in water maze per-
prised studies with very low average power — almost formance (number of studies (k) = 19, summary effect
50% of studies had an average power lower than 20%. size Cohen’s d = 0.49) and radial maze performance
However, seven meta-analyses comprised studies with (k = 21, summary effect size d = 0.69)80. The summary
high (>90%) average power 24,26,31,57,63,68,71. These seven effect sizes in the two meta-analyses provide evidence for
meta-analyses were all broadly neurological in focus medium to large effects, with the male and female per-
and were based on relatively small contributing studies formance differing by 0.49 to 0.69 standard deviations