15 SFR Common Statistical Error

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Journal of The Association of Physicians of India ■ Vol.

65 ■ November 2017 81

Statistics for Researchers

Common Statistical Errors and how to Avoid


them
NJ Gogtay, UM Thatte

“To consult the statistician after research cannot be easily identified The subsequent sections describe
an experiment is finished is often by a journal editor [who only sees common statistical errors. The list
merely to ask him to conduct a the final manuscript] and hence any given is not intended to be all
post mortem examination. He can study should be done with utmost encompassing but rather cover
perhaps say what the experiment care and attention to detail. This some of the more common errors
died of.” – R. A. Fisher is because errors of omission or made by authors and often missed
commission that occur during the by editors.
Introduction process of research are detrimental Errors in Data analysis
to the most important stakeholder
R esearchers should ensure
publication of work done as
papers that do not see the light of
in research – the patient one whose
quality of life we hope to improve
The choice of Parametric versus
non-parametric methods and the
importance of assumptions
through the research process.
the day have wasted precious time Statistics as a discipline uses
and resources of all stakeholders Common Statistical Errors models and assumptions. Prior to
and will have failed to advance applying any parametric test, the
Evidence Based Practice. Getting Before we proceed to common researcher needs to check if the
the statistics right in the publication statistical errors, let us understand assumption of normality is met as
is crucial. There is evidence both why it is important that we carry these tests are to be used only when
from India and elsewhere about out data analysis meticulously. the data is normally distributed.
inappropriate reporting of statistics Data analysis is a process that Also, it is a common myth that
in papers. For example, Jayakaran seeks to identify relationships, parametric tests are more powerful
and Yadav [2011] analyzed n=196 associations, differences, variance than non-parametric tests. In fact,
articles in two Pharmacology or trends that may exist within the t h e y a r e p o we r f u l o n l y t o t h e
journals published from India data. The purpose is to see if the extent that the assumptions are
and found that 78.1% articles had results can be generalized to the met. [see later for assumptions for
inappropriate descriptive statistics, population or in other words “how the unpaired t-test] Else, they miss
31.1% used the wrong statistical true” or real these findings are. It differences or relationships, which
test for between group comparisons is useful remember the following would have been otherwise picked
and only 1% reported statistical before data analysis- 1) Organize up by non-parametric tests. 2
assumptions. 1 all collection forms, and material Using the wrong statistical test
Researchers can commit errors at used to record the data in one place
Statistical tests are not only
two levels, both of which can affect 2) Check the data for completeness
numerous, but also have similar
statistics in the manuscript. The and accuracy 3) Note missing data
sounding names. Each test is to be
first is errors that occur during the if any and decide whether or not
used only if certain assumptions
process of research [such as errors to remove from the analysis and
are satisfied. For example, the
in planning and/or implementation document this 4) Assign unique
student’s t-test is a widely used
of the study] and the second is those identifiers to the data 5) Feed the
parametric test that is of two
that occur during data analysis, data into an appropriate software
types- the unpaired [also called
interpretation and presentation [Microsoft Excel, SPPS are two
the two-sample t test] and the
of results [either in the form of examples] and do a thorough
paired t test and data needs to
a manuscript or podium/poster quality check of the fed data by
be normally distributed for its
presentations]. This paper will someone independent of the study.
use. For the unpaired t-test, in
describe errors with the latter.
It is important to point out here
that more often than not, errors Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Mumbai, Maharashtra
Received: 10.10.2017; Accepted: 12.10.2017
during the planning and conduct of
[dog owners in par�cular] tend to get more exercise rela�ve to non-pet owners and thus exercise here
becomes the confounder or confounding variable.
Fig. 1: Exercise [the confounder variable] is associated with both the Exposure [Owning a pet] and the
82 Journal of The Association of Physicians of India ■ Vol. 65 ■ November 2017
outcome [Blood pressure]5

Owning a pet [Exposure] Blood pressure [outcome] Table 1: Association of perioperative


mortality with the Revised[6]
Cardiac Risk Index [RCRI]
RCRI- risk Odds 95%
stratification ratio for confidence
Exercise[confounder]
perioperative intervals of
mortality the odds ratio
Fig. 1: Exercise [the confounder variable] is associated with both the Exposure
0 – low 1.36 [1.27,1.45]
[Owning a pet] and the outcome [Blood pressure]5
1-mild 1.09 [1.01,1.19]
addition, observations need to be in a retrospective cohort study 2-moderate 0.88 [0.8, 0.98]
independent and the variances in of 663,635 patients. 6 The authors 3-high 0.71 [0.63, 0.8]
the two groups equal. This test 3
stratified patients based on the >/ 4-very high 0.58 [0.5, 0.67]
cannot be used for multiple group Revised Cardiac Risk Index (RCRI);
associations uncovered could be
comparisons. Williams 4 found that which is a tool used to estimate
spuriously positive [Type 1 error ]
among articles published in the a patient’s risk of perioperative
2) They may not able to pick
American Journal of Physiology cardiac complications. The Odds
up a difference due to smaller
that used either the unpaired or ratios for different levels of risk are
numbers of patients in each group
the paired t-test, 17% used the presented in Table 1.
[beta error/false negative error ]
test inappropriately for multiple The table shows that when beta 3) be difficult to interpret. Let us
comparisons. b l o c k e r s a r e g i ve n t o p a t i e n t s
Effect modification : This occurs whenand the exposure has a different effect on different groups of pa�ents
understand both their utility and
The need to address confounding with high RCRI scores [2, 3 or 4 difficulties with two examples.
effect
leadingmodification
to differen�al outcomes in sub groups. and Let us understand
more], therethis is with an example
a reduction inof the use of
The IRESSA Survival Evaluation
Confounding:
periopera�ve beta This
blockersoccurs when of their
and associa�on mortality.
use with However, at lower
mortality a�er non- scores
cardiac surgery
in in a Cancer (ISEL) was a phase
Lung
the effect or association between of 1 or 0, this effect is attenuated or
III study that compared the efficacy
aretrospec�ve
n e x p o s ucohort
r e a nstudy
d oof u t663,635
c o m e pa�ents
is [6].
evenThelost
authors
andstra�fied
thus RCRI pa�ents based
here on the Revised
acts
o f G e f i t i n i b ve r s u s p l a c e b o i n
distorted
Cardiac RiskbyIndex
the (RCRI);
presence which a tool usedastothe
of ais third “effect
es�mate modifier”.
a pa�ent's risk6 of periopera�ve cardiac
patients with refractory advanced
variable. 
complica�ons.This Thevariable
Odds ra�os is for
one that levelsSub
different group
of risk are analyses
presented in Table 1 non-small cell lung cancer
is linked to both the exposure and When two group comparisons (NSCLC). 7 The study did not yield
the outcome, but does not lie in are made, the result is an average a significant difference in survival
the
Tablecausal pathway.
1 – Associa�on Confounders
of periopera�ve mortality with
effecttheofRevised
the twoCardiac Risk Index [RCRI]
interventions [and between the two groups of patients.
a r e v i e we d a s t h e “ n u i s a n c e ” the difference between them] in a However, when a planned sub
factor/s that distort the association heterogenous group of patients. The group analysis of n = 342 patients
RCRI-way
one risk stra�fica�on
or the other [positive OddsorRa�o for periopera�ve 95% Confidence Intervals of the
practicing clinician would however o f A s i a n o r i g i n wa s d o n e , [ n =
negative]. Let us understand this
mortality like to know if theOdds ra�otreatment
better 235 received Geftinib and n= 107
with an example. One study that is likely to work in the individual received placebo], it was seen that
0 – low
compared pet owners versus1.36 non- [1.27,1.45]
patient that he is treating. This is Geftinib significantly improved
pet owners found that the former
1-mild 1.09 because each individual [1.01,1.19]patient has survival among the Asian patients
had significantly lower systolic certain characteristics that define {HR 0.66, 95% CI [0.48,0.91],
b l o o d p r e s s u r e r e l a t i ve t o t h e him/her- for example gender, median survival 9.5 months versus
latter despite coming from similar severity of the disease, age [young/ 5.5 months, p <0.01}. At the other
socioeconomic backgrounds and middle aged/elderly], alcohol end of the spectrum, a sub group
having similar body mass index. [presence/absence], smoking analysis of men versus women
One possible reason is that pet [presence/absence], diabetes and [a meta-analysis that included n
owners [dog owners in particular ] so on. The clinician may want = 6 studies] concluded that the
tend to get more exercise relative to to know that given a certain set effects of use aspirin for primary
non-pet owners and thus exercise of characteristics, what is the prevention reduced myocardial
here becomes the confounder or probability of response. This is what infarction significantly in men but
confounding variable. is essentially addressed by sub not in women. 8 These findings were
Effect modification: This occurs group analyses, which are defined however not confirmed by two
when the exposure has a different as “analysis of treatment effects other meta-analyses [where trials
effect on different groups of patients with subgroups of patients enrolled of both primary and secondary
leading to differential outcomes in in a study/trial. The fundamental prevention were included] and it
sub groups. Let us understand idea with these analyses is to was concluded that gender did
this with an example of the use uncover interesting relationships not really affect the efficacy of
o f p e r i o p e r a t i ve b e t a b l o c k e r s that could be explored further. a s p i r i n . 9,10 S u b g r o u p a n a l y s i s
and association of their use with T h eir d isadvant ag e lies in t he at best, should be hypothesis
mortality after non- cardiac surgery fact that 1) the differences or generating and the best ones that
Journal of The Association of Physicians of India ■ Vol. 65 ■ November 2017 83

those that are specified a priori. i s a l wa y s a c c o m p a n i e d b y t h e from the rest of the values in the
Post-hoc sub group analysis when standard deviation [SD] which sample. Outliers are important are
done should be transparently gives the extent of variability seen they can have a significant impact
reported. in the data and written as mean and alter results of the analysis
Understanding and addressing bias [SD]. The standard error of mean dramatically. They could be true
[which is much smaller than the outliers, a typographical error that
Bias can very simply be defined
SD and is given by SD/√n] should resulted during data entry [which
as any deviation from the truth. 11
not be given as it is a population needs to be corrected], or a wrong
Since the purpose of research is
parameter and not sample statistics measurement. All outliers need
to arrive at the truth, researchers
and will always be smaller than to be carefully considered. Given
must have a good grasp of bias and
the SD. Jaykaran and Yadav 1 for that here is little consensus on
minimizing it. Bias can be broadly
instance showed that 78.1% papers how outliers are to be analyzed, it
classified as – 1) Selection bias [for
h a d i n a p p r o p r i a t e d e s c r i p t i ve is important that are outliers are
example prevalence- incidence
statistics and use of mean ± SEM reported with honesty and where
bias, admission rate bias and non
rather than mean ± SD was the appropriate an analysis with and
-responder bias] and 2) Information
most common presentation error without the outlier be performed
bias [for example misclassification
seen. The median is used to present and reported. 17
bias and Hawthorne effect]. 12
skewed data and when used, it Reporting only P values, not reporting
Let us understand how bias must always be accompanied by the exact p value and confusing it
can affect study results (and the range. Alongside the median, with the effect size and not reporting
interpretation) with an example graphical depictions of skewed Confidence Intervals
of a paper by Redelmeier and data such as the box and whisker The p value that is usually set
Singh who reported that Academy plot giving the inter-quartile range at 5% essentially tells you whether
Award–winning actors and are useful visual aids that help the results are consistent with
a c t r e s s e s l i ve d a l m o s t 4 ye a r s understand variability in the data being due to chance. It does not
longer than those who did not better. by itself provide a good measure
receive an Oscar. 13 It was later
Presentation of categorical data of evidence. It must always be
shown that the statistical method
Categorical variables may accompanied by the effect size
used for the analysis actually
be dichotomous or binary [for [the magnitude and direction of
conferred an unfair advantage to
example -male and female] or the difference when two group
the Oscar winners, a type of bias
non-binary [mild, moderate and comparisons are made] and the 95%
called as “immortal time bias”
severe pain]. These are described CI of the difference [the confidence
and the difference in survival was
as proportions of the total number interval gives the range in which
just one year and not statistically
of participants [along with 95% we expect the true population value
s i g n i f i c a n t . 14 A n o t h e r a n d o f t
Confidence Intervals]. They can to lie]. The p value, the effect size
quoted example is that of a study
also be expressed in the form of and the confidence intervals of
by Coren and Halpern [1991] which
a bar or pie chart. Often times, the effect size must be viewed in
reported that left handers died
binary categorical data is best tandem for drawing meaningful
much earlier than right handed
presented as a 2 x 2 table. This is conclusions.
people. This questionnaire based
study did not take into account particularly done when there are Interpreting Data Correctly
the fact that in the early part of two group comparisons and helps Correlation and Causation
the 20 th century, many parents in the calculation of several metrics
A common mistake is to
forced children who were naturally such as the relative risk, odds
assume that just because we
inclined to be left handed to use the ratio, hazard ratio. It is also useful
find a correlation between two
right hand resulting in groups that when diagnostic tests are being
variables, one causes the other.
were not clearly only left handed or evaluated for the calculation of
This is often described in statistical
right handed. 15,16 sensitivity, specificity, positive and
parlance as “Correlation does
negative predictive values and the
Presenting Data Appropriately not imply causation”. An often-
diagnostic odds ratio. In addition,
Quantitative data- use summary quoted example in this regard is
tests such as the Chi-square test,
measures appropriately the “Correlation” of Sun Signs
F i s h e r’ s e x a c t p r o b a b i l i t y t e s t
in Astrology with outcomes by
When quantitative data and the McNemar’s test are best
the researchers of the Second
[height, weight, blood pressure understood and interpreted with
International Study of Infarct
for example] are presented, the 2 x 2 table.
Survival Trials Collaborative Group
both means and medians can be Outliers and their reporting [ISIS-2].18 Overall, the study showed
reported as summary measures.
An outlier is essentially an a significant benefit of aspirin over
It is important that the mean
abnormal value that lies far away
84 Journal of The Association of Physicians of India ■ Vol. 65 ■ November 2017

placebo [ p < 0.00001]. However, Clinical versus Statistical significance beta-blocker therapy and mortality after
based on the date of birth entered major non-cardiac surgery. N Engl J Med
Statistical significance can be 2005; 353:349-361.
in the case record forms, when the m is taken b y b ot h aut hors and
researchers classified all patients 7. Chang A, Parikh P, Thongprasert S, Tan
readers with clinical significance. EH, Perng RP, Ganzon D, Yang CH, Tsao
as per the Sun Sign they were born Let us understand this with CJ, Watkins C, Botwood N, Thatcher N.
under, two Sun Signs- Gemini and an example of blood pressure Gefitinib (IRESSA) in patients of Asian origin
Libra showed no apparent benefit reduction after the use of two anti- with refractory advanced non-small cell
with aspirin while another Sun sign hypertensive drugs. Say that Drug lung cancer: subset analysis from the ISEL
Capricorn, showed a nearly 50% study. J Thorac Oncol 2006; 1:847-55.
A produces a greater reduction in
reduction in mortality! It would blood pressure than Drug B and 8. Berger JS, Roncaglioni MC, Avanzini F,
be inappropriate to say that one Pangrazzi I, Tognoni G, et al. Aspirin for
the difference is 2mm Hg which is the primary prevention of cardiovascular
Sun Sign appears to benefit more reported as p <0.05 and the 95% CI events in women and men: a sex-specific
with aspirin than the other. Thus, is. 1,6 This means that when Drug meta-analysis of randomized controlled
these relationships should be A is used in the population, the trials. JAMA 2006; 295:306–313.
viewed merely as associations and reduction maybe as low as 1 mm 9. Seshasai SR, Wijesuriya S, Sivakumaran R,
not cause [Sun Signs] and effect or as high as 6mm Hg, 95% of the Nethercott S, Erqou S, et al. Effect of aspirin
[mortality]. times. Thus, whether this 2 mm of
on vascular and nonvascular outcomes:
meta-analysis of randomized controlled
Extending results Inappropriately Hg [which is the average reduction] trials. Arch Intern Med 2012; 172:209–216.
Both authors and readers are is significant enough to alter a 10. Baigent C, Blackwell L, Collins R, Emberson
often tempted to draw conclusions change in prescription from Drug J, Godwin J, et al. Aspirin in the primary and
beyond what the data actually B to Drug A must be well thought secondary prevention of vascular disease:
shows and this should not be through as this difference may collaborative meta-analysis of individual
really not be clinically meaningful. par ticipant data from randomized
done. For example, if a study trials. Lancet 2009; 373:1849–1860.
shows that 5% of lawyers in a
certain court in the country have Conclusions 11. Šimundić A-M. Bias in research. Biochemia
Medica 2013; 23:12-15.
alcohol dependence syndrome, In summary, it is useful to 12. Hammer GP, du Prel J-B, Blettner M.
it would be inappropriate to remember that the process of Avoiding Bias in Observational Studies:
extrapolate this finding to all research and its subsequent Part 8 in a Series of Articles on Evaluation of
lawyers who in practice the publication is fraught with the Scientific Publications. Deutsches Ärzteblatt
country or all lawyers practicing potential for making errors and all
International 2009; 106:664-668.
worldwide. 19 This finding would efforts must be made to minimize
13. Redelmeier AD, Singh MS. Survival in
only be applicable to populations Academy Award -winning Actors and
if not eliminate them. Actresses. Ann Intern Med 2001; 134:955-
similar to those where the original
962.
sample was drawn from. Prior References 14. Sylvestre PM, Huszh E, Hanley AJ. Do Oscar
to extrapolation/generalization,
1. Jaykaran, Yadav P. Quality of reporting winners Live Longer than Less Successful
it is thus important to critically Peers? A Reanalysis of the Evidence. Ann
statistics in two Indian pharmacology
appraise the representativeness of journals.  Journal of Pharmacology & Intern Med 2006; 145:361-363.
the sample. Pharmacotherapeutics 2011; 2:85-89. 15. Coren S, Halpern DF. Left-handedness:
The challenge of small sample sizes 2. Thiese MS, Arnold ZC, Walker SD. The a marker for decreased survival fitness.
misuse and abuse of statistics in biomedical Psychol Bull 1991; 109:90-106.
All sample size calculations
research. Biochemia Medica 2015; 25:5-11. 16. https://www.ncbi.nlm.nih.gov/pmc/
should be done before starting the
3. Gore MS, Jones GI, Rytter CE. Misuse of articles/PMC1694599/pdf/amjph00526-
study regardless of the number 0107.pdf, accessed on 9th October 2017.
statistical methods: critical assessment of
of groups being studied. A small
articles in BMJ from January to March 1996. 17. Thiese MS, Arnold ZC, Walker SD. The
sample size does not necessarily British Medical Journal 1977; 1:85-87. misuse and abuse of statistics in biomedical
make the study a weak or a poor 4. Williams JL, Hathaway CA, Kloster KL, research. Biochemia Medica. 2015; 25:5-11.
one. Rather, the ability to generalize Layne BH.  Low power, type II errors, 18. ISIS-2 (Second International Study of Infarct
and draw inference about the and other statistical problems in recent Survival) Collaborative Group Randomised
population of interest simply cardiovascular research. Am J Physiol 1997; trial of intravenous streptokinase, oral
becomes more difficult. Also, with 273:H487–93. aspirin, both, or neither among 17 187
cases of suspected acute myocardial
small sample sizes, one must be 5. Anderson WP, Reid CM, Jennings GL.
infarction: ISIS-2.  Lancet  1988; 332:349–
careful not to overstate the strength Pet ownership and risk factors for
cardiovascular disease. Med J Aust 1992; 360.
of evidence or go beyond what you 157:298-301. 19. Banerjee A, Chaudhury S. Statistics without
have observed to draw overarching tears: Populations and samples. Industrial
6. Lindenauer KP, Pekow P, Wang K, Mamidi
conclusions. KD, Gutierrez B, Benjamin ME. Perioperative Psychiatry Journal 2010; 19:60-65.

You might also like