Application of Statistical Test in Clinical Research
Application of Statistical Test in Clinical Research
Application of Statistical Test in Clinical Research
Abstract
Clinical research is increasingly based on the empirical studies and the results of these are usually presented and analyzed with
statistical methods. Therefore discuss frequently used statistical tests for different type of data set under assumption of normality
or non-normality. The statistical tests applied when normality (and homogeneity of variance) assumptions are satisfied otherwise
the equivalent non-parametric statistical test used. Advice will be presented for selecting statistical tests on the basis of very
simple cases. It is therefore an advantage for any physician or researcher he/she is familiar with the frequently used statistical
tests, as this is the only way he or she can evaluate the statistical methods in scientific publications and thus correctly interpret
their findings.
36
International Journal of Medicine Research
used to determine whether two population means are different vectors (multivariate means) of two multivariate data sets.
when the variances are known and statistic is assumed to This test can be used under the assumptions (1) the variables
have a normal distribution. For example in clinical research, of each data set follow a multivariate normal distribution,
suppose two flu drugs A and B, Drug A works on 41 people each variable may be tested for univariate normality, (2) the
out of a sample of 195, Drug B works on 351 people in a objects have been independently sampled, (3) in a two-
sample of 605 and to test the effect of two drugs equal or not. sampled test, the two data sets being tested have (near)
equivalent variance-covariance matrices, Bartlett's test may
Student t-test be used to evaluate if this assumption holds, (4) each data set
Student t-test, in statistics, a method of testing hypotheses describes one population with one multivariate mean. No
about the mean of a small sample drawn from a normally subpopulations exist within each data set. Example in clinical
distributed population when the population standard deviation research, a certain type of tropical disease is characterized by
is unknown. We can use this test under the assuming for the fever, low blood pressure and body aches. Suppose a
sample size is lesser than 30, observations should be researcher team are working on a new drug to treat this type
independent from each other, one observation isn’t related or of disease and wanted to determine whether the drug is
doesn’t affect another observations, data should be followed effective. They took a random sample of 20 people with this
normally distributed and data should be randomly selected type of disease and 18 with a placebo. Based on the data they
from a population, where each item has an equal chance of wanted to determine whether the drug is effective at reducing
being selected. There are two type of Student t-test under one these three symptoms.
sample and two sample. One sample student t-test is a
statistical procedure used to examine the mean difference ANOVA
between the sample and the known value of the population Analysis of variance (ANOVA) is used in statistics that splits
mean. It is used to determine if a mean response changes the total variability found inside a data set into two parts:
under different experimental conditions. In other hand, two- systematic factors and random factors. The systematic factors
sample t-test is used to compare the means of two have a statistical influence on the given data set, but the
independent populations, denoted µ1 and µ2 with standard random factors do not. Analysts use the ANOVA test to
deviation of the populations should be equal. This test has determine the result independent variables have on the
ubiquitous application in the analysis of controlled clinical dependent variable amid a regression study. It is an extension
research. For example in clinical research, the comparison of of the two-sample t-test and Z-test. In 1918, Ronald Fisher
mean decreases in diastolic blood pressure between two developed a test called the analysis of variance. This test is
groups of patients receiving different antihypertensive agents, also called the Fisher analysis of variance, used to the
or estimating pain relief from a new treatment relative to that analysis of variance between and within the groups whenever
of a placebo based on subjective assessment of percent the groups are more than two [7]. When we set the Type one
improvement in two parallel groups [3, 4]. error to be 0.05, and we have several of groups, each time we
tested a mean against another there would be a 0.05
Student paired ‘t’ test probability of having a type one error rate. This would mean
It is a statistical technique that is applied to paired data of that with six T-tests we would have a 0.30 (.05×6) probability
independent observations from one sample only when each of having a type one error rate. This is much higher than the
individual gives a pair of observation or compare two desired 0.05. ANOVA creates a way to test several null
population means in the case of two samples that are hypothesis at the same time at the Type one error 0.05. We
correlated. Paired sample t-test is used in ‘before-after’ can use this test under the assuming, each group sample is
studies, or when the samples are the matched pairs, or when it drawn from a normally distributed population, all populations
is a case-control study. We can use this test under have a common variance, all samples are drawn
assumptions of the number of observations in each data set independently of each other, within each sample, the
must be the same, and they must be organized in pairs, in observations are sampled randomly & independently of each
which there is a definite relationship between each pair of other and factor effects are additive in nature. Example in
data observations, data were taken as random samples clinical research, ANOVA method might be appropriate for
follows as Normal distribution and the variance of two comparing mean responses among a number of parallel-dose
samples is equal, Cases must be independent of each other. groups or among various strata based on patients’ background
This statistical test used in clinical research to compare the information, such as race, age group, or disease severity [4].
effect of two drugs, given to the same individuals in the
sample at two different occasions, e.g., adrenaline and ANCOVA
noradrenalin on puls rate, number of hours for which sleep is In clinical research, patients who meet inclusion and
induced by two hypnotics and so on [5]. exclusion criteria are randomly assigned to each treatment
group. Under the assumption of targeted patient population is
Hotelling’s T2 test homogeneous, we can expect that patient characteristics such
Hotelling’s T2 test is the multivariate generalization of the as age, gender, and weight are comparable between treatment
Student’s t-test [6]. Hotelling's T2 test should be described by groups. If the patient population is known to be
multiple response variables. A one-sample Hotelling's T2 test heterogeneous in terms of some demographic variables, then
can be used to test if a set of objects (which should be a a stratified randomization according to these variables should
sample of a single statistical population) has a mean equal to be applied. At the beginning of the study, clinical data are
a hypothetical mean. A two-sample Hotelling's T2 test may be usually collected at randomization to establish baseline
used to test for significant differences between the mean values. After the administration of study drug, clinical data
37
International Journal of Medicine Research
are often collected at each visit over the entire duration of Otherwise, declare that comparison to not be statistically
study. These clinical data are analyzed to assess the efficacy significant.
and safety of the treatments. As pointed out earlier, before the
analysis of endpoint values. Characteristics between Holm’s Test
treatments of the patient are usually examined by an analysis The Holm test is a powerful and versatile multiple
of variance (ANOVA) if the variable is continuous. For the comparison test. It can be used in clinical research to
analysis of endpoint values, although the technique of compare all pairs of means, compare each group mean to a
analysis of variance (ANOVA) can be directly applied, it is control mean, or compare preselected pairs of means. It is not
believed the endpoint values are usually linearly related to the restricted to being used as a follow up to ANOVA but instead
baseline values. Therefore an adjusted analysis of variance it can be used in any multiple comparisons context [11].
should be considered to account for the baseline values. This
adjusted analysis of variance is called analysis of covariance Newman-Keuls Test
(ANCOVA) [8]. In addition, ANCOVA provides a method for Newman-Keuls Test also referred to as the “Student
comparing response means among two or more groups Newman-Keuls Test”. It is described variously as a stepwise
adjusted for a quantitative concomitant variable, or or multiple-stage test. The range statistic varies for each
‘covariate’, thought to influence the response. The attention pairwise comparison as a function of the number of group
here is confined to cases in which the response, y, might be means in between the two being compared. A different
linearly related to the covariate, x. ANCOVA combines shortest significant range is computed for each pairwise
regression and ANOVA methods by fitting simple linear comparison of means. Means are first ordered by rank, and
regression models within each group and comparing the largest and smallest means are tested. If there is no
regressions among groups. Assumptions for ANCOVA as significant differences, testing stops there and it is concluded
each independent variable, the relationship between the that none is significantly different. Then means of the next
response (y) and the covariate (x) is linear, the lines greatest difference are tested using a different shortest
expressing these linear relationships are all parallel significant range. Testing is continued until no further
(homogeneity of regression slopes), the covariate is significant differences are found.
independent of the treatment effects (i.e. the covariant and This tests used when the group sample sizes are equal. For
independent variables are independent. ANCOVA might be example, to test with 5 treatment means X5 > X1, p-value <
applied 1) comparing cholesterol levels (y) between a treated 0.05. X4 = X1, p-value = ns. Can’t test different between X1
group and a reference group adjusted for age (x, in years) 2) and X3, X1 and X2, or X2 and X3. Can test different between
comparing scar healing (y) between conventional and laser X2 and X5 if the different between the means exceeds the
surgery adjusted for excision size (x, in mm) 3) comparing difference between the means of X1 and X5. The Student-
exercise tolerance (y) in 3 dose levels of a treatment used for Newman-Keuls (SNK) test is more powerful than Tukey's
angina patients adjusted for smoking habits (x, in method, so it will detect real differences more frequently [12].
cigarettes/day). However, Newman-Keuls test offers poor protection against a
type I error. This is especially the case when treatment means
Bartlett Test fall into groups which are themselves widely spaced apart.
Bartlett Test can be used to test for homogeneity of variance Differences between means within groups will be significant
[9]
. In addition, it can be used when the variances across more often than they should be at the specified level of α.
groups are not equal, the usual analysis of variance
assumptions are not satisfied and the ANOVA F test is not Tukey Multiple Comparison Test
valid and equal sample sizes from several normal In clinical research, the researcher may still need to
populations. For example, to use this tests for checking understand subgroup differences among the different
equality of variances among the treatment groups. The experimental and control groups. The subgroup differences
Levene's,Cochran's, and Hartley's statistical tests are also are called “pairwise” differences. ANOVA does not provide
used to test for homogeneity of variance. tests of pairwise differences when the researcher needs to test
pairwise differences. Tukey’s multiple comparison analysis
Bonferroni Test method tests each experimental group against each control
It is a multiple comparison test of significance based on group [13]. The Tukey method is preferred if there are equal
individual p-value is derived [10]. It can be used to correct any group sizes among the experimental and control groups. A
set of p-values for multiple comparisons, and is not restricted modified Tukey-Kramer method can be applied for
to use as a test to ANOVA. It works like as (1) compute a p- comparisons of unequal-sized groups. We can use this test
value for each comparison. Do no corrections for multiple under assuming the observations being tested are independent
comparisons when you do this calculation. (2) Define the within and among the groups, the groups associated with each
familywise significance threshold. Often this value is kept set mean in the test are normally distributed and there is equal
to the traditional value of 0.05. (3) Divide the value you within-group variance across the groups associated with each
chose in step 2 by the number of comparisons you are making mean in the test (homogeneity of variance). Example in
in this family of comparisons. If you use the traditional 0.05 clinical research, consider the data on effect of maternal
definition of significance, and are making 20 comparisons, smoking on child birth weight, in this case only the effect of
then the new threshold is 0.05/20, or 0.0025. (4) Call each duration of smoking is statistically significant. To find which
comparison "statistically significant" if the p- value from step duration or durations are making a significant impact,
1 is less than or equal to the value computed in step 3. compare mean birth weight for different duration.
38
International Journal of Medicine Research
between two independence categorical variables. The idea nominal variables, each with two or more possible values and
behind this test is to compare the observed frequencies with researcher want to see whether the proportions of one
the frequencies that would be expected if the null hypothesis variable are different for different values of the other
of no association/statistically independence were true. By variable. For example, suppose researcher wanted to know
assuming the variables are independent, we can also predict whether it is better to give the diphtheria, tetanus and
an expected frequency for each cell in the contingency table. pertussis (DTaP) vaccine in either the thigh or the arm, so
If the value of the test statistic for the chi-squared test of they collected data on severe reactions to this vaccine in
association is too large, it indicates a poor agreement between children aged 3 to 6 years old. One nominal variable is severe
the observed and expected frequencies and the null reaction vs. no severe reaction; the other nominal variable is
hypothesis of independence/no association is rejected. For thigh vs. arm [21]. In this case, a higher proportion of severe
example in clinical research, it will be used to test the reactions in children vaccinated in the arm; a G–test of
association between adverse event and the treatment used. independence will tell whether a difference this big is likely
The assumptions of chi-square test as independent random to have occurred by chance. Fisher's exact test is more
sampling, no more than 20% of the cells have an expected accurate than the G–test of independence when the expected
frequency less than five, and no empty cells. If the chi-square numbers are small.
test shows significant result, then we may be interested to see
the degree or strength of association among variables, but it Binomial Test
fails to explain another situation where more than or equal to It is used for testing whether a proportion from a single
20% of the cells have an expected frequency less than five. In dichotomous variable is equal to a presumed population
this case, the usual chi-square test is not valid. Then the value. Binomial test as an alternative to the z -test for
Fisher Exact test will be used to test the association among population proportions. The assumptions for the test are that
variables. This method also fails to give the strength of a) the data are dichotomous, b) observations should be
association among variables. independent from each other, and c) the total number of
The chi-square test of homogeneity is applied to a single observations in category A multiplied by the total number of
categorical variable from two different populations. It is used observations (i.e. A + B) > 10, and that the total number of
to determine whether frequency counts are distributed observations in category B multiplied by the total number of
identically across different populations. We can use this test observations > 10 (this way we can use the normal
under the assuming for each population, the sampling method approximation for the binomial test and calculate the z-
is simple random sampling and sample data are displayed in score). In clinical research, a common use of the binomial test
a contingency table (Populations x Category levels), the is for estimating a response rate, p, using the number of
expected frequency count for each cell of the table is at least patients (X) who respond to an investigative treatment out of
5. For example, in multicenter clinical trials it will be used to a total of n studied.
test differences among the centres for response of the
particular drug(s). McNemar test
In clinical research, It’s used when researcher interested to
Chi-square test (of Homogeneity) the test of improvement in response rate after a particular
The chi-square test of homogeneity is applied to a single treatment or finding a change in proportion for the paired data
categorical variable from two different populations. It is used (e.g., studies in which patients serve as their own control, or
to determine whether frequency counts are distributed in studies with before and after design). The three main
identically across different populations. We can use this test assumptions for this test are variable must be nominal with
under the assuming for each population, the sampling method two categories (i.e. dichotomous variables) and one
is simple random sampling and sample data are displayed in independent variable with two connected groups, two groups
a contingency table (Populations x Category levels), the of the dependent variable must be mutually exclusive and
expected frequency count for each cell of the table is at least sample must be a random sample and no expected
5. For example, in multicenter clinical trials it will be used to frequencies should be less than five. Data should be placed
test differences among the centres for response of the into a 2×2 contingency table, with the cell frequencies
particular drug(s). equalling the number of pairs. For example, a researcher is
testing a new medication and records if the drug worked
Fisher Exact Test (“yes”) or did not (“no”).
The Fisher's exact test is used in the approximation of the chi-
squared and normal test for a 2 x 2 contingency table, when Generalized McNemar/Stuart-Maxwell Test
cells have an expected frequency of five or less [19]. The chi- The generalization of McNemar's test extend 2x2 square
square test assumes that each cell has an expected frequency tables to KxK tables is often referred to as the generalized
of five or more, but the Fisher's exact test has no such McNemar or Stuart-Maxwell test [22, 23]. In clinical research,
assumption and can be used regardless of how small the this testing is used to analyze matched-pair pre–post data
expected frequency is. For example in clinical research, a (treatment) with multiple discrete levels (e.g. severity of pain)
study to compare two treatment regimes for controlling of the exposure (outcome) variable.
bleeding in haemophiliacs undergoing surgery when cell
frequency of 2 x 2 contingency table is five or less [20]. Bhapkar's test
This test is the marginal homogeneity by exploiting the
G–test of independence asymptotic normality of marginal proportion [24]. The idea of
G–test of independence used when researcher has two constructing test statistic is similar to the one of generalized
40
International Journal of Medicine Research
McNemar's test statistic, and the main difference lies in the many quantities of interest in medicine, such as anxiety or
calculation of elements in variance-covariance matrix. degree of handicap, are impossible to measure explicitly. In
Although the Bhapkar and Stuart-Maxwell tests are such cases, we ask a series of questions and combine the
asymptotically equivalent [25]. Bhapkar test is a more answers into a single numerical value. For example, Quality
powerful alternative to the Stuart-Maxwell test. In large of Life (QoL) scale used in clinical research should have
sample both will produce the same chi-squared value [24]. demonstrated reliability and validity, and be responsive to
change in health status, reliability is assessed through
Cochran’s Q test examination of the internal consistency at a single
This test is used to determine if there are differences on a administration of the instrument using Cronbach's α (alpha).
dichotomous dependent variable between three or more
related groups. In addition, when a binary response is Wilcoxon signed-rank test
measured several times or under different conditions, The Wilcoxon signed rank test is a non-parametric or
Cochran’s tests that the marginal probability of a positive distribution free test for the case of two related samples or
response is unchanged across the times or conditions. The repeated measurements on a single sample. It can be used (a)
Cochran Q test is an extension to the McNemar test for in place of a one-sample t-test (b) in place of a paired t-test or
related samples that provides a method for testing the (c) for ordered categorical data where a numerical scale is
differences between three or more matched sets of inappropriate but where it is possible to rank the observations
frequencies or proportions. We can use this test under the when the population can't be assumed to be normally
assuming for one dependent variable with two, mutually distributed. For example, the hours of relief provided by two
exclusive groups (i.e., the variable is dichotomous), analgesic drugs in patients suffering from arthritis and to test
dichotomous variables include perceived safety (two groups: that one drug provides longer relief than the other.
"safe" and "unsafe"), one independent variable with three or
more related groups and the cases (e.g., participants) are a Mann–Whitney U test
random sample from the population of interest. For example, The Mann–Whitney U test is a non-parametric or distribution
the data set drugs contain data for a study of three drugs to free test to compare differences between two independent
treat a chronic disease and forty-six subjects receives drugs groups when the dependent variable is either ordinal or
A, B, and C [26]. The response to each drug is either favorable continuous, but not normally distributed. The Mann-Whitney
or unfavorable and to test that differences of favorable (or Wilcoxon-Mann-Whitney) test is sometimes used for
response for the three drugs. comparing the efficacy of two treatments in clinical research.
It is often presented as an alternative to a t- test when the data
Cohen's kappa statistic are not normally distributed. Whereas a t-test is a test of
Cohen's kappa statistic is a measure of agreement between population means, the Mann-Whitney test is commonly
categorical variables. For example, kappa can be used to regarded as a test of population.
compare the ability of different raters to classify subjects into
one of several groups. Kappa also can be used to assess the Kruskal-Wallis H test
agreement between alternative methods of categorical The Kruskal-Wallis H test is a rank-based nonparametric test
assessment when new techniques are under study. In clinical that can be used to determine if there are statistically
aspect, comparison of a new measurement technique with an significant differences between two or more groups of an
established one is often needed to check whether they agree independent variable on a continuous or ordinal dependent
sufficiently for the new to replace the old. Correlation is often variable. Sometimes this test described as an ANOVA with
misleading [27]. Cohen’s Kappa used and the level of the data replaced by their ranks. It is an extension of the
agreement between raters were assessed in terms of a simple Mann-Whitney U test to three or more groups. For example
categorical diagnosis (i.e., the presence or absence of a in clinical research, it will be used to test assess differences in
disorder). albumin levels in adults different diets with different amounts
The kappa coefficient (𝜅) is used to assess inter-rater of protein.
agreement. One of the most important features of the kappa
statistic is that it is a measure of agreement, which naturally Friedman Post Hoc test
controls for chance. Kappa is always less than or equal to 1. It is a non-parametric test (distribution-free) used to compare
A value of 1 implies perfect agreement and values less than 1 observations repeated on the same subjects. This test is an
imply less than perfect agreement. In rare situations, Kappa alternative to the repeated measures ANOVA, when the
can be negative. This is a sign that the two observers agreed assumption of normality or equality of variance is not met.
less than would be expected just by chance. Possible Friedman’s Test and found a significant P- value, that means
interpretation of kappa coefficient (𝜅) as follows: that some of the groups in data have different distribution
Poor agreement = Less than 0.20 from one another, but it is don’t know which. There for, it is
Fair agreement = 0.20 to 0.40 needed to find out which pairs of groups are significantly
Moderate agreement = 0.40 to 0.60 different then each other. But when we have N groups,
Good agreement = 0.60 to 0.80 checking all of their pairs will be to perform [n over 2]
Very good agreement = 0.80 to 1.00 comparisons, thus the need to correct for multiple
comparisons arises. In that situation we will used the
Cronbach’s α (alpha) Statistic Friedman Post Hoc test. In clinical research, this test find out
The Cronbach's alpha is a statistic for investigating the the improvement of the drug(s) among the patients follow ups
internal consistency of a questionnaire [28, 29]. Generally, for a particular disease.
41
International Journal of Medicine Research
[26]
Kolmogorov-Smirnov test centres as strata . The CMH can be generalized to IxJxK
Kolmogorov-Smirnov test is a nonparametric statistical test tables.
that compares the cumulative distributions of two data sets. It
does not assume that data are sampled from Gaussian Log-rank test
distributions (or any other defined distributions). This test (K- The Log-rank test is a nonparametric test to comparing
S test) is used to decide if a sample comes from a population distributions of time until the occurrence of an event of
with a completely specified continuous distribution and also interest among independent groups. The event is often death
assumed that the population distribution is fully specified (i.e. due to disease, but event might be any binomial outcome,
it assumes that you know the mean and Standard deviation such as cure, response, relapse, or failure. Examples where
(SD) of the overall population perhaps from prior work) [30, use of the log-rank test might be appropriate include
31]
. For example in clinical research, to compare the serum comparing survival times in cancer patients who are given a
Antioxidant levels in 30 patients with pemphigus vulgaris, an new treatment with patients who receive standard
auto-immune blistering disorder [32]. chemotherapy, or comparing times-to-cure among several
doses of a topical antifungal preparation where the patient is
Spearman Correlation Test treated for 10 weeks or until cured, whichever comes first.
Spearman correlation to test the association between two
ranked variables, or one ranked variable and one Peto log-rank or Peto's generalized wilcoxon test
measurement variable. It is appropriate when one or both This test give more weight to the initial interval of the study
variables are skewed or ordinal [33] and is robust when where there are the largest number of patient’s risk. If the rate
extreme values are present. It is used instead of linear of death is similar over time, the Peto log-rank test and log-
regression/correlation for two measurement variables if rank test will produce the similar results. Log-Rank test is
you're worried about non-normality, but this is not usually more appropriate than the Peto generalized Wilcoxon test
necessary. Spearman correlation coefficient solely tests for when the alternative hypothesis is that the risk of death for an
monotonous relationships for at least ordinally scaled individual in one group is proportional to the risk at that time
parameters. The advantages of the latter are its robustness to for a similar individual in the other group. In additions, the
outliers and skew distributions. Correlation coefficients validity of this proportional risk assumption can be elucidated
measure the strength of association and can have values by the survivor functions of both groups. If it is clear they do
between –1 and +1. The closer they are to 1, the stronger is not cross each other than the proportional risk assumption is
the association. A test variable and a statistical test can be quite probably true and then Log-rank test should be used. In
constructed from the correlation coefficient. The null other case, the Peto log-rank test used instead.
hypothesis to be tested is then that there is no linear (or
monotonous) correlation. Odds Ratio (OR)
The Odds ratio is the ratio of the odds of disease in the
Cochran Armitage trend test exposed to the odds of disease in the non-exposed. It is used
In clinical research, it is often of interest to investigate the to measure of association the risk of a particular outcome (or
relationship between the increasing dosage and the effect of disease) if a certain factor (or exposure) is present. In
the drug under study. Usually the dose levels tested are addition, odds ratio is a relative measure of risk, telling us
ordinal, and the effect of the drug is measured in binary. In how much more likely it is that someone who is exposed to
this case, Cochran-Armitage trend test is used to test for trend the factor under study will develop the outcome as compared
among binomial proportions across levels of a single factor or to someone who is not exposed.
covariate [34, 35]. This test is appropriate for a two-way table For a 2x2 contingency table:
where one variable has two levels and the other variable is OR=1 suggests there is an equal chance of getting the
ordinal. The two-level variable represents the response, and disease among exposed group compared to unexposed
the other variable represents an explanatory variable with group.
ordered levels. OR>1 suggests there is a more chance or likelihood of
getting the disease exposed group compared to unexposed
Mantel Haenszel (MH) test group.
Mantel Haenszel (MH) statistic used to analysis of two OR<1 suggests there is a less chance or likelihood of
dichotomous variables while adjusting for a third variable to getting the disease among exposed group compared to
determine whether there is a relationship between the two unexposed group. Odds ratio can be used in both
variables controlling for levels of the third variable. For retrospective and prospective studies.
example, compare the frequency of smoking vs. non-smoking The Odds Ratio useful to analyse associations between
in teenage boys vs. girls in several different cities for 2x2 groups from case-control and prevalent (or cross-sectional)
replicated tables. data, rare diseases (or diseases with long latency periods) the
OR can be an approximate measure to the RR (relative risk)
Cochran Mantel Haenszel (CMH) test and to estimate the strength of an association between
Mantel Haenszel is a non-model based test used to identify exposures and outcomes.
confounders and to control for confounding in the statistical
analysis. It is used to test the conditional independence in Relative Risk (RR)
2x2xK tables. The Cochran-Mantel-Haenszel test is often The risk of the disease is probability of an individual
used in the comparison of response rates between two becoming newly disease given that the individual has the
treatment groups in a multi-center study using the study particular attribute. The Relative Risk is a ratio of the risk of
42
International Journal of Medicine Research
disease for those with the risk factor to the risk of disease for relationship between alcohol consumption and death from
those without the risk factor. In clinical research, it is used to any cause and to test the nonlinear relationship [39].
compare the risk of developing a disease in people not
receiving the treatment (or receiving a placebo) versus people Permutation test
who are receiving the treatment. Alternatively, it is used to Permutation test is used to perform a nonparametric test to
compare the risk of developing a side effect in people find out the difference between treatment groups in the
receiving a drug as compared to the people who are not assessment of new medical interventions. In addition, it is
receiving the treatment. used to study efficacy in a randomized clinical trial which
For a 2x2 contingency table: compares, in a heterogeneous patient population, two or more
RR=1 implies that the two groups (exposed and treatments, each of which may be most effective in some
unexposed group) have same risk. patients, when the primary analysis does not adjust for
RR>1 implies that higher risk of getting disease among covariates. The general discussion and application of
exposed group compared to unexposed group. permutation test describe by Zucker DM [40].
RR<1 implies that lower risk of getting disease among
exposed group compared to unexposed group. 3. Conclusion
Statistical test are used to analyze the different type of data in
Sensitivity, specificity, Predictive Value Positive Test different situations and nature of the data set. The statistical
(PPT) and Predictive Value Negative Test (NNT) test has its limitations, and to overcome that another method
Sensitivity: Sensitivity of a test is the ability to identify is used. Before using the statistical test in clinical research we
correctly those who have the disease and it is the need to check the assumptions and type of the study. Most of
proportion of patients with disease in whom the test is these statistical tests play a very important role to getting
positive. appropriate and desired result in clinical research, to make the
Specificity: Specificity of a test is the ability to identify decision on the objectives. Researchers / Physicians are
correctly those who do not have the disease and it is the helpful to used statistical tests to determine results from
proportion of patients without disease in whom the test is experiments, clinical research of medicine and symptoms of
negative. diseases. The use of statistical test in medicine provides
Predictive Value Positive Test (PPT): Predictive value generalizations for the public to better understand their risks
of a positive test is the likelihood of an individual with a for certain diseases, links between certain behaviors of
positive test has the disease. diseases, effectiveness of drug(s) and to significant finding of
Predictive Value Negative Test (NNT): Predictive value experimental objectives.
of a negative test is the likelihood of an individual with a
negative test Predictive value of a positive test is the 4. References
likelihood of an individual with a positive test does not 1. Wang D, Bakhai A. Clinical Trials-A Practical Guide to
have the disease. Design, Analysis, and Reporting. Remedica Publishing,
USA. 2006.
Simpson’s Paradox 2. Campbell MJ. Statistics at Square Two (2nd Ed.).
Simpson's paradox, also known as Yule–Simpson effect was Blackwell, USA. 2006.
first described by Yule [36] and is named after Simpson's [37]. 3. Box JF, Guinness, Gosset, Fisher, and Small Samples.
In clinical research, Simpson’s Paradox arises when the Statistical Science. 1987; 2(1):45-52.
association between an exposure and an outcome is 4. Walker GA, Shostak J. Common Statistical Methods for
investigated but the exposure and outcome are strongly Clinical Research with SAS® Examples (3rd Ed.). SAS
associated with a third variable. This is a real-life example Publishing, USA. 2010.
from a medical study comparing the success rates of two 5. Mahajan BK. Methods in biostatistics for medical
treatments for kidney stones [38]. students and research workers (7th Ed.). Jaypee, India,
2010.
Tests for Linear Trend 6. Hotelling H. The generalization of Student’s ratio. Ann
In clinical study researcher may interested to dose-response Math Stat. 1931; 2(3):360-378.
effect, that is situation in which an increased value of the risk 7. Scheffé H. The Analysis of Variance (Classics Ed.). John
factor means a greater likelihood of disease. It is used to test Wiley & Sons, USA, 1999.
for a dose-response trend whenever the different level of the 8. Chow SC, Liu JP. Design and analysis of clinical trials:
risk factor (i.e. The risk factor is ordinal or at least treated as Concepts and Methodologies (2nd Ed.). John Wiley &
such). Armitage described the details of the theory [34]. For Sons, New Jersey, 2004.
example, it is used to trend test of prevalence cough would be 9. Bartlett MS. Properties of sufficiency and statistical
greater for greater amount of smoking. tests. Proceedings of the Royal Society of London Series
A. 1937; 160:268-282.
Tests for Nonlinearity 10. Hochberg Y. A Sharper Bonferroni Procedure for
Sometimes the relationship between the risk factor and Multiple Tests of Significance. Biometrika. 1988;
disease is nonlinear. For example, it could be that low and 75(4):800-802.
high doses of the risk factor are harmful compared with 11. Motulsky H. Intuitive Biostatistics: A Nonmathematical
average doses. In this case a U-shaped relationship has been Guide to Statistical Thinking (2nd Ed.). New York, NY:
found by the several authors who have investigated the Oxford University Press. 2010.
43
International Journal of Medicine Research
12. Herve Abdi, Lynne JW. Newman-Keuls Test and Tukey 34. Armitage P. Tests for Linear Trends in Proportions and
Test. Neil Salkind (Ed.), Encyclopedia of Research Frequencies, Biometrics. 1955; 11:375-386.
Design. Thousand Oaks, CA: Sage. 2010. 35. Cochran WG. Some Methods for Strengthening the
13. Mary L. McHugh. Multiple comparison analysis testing Common Chi-Square Tests. Biometrics. 1954; 10:417-
in ANOVA. Biochemia Medica. 2011; 21(3):203-9. 51.
14. Scheffé H. The Analysis of Variance. New York: Wiley. 36. Yule G. Notes on the theory of association of attributes
1959. of statistics. Biometrika. 1903; 2:121-134.
15. Dunnett CW. A multiple comparison procedure for 37. Simpson EH. The Interpretation of Interaction in
comparing several treatments with a control. Journal of Contingency Tables. Journal of the Royal Statistical
the American Statistical Association. 1955; 50:1096- Society, Series B. 1951; 13:238-241.
1121. 38. Julious SA, Mullee MA. Confounding and Simpson's
16. Dunnett CW. New tables for multiple comparisons with paradox. British Medical Journal. 1994; 309:1480-1481.
a control. Biometrics. 1964; 20:482-491. 39. Duffy JC. Alcohol consumption and all-cause mortality,
17. Singh V, Rana RK, Singhal R. Analysis of repeated International Journal of Epidemiology. 1995; 24(1):100-
measurement data in the clinical trials. Jounranl of 5.
Ayurveda and Integrative Medicine. 2013; 4(2):77-81. 40. Zucker DM. Permutation Tests in Clinical Trials. Wiley
18. Liang KY, Zeger S. Longitudinal Data Analysis of Encyclopedia of Clinical Trials, 2007.
Continuous and Discrete Responses for Pre-Post (http://pluto.mscc.huji.ac.il/~mszucker/DESIGN/perm.pd
Designs. The Indian Journal of Statistics. 2000; 62:134- f).
148.
19. Fisher RA. Statistical methods for research workers.
Genesis Publishing Pvt Ltd. 1925.
20. Sarmukaddan SB. Clinical Biostatistics (1st Ed.). New
Age International, India. 2014.
21. Jackson LA, Peterson D, Nelson JC, et al. (13 co-
authors). Vaccination site and risk of local reactions in
children one through six years of age. Pediatrics. 2013;
131:283-289.
22. Stuart A. A Test for Homogeneity of the Marginal
Distributions in a Two-Way Classification. Biometrika.
1955; 42:412-416.
23. Maxwell AE. Comparing the classification of subjects by
two independent judges. British Journal of Psychiatry.
1970; 116:651-655.
24. Bhapkar VP. A note on the equivalence of two test
criteria for hypotheses in categorical data. Journal of the
American Statistical Association. 1966; 61:228-235.
25. Keefe TJ. On the relationship between two tests for
homogeneity of the marginal distributions in a two-way
classification. Biometrics. 1982; 69:683-684.
26. Agresti A. Categorical Data Analysis (2nd Ed.). John
Wiley & Sons, New Jersey. 2002.
27. Bland JM, Altman DG. statistical methods for assessing
agreement between two methods of clinical
measurement. Lancet. 1986; i:307-10.
28. Cronbach LJ. Coefficient alpha and the internal structure
of test. Psychometrika. 1951; 16:297-334.
29. Bland JM, Altman DG. Statistics notes: Cronbach's
alpha, British Medical Journal. 1997; 314:572.
30. Lilliefors H. On the Kolmogorov Smirnov test for
normality with mean and variance unknown. JASA.
1967; 62:399:402.
31. Sprent P, Smeeton NC. Applied Nonparametric
Statistical Methods (4th Ed.). Florida: Chapman and
Hall/CRC. 2001.
32. Alireza AB, Shima Y, Sara J, Maryam Y, Farid Z, Farid
AJ. How to test normality distribution for a variable: a
real example and a simulation study. Journal of
Paramedical Sciences (JPS). 2013; 4(1):73:77.
33. Altman DG. Practical Statistics for Medical
Research. Chapman & Hall/CRC. 1990.
44