A Handbook of Applied Statistics in Pharmacology - 2013
A Handbook of Applied Statistics in Pharmacology - 2013
A Handbook of Applied Statistics in Pharmacology - 2013
Applied
Statistics in
Pharmacology
Lower Hinge Median Upper Hinge
Whisker Whisker
Hinge Spread
A SCIENCE PUBL
PUBLISHERS
UBLIISHERS BOOK
Katsumi Kobayashi
Safety Assessment Division, Chemical Management Center
National Institute of Technology and Evaluation (NITE)
Tokyo, Japan
K. Sadasivan Pillai
Frontier Life Science Services
(A Unit of Frontier Lifeline Hospitals)
Thiruvallur District
Chennai, India
p,
A SCIENCE PUBLISHERS BOOK
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
2013 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Foreword
to assimilate. The examples used in the book are similar to those that the
scientists encounter regularly in their research. The authors have provided
cognitive clues for selection of an appropriate statistical tool to analyse the
data obtained from the studies and also how to interpret the result of the
statistical analysis.
Contents
Foreword v
Preface vii
1. Probability 1
Probability and Possibility 1
ProbabilityExamples 2
Probability Distribution 3
Cumulative Probability 3
Probability and Randomization 4
2. Distribution 6
History 6
Variable 6
Stem-and-Leaf Plot 7
Box-and-Whisker Plot 8
3. Mean, Mode, Median 11
Average and Mean 11
Mean 11
Geometric Mean 12
Harmonic Mean 12
Weighted Mean 13
Mode 13
Median 14
4. Variance, Standard Deviation, Standard Error, 16
Coefcient of Variation
Variance 16
Standard Deviation (SD) 18
Standard Error (SE) 19
Coefcient of Variation (CV) 19
When to Use a Standard Deviation (SD)/Standard Error (SE)? 20
x A Handbook of Applied Statistics in Pharmacology
9. Correlation Analysis 67
Correlation and Association 67
Pearsons Product Moment Correlation Coefcient 68
Signicance of r 69
Condence Interval of Correlation Coefcient 70
Coefcient of Determination 71
Rank Correlation 71
Spearmans Rank Correlation 71
Canonical Correlation 72
Misuse of Correlation Analysis 72
10. Regression Analysis 74
History 74
Linear Regression Analysis 74
Condence Limits for Slope 78
Comparison of Two Regression Coefcients 79
R2 80
Multiple Linear Regression Analysis 80
Polynomial Regression 81
Misuse of Regression Analysis 82
11. Multivariate Analysis 84
Analysis of More than Two Groups 84
One-way ANOVA 85
post hoc Comparison 87
Dunnetts multiple comparison test 87
Tukeys multiple range test 89
Williamss test 90
Duncans multiple range test 95
Scheffs multiple comparison test 98
Two-way ANOVA 100
Dunnetts Multiple Comparison Test and Students 103
t TestA Comparison
12. Non-Parametric Tests 106
Non-parametric and Parametric TestsAssumptions 106
Sign Tests 106
Calculation procedure of sign test for small sample size 107
Calculation procedure of sign test for large sample size 107
xii A Handbook of Applied Statistics in Pharmacology
ProbabilityExamples
Let us try to dene a probability with regard to frequency approach. The
probability of an occurrence for an event labeled A is dened as the ratio of
the number of events where event A occurs to the total number of possible
events that could occur (Selvin, 2004).
Let us understand some basic notations of probability:
P denotes probability.
If you toss a coin, only two events can occur, either a head up or a tail
up.
P(H) denotes probability of event head is up. You can calculate the
probability of head coming up using the formula:
Number of times head is up
P(H) =
(Number of times head is up+Number of times tail is up)
Probability Distribution
Let us try to understand probability distribution with the help of an
example. You ip a coin twice. In this example the variable, H is number
of heads that results from ipping the coin. There are only 3 possibilities:
H=0
H=1
H=2
Let us calculate the probabilities of the above occurrences of head up.
The probability of not occurring a head up in both the times (H=0)
=0.25
The probability of occurring a head up in one time (H=1) = 0.5
The probability of occurring a head up in both times (H=2) = 0.25
0.25, 0.5 and 0.25 are the probability distribution of H.
Cumulative Probability
A cumulative probability is a sum of probabilities. It refers to the probability
that the value of a random variable falls within a specied range.
You toss a dice. What is the probability that the dice will land on a
number that is smaller than 4? The possible 6 outcomes, when a dice is
tossed are 1, 2, 3, 4, 5 and 6.
The probability that the dice will land on a number smaller than 4:
P(X < 4 ) = P(X = 1) + P(X = 2) + P(X = 3) = 1/6 + 1/6 + 1/6 = 1/2
4 A Handbook of Applied Statistics in Pharmacology
The probability that the dice will land on a number 4 or smaller than 4:
P(X 4 ) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = 1/6 + 1/6 + 1/6 +
1/6 = 2/3
Cumulative probability is commonly used in the analysis of data obtained
from pharmacological (Kuo et al., 2009; Rajasekaran et al., 2009) and
toxicological experiments.
References
Carnap, R. (1995): Introduction to the Philosophy of Science. Dover Publications, Inc.,
New York, USA.
Keynes, J.M. (1921): A Treatise on Probability. Macmillan, London, UK.
Probability 5
Kuo, S.P., Bradley, L.A. and Trussell, L.O. (2009): Heterogeneous kinetics and
pharmacology of synaptic inhibition in the chick auditory brainstem. J. Neurosci., 29
(30), 96259634.
Rajasekaran, K., Sun, C. and Bertram, E.H. (2009): Altered pharmacology and GABA-A
receptor subunit expression in dorsal midline thalamic neurons in limbic epilepsy.
Neurobiol. Res., 33(1), 119132.
Murphy, E.A. (1985): A Companion to Medical Statistics. Johns Hopkins University Press,
Baltimore, USA.
Selvin, S. (2004): BiostatisticsHow It Works. Pearson Education (Singapore) Pte. Ltd.,
India Branch, Delhi, India.
Spiegel, M.R., Schiller, J.J., Srinivsan, R.A. and LeVan, M. (2002): Probability and
Statistics. The McGraw Hill Companies, Inc., USA.
2
Distribution
History
The most commonly used probability distribution is the normal
distribution. The history of normal distribution goes way back to 1700s.
Abraham DeMoivre, a French-born mathematician introduced the normal
distribution in 1733. Another French astronomer and mathematician,
Pierre-Simon Laplace dealt with normal distribution in 1778, when he
derived central limit theorem. In 1809 Johann Carl Friedrich Gauss
(17771855), a German physicist and mathematician, studied normal
distribution extensively and used it for analysing astronomical data. Normal
distribution curve is also called as Gaussian distribution after Johann Carl
Friedrich Gauss, who recognized that the errors of repeated measurements
of an object are normally distributed (Black, 2009).
Variable
We need to understand a terminology very commonly used in statistics,
i.e., variable. Variable is the fundamental element of statistical analysis.
Variables are broadly classied into categorical (attribute) and quantitative
variables. Categorical and quantitative variables are further classied into
two subgroups eachCategorical variables into nominal and ordinal, and
Quantitative variables into discrete and continuous.
Nominal variable: The key feature of nominal variables is that the
observation is not a number but a word (examplemale or female, blood
types). Nominal variables cannot be ordered. It makes no difference if you
write the blood types in the order A, B, O, AB or AB, O, B, A.
Distribution 7
Ordinal variable: Here the variable can be ordered (ranked); the data can
be arranged in a logical manner. For example, intensity of pain can be
ordered asmild, moderate and severe.
Discrete variable: Discrete variable results from counting. It can be 0 or
a positive integer value. For example, the number of leucocytes in a l of
blood.
Continuous variable: Continuous variable results from measuring. For
example, alkaline phosphatase activity in a dl of serum.
The variables can be independent and dependent. In a 90 day repeated
dose administration study you measure body weight of rats at weekly
intervals. In this situation week is the independent variable and the body
weight of the rats is the dependent variable.
Stem-and-Leaf Plot
Stem- and Leaf-Plot (Tukey, 1977) is an elegant way of describing the
data (Belle et al., 2004). Let us construct a stem-and-leaf plot of the body
weight of rats given in Table 2.1.
Table 2.1. Body weight of rats
Body weight (g)
132, 139, 134, 141, 145, 141, 140, 166, 154, 165, 145, 158, 162, 148, 154, 146, 154,
148, 140, 153, 154
Each data is split into a leaf (last digit) and a stem (the rst two
digits). For example, 132 is split into 13, which forms the stem and 2,
which forms the leaf. The stem values are listed down (in this example
13, 14, 15 and 16) and the leaf values are listed on the right side of the
stem values.
The Stem-and-leaf plot provides valuable information on the
distribution of the data. For example, the plot indicates that more number
of the animals is having body weight in the 140 g range, followed by the
150 g range.
Box-and-Whisker Plot
Another way of describing the data is by constructing a box-and-whisker
plot. The usefulness of box-and-whisker plot is better understood by
learning how to construct it. For this purpose we shall use the same body
weight data given in Table 2.1. As we have done for plotting the stem-
and-leaf plot, arrange the data in an ascending order (Table 2.2). The rst
step in constructing a box-and-whisker plot is to nd the median. You will
learn more about median in Chapter 3.
The median of the data given in Table 2.2 is the 11th value, i.e., 148
(see Table 2.3).
Table 2.3. Median value of the body weight data
Median
132, 134, 139, 140, 140, 141, 141, 145, 145, 146, 148, 148, 153, 154, 154, 154, 154, 158,
162, 165, 166
The median divides the data into 2 halves (a lower and an upper half).
The lower half consists of a range of values from 132 to 146 and the upper
half consists of a range of values from 148 to 166 (see Table 2.4).
Table 2.4. Median value of the lower and upper quartiles
Median
Lower half Upper half
132, 134, 139, 140, 140, 141, 141, 145, 145, 146, 148, 148, 153, 154, 154, 154, 154, 158,
162, 165, 166
Distribution 9
Hinge Spread
References
Belle, G,V., Fisher, L.D., Heagerty, P.J. and Lumley, T. (2004): Biostatistics-A Method for
the Health Sciences. 2nd Edition, Wiley Interscience, New Jersey, USA.
Black, K. (2009): Business Statistics: Contemporary Decision Making. 6th Edition. John
Wiley and Sons, Inc., USA.
Crow, J.F. (1993): Francis Galton: Count and measure, measure and count. Genetics, 135,
14.
Tukey, J.W. (1977): Exploratory Data Analysis. Addison-Wesley, Reading, Massachusetts,
USA.
3
Mean, Mode, Median
Mean
The procedure for calculating mean is very simple; sum of all individual
observations divided by the sum of number of observations. There are
several types of means, such as arithmetic mean, geometric mean and
harmonic mean. Let us work out the example given in Table 3.1 to
familiarise the reader with the calculation procedure of these means.
Table 3.1. Calculation of arithmetic mean of body weight of rats
Body weight (g) Sum
132, 139, 134, 141, 145, 141, 140, 166, 186, 183 1507
X
X 1507
150.7 g
N 10
Mean in the above example is called the arithmetic mean. Arithmetic
mean is sensitive to extreme values in data set. There is a condition for
calculating arithmetic meanthe data should t a normal distribution.
12 A Handbook of Applied Statistics in Pharmacology
Geometric Mean
Mathematically geometric mean is defined as the nth root of the product
of n numbers. An easy way to calculate the geometric mean is to find the
mean of logarithmic values of the data and then to find the antilog of
the mean. Steps involved in the calculation of the geometric mean of
the body weight data of the rats (Table 3.1) are given in Table 3.2.
Table 3.2. Calculation of geometric mean of body weight of rats
Body weight (g) N X
Linear scale 132, 139, 134, 141, 145, 141, 140, 166, 186, 183 1507 10 150.7
Log scale 2.12, 2.14, 2.13, 2.15, 2.16, 2.15, 2.15, 2.22, 21.7 10 2.17
2.27, 2.26
Geometric mean is the antilog of 2.17 = 147.9
Harmonic Mean
Harmonic mean is calculated by nding the mean of the reciprocals of the
values and then nding the reciprocal of the mean.
Calculation procedure of the harmonic mean of the data given in Table 3.1
is described in Table 3.3:
Table 3.3. Calculation of harmonic mean of body weight of rats
Body weight (g) N X
Linear scale132, 139, 134, 141, 145, 141, 140, 166, 186, 183 1507 10 150.7
Reciprocal 0.0076, 0.0072, 0.0075, 0.0071, 0.0069, 0.0071, 0.0673 10 0.0067
0.0071, 0.0060, 0.0054, 0.0055
Harmonic mean = 1/0.0067 = 148.5 g
Weighted Mean
In an experiment designed to administer a drug to rats, 15 rats were
randomly assigned to 3 cages (Cage 1, Cage 2 and Cage 3), each cage
consisting of 5 rats. At the end of 2 weeks of the drug administration in
Cages 1 and 2, two rats each survived, whereas in Cage 3 all the ve rats
survived. The body weight of the survived rats is given in Table 3.4.
Table 3.4. Body weight (g) of rats in 3 Cages at the end of 2 weeks following a drug
administration
Cage N Mean (g)
1 2 119
2 2 125
3 5 134
Mode
The mode is the value which appears the most in the data. It is usually
calculated for discrete data (Belle et al., 2004). There can be more than
one mode, if there is more than one value which appears the most.
In the following data,
130, 140, 140, 150, 140, 160, 140, 110, 120
The mode is 140 (140 appears 4 times in the data).
In the following data,
130, 140, 140, 150, 140, 160, 140, 110, 120, 130, 130
There are two modes, 140 and 130 (140 appears 4 times in the data,
whereas 130 appears 3 times).
14 A Handbook of Applied Statistics in Pharmacology
Median
To measure the central tendency, median is second in popularity to mean
(Rosner, 2006). Median is also termed as 0.50 quantile. Another term for
the median is the 50th percentile.
The rst step to calculate the median is to rank the values from lowest
to the highest. If the number of data values is odd, add 1 to the number of
data values and divide that by 2. For example, if there are 9 sample values,
divide (9+1) by 2. The median is the 5th ranked value. If the number of
data values is even, again add 1 to the number of data values and divide
that by 2. For example, if there are 10 sample values, divide (10+1) by 2 to
get 5.5. Median is the mean of the 5th and 6th ranked values.
The second situation where the median is useful is when it is impractical
to measure all of the values, such as when you are measuring the time until
something happens. Survival time is a good example of this; in order to
determine the mean survival time, you have to wait until every individual
is dead, whereas to determine the median survival time you do not need
to wait until every individual is dead; you need to wait only until half the
individuals are dead.
Mean, mode and median are theoretically the same for the data collected
from a symmetrical distribution (Lemma, 2008). Median and mode are
not affected by the extreme values (outliers). One disadvantage of mode
is that it does not include all the data for the analysis. Though mean and
median are commonly used in statistical analysis of pharmacological and
toxicological data, the use of mode is not very common.
References
Belle, V.G., Fisher, L.D., Heagerty, P.J. and Lumley, T. (2004): BiostatisticsA
Methodology for the Health Sciences. John Wiley & Sons, Inc., New Jersey, USA.
Iwamoto, M., Wenning, L.A., Petry, A.S., Laethem, M., De Smet, M., Kost, J.T., Merschman,
S.A., Strohmaier, K.M., Ramael, S., Lasseter, K.C., Stone, J.A., Gottesdiener, K.M.
and Wagner, J.A. (2008): Safety, tolerability, and pharmacokinetics of Raltegravir
after aingle and multiple doses in healthy subjects. Clin. Pharmacol. Therapeutics,
83, 293299.
Lemma, A. (2008): Introduction to the Practice of Psychoanalytic Psychotherapy. John
Wiley & Sons Ltd., Chichester, UK.
Lum, B.L., Tam, J., Kaubisch, S. and Flechner, S.M. (1992): Arithmetic versus harmonic
mean values for cyclosporin-A pharmacokinetic parameters. J. Clin. Pharmacol., 32
(10), 911014.
Mean, Mode, Median 15
Variance
Even the inbred animals maintained under well controlled animal house
conditions may show some variations among the individuals in responding
to a treatment in a pharmacology or toxicology study. Though majority of
the individual animals respond to the treatment in a similar manner or
magnitude, few of them will be too sensitive or resistant to the treatment.
There are several factors that may affect the outcome of an animal
experimentation, for example factors related to the experimenter. In a nut-
shell, even a well designed animal experimentation is bound to show some
variations in the result and it is important to understand these variations
for interpreting the experimental data. We shall work out an example, to
make it very clear.
For a pharmacology experiment 5 rats are randomly picked up and
placed them in a cage. As all the rats are of similar age and maintained in
identical animal house conditions, one would assume that all the animals
will have comparable body weight. The body weight of the rats is given
in Table 4.1.
It is evident from the Table that the assumption of all animals having
comparable body weight is incorrect. In animal experiments, one can
seldom get identical animals. There could be several differences (for
example difference in water and feed consumption, difference in activity,
difference in certain clinical chemistry parameters, etc.) among them.
These differences have an important role in determining the outcome of an
Variance, Standard Deviation, Standard Error, Coefcient of Variation 17
Initially, you thought that you would have had 5 degrees of freedom
before picking up any box. Firstly, you picked up the red box and your
degrees of freedom is reduced by 1 (51). The next time you picked up
the yellow box, and now your degrees of freedom is reduced by 2 (52).
When you picked up the black box, you have only 2 degrees of freedom
left. After picking the green box, you have only 1 degree of freedom left.
But you cannot exercise any freedom to pick up the blue box. Blue box
is the last box left out and you have to pick up this without any choice.
Therefore, the actual degrees of freedom that one can exercise is not equal
to the total number of observations, but 1 less than the total number of
observations.
being equally scattered above and below these limits (Altman and Bland,
1995). Mean 3 SD covers a range of 99.7% of the observations.
SD
SE
Since SE is smaller than the SD (see Figure 4.1), some authors use SE,
perhaps intentionally, in order to reduce the variability of their samples
(Streiner, 1996; Lang, 1997; Fisher, 2000).
Although SD and SE are related, they give two very different types of
information (Carlin and Doyle, 2000). In animal experiments, generally
SD is 820% of the mean of the measured values, hence, the bar presented
by the SD in a graph seems to be well balanced against the mean value. It
is not permitted to use SE intentionally just to show a small width of the
bar (Matsumoto, 1990). The next question is how precisely mean and SD
should be specied? Mean should not be specied with more than one
extra decimal place over the raw data but for SD greater precision can be
given (Altman and Bland, 1996).
In conclusion, SD gives a fairly good indication about the distribution
of the observed values around the mean. SE gives an indication about the
variability of the mean. In toxicology experiments, especially with rodents,
where the number of animals in a group is usually 10, it would be more
ideal to use SD and in pharmacology experiments, where the number of
animals in a group is usually <5 it would be more ideal to use SE, though
there is no hard and fast rule for these.
References
Altman, D.G. and Bland, J.M. (1995): Statistics notes: The normal distribution. BMJ, 310,
298.
Altman, D.G. and Bland, J.M. (1996): Presentation of numerical data. BMJ, 312, 572.
Altman, D.G. and Bland, J.M. (2005): Standard deviations and standard errors. BMJ, 331,
903.
Aoyama, H. (2005): Applications and limitations of in vivo bioassays for detecting
endocrine disrupting effects of chemicals on mammalian species of animals. J. Natl.
Inst. Public Health, 54(1), 2934.
Carlin, J.B. and Doyle, L.W. (2000): Basic concepts of statistical reasoning: standard errors
and condence intervals. J. Paediatr. Child Health, 36, 502505.
Cumming, G. (2007): Error bars in experimental biology. JCB, 177(1), 711.
Daniel, W.W. (2007): Biostatistics-A Foundation of Analysis in the Health Sciences. 7th
Edition, John Wiley & Sons (Asia) Pte. Ltd., Singapore.
Everett, D.C. (2008): Explorations in statistics: standard deviations and standard errors.
Adv. Physiol. Educ., 32, 203208.
Everett, D.C. and Benos, D.J. (2004): Guidelines for reporting statistics in journals
published by the American Physiological Society. Adv. Physiol. Educ., 28, 8587.
Fisher, D.M. (2000): Research Design and Statistics in Anesthesia. In: Anesthesia, 5th
Edition, Vol. 1., Edited by Miller, R.D., Churchill Livingston, Philadelphia, USA.
22 A Handbook of Applied Statistics in Pharmacology
Herxheimer, A. (1988): Misuse of standard error of the mean. Br. J. Clin. Pharmacol., 26,
197.
Kobayashi, K., Sakuratani, Y., Abe, T., Yamazaki, K., Nishikawa, S., Yamada, J., Hirose,
A., Kamata, E. and Hayashi, M. (2011): Inuence of coefcient of variation in
determining signicant difference of quantitative values obtained from 28-day
repeated-dose toxicity studies in rats. J. Toxicol. Sci., 36(1), 6371.
Lang, T.A.S.M. (1997): How to report statistics in medicine: annotated guidelines for
authors, editors, and reviewers. American College of Physicians, Philadelphia, USA.
Matsuzawa, T., Nomura, M. and Unno, T. (1993): Clinical pathology reference ranges of
laboratory animals. J. Vet. Med. Sci., 55(3), 351362.
Matsumoto, K. (1990): Japanese Laboratory Animal Engineer Society, No. 6.
Nagele, P. (2003): Misuse of standard error of the mean (SEM) when reporting variability
of a sample. A critical evaluation of four anaesthesia journals. Br. J. Anaesthesiol.,
90, 514516.
Streiner, D.L. (1996): Maintaining standards: differences between the standard deviation
and standard error, and when to use each. Can. J. Psychiatry, 41, 498502.
USP (2008): The United States Pharmacopeia, The National Formularly, USP 31, NF 26,
Asian Edition, Volume1, Port City Press, Baltimore, USA.
5
Analysis of Normality and
Homogeneity of Variance
Analysis of Normality
The two types of non-normal distributions that are generally encountered
in statistical analysis are skewness and kurtosis. The mean and median are
different in a skewed distribution. Skewness can be positive or negative.
The data are positively skewed, when the tail of the distribution curve is
extended towards more positive values and the data are negatively skewed,
when the tail of the distribution curve is extended towards more negative
values (isar and isar, 2010).
Peakedness of a distribution is depicted by kurtosis. A distribution
can be platykurtic or leptokurtic. Platykurtic is more at-topped and
24 A Handbook of Applied Statistics in Pharmacology
Shapiro-Wilks W test
Let us understand Shapiro-Wilks W test in detail by working out an
example given in Table 5.1, body weight of F344 male rats. The data are
arranged in an orderly fashion.
Analysis of Normality and Homogeneity of Variance 25
The data in Table 5.1. is analysed using SAS-JMP and the statistics
are given in Tables 5.2. and 5.3. The body weight distribution is given in
Figure 5.2.
Table 5.2. Quantiles
100% Maximum 123.0
99.5% 123.0
97.5% 123.0
90.0% 122.5
75.0% Quartile 110.5
50.0% Median 101.0
25.0% Quartile 90.5
10.0% 72.5
2.5% 71.0
0.5% 71.0
0.0% Minimum 71.0
Note: The term, quantile was introduced by Kendall (1940). Quantiles divide the
distributions such that there is a given proportion of observations below the quantile.
Quartiles and percentiles are quantiles. Quartile divides the quantile into four equal parts
(025%, 2550%, 5075% and 75100%). A percentile is the value of a variable below
which a certain percent of observations fall. For example, the 10th percentile is that position
in a data set which has 90% of data points above it, and 10% below it.
Shapiro-Wilks W test
W Prob <W
0.987278 0.9891
Situation 2 (Thirty four observations, the observations of situation 1 are
used twice): 70, 80, 85, 90, 94, 99, 101, 102, 104, 105, 108, 111, 112, 114,
121, 125, 131, 70, 80, 85, 90, 94, 99, 101, 102, 104, 105, 108, 111, 112,
114, 121, 125, and 131. The distribution of the observations is given in
Figure 5.3b.
Statistics
Mean 103.05882
SD 15.765211
SE 2.7037114
Upper 95% mean 108.55957
Lower 95% mean 97.558081
N 34
Shapiro-Wilks W test
W Prob <W
0.968746 0.5017
Situation 3 (Fifty one observations, the observations of situation 1 are used
thrice ): 70, 80, 85, 90, 94, 99, 101, 102, 104, 105, 108, 111, 112, 114, 121,
125, 131, 70, 80, 85, 90, 94, 99, 101, 102, 104, 105, 108, 111, 112, 114,
121, 125, 131, 70, 80, 85, 90, 94, 99, 101, 102, 104, 105, 108, 111, 112,
114, 121, 125, and 131. The distribution of the observations is given in
Figure 5.3c.
Statistics
Mean 103.05882
SD 15.686187
SE 2.1965056
Upper 95% mean 107.47063
Lower 95% mean 98.647012
N 51
30 A Handbook of Applied Statistics in Pharmacology
Statistics
Mean 103.05882
SD 15.647118
SE 1.8974918
Upper 95% mean 106.84623
Lower 95% mean 99.271414
N 68
Shapiro-Wilks W test
W Prob <W
0.954862 0.0383
The statistics given in Figure 5.3aFigure 5.3d are consolidated in Table
5.5. Shapiro-Wilks W test revealed a signicant P, when the number of
animals was 68, indicating a non-normal distribution.
Analysis of Normality and Homogeneity of Variance 31
Table 5.5. Change in power of Shapiro-Wilks W test with the change in number of
animals
N Mean Coefcient of variance (%) Shapiro-Wilks W test
W P
17 103 15.5 0.987278 0.9891 (NS)
34 15.3 0.968746 0.5017 (NS)
51 15.2 0.959888 0.1486 (NS)
68 15.2 0.954862 0.0383 (S)
NS-Not signicant (normal distribution); S-Signicant (non-normal distribution)
where,
V
Variance of each group u Sum ofN 1
(Sum of N ) Number of group
X 2 cal (chi square calculated) is compared with the value given in chi square
Table (N=number of groups-1) at 5% probability level. If the computed
value is less than the table value, it is interpreted that the variances of the
groups are similar (no heterogeneity). It may be noted that Bartletts test
is not suitable for detecting a heterogeneity when the number of animals
in a group very small.
et al., 2009). The reason for setting a 1% probability level for detecting
a signicant difference probably could be: if a signicant difference is
detected by Bartletts test at the conventional 5% probability level, then
the data should be analysed using the non-parametric Dunnett type rank
sum test (joint type) (Yamazaki et al., 1981) and/or Dunn test (Hollander
and Wolfe, 1973), which have low detection power. Therefore, when
the probability level is set at 1%, it is unlikely that the data show a
heteroscedacity in variance by Bartletts test. The reason for this is that to
detect a signicant difference at 1% probability level, the chi square value
has to be larger than that of the 5% probability level.
Table 5.6. Water consumption (g/week)of B6C3F1 female mice during the week 13 of a
repeated dose administration study
Groups N Mean S.D. P
OBrien Brown-Foresythe Levene Bartlett
1 10 43.8 9.0 0.0459 0.0340 0.0014 <0.0001
2 10 35.4 3.4
3 10 31.9 1.5
4 10 30.7 2.1
It is clear from the table that the sensitivity of Bartletts test is higher,
followed by Levenes test. OBriens and Brown-Forsythes tests have
very low sensitivity.
Brown-Forsythes test is a modied Levenes test. Both Brown-
Forsythes and Levenes tests use transformed values (Maxwell and
Delaney, 2004). It is more appropriate to use the Levenes, Brown-
Forsythes or OBriens tests (OBrien, 1979; 1981) for testing the
homogeneity of variance of the data that follow a non-normal distribution
(SAS, 1996). Kobayashi et al. (1999) suggested Levenes test for examining
homogeneity of variance of the data obtained from toxicity studies.
References
Bartlett, M.S. (1937): Properties of sufciency and statistical tests. Proceedings of the
Royal Statistical Society Series A, 160, 268282.
Bradlee, J.V. (1968): Distribution-Free Statistical Tests. Prentice-Hall, Englewood Cliffs,
New Jersey, USA.
Brown, M.B. and Forsythe, A.B. (1974): Robust tests for equality of variances. J. Am. Stat.
Assoc., 69, 364367.
Chakravarti, I.M, Laha, R.G. and Roy, J. (1967): Handbook of Methods of Applied
Statistics, Volume I, John Wiley and Sons, New York, USA.
Chen, E.H. (1971): The power of Shapiro-Wilk W test for normality in samples from
contaminated normal distribution. J. Am. Stat. Assoc., 66(336), 760762.
isar, P. and isar, S.M. (2010): Skewness and kurtosis in function of selection of network
trafc distribution. Acta Polytech. Hung., 7(2), 95106.
Colquhoun, D. (1971): Lecture on Biostatistics. Clarendon Press, Oxford, UK.
EMEA (2006): European Medicines Agency. Biostatistical Methodology in Clinical Trials.
ICH Topic E 9Statistical Principles for Clinical Trials, CPMP/ICH/363/96, London,
UK.
Finney, D.J. (1995): Thoughts suggested by a recent paper: Questions on non-parametric
analysis of quantitative data (letter to editor). J. Toxicol. Sci., 20(2), 165170.
Hayashi, T., Yada, H., Auletta, C.S., Daly, I.W., Knezevich, A.L. and Cockrell, B.Y. (1994):
A six-month interperitoneal repeated dose toxicity study of tazobactam/piperacillin
and tazobactam in rats. J. Toxicol. Sci., 19, Suppl. II, 155176.
Analysis of Normality and Homogeneity of Variance 35
Hollander, M. and Wolf, D.A. (1973): Nonparametric Statistical Methods, John Wiley and
Sons, New York, USA.
Ishii, S., Ube, M., Okada, M., Adachi, T., Sugimoto, J., Inoue, Y., Uno, Y. and Mutai,
M. (2009): Collaborative work on evaluation of ovarian toxicity (17). J. Toxicol. Sci.,
34, SP175SP188.
Kendall, M.G. (1940): Note on the distribution of quantiles for large samples. Supp. J.
Royal Stat. Soc., 7(1), 8385.
Kobayashi, K. (2005): Analysis of quantitative data obtained from toxicity studies showing
non-normal distribution. J. Toxicol. Sci., 30(2), 127134.
Kobayashi, K., Kitajima, S., Miura, D., Inoue, H., Ohori, K., Takeuchi, H. and Takasaki,
K. (1999): Characteristics of quantitative data obtained in toxicity rodentsThe
necessity of Bartletts test for homogeneity of variance to introduce a rank test. J.
Environ. Biol., 20, 3748.
Kobayashi, K., Pillai, K.S., Suzuki, M. and Wang, J. (2008): Do we need to examine the
quantitative data obtained from toxicity studies for both normality and homogeneity
of variance? J. Environ. Biol., 29, 4752.
Kobayashi, K., Sadasivan Pillai, K., Soma Guhatakurta, Cherian, K.M., and Ohnishi,
M. (2011b): Statistical tools for analysing the data obtained from repeated dose
toxicity studies with rodents: A comparison of the statistical tools used in Japan with
that of used in other countries. J. Environ. Biol., 32: 1116.
Kobayashi, K., Sakuratani, Y., Abe, T., Yamazaki, K., Nishikawa, S., Yamada, J., Hirose,
A., Kamata, E. and Hayashi, M. (2011a): Inuence of coefcient of variation in
determining signicant difference of quantitative values obtained from 28-day
repeated-dose toxicity studies in rats. J.Toxicol. Sci., 36(1), 6371.
Kudo, S., Tanase, H., Yamasaki, M., Nakao, M., Miyata, Y., Tsuru, K. and Imai, S. (2000):
Collaborative work to evaluate toxicity on male reproductive organs by repeated dose
studies in rats (23). J. Toxicol. Sci., 25, SP223SP232.
Levene, H. (1960): Robust tests for the equality of variances. In: Contributions to Probability
and Statistics, Edited Olkin, I. Stanford University Press, USA.
Liang, J., Tang, M.L. and Chan, P.S. (2009): A generalized Shapiro-Wilks W statistic for
testing high dimensional normality. Comp. Stat. Data Anal., 53(11), 38833891.
Lilliefors, H. (1967): On the KolmogorovSmirnov test for normality with mean and
variance unknown. J. Am. Stat. Assoc., 62, 399402.
Maxwell, S.E. and Delaney, H.D. (2004): Designing Experiments and Analysing DataA
Model Comparison Perspective. 2nd Ed., Lawrence Erlbaum Associates, Inc., New
Jersey, USA.
Mochizuki, M., Shimizu, S., Urasoko, Y., Umeshita, K., Kamata, T., Kitazawa, T. Nakamura,
D., Nishihata, Y., Ohishi, T. and Edamoto, H. (2009): Carbon tetrachloride-induced
hepatotoxicity in pregnant and lactating rats. J. Toxicol. Sci., 34(2), 175181.
Nichols, D. (1994): Levene test, SPSS Inc., [email protected]
OBrien, R.G. (1979): A general ANOVA method for robust test of additive models for
variance. J. American Stat. Asso., 74, 877880.
OBrien, R.G. (1981): A simple test for variance effects in experimental designs. Psych.
Bull., 89, 570574.
36 A Handbook of Applied Statistics in Pharmacology
Park, H.M. (2008): Univariate analysis and normality test using SAS, Stata and SPSS.
Univ. Inf. Tech. Serv., Centre Stat. Math. Comp., Indiana Univ., Bloomington, USA.
SAS (1996): JMP Start Statistics. SAS Institute, USA.
Sen, P.K., Jureckov, J. and Picek, J. (2003): Goodness-of-Fit Test of Shapiro-Wilk Type
with Nuisance Regression and Scale. Aust. J. Stat., 32(1&2), 163167.
Shapiro, S.S. and Wilk, M.B. (1965): An analysis of variance test for normality (complete
samples). Biometrika, 52(3-4), 591611.
Singh, K. (2009): Quantitative Social Research Methods. Sage Publication Pvt. Ltd., New
Delhi, India.
Snedecor, G.W. and Cochran, W.G. (1989): Statistical Methods, 8th Edition, Iowa State
University Press, Ames, USA.
Spector, R. and Vesell, E.S. (2006): Pharmacology and statistics: Recommendations to
strengthen a productive partnership. Pharmacology, 78, 113122.
Weil, C.S. (1982): Statistical analysis and normality of selected hematologic and clinical
chemistry measurements used in toxicologic studies. Arch. Toxicol. Suppl., 5,
237253.
Yamazaki, M., Noguchi, Y., Tanda, M. and Shintani, S. (1981): Statistical method
appropriate for general toxicological studies in rats. J. Takeda Res. Lab., 40(3/4),
163187.
6
Transformation of Data and
Outliers
Transformation of Data
There are situations in pharmacological and toxicological experiments
that the data show heterogeneous variance across the groups of animals.
Using parametric tests to analyse such data may give rise to Type I error.
One way to overcome this situation is to transform the data (Wallenstein et
al., 1980). It is most likely that the variance of the transformed data show
homogeneity.
In Table 6.1, transformed values of alanine aminotransferase activity
of Wistar rats of the control group in a 14-day repeated dose administration
study is given.
Table 6.1. Alanine aminotransferase activity (U/L) of Wistar rats of the control group in
a 14-day repeated dose administration study
45.3, 63.8, 82, 42, 40.8, 38.2, 35.9, 37.9, 39.1, 35.5 (N=10)
Transformation MeanSD CV (%)
None 46 15 32.7
Logarithm 1.6 0.12 7.2
Square root 6.7 1.0 15.0
Reciprocal 0.02 0.005 22.8
For the non-transformed data, the CV was 32.7%, which substantially
decreased, when the data were transformed to logarithms. CV also
decreased when the data were transformed to square roots and reciprocals,
but in a lesser magnitude than the logarithmic-transformed data.
Concentrations of blood constituents usually show a non-normal
distribution (Flynn et al., 1974). Therefore, statistical analysis is usually
carried out with the transformed values of blood constituents (Niewczas
38 A Handbook of Applied Statistics in Pharmacology
Outliers
Data obtained from pharmacological and toxicological studies are not
free from outliers. An outlier can be dened as an observation which
deviates so much from other observations as to arouse suspicion that it
was generated by a different mechanism (Hawkins, 1980). Outliers
Transformation of Data and Outliers 39
The blood glucose level of the vehicle treated group was 186.8 14.9
mg/dl (mean SD), whereas the drug treated group was 177.0 47.4 mg/
dl (mean SD). Though a decrease in blood glucose level was observed in
the drug treated animals, it was statistically insignicant by Aspin Welchs
t-test using one-sided (we used Aspin Welchs t-test because the variance
of the groups is different. You will read more about this test in Chapter 8).
The SD of drug treated group exploded considerably, indicating a large
variance. Close observation of the individual values of the drug treated
animals shows that all the values in this group are close to each other,
except the value, 259. Let us recompute the mean and SD of this group,
after removing 259 from the data. The revised mean SD is 156.5 13.4
(n=4). We are comfortable with this SD, as this is very close to the SD of
the vehicle treated group, indicating a homogeneity of variance between
the vehicle treated and drug treated animals. The blood glucose of drug
treated animals (after removing the value, 259) is statistically different
from the vehicle treated animals by Students t-test (we used the Students
t-test because the variance of the groups is not different. You will read more
about this test in Chapter 8). In this example, the value 259 is an outlier,
as it clearly stands out of other values, but in many pharmacological and
toxicological experiments it is not easy to spot an outlier. A simple method
to identify an outlier mentioned in several books on statistics is given
below (Hogan and Evalenko, 2006):
40 A Handbook of Applied Statistics in Pharmacology
A Cautionary Note
Though human and other errors are major contributing factors for outliers,
a positive outcome from an outlier test should be investigated (Ellison et
al., 2009). Before discarding an outlier, one has to conrm that the value
discarded as an outlier is not a genuine data point. Hubrecht and Kirkwood
(2010) suggested that one way to deal with an outlier is to carry out the
statistical analysis with and without it. If the analytical results provide
Transformation of Data and Outliers 43
Figure 6.1. Hemoglobin concentration (g/dl) of F344 male rats on week 104
44 A Handbook of Applied Statistics in Pharmacology
References
Anscombe, F.J. (1960): Rejection of outliers. Technometrics, 2, 123147.
Aoki, S. (2002): http://aoki2.si.gunma-u.ac.jp/lecture/Grubbs/Grubbs-table.html
Aoki, S. (2006): http://aoki2.si.gunma-u.ac.jp/lecture/Grubbs/Grubbs.html
ASTM (2008): American Society for Testing Materials. Standard Practice for Dealing With
Outlying Observations, ASTM E178-08, ASTM International Philadelphia, USA.
Barnett, V. and Lewis, T. (1994): Outliers in Statistical Data, 3rd Edition. Wiley, New
York, USA.
Dubey, S.K., Patni, A., Khuroo, A., Thudi, N.R., Reyar, S., Arun Kumar, Tomar, M.S., Jain,
R., Nand Kumar and Monif, T. (2009): A quantitative analysis of memantine in human
plasma using ultra performance liquid chromatography/Tandem mass spectrometry.
E- J. Chem., 6(4), 10631070.
Ellison, S.L.R., Barwick, V.J. and Farrant, T.J.D. (2009): Practical Statistics for the
Analytical ScientistA Bench Guide. 2nd Edition, The Royal Society of Chemistry,
Cambridge, U.K.
EMEA (2006): European Medicines Agency. Biostatistical Methodology in Clinical Trials.
ICH Topic E 9Statistical Principles for Clinical Trials, CPMP/ICH/363/96, London,
UK.
FDA (2003): Food and Drug Administration. Guidance for Industry Bioavailability
and Bioequivalence Studies for Orally Administered Drug ProductsGeneral
Considerations. U.S. Department of Health and Human Services, Food and Drug
Administration, Center for Drug Evaluation and Research (CDER), Rockville, USA.
Transformation of Data and Outliers 45
Null Hypothesis
The main objective of conducting an animal experiment is to know
whether the treatment with a test item causes any effect compared to the
control group. The comparison between the treatment group/s and the
control group is made using various statistical tools. The selection of an
appropriate statistical tool is based on certain assumptions. Before we go
further, we need to understand a hypothesis called null hypothesis.
In the statistical context, a hypothesis is a statement about a distribution
(example, normal distribution), or its underlying parameter (example,
mean value, ) or a statement about the relationship between probability
distribution (example, there is no statistical difference between the treated
and the control groups) or its parameter (1= 2) (Le, 2009). Why is it
called as null hypothesis? Let us try to understand null hypothesis using
the explanation proposed by Yoshida (1980). No pharmaceutical company
will venture in developing a new drug, A1, if it is not superior to the drug
currently in use, A2. In a statistical analysis, we rst hypothesize that drugs
A1 and A2 have the same therapeutic value. That is, we hypothesize A1 =
A2, which is contrary to our assumption A1 > A2. When the experimental
data fail to show A1 = A2, we judge that A1 differs from A2 and reject
the hypothesis. Thus, in a statistical test, we rst hypothesize A1 = A2 in
contrast to our assumption A1 > A2, and then show that it is not true (A1
A2). The original hypothesis A1 = A2, which is desirably rejected, is
called the null hypothesis. In most of the statistical books null hypothesis
is notated as:
48 A Handbook of Applied Statistics in Pharmacology
player may abandon the null hypothesis (null hypothesis in this case is that
the player and his opponent have equal skill level) and consider that his
opponent is a better player than him. If the player and his opponent have
equal ability, the probability of losing the game once by the player is 1/2,
but the probabilities of losing four and ve games consecutively by the
player are (1/2)4 = 6.3% and (1/2)5 = 3.2%, respectively. The mid-point of
these probabilities is about 5% [(6.3+3.2)/2=4.8%)].
The ve percent signicant level which implies 1 mistake in 20
observations (1/20) is normally unavoidable in biological experiments
and has been used for more than half a century in bioassays including
toxicity tests (Dunnett, 1955; Kornegay et al., 1961). Hence, according
to Bailey (1995), the ve percent signicant level can be generally used
for agging a signicant difference. Conventionally, a P value of <0.05
indicates statistical signicance (Doll and Carney, 2005).
However, strictly adhering to a 5% signicant level to delineate a
signicant difference has been questioned by few statisticians. Fisher
(1955) recommended a 5% signicant level based on a single hypothesis,
H0. Neyman and Pearson (1928, 1936) proposed a decision process which
seeks to conrm or reject a priori hypothesis and rejected Fishers idea that
only the null hypothesis needs to be tested. Statisticians posed questions
against Fishers 5% probability level; the question was what should be
the smallest P value that warrants rejection of the null hypothesis? In later
years, Fisher (1971) stated that the Q value can be signicant at a higher
standard, if P is 1% and at a lower standard if P is 5%. It again states,
though indirectly, that a signicant difference can be obtained only when
the P is between 1 and 5%. (Note: Q value is the false discovery rate
analogue of P).
Many statisticians do not favor strictly characterizing the result of a
statistical analysis into a positive or negative nding on the basis of a
P value. They suggest, when reporting the results of signicance tests,
precise P values (example, P<0.049 or P<0.051) should be reported rather
than referring to specic critical values. Interpretation of the results of a
statistical analysis should not be made solely on the basis of null hypothesis.
The hypothesis testing has been challenged and there has been suggestion
to report condence intervals rather than P (Krantz, 1999). According
to Gelman and Stern (2006) dichotomization into signicant and non-
signicant results encourages the dismissal of observed differences in
favor of the usually less interesting null hypothesis of no difference. In the
case of experiments conducted in pharmacology and toxicology, biological
50 A Handbook of Applied Statistics in Pharmacology
relevance of the results also should be considered for interpreting the data.
Declaring a result non-signicant does not mean that the effect is not
biologically relevant; it only means that there is not sufcient evidence
to reject the null hypothesis. In a nutshell, statistical analysis should not
override the experience of the experimenter in interpreting the results of
the experiments.
How to Express P?
The published articles express the P in two ways: P <0.05 or P<0.05. The
question is how the P should be expressedP <0.05 or P0.05? Though,
technically, it may be better to express P0.05, P<0.05 also conveys similar
information on statistical signicance. We conducted a small investigation
on the expression of P in toxicological/pharmacological articles published
in few journals. In most of the journals investigated, we observed that
P<0.05 and P <0.05 were used at similar frequencies. In the toxicological/
pharmacological experiments conducted in Japan, P<0.05 tended to be
used slightly more frequently than P<0.05. In the technological report of
the National Toxicology Program of NIH, USA, P <0.05 is more widely
used.
(2.447), hence the null hypothesis was not rejected (Note: Normally we
analyse the data using a statistical formula to obtain a calculated value.
Then, we compare this calculated value with the value (critical value)
given in the appropriate statistical Table. If the calculated value is greater
than the Table value (critical value), we consider the null hypothesis is
rejected. In this particular example we have analysed the data using a t-test
and got a t value. This t value was compared with the value given in the
t Table. You shall learn about various statistical tools and their applications
in later chapters). Not-rejection of the null hypothesis means there is no
statistical signicant difference among the weights of seven loaves of the
bread that the customer purchased. The customer was not convinced with the
result of the two-sided test provided by the grocer. The customer decided to
analyse the data using a one-sided test, with the assumption that the weight of
the loaf of the bread is less than 450 g. When the customer analysed the data
using the one-sided test, he found that the calculated t value (2.14) was greater
than the value of t-distribution Table (1.943). Therefore, Null hypothesis is
rejected, which means that there is a statistical signicant difference among
the weights of seven loaves of the bread that he purchased.
with the control group. Shirley (1997) used a two-sided test for Students
t-test and Cochrans t-test, and if a signicant difference is observed in
the ANOVA, used the one-side test in Dunnetts multiple comparison test.
Dunnett (1955) recommended the use of a two-sided test to determine
simultaneously the upper and lower limits to the difference between the
control group and each treated group and a one-sided test to determine
either the upper or lower limit to the difference between the control group
and each treated group. Gad and Weil (1988) explained the signicant
difference between the control and treated groups in body weight by using
the two-sided test. Sakuma (1977) suggested to select either a one- or a
two-sided test referring to the reports of similar studies. Nakamura (1986)
stated that selection of one- or two-sided test may depend on the objective
of the study, and he suggested that the statistical signicance of the data
should not be foreseen. Kobayashi (1997) recommended a one-sided test
for the analysis of data obtained from toxicological studies.
A signicant difference is more apt to be observed in a one-sided
test than in a two-sided test. According to a survey, the detectability of a
signicant difference by the two-sided test was 7195% of that by a one-
sided test in the Dunnetts t-test (Table 7.1) (Kobayashi, 1997).
Table 7.1. Difference in number of signicant differences (P < 0.05) by one- and two-sided
test by Dunnetts t-test in a chronic toxicity and carcinogenicity study
Items No. of statistical Dunnetts t-test
analyses One-sided Two-sided
Body weight (b.w.) 528 223 212 (95)
Feed consumption 832 235 189 (80)
Hematology 352 123 105 (85)
Blood chemistry 576 215 181 (84)
Urinalysis 64 7 5 (71)
Organ weight 224 47 42 (89)
Organ weight/b.w. 224 82 67 (81)
Total 2800 932 801 (86)
Note: Values in parentheses show the percent signicant difference by two-sided test with
regard to one-sided test.
References
Bailey, N.T.J. (1995): Statistical Methods in Biology, Cambridge University Press, New
York, USA.
Bland, J.M. and Bland, D.G. (1994): Statistics notes: One and two sided tests of signicance.
BMJ, 309, 248.
CSCL (1986): Chemical Substance Control Law. http://www.safe.nite.go.jp/ kasinn/
genkou/kasinhou02.html.
Doll, H. and Carney, S. (2005): Statistical approaches to uncertainty: p values and
condence intervals unpacked. Evid. Based Med., 10, 133134.
Drewitt, P.N., Butterworth, C.D., Springall, C.D. and Moorhouse, S.R. (1993): Plasma
levels of aluminum after tea ingestion in healthy volunteers. Food Chem. Toxic., 31,
1923.
Dunnett, C.W. (1955): A multiple comparison procedure for comparing several treatments
with a control. Am. Stat. Assoc., 50, 10961211.
Fisher, R.A. (1955): Statistical methods and scientic induction. J. Royal Stat. Soc. B., 17,
6978.
Fisher, R.A. (1971): The Design of Experiments. 9th Edition. Hafner Press, New York,
USA.
Freedman, L. (2008): An analysis of the controversy over classical one-sided tests. Clin.
Trials, 5(6), 635640.
Gad, S.C. and Weil, C.S. (1988): Statistics and Experimental Design for Toxicologists.
Telford Press, New Jersey, USA.
Gelman, A. and Stern, H. (2006): The difference between signicant and not signicant
is not itself statistically signicant. Am. Stat., 60(4), 328331.
Kobayashi, K. (1997): A comparison of one- and two-sided tests for judging signicant
differences in quantitative data obtained in toxicological bioassay of laboratory
animals. J. Occup Health, 39, 2935.
Kobayashi, K., Pillai, K.S., Sakuratani, Y., Abe, T., Kamata, E. and Hayashi, M. (2008):
Evaluation of statistical tools used in short-term repeated dose administration toxicity
studies with rodents. J. Toxicol. Sci., 33(1), 97104.
Tests for Signicant Differences 55
Kornegay, E.T., Clawson, A.J., Smith, F.H. and Barrick, E.R. (1961): Inuence of protein
source on toxicity of gossypol in swine ration. J. Anim. Sci., 20, 597602.
Le, C.T. (2009). Health and Numbers-A Problems-Based Introduction to Biostatistics, 3rd
Edition, John Wiley & Sons Inc., New Jersey, USA.
Krantz, D.H. (1999): The null hypothesis testing controversy in psychology. J. Am. Stat.
Assoc., 94, 13721381.
Madsen, B. (2011): Statistics for Non-Statisticians. Springer-Verlag, Berlin, Germany.
Moye, L.A. and Tita, A.T.N. (2002): Defending the rationale for the two-tailed test in
clinical research. Circulation, 150, 30623065.
Nakamura, G. (1986): Practice, Statistical Analyses. Kaiumeisha, Tokyo, Japan.
Neyman, J. and Pearson, E.S. (1928): On the use and interpretation of certain test criteria
for purposes of statistical inference. Part II. Biometrika, 20A, 26394.
Neyman, J. and Pearson, E.S. (1936): Sufcient statistics and uniformly most powerful
tests of statistical hypotheses. Stat. Res. Mem., 1, 113137.
OECD (2008): Organization for Economic Cooperation and Development. OECD
Guidelines for the Testing of Chemicals. Repeated Dose 28-Day Oral Toxicity Study
in Rodents., No. 407. OECD, Geneva, France.
Rosner, B. (2010): Fundamentals of Biostatistics. 7th Edition. Brooks/cole, Cengage
Learning, Boston, USA.
Sakuma, A. (1977): Statistical Methods in Pharmacometrics I. 56, Tokyodaigaku Shupankai,
Tokyo, Japan.
Shertzer, H.G. and Sainsbury, M. (1991): Chemoprotective and hepatic enzyme induction
properties of indol and indenoindol antioxidants in rats, Food Chem. Toxic., 29,
391400.
Shirley, E.A. (1977): Non-parametric equivalent of Williams test for contrasting increasing
dose levels of a treatment. Biometrics, 33, 386389.
Spina, D. (2007): Statistics in Pharmacology. Br J. Pharmacol., 152(3), 291293.
Yoshida, M. (1980): Design of Experiments for Animal Husbandry. Yokendo Press, Tokyo,
Japan.
Yoshimura, I. (1987): Statistical Analysis of Toxicological Data. Scientist Press, Tokyo,
Japan.
Yoshimura, I. and Ohashi, S. (1992): Statistical Analysis for Toxicology Data. Chijin-
Shokan, Tokyo, Japan.
8
t-Tests
Students t-TestHistory
The history of statistical signicance tests dates back 17th century. Perhaps
the earliest statistical analysis published was by John Arbuthnot on London
birth rates with regards to gender in 1710 (Hacking, 1965). One of the
most popular signicance tests is the Students t-test, which has wide
scientic applications (Papana and Ishwaran, 2006). The Students t-test
is a parametric test for comparing two groups. Readers may be interested
to know why it is called as Students t-test. Student was the pseudonym
of W.S. Gossett (18761937) (Box, 1987). He worked as a chemist at the
Guinness brewery, Ireland. He chose this pseudonym because his company
did not allow its scientists to publish condential data (Raju, 2005). His
company regarded use of statistics in quality control as a trade secret. In
an article published in Biometrika, Gossett described a procedure to assess
population means by using small samples under the pseudonym, Student
(Student, 1908).
22.46 22.0
tcal 10.723
0.0429
The t-distribution Table value (Table 8.2.) at 0.05 probability, for 6 (71)
degrees of freedom is 2.447 (two-sided). Since calculated value (10.723)
is greater than the Table value (2.447), it is considered that the temperature
measured in the animal room during the seven days differed from the
temperature set (22C).
Table 8.2. t-distribution Table (Yoshimura, 1987)
DF\2* 0.2 0.1 0.05 0.02 0.01 0.002 0.001
DF\** 0.1 0.05 0.025 0.01 0.005 0.001 0.0005
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
DF, Degrees of freedom; *One-sided; **Two-sided
P=0.05
F-test
Not signicant Signicant
Number of samples
Students t-test
The height of male and female students in a class room is given in Table
8.3. We would like to examine whether the male and female students have
similar heights.
Table 8.3. Height (cm) of male and female students
Male (Group 1) Female (Group 2)
170 160
168 154
170 162
169 160
179 151
162 159
172 148
169 159
169 150
179 162
Statistics
Estimates Male (Group 1) Female (Group 2)
N 10 10
Sum 1707 1565
Mean 170.7 156.5
SD 5.0783 5.2546
Variance 25.79 27.61
Sum of squares 232.10 248.50
Let us examine the distribution of the data of males and females by
calculating F-value:
27.6
F99 1.07
25.8
9
Note: F9 The superscript and subscript to F indicate the degrees of
freedom of the numerator and denominator, respectively.
t-Tests 59
The calculated F value (1.07) is less than the Table value (3.17).
Hence, F99 is not considered signicant, indicating that the variances of
both the groups having a similar distribution. Therefore, as given in Figure
8.1, the data can now be analysed using Students t-test.
The t value is calculated using the equation,
X1 X 2 N1 u N 2
tcal u N 1 N 2 2
SS1 SS 2 N1 N 2
Where,
X 1 = Mean of Group 1; X 2 = Mean of Group 2; SS1 = Sum of squares of
Group 1; SS2= Sum of squares of Group 2; N1 = Degrees of freedom of
Group 1; N2 = Degrees of freedom of Group 2.
170.7 156.5 10 u 10
tcal u (10 10 2 ) 6.145
232.1 248.5 10 10
Compare the calculated t value with the t-test critical value given
in Table 8.5.
60 A Handbook of Applied Statistics in Pharmacology
Aspin-Welchs t-test
This test is used to compare the means of two groups having different
distributions, but number of samples (observations) is the same.
A study was conducted in volunteers to nd the effect of high fat
content. Diet containing high fat content was given to 10 individuals
(Group 1). Concurrently, normal diet was given to another 10 individuals
for comparison (Group 2). At the end of the 7 days treatment, alanine
aminotransferase (ALT) activity was measured in the individuals of both
the Groups. The ALT determined in the individuals is given in Table 8.6.
Table 8.6. Alanine aminotransferase activity (IU/l) of individuals
Diet containing high fat content (Group1) Normal diet (Group 2)
42 30
60 34
26 35
48 32
56 36
31 41
30 42
80 28
79 71
93 35
t-Tests 61
Statistics
Estimates Diet containing Normal diet
high fat content (Group 1) (Group 2)
N 10 10
Sum 545 384
Mean 55 38
SD 23.4011 12.2493
Variance (Sx2) 548 150
F-ratio =
548
F99 3.65
150
Compare the calculated F-value with the Table value (Table 8.4). The
derived F value (3.65) is greater than the Table value (3.17). Hence, F99 is
considered signicant, indicating that the variances of both the groups are
distributed differently. According to Figure 8.1, Aspin-Welchs t-test is the
appropriate statistical tool for the analysis of this data. The t is calculated
using the following formula:
X1 X 2
tcal
Sx1 Sx 2
N1 N 2
Where,
Where,
2
Sx1
N1
C 2 2
Sx1 Sx
2
N1 N2
54.8
C 0.79
54.8 15.0
1
N 13.5
0.79 (1 0.79) 2
2
9 9
Compare the derived t value with the t-test critical value given in Table
8.7 at 5% probability level for fourteen degrees of freedom (14 degrees
of freedom is obtained by rounding up the calculated N, 13.5). Since the
calculated t-value, 2.03 is greater than the t-test critical value given in the
Table 8.7 (1.761), it can be stated that there is a difference in ALT between
the high fat diet-treated and normal diet treated-individuals.
Table 8.7. t-test critical values (Yoshimura, 1987)
2 0.20 0.10 0.05 0.02 0.01 0.002 0.001
0.10 0.05 0.025 0.01 0.005 0.001 0.0005
DF=14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
=one-sided, 2=two-sided.
Cochran-Coxs t-test
Cochran-Coxs t-test is used to compare the means of two samples having
different distributions and different number of observations. We shall
modify the data given in Table 8.6 and analyse it using Cochran-Coxs
t-test. The values modied are given in Table 8.8. We have not made
any change in the ALT values of Group 1. But, the values of Group 2 are
changed and only nine individuals of this group are used for the analysis.
t-Tests 63
Statistics
Estimates Diet containing Normal diet
high fat content (Group 1) (Group 2)
N 10 9
Sum 545 381
Mean 55 42
SD 23.4011 10.0374
Variance (Sx2) 548 101
F-ratio =
548
F89 5.43
101
Compare the derived F-value with the Table value (Table 8.4). The calculated
F-value (5.43) is greater than the Table value (3.38). Hence, F89 is considered
signicant, indicating that the variances of both the groups are distributed
differently. According to Figure 8.1, Cochran-Coxs t-test is the appropriate
statistical tool for the analysis of the data given in Table 8.8.
In Cochran-Coxs t-test, we need to calculate two t values
(t calculated and t' calculated).
X1 X 2
tcal
2 2
Sx1 Sx
2
N1 N2
64 A Handbook of Applied Statistics in Pharmacology
55 42
tcal 1.21
548 101
10 9
t1 u Sx 21 t 2 u Sx 2 2
N1 N2
t ' cal 2 2
Sx 1 Sx 2
N1 N2
Paired t-Test
Let us assume one needs to test an antidiabetic drug in diabetic rats. One
way to do is to measure the blood sugar before and after treatment with
the drug and calculate the respective mean values, and compare the mean
values using an appropriate t-test (select the appropriate t-test as per Figure
8.1). Another way is to analyse the data using paired t-test.
Blood sugar values of individual rats before and after the drug treatment
is given Table 8.9.
Table 8.9. Blood sugar values (mg/dl) of individual rats
Rat Number 1 2 3 4 5 Mean Variance SD SE
Before treatment 274 287 277 259 237 - - - -
After treatment 165 142 215 209 198 - - - -
Difference 109 145 62 50 39 81 1992 44.6 19.9
between before
and after
treatments
t-Tests 65
Mean
tcal
SSE
.E.
81
tcal 4.07
19.9
Compare the calculated t-value with the t-test critical value given in Table
8.10 at 5% probability level for N-1 degrees of freedom. N is number of
pairs, hence N1=4. Since the calculated t value, 4.07 is greater than the
t-test critical value given in the Table 8.10 (2.132), it can be stated that
treatment with the drug signicantly decreased the blood sugar in rats.
Table 8.10. t-test critical values (Yoshimura, 1987)
2 0.20 0.10 0.05 0.02 0.01 0.002 0.001
0.10 0.05 0.025 0.01 0.005 0.001 0.0005
DF=4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
=one-sided, 2=two-sided.
A Note of Caution
It is well known that with Students two-independent-sample t-test, the
actual level of signicance can be well above or below the nominal level,
condence intervals can have inaccurate probability coverage, and power
can be low relative to other methods.
In Students two-independent-sample t-test, the variance heterogeneity
can distort rates of Type I error (Kaselman et al., 2004). Therefore, when
the variance of the two populations is different, Students t-test is not
suitable (Ruxton, 2006). When the number of the groups is more than two,
multiple comparison with Students t-test can cause Type I error.
References
Box, J.F. (1987): Guinness, Gosset, Fisher, and Small Samples. Stat. Sci., 2(1), 4552.
Hacking, I. (1965): Logic of Statistical Inference. Cambridge University Press, New
York.
Kaselman, H.J., Othman, A.R., Wilcox, R.R. and Fradette, K. (2004): The new and
improved two-sample t-test. Psych. Sci., 15(1), 4751.
Kennedy, P. (2003): A Guide to Econometrics. 5th Edition. MIT Press, UK.
Papana, A. and Ishwaran, H. (2006): CART variance stabilization and regularization for
high-throughput genomic data. Bioinformatics, 22(18), 22542261.
Raju, T.N.K. (2005): William Sealy Gosset and William A. Silverman: Two Students of
Science. Pediatrics, 116(3), 732735.
66 A Handbook of Applied Statistics in Pharmacology
Standardised covariance, r =
1 ( x x)( y y) , where
n 1 ( s x )( s y )
sx and sy are the standard deviations of variable x and variable y, respectively.
Above equation can be rewritten as follows:
1 ( x x)( y y )
r=
n 1 ( x x) 2 ( y y ) 2
n 1 n 1
( x x)( y y )
r=
( x x) ( y y )
2 2
( x x)( y y )
r =r =
( x x) ( y y )
2 2
( y )
2
(540) 2
( y y) 2
= y 2
n
= 36112
10
= 6952
754 754
r 0.996
82.5 u 6952 757.32
Signicance of r
When the sample size is not too large, the signicance of a correlation
coefcient can be tested using a t-test:
r n2 0.996 10 2 2.8171
t 31.51
1 r 2
1 (0.996) 2 0.0894
Above is Students t-test with n2 degrees of freedom.
Alternatively, signicance of a correlation coefcient can be tested as
given below, which involves no calculation procedure:
Compare the correlation coefcient, r with the value given in
correlation coefcient table (Table 9.2) for eight degrees of freedom. The
computed correlation coefcient, r (0.996) is less than the correlation
coefcient Table value (0.765) at 1% probability level. Hence the
correlation coefcient is considered to be signicant. The negative sign of
the correlation coefcient indicates that the variables x and y are negatively
correlated. Had the r been 0.996 (positively correlated), we would have
compared it with 0.765 (without a minus sign). In this case, in order to
consider the r to be signicant, it has to be greater than 0.765.
70 A Handbook of Applied Statistics in Pharmacology
Coefcient of Determination
The coefcient of determination is the square of r (R 2; coefcient of
determination is usually denoted by the capital letter R 2), which expresses
the strength of the relationship between the x and y variables (McDonald,
2009). This is reviewed in Chapter 10, in greater detail.
Rank Correlation
When the variables are not linearly associated, Pearsons product moment
correlation analysis does not work well. In this situation the association
is transformed into linear by ranking the variables. Rank correlation is
a nonparametric alternative to the linear correlation coefcient (Ruby,
2008). There are several rank correlation analyses available, amongst
them, Spearmans rank correlation is more commonly employed (Hassard,
1991).
Canonical Correlation
Canonical correlation analysis developed by Hotelling (1936), is the study
of the linear relationships between two sets of variables, and is considered
as a fundamental statistical tool (Bulut et al., 2010). It is the multivariate
extension of correlation analysis and it measures the interrelationships
among sets of multiple dependent variables and multiple independent
variables (Green, 1978). Canonical correlation simultaneously predicts
multiple dependent variables from multiple independent variables. It is a
very useful tool in pharmacology and toxicology (Kelder, 1982; Hu et al.,
2003; Tanaka, 2010), where interrelationships between several dependent
and independent variables need to be assessed.
An elaborative discussion on canonical correlation is beyond the scope
of this book. Several books are available that cover the subject in depth
(Green and Carroll, 1978; Das and Sen, 1994).
References
Berkman, E.T and Reise, S.P. (2011): A Conceptual Guide to Statistics Using SPSS. SAGE
Publications Inc., California, USA.
Bewick, V., Cheek, L. and Ball, J. (2003): Statistics review 7: Correlation and regression.
Critical Care, 7, 451459.
Bulut, M., Gultepe, N., Mendes, M., Guroy, D. and Palaz, M. (2010): According to
Canonical correlation, the evaluation of bluesh (Pomatomus saltatrix) blood
chemistry. J. Animal Vet. Adv., 9(4), 666670.
Cohen, J. and Cohen, P. (1983): Multiple Regression/Correlation for the Behavioral
Sciences. 2nd Edition. Erlbaum Associates, Hillsdale, New Jersey, USA.
Das, S. and Sen, P.K. (1994): Restricted canonical correlations. Linear algebra and its
applications, 210, 2947.
Field, A. (2009): Discovering Statistics Using SPSS. 3rd Edition. SAGE Publications Ltd.,
London, UK.
Gauthier, T.D. (2001): Detecting trends using Spearmans rank correlation coefcient. Exp.
Forensics, 2, 359362.
Glantz, S.A. (2005): Primer of Biostatistics. Mc Graw-Hill Companies Inc., USA.
Green, P.E. (1978): Analyzing Multivariate Data. Holt, Rinehart & Winston, Illinois, USA.
Green, P.E., and Carroll, J.D. (1978): Mathematical Tools for Applied Multivariate
Analysis. Academic Press, New York, USA.
Gurumani, N. (2005): An Introduction to Biostatistics. 2nd Edition. MJP Publishers,
Chennai, India.
Hassard, T.H. (1991): Understanding Biostatistics. Mosby-Year Book Inc., St. Loius,
Missouri, USA.
Hotelling, H. (1936): Relations between two sets of variates. Biometrika, 28, 321377.
Hu, Q.N., Liang, Y.Z., Peng, X.L., Hong, Y. and Zhu, L. (2003): Application of orthogonal
block variables and canonical correlation analysis in modeling pharmacological
activity of alkaloids from plant medicines. J. Data Sci., 1, 405423.
Kelder, J. (1982): Prediction of the Bobon clinical prole of neuroleptics from animal
pharmacological data. Psychopharmacol., 77(2), 140145.
Loco. J.V., Elskens, M., Croux, C. and Beernaert, H. (2002): Linearity of calibration curves:
use and misuse of the correlation coefcient. Accred. Qual. Assur., 7, 281285.
McDonald, J.H. (2009): Handbook of Biological Statistics. 2nd Edition. Sparky House
Publishing Baltimore, Maryland, USA.
Paler-Calmorin, L. and Calmorin-Piedad, M.L.P. (2008): Nursing Biostatistics with
Computer. Rex Printing Co. Inc., Florentino St., Quezon City, Philippines.
Porter, A.M.W. (1999): Misuse of correlation and regression in three medical journals. J.
Royal Soc. Med., 92, 123128.
Ruby, J. (2008): Elementary Statistics. Thompson Brooks. Cole, Belmont, USA.
Shibata, K. (1970): Biostatistics, Tokyo University of Agriculture, Tokyo, Japan.
Sonnergaard, J.M. (2006): On the misinterpretation of the correlation coefcient in
pharmaceutical sciences. Int. J. Pharm., 321(1-2), 1217.
Tanaka, T. (2010): Biological factors inuencing exploratory behavior in laboratory mice,
Mus musculus. Mammal Study, 35(2), 139144.
10
Regression Analysis
History
The origin of the term regression in statistics has an interesting history.
Francis Galton (18221911) had deep interest in heredity, biometrics and
eugenics (Crow, 1993). He found that sons of tall men to be shorter than
their fathers. He called this phenomenon regression towards the mean, and
thus the term regression originated (Dupont, 2002).
Unlike correlation, where there is no dependence relationship,
there are dependent and independent variables in regression analysis. In
regression analysis, y is assumed to be a random variable and x is assumed
to be a xed variable. The underlying assumption of regression analysis is
that the dependent variable follows a normal distribution and scatter about
the regression line.
In animal experiments regression analysis is used to evaluate cause
(variable x) and effect (variable y) relationships; for example in a repeated
dose administration study, the rate of decrease in body weight (y) as the
exposure period (x) increases can be determined using regression analysis.
( x )
2
(63) 2
( x x) 2
= x 2
n
= 513
10
= 116.1
76 A Handbook of Applied Statistics in Pharmacology
( y )
2
(65.7) 2
( y y) = 2
y 2
n
== 469.55
10
= 37.90
( x x)( y y ) 65.69
b = =0.5658
( x x)
2
116.1
Once the slope, b is calculated, it is easy to calculate the intercept, a:
y =a+b x
( y ) 9 37.90 4.21 -
2
Total SS for y = y 2
n
x u y
2
xy
N
2
x x
SSSum of squares
Since the calculated F-value is greater than the F-Table value (Table
10.3), the null hypothesis is rejected and the alternative hypothesis (H1: b
0) is accepted. This means the slope of the regression line is signicantly
different from 0, which implies that there is a signicant relationship
between age and body weight of the babies.
Regression Analysis 77
Solid squares are the actual values. The line passing through the actual
values is the regression line. For each value of x variable, the predicted y
value is computed using the regression equation, y' = 3.005 + 0.5658 x
(predicted y is denoted as y' in order to differentiate it from the actual y).
Thus, y' is derived for each x, and the predicted y's are joined together to
obtain the regression line. By closely observing the plot, one can nd that
all the actual values do not fall on the regression line, though they are very
close to the regression line. Linear regression line is called a best t line,
since it best ts the data points. The best t line minimizes the squared
vertical distances between the actual values and the line. An estimate
of the squared vertical distances between the actual values and the line
78 A Handbook of Applied Statistics in Pharmacology
(in other words, variation of the actual values from the predicted values)
can easily be arrived at (vide Table 10.4). You would have noticed that this
estimate is the sum of squares for error component given in the ANOVA
Table (Table 10.2).
Table 10.4. Calculation of variation of the actual y values from the predicted y values
Age (Month) Body weight y' y y' (y y')2
(x) (kg) (y) (y' = 3.005
+ 0.5658 x)
1 3.8 3.5708 0.2292 0.052533
2 4.2 4.1366 0.0634 0.00402
3 4.8 4.7024 0.0976 0.009526
5 5.7 5.834 0.134 0.017956
6 6.4 6.3998 0.0002 0.00000004
7 6.9 6.9656 0.0656 0.004303
8 7.1 7.5314 0.4314 0.186106
9 7.8 8.0972 0.2972 0.088328
10 8.6 8.663 0.063 0.003969
12 10.4 9.7946 0.6054 0.366509
- - - -
(y - y') = 2
0.733249
95% condence limits for the slope (b) can be derived by using the
formula:
b t0.05.n2 SE (b), where b is the slope (0.5658); t0.05.n2 is the critical value
for t at 5 % probability level for n2 degrees of freedom (2.306);
ErrorMeanSS 0.0913
SE (b) is the standard error of b = 2
0.0280
x x 116.1
95% condence limits for the slope (b) = 0.5658 (2.306 x 0.0280) =
0.5658 0.0646.
The signicance of slope can be tested using the t-test, when the number
of samples is smaller than about 30 (Bailey, 1995):
b E
t0.05.n2 = 2
where t0.05.n2 is the critical value for t at 5%
s / x x
probability level for n2 degrees of freedom; b is the slope (b=0.5658);
Regression Analysis 79
is the hypothetical value ( =0) (we are testing whether the observed b
value is different from the hypothetical value); s is the square root of error
mean sum of squares
(s 0.0913 0.3022) ; x x 2
=116.1.
0.5658 0 0.5658
t0.05.n-2 = 20.17
0.3022 / 116.1 0.0280
The derived t value (20.17) is greater than the Table t-value (2.228) at 5%
probability level and 10 degrees of freedom; hence the slope is signicant.
1 1
R2
R2 is interpreted as the proportion of total variability of the outcome that
is accounted by the model (Vittinghoff et al., 2005). In other words, it is
the proportion of the variation in the y variable that is explained by the
variation in the x variable. R2 is called as the coefcient of determination.
R2 can vary from 0 to1. An R2 close to 1 indicates that the actual y values
fall almost right on the regression line. An R2 close to 0 indicates that there
is little or no relationship between x and y.
x x
2 =A
1 1
x x x
1 1 2 x2 =B
x
2
x2 =C
2
x x y y
1 1
=D
x 2 x2 y y =E
y y
2
=F
Regression Analysis 81
CD BE
b1 =
AC B 2
AE BD
b2 =
AC B 2
Once the slopes are derived, a can be calculated using the formula:
y = a + b1 x1+ b2 x 2
Multiple correlation coefcient can be computed using the formula:
6yy '
R= , where
y 2 y '2
R = Multiple correlation coefcient; y = Actual value; y= Predicted
y (calculated using the regression equation, y = a + b1 x 1+ b2 x 2;
6yy ' = yy '
y u y'
n
( y ) ( y ' )
2 2
y = y
2 2
n
; y' 2
y' 2
n
Signicance of the multiple regression equation can be checked by
ANOVA (Table 10.5).
Polynomial Regression
Linear regression does not hold good, when the data of your dependent
variable follows a curved line, rather than a straight line. Transforming the
y or x or both the variables to their logarithms, reciprocals, square roots
etc., may straighten certain curves, but not all. Another way to solve this
issue is to use a curvilinear regression equation. Polynomial regression
equation is an example of curvilinear regression equation, which is used
to predict toxicological variables (Vogt, 1989). Given the complexity of
the calculations in polynomial regression analysis, it is not being included
in the coverage of this book. The purpose of touching upon polynomial
82 A Handbook of Applied Statistics in Pharmacology
y 2
n
Reduction due k
( y ' ) ( Y ' )
2 2
to regression
y' 2
Y '2 n
/k
(Residual SS) n
Error nk1
y u y ' y u y '
2 2
yy ' yy '
n n
/nk1
k is the number of independent variables.
F value is calculated by dividing Reduction due to regression (Residual SS) with error.
regression analysis, is to create awareness that before carrying out linear
regression analysis one should ensure that the trend of the association
between the two variables is linear.
References
Ambrosius, W.T. (2007): Topics in Biostatistics. Humana Press Inc., New Jersey, USA.
Bailey, N.T.J. (1995): Statistical Methods in Biology. Cambridge University Press,
Cambridge, UK.
Chan, Y.H. (2004): Biostatistics 201: Linear regression analysis. Singapore Med. J., 45
(2), 5561.
Cornbleet, P.J. and Gochman, N. (1979): Incorrect least-squares regression coefcients in
method-comparison analysis. Clin. Chem., 25, 432438.
Crow, J.F. (1993): Francis Galton: Count and measure, measure and count. Genetics, 135,
14.
DuPont, W.D. (2002): Statistical Modeling for Biomedical Researchers. Cambridge Univ.
Press, Cambridge, U.K.
Regression Analysis 83
Farnsworth, D.L. (1990): The effect of a single point on correlation and slope. Internat. J.
Math. Math. Sci., 13(4), 799806.
Glaister, P. (2005): Robust linear regression using Theils method. J. Chem. Educ., 82(10),
14721473.
Vittinghoff, E., Glidden, D.N., Shiboski, S.C. and McCulloch, C.E. (2005): Statistics for
Biology and Health. Springer Science+Business Media, Inc., New York, USA.
Vogt, N.B. (1989): Polynomial principal component regression: An approach to analysis
and interpretation of complex mixture relationships in multivariate environmental
data. Chemometrics Intelligent Lab Systems, 7(1-2), 119130.
Williams, G.P. (1983): Improper use of regression equations in earth sciences. Geology,
11(4), 195197.
Yoshimura, I. (1987): Statistical Analysis of Toxicological Data. Scientist Inc., Tokyo,
Japan.
11
Multivariate Analysis
One-way ANOVA
One-way ANOVA is used to nd if the given factor has signicant effect on
the expected outcome of the experiment. Jaundice index (x) of a newborn
baby measured in weeks 36, 38 and 40 is presented in Table 11.2. We want
to examine if the factor (week) has any signicant effect on the jaundice
index.
Table 11.2. Jaundice index (x) of newborn baby
Week
36 (Group 1) 38 (Group 2) 40 (Group 3)
x1 13 x11 9 x21 5
x2 6 x12 11 x22 5
x3 11 x13 11 x23 4
x4 12 x14 10 x24 7
x5 14 x15 7 x25 7
x6 10 x16 7 x26 3
x7 9 x17 5 x27 3
x8 11 x18 8 x28 4
x9 11 x19 7 x29 5
x10 10 x20 10 x30 3
86 A Handbook of Applied Statistics in Pharmacology
Statistics
Estimates Week
36 (Group 1) 38 (Group 2) 40 (Group 3)
N 10 10 10
Mean SD 10.7 2.2 8.5 2.0 4.6 1.5
Sum 107 85 46
Grand sum 238
( x ) 2
Total sum of squares = ( x1 2 x 2 2 x 29 2 x30 2 )
N
(238) 2
= (13 6 5 3 )
2 2 2 2
291.9
30
Sum of squares of among the groups
107 2 85 2 46 2 (238) 2
= 190.9
10 10 10 30
Total sum of squares for error = Total sum of squaresSum of
squares of among the groups
= 291.9190.9 = 101
We have all the estimates required for constructing the ANOVA Table. See
Table 11.3 given below:
Table 11.3. ANOVA Table
Source of variation SS DF Variance (MS) F-value P
Total 291.9 29 - - -
Groups 190.9 2 95.5 25.5 P<0.001
Error 101 27 3.74
SS-Sum of squares; DF-Degrees of freedom; MS-Mean sum of squares.
Note: There are 30 observations, hence the DF for SS total is 301 = 29; Total number
of groups are three, hence the DF for SS groups is 31 = 2; DF for error SS = DF for SS
totalDF for groups SS (292 = 27).
95.5
F272 calc 25.5
3.74
Compare the derived F value with the value given in the F distribution
Table (Table 11.4):
Multivariate Analysis 87
In the four-group setting (control, low dose, mid dose and high dose),
the high dose group showed a signicant difference from the control group,
whereas in the in the ve-group setting (control, low dose, mid dose, high
dose and top dose), no signicant difference was seen in the high dose
group compared to the control group, indicating a decrease in the power
of Dunnetts test to detect a signicant difference as the number of groups
increases.
Note: The superscripts of the mean values can be explained asValues bearing similar
superscripts are statistically the same. Since the superscripts of the mean values are
different, it can be stated that each mean value is different from the other.
Williamss test
Most of the regulatory guidelines prescribe that the repeated-dose
administration studies with rodents should be conducted with a minimum
of three levels of doses (low, mid and high doses) and a control group
(OECD, 1995). The high dose is chosen with the aim to induce toxicity
but not death or severe suffering (OECD, 1998; EPA, 2000), whereas the
low dose is chosen with the assumption that animals exposed to this dose
level will not show any effect of the treatment compared to the control
group (Kobayashi et al., 2010). However, these guidelines do not state
how to determine the mid dose. It only indicates that this dose is required
to examine dose dependency. According to Gupta (2007), the mid dose
selection should consider threshold in toxic response and mechanism of
toxicity. Choosing the mid dose is as important as choosing the high and
low doses in repeated dose administration studies, since mid dose plays a
determining role in establishing the dose dependency. It is not uncommon
to encounter situations where mid dose alone shows an insignicant
difference compared to the control group, whereas low and high doses
show a signicant difference. In this situation the data are examined for
a dose-related trend. Williams test is generally carried out to test dose-
related trend (Bretz, 2006).
For the data that show a dose-related trend and a signicant difference
by Dunnetts test (Dunnett, 1955), the interpretation of the data analysis
can be done in a straight forward manner. In a four group-setting repeated
dose administration study, seven different situations can be expected
(Table 11.9). Interpretation is relatively easier in situations 13, whereas
it is difcult in situations 47, where further investigation on dose-related
trend is required.
Multivariate Analysis 91
Table 11.9. Signicant difference shown by the treatment groups by Dunnetts test
Possible situations
: Significant difference, : No significant difference from
for the
thecontrol
controlgroup
group
Test Group
Situation 1 Situation 2 Situation 3 Situation 4 Situation 5 Situation 6 Situation 7
Control
Low dose
Mid dose
Mid-dose
High dose
Investigation Not required Not required Not required Required Required Required Required
Visual dose-
Yes Yes Yes No No No No
related trends
61.6
12.32 This largest value is used for the calculation of t value.
5
(Note: Numeratorsum of high dose; denominatornumber of
observations of high dose).
We have all estimates for calculating the t value, except the mean SS
of error variance. Let us analyse the data using ANOVA:
Liver weight of rats in a 4-week repeated dose administration study
Statistics
Estimates Liver weight (g)
Control Low dose Mid dose High dose
N 5 5 5 5
Mean SD 11.36 0.51 12.28 0.41 11.460.15 12.32 0.44
Sum 56.8 61.4 57.3 61.6
Grand sum 237.1
2 2 2 2 ( x ) 2
= ( x1 x 2 x 29 x30 )
N
(237.1) 2
= (10.7 2 11.5 2 11.9 2 13 2 ) 6.6095
20
Sum of squares of among the groups
Mean SS for error is 0.16375. Now we have all the required estimates for
calculating t:
11.36 12.32
t 3.751
1 1
0.16375
5 5
t-value is signicant at 5% level (Table 11.13, Number of groups-4;
DF-16).
(2) Control vs Mid dose
61.4 57.3
11.87This largest value is used for the calculation of
55
t-value.
(Note: Numeratorsums of low dose + mid dose; denominatornumber
of observations of low dose + mid dose).
57.3
11.46
5
(Note: Numerator- sum of mid dose; denominator- number of observations
of mid dose).
11.36 11.87
t 1.993
1 1
0.16375
5 5
t value is signicant at 5% level (Table 11.13, Number of groups-3;
DF-16).
(3) Control vs Low dose
61.4
12.28
5
(Note: Numerator- sum of low dose; denominator- number of observations
of low dose).
11.36 12.28
t 3.595
1 1
0.16375
5 5
t-value is signicant at 5% level (Table 11.13, Number of groups-2;
DF-16).
Multivariate Analysis 95
Table 11.14. Jaundice index of newborn baby. Reproduced from Table 11.2. Number of
observations of Groups 1 and 2 was changed
Week
36 (Group 1) 38 (Group 2) 40 (Group 3)
13 9 5
6 11 5
11 11 4
12 10 7
14 7 7
10 7 3
9 5 3
8 4
5
3
Statistics
Estimates Week
36 (Group 1) 38 (Group 2) 40 (Group 3)
N 7 8 10
Mean SD 10.72.7 8.52.1 4.61.5
Sum 75 68 46
Grand sum 189
Calculation steps:
Total sum of squares =
2 2 2 2 ( x ) 2
( x1 x 2 x 24 x 25 )
N
(189) 2
(132 6 2 5 2 3 2 ) 260.2
25
Sum of squares of among the groups
75 2 68 2 46 2 (189) 2
= 164.3
7 8 10 25
Total sum of squares for error= Total sum of squaresSum of squares
among the groups
= 260.2 164.3 = 95.9
Let us construct the ANOVA Table (Table 11.15).
Multivariate Analysis 97
4.3
Sm = 0.72
25 / 3
Note: 4.3 is variance of error (see Table 11.15); Total number of observation
= 25; Total number of groups = 3).
98 A Handbook of Applied Statistics in Pharmacology
Comparisons:
Group 3 vs Group 2
(4.6 8.5) 2
F 7.86 ? p 0.05
1 1
(3 1) u 4.3 u ( )
10 8
Group 3 vs Group 1
(4.6 10.7) 2
F 17.82 ? p 0.05
1 1
(3 1) u 4.3 u ( )
10 7
Group 2 vs Group 1
(8.5 10.7) 2
F 2.10 ? p ! 0.05( NS )
1 1
(3 1) u 4.3 u ( )
8 7
Note: 4.3 is the variance of error (vide Table 11.15).
These derived F-values are compared with the values given in F distribution
Table (Table 11.18) given below:
Table 11.18. F-distribution values at 5% probability level (Yoshimura, 1987)
N1\N2 1 2 3 4 5 6 7 8 9 10
22 4.301 3.443 3.049 2.817 2.661 2.549 2.464 2.397 2.342 2.297
N1-DF for the numerator; N2-DF for the denominator.
All the derived F values, except the one computed for the comparison
between Group 2 and Group 1, are signicant at 5% probability level.
The Scheffs multiple comparison test is used for all-pair comparisons,
like the Duncans multiple comparison test. However, the power to detect
a signicant difference is low with the Scheffs multiple comparison test
compared to that of the Duncans multiple comparison test (vide Table
11.19).
Duncans multiple comparison test showed a signicant difference
in the mid dose and high dose groups, whereas the Scheffs multiple
comparison test did not show a signicant difference in these groups,
indicating its low power to detect a signicant difference. Therefore, use
of Scheffs multiple comparison test should be done with little caution in
the safety evaluation studies with animals.
100 A Handbook of Applied Statistics in Pharmacology
Table 11.19. Comparison of the power to detect a signicant difference between Scheffs
and Duncans multiple comparison tests. LDH activity (U/l) of F344 female rats at week
78 in a repeated dose administration study is given.
Estimates Control Low dose Mid dose High dose
168, 188, 181,
112, 168, 175, 69, 86, 145,
250, 122, 89, 43, 59, 73, 99,
- 241, 218, 49, 244, 135, 46,
125, 135, 211, 129, 181, 49, 69
49, 76, 66, 30 105, 40, 53, 73
204
N 10 10 10 8
Mean SD 167 49 118 76 100 62 88 47
In % of control - 71 60 53
ANOVA P < 0.05
Duncans test N.S. S S
Scheffs test N.S. N.S. N.S.
N.S.Not signicant (P > 0.05); SSignicant (P < 0.05)
Two-way ANOVA
It is an extension of one-way ANOVA. The difference in 2-way ANOVA
is that it has 2 independent factors. The data is arranged in tabular fashion
in such a way that the column represents one factor and the row, the other
factor (Belle et al., 2004).
An example is provided to illustrate the computations required in
two-way ANOVA (Kibune and Sakuma, 1999).The diameter of the head
of the three human embryos was measured by four observers. Each
observer measured the diameter of three embryos. The data is arranged
in a tabular fashion as given in Table 11.20. We are interested to know:
1. Among the observers, is there any difference in the diameter of embryos
measured 2. Among the embryos, is there any difference in the diameter of
embryos measured and 3. Is there any simultaneous inuence of observer
and embryo in the diameter measured (interaction)
Calculation steps:
1) Correction factor (CF)
=(Grand sum)2/N = 558.12/36=8652.1
2) Total sum of squares
= (14.32+14.02+......+12.92+13.82)-CF=8979.78652.1=327.6
3) Sum of squares of among the observers
=1/9 (141.02+137.62+138.22+141.32)-CF=8653.28652.1=1.199
Multivariate Analysis 101
Discussion:
1. Embryo: The F-value is greater than the Table F-value (2051>5.614);
hence there is a signicant difference among embryos.
2. Observer: The F-value is greater than the Table F-value (5.05>4.718);
hence there is a signicant difference among observers.
3. The embryo observer interaction: The F-value is less than the
Table F value (1.06<3.667); hence embryoobserver interaction is
not signicant.
Since the interaction is not signicant, the ANOVA Table can be
reconstructed excluding interaction as a source of variation. The SS of
interaction is added to the SS of error and the DF of the interaction is
added to the DF of error. The Table thus reconstructed after excluding
interaction as a source of variation is given below (Table 11.23):
References
Armstrong, R.A., Slave, S.V. and Eperjesi, F. (2000): An introduction to analysis of variance
(ANOVA) with special reference to data from clinical experiments in optometry.
Ophthalmic Physiol. Opt., 20(3), 235241.
Belle, G.V., Fisher, L.D., Heagerty, P.J. and Lumley, T. (2004): BiostatisticsA Method
for Health Sciences. John Wiley & Sons, Inc., New Jersey, USA.
Bretz, F. (2006): An extension of the Williams trend test to general unbalanced linear
models. Comp. Stat. Data Anal., 50(7), 17351748.
Cheung, S.H. and Holland, B. (1991): Extension of Dunnetts multiple comparison
procedure to the case of several groups. Biometrics, 47, 2132.
Dmitrienko, A., Chaung-Stein, C. and DAgostino, R. (2007): Pharmaceutical Statistics.
Using SASA Practical Guide. SAS Institute, NC, USA.
Dunnett, C.W. (1955): A multiple comparison procedure for comparing several treatments
with a control. Am. Stat. Assoc., 50, 10961211.
EPA (2000): United States Environmental Protection Agency. Health Effects Test
Guidelines, OPPTS 870.3050, Repeated Dose 28Day Oral Toxicity Study in
Rodents, EPA 712C00366 2000. EPA, USA.
Gad, S. and Weil, C.S. (1988): Statistics and Experimental Design for Toxicologists.
Telford Press, New Jersey, USA.
Gill, L. (!990): Uses and abuses of statistical methods in research in parasitology. Vet.
Parasitol., 36(3-4), 189209.
Gupta, R.C. (2007): Veterinary ToxicologyBasic and Clinical Principles. Academic
Press, New York, USA.
Kibune, Y. and Sakuma, A. (1999): Practical Statistics for Medical Research, Scientist
Press, Tokyo, Japan.
Kobayashi, K., Pillai, K.S., Michael, M., Cherian, KM., Ohnishi, M. (2010): Determining
NOEL/NOAEL in repeated-dose toxicity studies, when the low dose group shows
signicant difference in quantitative data. Lab. Anim. Res., 26(2),133137.
Mathews, P.G. (2005): Design of Experiments with MINITAB. American Society for
Quality, Milwaukee, USA.
Moder, K. (2007): How to keep the Type I error rate in ANOVA if variances are
heteroscedastic. Aust. J. Stat., 6(3), 179188.
Muir, W.M., Romero-Severson, J., Rider Jr., S.D., Simons, A. and Ogas, J. (2006):
Application of one sided t-tests and a generalized experiment-wise error rate to high-
density oligonucleotide microarray experiments: An example using Arabidopsis. J.
Data Sci., 4, 323341.
Nagata, Y. and Yoshida, M. (1997): Tokeiteki-tajuhikakuho-no-kiso. Scientist Co. Ltd,
Tokyo, Japan.
Norman, G.R. and Streiner, D.L. (2008): BiostatisticsThe Bare Essentials. 3rd Edition.
BC Decker Inc., Ontario, Canada.
OECD (1995): Organization for Economic Cooperation and Development. OECD
Guidelines for Testing of Chemicals. Repeated Dose 28-Day Oral Toxicity Study in
Rodents. No. 407, OECD, France.
Multivariate Analysis 105
Sign Tests
Perhaps, the sign test is the oldest distribution-free test which can be used
either in the one-sample or in the paired sample contexts (Sawilowsky,
2005). Sign test is probably the simplest of all the non-parametric methods
(Whitley and Ball, 2002; Crawley, 2005). The null hypothesis of the sign
test is that given a pair of measurements (xi, yi), then xi and yi are equally
likely to be larger than each other (Surhone et al., 2010). Though the sign
test is rarely used in toxicology, it can be used in certain pharmacological
in vivo experiments to evaluate whether a treatment is superior to the other.
The sign test may be used in clinical trials to know whether either of the
two treatments that are provided to study subjects is favored over the other
(Nietert and Dooley, 2011).
The calculation procedure of sign test for small sample size (n < = 25) is
different from that of large sample size (n>25):
Non-Parametric Tests 107
6 7
1 1 1
p 7 C1 7 C 0
2 2 2
7
7 C1 7 C0
1
2
0.0546 0.0078 0.0624
n
Note: n Cr ; Rat No. 8, which did not show any change in the
r(n r
)
blood sugar is not included in the analysis.
Since P=0.0624 is >0.05, it is considered that the decrease in blood sugar
in rats administered with herbal preparation is insignicant.
Doctor No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Drug A (Xa) 4 5 4 5 5 3 5 4 3 4 5 3 3 4 5 4 3 4 4 5 4 5 4 3 3 5 2 1 4 5 3 5
Drug B (Xb) 2 3 3 4 2 3 2 5 3 5 3 4 2 4 3 5 4 3 5 3 4 3 5 4 4 2 4 3 2 4 2 2
Sign (Xb- Xa) - - - - - - + + - + - - + + - + - - + + + - + + - - - -
A Handbook of Applied Statistics in Pharmacology
Non-Parametric Tests 109
study was to know whether the analgesic effect of drugs A and B is similar
or different.
The pairs, which showed a difference of 0 ( sign) are excluded from
the calculation procedure. In this example four pairs showed a difference
of 0 ( sign). Therefore, number (n) of data becomes 324=28. Number
of + sign, which indicates that the effect of drug B is better than drug A, is
11. Z is obtained from the equation given below:
r 0.5r 11.5 14
z 0.94
r 2.65
28 28
Mean r 14 r ( SD) 2.65
2 2
Calculation Procedure:
The number of samples (classes) in each group = 6
Sum of rank of School B, R2=10+6+7+12+11+3=49
Non-Parametric Tests 111
V
12 u 11
39
Where,
29 49
6.5
12
12 = Sum of number of samples (classes) of School A and School B
11 = (Sum of number of samples (classes) of School A and School B) 1
Let us calculate T
13
49 6 u
T 2 1.601
39
Where,
13 = (Sum of number of samples (classes) of School A and School B) + 1
2 = Constant
Calculated T value (T=1.601) is smaller than the U() = 1.644854 at
P= 0.05 (see Table 12.7). Hence, it is considered that there is no signicant
difference in scores between the schools.
Table 12.7. Standard normal distribution Table (Yoshimura, 1987)
Two tailed P Upper P % point
2 U()
0.05000 0.025000 1.959964
0.06000 0.030000 1.880791
0.07000 0.035000 1.811911
0.08000 0.040000 1.750686
0.09000 0.045000 1.695398
0.10000 0.050000 1.644854
Bristol. Eight cups of tea were made. In four cups, milk was added rst
and in the other four cups tea was added rst. Thus, the column totals were
xed. Dr. Bristol was asked to identify the four to tea rst, and the four
to milk rst cups. Thus, the row totals were also xed in advance. Fisher
proceeded to analyse the resulting 2 2 table, thus giving birth to Fishers
exact test (Clarke, 1991; Ludbrook, 2008).
Manual analysis of data using Fishers exact test is beyond the scope of
this book, hence not covered. The power to detect a signicant difference
is more with Fishers exact test than the 2 test as seen in Table 12.8.
Table 12.8. Power to detect a signicant differenceComparison between 2 test and
Fishers exact test
Incidence of pathological lesions P-value
(Control vs dosed group) Chi-square test* Fishers test ()
0/5 vs 1/5 1.00000 0.50000
0/5 vs 2/5 0.42920 0.22222
0/5 vs 3/5 0.16755 0.08333
0/5 vs 4/5 0.05281 0.02381
0/5 vs 5/5 0.01141 0.00397
1/5 vs 2/5 1.00000 0.50000
1/5 vs 3/5 0.51861 0.26190
1/5 vs 4/5 0.20590 0.10317
1/5 vs 5/5 0.05281 0.02381
2/5 vs 3/5 1.00000 0.50000
2/5 vs 4/5 0.51861 0.26190
2/5 vs 5/5 0.16755 0.08333
*Yetess correction (Note on Yetess correction: slightly overestimates the difference
2
between expected and observed results. This overestimation can be corrected by decreasing
the difference between expected and observed by 0.5).
McKinney et al. (1989) reviewed the use of Fishers exact test in 71
articles published between 1983 and 1987 in six medical journals. Nearly
60% of articles did not specify use of a one- or two-sided test. The authors
concluded that the use of Fishers exact test without specication as a one-
or two-sided version may misrepresent the statistical signicance of data.
Non-Parametric Tests 113
Mann-Whitneys U test
Mann-Whitneys U test, a test equivalent of Students t-test for comparing
two groups, was independently developed by Mann and Whitney (1947)
and Wilcoxan (1945). The calculation procedure of Mann-Whitneys U test
is very much similar to Wilcoxan signed rank sum test. For understanding
Mann-Whitneys U test in a detailed manner, let us analyse the data given
in Table 12.9. Our objective of the analysis is to nd whether there is
a signicant difference in hemoglobin content between Group A and
Group B.
Table 12.9. Hemoglobin content (g/dl) in two experimental groups of rats following the
administration of a drug at 10 mg/kg b.w. (Group A) and at 20 mg/kg b.w. (Group B)
Group A 9.3 6.4 10.8 5.6
Group B 5.9 9.7 9.9 6.7
Let us pool the data and arrange them from the smallest to the largest,
ignoring the Group to which they belong and rank them. Then, tag them
with the identity of the Group to which they belong (Table 12.10).
Table 12.10. Ranking the data
Pooled data 5.6 5.9 6.4 6.7 9.3 9.7 9.9 10.8
Ranked data 1 2 3 4 5 6 7 8
Tagged data with A B A B A B B A
respective group
(Note for tied observations: Assign mean score for the tied observations. For example, if
the value of ranks 2nd and 3rd is 5.9, give each value a rank of 2.5).
Since the computed U value is greater than the values given in the
Mann-Whitney U Table, it is not signicant at 5% level by two-sided and
one-sided tests (at 5 % signicant level the U Table values are 0 and 1 for
two-sided and one-sided tests, respectively).
When the size of either of the groups exceeds 20, the signicance of U can
be tested using the Z statistic:
U n1 n 2 / 2
Z
n1 n 2 (n1 n 2 1) / 12
The high dose group is signicantly different from the control group
as per Mann-Whitneys U test, when the data of both the groups have
three digits after decimal and no data from the control group is repeated
in the high dose group and vice versa. When the number of digits after
the decimal of the data was truncated to two decimals, the value 11.44
was repeated in both the groups, resulting in an insignicant difference
between the control and high dose groups. When the number of digits
after the decimal of the data was restricted to one decimal, the values 11.4
and 13.7 were repeated in both the groups, resulting in an insignicant
difference between the control and high dose groups.
There are two methods for calculating the Mann-Whitneys U test.
When the number of observations in each group is small (N= <27), the
Mann-Whitneys U test can be calculated by using a ready reckoner (http://
aoki2.si.gunma-u.ac.jp/lecture/Average/u-tab.html). When the number of
observations in each group is large (N= >27), it is calculated using the
Z distribution Table method. Table 12.14 demonstrates the analysis of a
simulated data with a strong dose-related pattern by Mann-Whitneys U
test using the Z distribution Table method. Table 12.15 demonstrates the
analysis of a simulated data with strong dose-related pattern by Mann-
Whitneys U test using the ready reckoner.
Table 12.14. Power of Mann-Whitneys U test for three and four samples with a strong
dose-related pattern (calculated by using Z distribution Table)
Number Group Raw data Mean rank Z value P value
of samples (ranked) Two-sided One-sided
3 Control 1, 2, 3 2 1.96 0.04953 0.02500
Dose 4, 5, 6 5
4 Control 1, 2, 3, 4 2.5 2.30 0.0209 0.010
Dose 5, 6, 7, 8 6.5
Table 12.15. Power of Mann-Whitneys U test for three and four samples with a strong
dose-related pattern (calculated by using the ready reckonerhttp://aoki2.si.gunma-u.
ac.jp/lecture/Average/u-tab.html)
Number of Group Raw data Mean rank U value P value
samples (ranked) Two-sided One-sided
3 Control 1, 2, 3 2 0.0 Not signicant P<0.05.
Dose 4, 5, 6 5
4 Control 1, 2, 3, 4 2.5 0.0 P=0.05 P<0.05.
Dose 5, 6, 7, 8 6.5
Non-Parametric Tests 117
The Tables 12.14 and 12.15 indicate that there is not much difference in
P values between Z distribution Table and ready reckoner methods, when
the number of samples is as small as 3 to 4. However, we recommend a
ready reckoner when the number of observations in each group is small
(N= <27) and a Z distribution Table when the number of observations in
each group is large (N= >27).
12 u 1 2 a
N1 N 2 Na
X2 3( N 1)
N ( N 1)
If the groups have data with same ranks, the chi-square value is calculated
as given below (Equation 2):
( N 1) S
X2
N 1
2
N 1
2
r11 rana
2 2
2 2 2
r1 N 1 ( N 1) r2 N 2 ( N 1) ra N a ( N 1)
2 2 2
S
N1 N2 Na
118 A Handbook of Applied Statistics in Pharmacology
If the derived chi-square value is larger than the chi distribution Table
value, then it indicates a signicant difference.
Let us work out an example. Lymphocyte count determined in four
groups in a clinical study is given in Table 12.16.
Table 12.16. Lymphocyte counts (%) determined in a clinical study
Group A Group B Group C Group D
40.6 31.9 32.7 30.6
38.0 36.8 31.3 35.9
41.1 32.4 32.9 29.6
52.7 34.8 31.9 29.2
48.8 43.1 28.5 28.5
41.1 39.0 31.2 30.8
39.9 33.6 33.1 30.5
43.1 34.3 34.1 29.4
32.7 34.0 31.2 30.8
30.1 33.8 31.7 32.0
Mean 40.8 35.4 31.9 30.7
N 10 10 10 10
Number group = 4; Total number of samples = 40.
Combine the lymphocytes counts of all the four groups, and arrange
them from the smallest to the largest. Then assign a rank from 1 to 40
to them as given in Table 12.17. (Note: we have done a similar exercise
while working out the example of scores for performance of six classes of
two schools for explaining Wilcoxon rank-sum test; vide Tables 12.4 and
12.5).
Table 12.17. Ranks assigned to the lymphocyte counts (%) of four groups
Group A Group B Group C Group D
34 15.5 19.5 8
31 30 13 29
35.5 18 21 5
40 28 15.5 3
39 37.5 1.5 1.5
35.5 32 11.5 9.5
33 23 22 7
37.5 27 26 4
19.5 25 11.5 9.5
6 24 14 17
Mean rank 31.1 26 15.55 9.35
Non-Parametric Tests 119
r = 15.5+30++25+24=260
r =19.5+13++11.5+14=155.5
r = 8+29++9.5+17=93.5
S is calculated as 2914.35 (see below):
2 2 2 2
10 u 41 10 u 41 10 u 41 10 u 41
311 260 155.5 93.5
2 2 2 2
S 2914.35
10 10 10 10
X 2 is calculated as 21.3 (see below):
(40 1) u 2914.35
X2
2
2
2
2
34 (40 1) 31 (40 1) 9.5 (40 1) 17 (40 1)
2 2
2 2
113659.7
= = 21.3
5326.5
The computed X 2 value is compared with the X 2 Table value (Table 12.18)
at 41=3 degrees freedom. Since the computed X 2 value (21.3) is greater
than the X 2 Table value (16.266), it is considered that there is a signicant
difference in lymphocyte counts among the groups (P<0.001).
Table 12.18. Chi square Table (Yoshimura, 1987)
DF\ 0.1 0.05 0.01 0.001
1 2.706 3.841 6.635 10.828
2 4.605 5.991 9.210 13.816
3 6.251 7.815 11.345 16.266
4 7.779 9.488 13.277 18.467
5 9.236 11.070 15.086 20.515
tests reveal a signicant difference, it does not indicate that every group
means are signicantly different from each other. One of the robust tests
used to nd out which group means are signicantly different from each
other is the Dunns multiple comparison test. Dunns multiple comparison
test can be used to nd the difference of 3 or more groups (Israel, 2008).
Dunns multiple comparison test for more than three groups (Gad and
Weil, 1986; Hollander and Wolf, 1973)
Let us review the example given in Table12.17. The mean rank values
are reproduced in Table 12.19.
Table 12.19. Mean rank of lymphocyte (%)
Group A Group B Group C Group D
Mean rank 31.1 26 15.6 9.4
N 10 10 10 10 Sum=40
Calculation procedure
Group A vs Group B:
Difference of mean rank: 31.126=5.1
The Probability value:
0.05 (40)(41) 1 1
4(3) Z 0.00417 2.63 u 13.7
12 10 10
Group A vs Group C:
Difference of mean rank: 31.115.6=15.5
The Probability value:
0.05 (40)(41) 1 1
4(3) Z 0.00417 2.63 u 13.7
12 10 10
Group A vs Group D:
Difference of mean rank: 31.1-9.4=21.7
The Probability value:
0.05 (40)(41) 1 1
4(3) Z 0.00417 2.63 u 13.7
12 10 10
Non-Parametric Tests 121
The difference between the two mean scores is compared with the
Probability (critical) value (13.7). If the difference between the two mean
scores is greater than the Probability (critical) value, then the difference is
considered signicant (see below given Table 12.21).
Table 12.21. Signicant difference between the groups
Analysis Difference Critical P
value
Group A vs Group B 31.126=5.1 13.7 Not signicant (P>0.05)
Group A vs Group C 31.115.6=15.5 Signicant (P<0.05)
Group A vs Group D 31.19.4=21.7 Signicant (P<0.05)
Calculation procedure:
Control group vs Low dose group
1) Sum of rank of low dose group, R2=5+6+7+8=26
122 A Handbook of Applied Statistics in Pharmacology
Table 12.25. Minimum number of animals in four-group and ve-group settings necessary
to show a signicant difference
Test Four groups Five groups
Scheff type 22 40
Hollander-Wolfe* 19 30
Tukey type 18 32
Dunnett type 15 26
Wilcoxon 8 12
Steel type 4 6
Mann-Whitney U** 3
*Dunns test. **Test for 2 group alone.
The power also depends on the number of treatment groups, which
implies that inclusion of further non-signicant treatment group/s can
result in overlooking signicant effects (Hothorn, 1990).
As mentioned earlier, the power to detect a signicant difference is
high with Steels test. A comparison of the power to detect a signicant
difference between Dunnett type rank test and Steels test is given in Table
12.26.
Table 12.26. Comparison of the power to detect a signicant difference between Dunnett
type rank test and Steels test
Parameter analysed Control (N=5) Low dose Mid dose High dose Top dose
and tests (N=5) (N=5) (N=4) (N=4)
Urine volume (ml) 2.4, 2.8, 2.4, 43, 45, 40, 62, 48, 68, 73, 72, 52, 97, 99,
2.4, 2.4 41, 46 52, 55 102, 104 103
Mean SD 2.5 0.18 43 2.55 57 8.0 87.8 87.8 24
17.6
Bartletts P = 0.0001
homogeneity test
Kruskal-Walliss P = 0.0006
test
Dunnett type rank NS S S S
test
Steels test S S S S
NS-Not signicant (P>0.05); S-Signicant (P<0.05)
The low dose group was not signicantly different, when analysed
using Dunnett type rank test, whereas, this dose group was signicantly
different, when analysed using Steels test.
Most of the pharmacologists and toxicologists express their concern
about use of non-parametric tests like rank sum tests, because of their
low sensitivity in detecting a signicant difference. However, some
Non-Parametric Tests 125
biostatisticians are of the opinion that the rank sum tests are more useful
for analyzing the biological data than the parametric tests.
References
Clarke, S.C. (1991): Invited commentary on R. A. Fisher. Am. J. Epidemiol., 134(12),
13711374.
Crawley, M.J. (2005): Statistics: An Introduction Using R. John Wiley and Sons Ltd.,
Chichester, UK.
Elston, R.C. and Johnson, W.D. (1994): Essentials of Biostatistics. F.A. Davis & Co.,
Philadelphia, USA.
Fagerland, M.W. and Sandvik, L. (2009): The Wilcoxon-Mann-Whitney test under scrutiny.
Statist. Med., 28, 14871497.
Fisher, R.A. (1922): On the interpretation of 2 from contingency tables, and the calculation
of P. J. Royal Stat. Soc., 85(1), 8794.
Fisher, R.A. (1954): Statistical Methods for Research Workers. Oliver and Boyd, London,
UK.
Gad, S. and Weil, C.S. (1986): Statistics and Experimental Design for Toxicologists. The
Telford Press, New Jersey, USA.
Hollander, M. and Wolf, D.A. (1973): Non-Parametric Statistical Methods.John Wiley,
New York, USA.
Hothorn, L. (1990): Biometrische Analyse spezieller Untersuchungen der regulatorischen
Toxikologie. In: Aktuelle Probleme der Tbxikologie, Vol. 5 Grundlagen der Statistik
fuer Toxikologen (M. Horn and L. Hothorn, Eds.) Verlag Gesundheit Gmbh, Berlin,
Germany.
Inaba, T. (1994): Problem of multiple comparison method used to evaluate medicine of
enzyme inhibitor X1, Japanese Society for Biopharmaceutical Statistic, 40, 3336.
Israel, D. (2008): Data Analysis in Business Research-A Step by Step Non-Parametric
Approach. SAGE Publications India Pvt. Ltd., New Delhi, India.
Kruskal, W.H. and Wallis, A.W (1952): Use of ranks in one criterion analysis of variance.
J. Am. Stat. Assoc., 47(260), 583621.
Le, C.T. (2003): Introductory Biostatistics. John Wiley & Sons, Inc., Hoboken, New Jersey,
USA.
Ludbrook, J. (2008): Analysis of 2 2 tables of frequencies: Matching test to experimental
design. Int. J. Epidemiol., 37(6), 14301435.
Mann, H.B. and Whitney, D.R. (1947): On a test of whether one of 2 random variables is
stochastically larger than the other. Ann. Math. Stat., 18, 5060.
Mc Donald, J.H. 2009: Handbook of Biological Statistics, 2nd Edition. Sparky House
Publishing, Baltimore, USA.
Mc Kight, P.E. and Najab, J. (2010): Kruskal-Wallis Test. In: Corsini Encyclopedia of
Psychology. Editors, Weiner, I.B. and Craighead, W.E., Wiley Online Library,
DOI: 10.1002/9780470479216.
Mc Kinney, W.P., Young, M.J., Hartz, A. and Lee, M.B. (1989): The inexact use of Fishers
exact test in six major medical journals. JAMA, 16, 261(23), 34303433.
126 A Handbook of Applied Statistics in Pharmacology
Nietert, P.J. and Dooley, M.J. (2011): The power of the sign test given uncertainty in the
proportion of tied observations, 32(1), 147150.
Sawilowsky, S. (2005): Encyclopedia of Statistics in Behavioral Science. Wiley Online
Library, DOI: 10.1002/0470013192.bsa615.
Steel, R.G.D. (1961): Some rank sum multiple comparison tests. Biometrics, 17(4),
539552.
Surhone, L.M., Timpledon, M.T. and Marseken, S.F. (2010): Sign Test. VDM Verlag Dr
Mueller AG&Co., KG, Germany.
Whitley, E. and Ball, J. (2002): Statistics review 6: Nonparametric methods, Crit. Care,
6(6), 509513.
Wilcoxan, F. (1945): Individual comparisons by ranking methods. Biometrics Bull., 1(6),
8083.
Yoshimura, I. (1987): Statistical Analysis of Toxicological Data. Scientist Press, Tokyo,
Japan.
Yoshimura, I. and Ohashi, Y. (1992): Statistical Analysis for Toxicology Data. Chijin-
Shokan, Tokyo, Japan.
13
Cluster Analysis
Observation
Wards method of cluster analysis (Ward, 1963; Ward and Hook, 1963)
This method is more efcient than hierarchical cluster analysis. Wards
method uses the squared distances between-clusters and within-clusters
(Rencher, 2002). Hence, Wards method is also called as the incremental
sum of squares method.
Cluster Analysis 129
Bartletts test
Dunnetts multiple
Steels test
comparison test
Figure 13.2. Analytical methods by a decision tree
We shall analyse the data of the study described above using Wards
method of cluster analysis (Milligan 1980). The software used for the
analysis was JMP (version 5) of the SAS (SAS Institute, Japan).
Cluster-1
The items in the dosed groups that showed a signicant difference
compared to the control group werebody weight gain, food efciency,
hematocrit, hemoglobin, red blood cell count, platelet count, neutrophil
(%), lymphocytes (%), blood urea nitrogen, total protein, alanine
aminotranferase, alkaline phosphatase, glucose, prothrombin time,
albumin, albumin/globulin ratio, inorganic phosphorus in urine, lung
weight, relative weights of the lung, liver, kidneys and testes, gross
pathology ndings, and microscopic ndings. These items were grouped
in Cluster 1.
Each dosed group was divided into Group 1 and Group 2. Group 1 was
further divided into Subgroup 1 and Subgroup 2 (Table 13.1).
Table 13.1. Results of cluster analysis: Cluster 1Items showing a signicant difference
(P<0.05) compared to control
Dose Group Number of animals
Group 1 Group 2
Subgroup-1 Subgroup-2
Control 10 0 0
Low 10 0 0
Mid 10 0 0
High 2 8 0
Top 0 4 6
Cluster Analysis 131
The dendrogram obtained from the above data is given in Figure 13.3.
Figure 13.3. Dendrogram of items that are signicantly different from control (Wards
method)
Note: Animal identication mark, dose group and animal number are given on the left side
of the dendrogram.
132 A Handbook of Applied Statistics in Pharmacology
Cluster 2
The items which did not show a signicant difference compared to control
werefood and water consumption, leucocyte count, lymphocyte count,
reticulocyte count, activated partial thromboplastin time, total cholesterol,
free cholesterol, triglyceride, phospholipid, non esteried fatty acid,
creatinine, total bilirubin, sodium, potassium, chloride, calcium, inorganic
phosphorus, alanine aminotransferase, lactate dehydrogenase, alpha-1 (%),
gamma (%), urine volume, urine specic gravity, and sodium, potassium,
chloride, calcium and inorganic phosphorus in urine, and weights of the
brain, heart, liver, kidneys, spleen, adrenals, testes, thyroid and thymus,
and relative weights of the brain, heart, spleen, adrenals, thyroid and
thymus. These items were grouped in Cluster 2.
Each dosed group was divided into Group 1 and Group 2. Groups 1
and 2 were further divided into two Subgroups each (Table 13.2).
Table 13.2. Results of cluster analysis: Cluster 2Items showing no signicant difference
(P>0.05) compared to control
Dose group Number of animal
Group 1 Group 2
Subgroup-1 Subgroup-2 Subgroup-1 Subgroup-2
Control 8 2 0 0
Low 6 4 0 0
Mid 7 3 0 0
High 5 0 5 0
Top 0 0 5 5
Figure 13.4. Dendrogram of items that are not signicantly different from control (Wards
method)
Note: Animal identication mark, dose group and animal number are given on the left side
of the dendrogram.
134 A Handbook of Applied Statistics in Pharmacology
References
Agraotis, D.K., Bandyopadhyay, D. and Farnum, M. (2007): Radial clustergrams:
visualizing the aggregate properties of hierarchical clusters. J. Chem. Inf. Model, 47,
6975.
Finch, H. (2005): Comparison of distance measures in cluster analysis with dichotomous
data. J. Data Sci., 3, 85100.
Furlan, D., Carnevali, I.W., Bernasconi, B., Sahnane, N., Milani, K., Cerutti, R.,
Bertolini, V., Chiaravalli, A.M., Bertoni, F., Kwee, I., Pastorino, R. and Carlo, C.
(2011): Hierarchical clustering analysis of pathologic and molecular data identies
prognostically and biologically distinct groups of colorectal carcinomas. Modern
Path., 24, 126137.
Gad, S. and Weil, C.S. (1986): Statistics and Experimental Design for Toxicologists, The
Telford Press, New Jersey, USA.
Hamadeh, H.K., Bushel, P.R., Jayadev, S., DiSorbo, O., Bennett, L., Li, L., Tennant, R.,
Stoll, R., Barrett, C., Paules, R.S., Blanchard, K. and Afshari, C.A. (2002): Prediction
of compound signature using high density gene expression proling. Toxicol. Sci.,
67, 232240.
Hartigan, J.A. (1975): Clustering Algorithms. John Wiley & Sons, Inc., New York, USA.
Kikwood, B. (1989): Medical Statistics, Blackwell Scientic Publications, London, UK.
Kobayashi, K. (2004): Evaluation of toxicity dose levels by cluster analysis. J. Toxicol.
Sci., 29(2), 125129.
Kobayashi, K., Kanamori, M., Ohori, K. and Takeuchi, H. (2000): A new decision tree
method for statistical analysis of quantitative data obtained in toxicity studies on
rodent. San Ei Shi, 42, 125129.
Makretsov, N.A., Huntsman, D.G., Nielsen, T.O., Yorida, E., Peacock, M., Cheang,
M.C.U., Dunn, S.E., Hayes, M., van de Rijn, M., Bajdik, C. and Gilks, C.B. (2004):
Hierarchical clustering analysis of tissue microarray immunostaining data identies
prognostically signicant groups of breast carcinoma. Clin. Cancer Res., 10,
61436151.
Milligan, G.W. (1980): An examination of the effect of six types of error perturbation on
fteen clustering algorithms. Psychometrika, 45, 325342.
Moore, C.W., Meyers, D.A., Wenzel, S.E., Teague, G.W., Li, H., Li, X., DAgostino, Jr., R.,
Castro, M., Curran-Everett, D., Fitzpatrick, A.M., Gaston, B., Jarjour, N.N., Sorkness,
R., Calhoun, W.J., Chung, K.F., Comhair, S.A.A., Dweik, R.A., Israel, E., Peters,
S.P., Busse, W.W., Erzurum, S.C. and Bleecker, E.R. (2010): Identication of asthma
phenotypes using cluster analysis in the severe asthma research program. Am. J. Resp.
Crit. Care Med., 181, 315323.
Rencher, A.C. (2002): Methods of Multivariate Analysis. 2nd Edition, Wiley-Interscience,
New York, USA.
Romesburg, H. (2004): Cluster Analysis for Researchers. Lulu Press, North
Carolina, USA.
Schonlau, M. (2002): The Clustergram: A graph for visualizing hierarchical and non-
hierarchical cluster analyses. The Stata J., 3, 316327.
Cluster Analysis 135
Scoltock, J. (1982): A survey of the literature of cluster analysis. Computer J., 25(1),
130134.
Shannon, W., Culverhouse, R. and Duncan, J. (2003): Analyzing microarray data using
cluster analysis. Pharmacogenomics, 4(1), 4152.
Tryon, R.C. (1939): Cluster Analysis. Edward Brothers, Ann. Arbor., MI, USA.
Ward, J.H., Jr. (1963): Hierarchical grouping to optimize an objective function. J. Am. Stat.
Assoc., 58(301), 235244.
Ward, J.H., Jr. and Hook, M.E. (1963): Application of an hierarchical grouping procedure
to a problem of grouping proles. Edu. Psych. Measurement, 23, 6981.
14
Trend Tests
Introduction
In pharmacology and toxicology experiments three or more than three
treatment groups are usually used. One of the objectives for carrying
out the experiment with three or more than three groups is to assess the
dose-dependency of the test substance. Dose-dependency is an important
concept for evaluating toxicological data (Hamada et al., 1997). In order
to examine whether the change in a parameter observed in a study is dose-
dependent, a trend test is used. A trend test examines whether the results
in all dose groups together increase as the dose increases (EPA, 2005).
Trend tests have been recommended as a customary method for analyzing
data from subchronic and chronic animal studies (Selwyn, 1995). For
examining quantitative data, Jonckheeres trend test (Jonckheere, 1954)
is generally used. The frequency data are examined by Cochran-Armitage
trend test (Cochran, 1954; Armitage, 1955).
Formula:
ijTij
2 4
J
V
N ( N 1) (2 N 5) iN1 ( N i 1) (2 Ni 5)
V
72
^ iNi(ni 1)( Ni 2)`^ i(i 1)(i 2)`
36 N ( N 1) ( N 2)
If the computed J value is greater than the Z value given in the standard
normal distribution Table, it is considered to be signicantly different.
Calculation of T values:
We need this information for the calculation of J. Arrange the data in each
group in the order of prediction. Let us calculate T12 (Control Group vs Low
Dose Group). For each control value the number of values that are lesser
than it in the low dose group are counted, and their total is calculated:
T12 = 9+ 8+ 9+10+10+9+ 9+ 9+ 2+ 0 = 75
The rst value of the control group is 40.6 and there are 9 values of the low
dose group, which are lesser than 40.6. The second value of the control
group is 38.0 and there are 8 values of the low dose group, which are lesser
than 38.0, and so on.
Similarly, values are counted for other trends.
T13 =10+10+10+10+10+10+10+10+ 6+ 1 = 87
T 14 =10+10+10+10+10+10+10+10+ 9+ 4 = 93
T 23 = 5+10+ 6+10+10+10+ 9+10+ 9+ 9 = 88
T 24 = 8+10+ 9+ 9+10+10+ 9+ 9+ 9+ 9 = 92
T 34 = 9+ 8+ 9+ 8+ 0+ 8+ 9+ 9+ 8+ 8 = 76
where, T13 is Control Group vs Mid Dose Group, T14 is Control Group vs
High Dose Group, T23 is Low Dose Group vs Mid Dose Group, T24 is Low
Dose Group vs High Dose Group and T34 is Mid Dose Group vs High Dose
Group.
Values: We also need to know how many times a value repeated
within a group and across the groups. 43.1 is repeated twice-one each in
Groups 1 and 2 (1), 41.1is repeated twice within the Group 1 (2), 32.7 is
repeated twice-one each in Groups 1 and 3 (3), 31.9 is repeated twice-one
each in Groups 2 and 3 (4), 30.8 is repeated twice within Group 4 (5),
31.2 is repeated twice within Group 3 (6)and 28.5 is repeated twice-one
each in Groups 3 and 4 (7).
40(40 1)(2 u 40 5) 10(10 1)(20 5) u 4
V
72
4 u10(10 1)(10 2) u ^ 2(2 1)(2 2) 2(2 1)(2 2) 2(2 1)(2 2) 2(2 1)(2 2) 2(2 1)(2 2)
2(2 1)(2 2) 2(2 1)(2 2) `
36 u (40 1)(40 2)
1 1 0 1 0 1 40 2 10 2 u 4
75 87 93 88 92 76 0.5
2 4 212.5
J 5.13
1717.003 41.4
Table 14.3. Individuals of different age groups expressing antibodies to house dust
Age Conversion Independent variable Number of Number of
value (log transformed) investigations antibody
positives
Ones sixties 2.5 0.398 10 2
Ones fties 5 0.699 10 4
Ones forties 10 1.000 10 6
Ones thirties 20 1.301 10 8
A value of 10 is assigned to the age forties. Half of the value of the age
forties (10/2=5) is assigned to the age fties and half of the value of age
fties (5/2=2.5) is assigned to the age sixties. The value assigned for the
age thirties is 20 (10x2).
Number of group = 4, Sum of number of sample = 40, rate of positive in
total = (2+4+6+8)/40= 20/40= 0.5
References
Antonello, J.M., Clark, R.L. and Heyse, J.F. (1993): Application of the Tukey trend test
procedure to assess developmental and reproductive toxicity I. Measurement data.
Tox. Sci., 21(1), 5258.
Armitage, P. (1955): Tests for linear trends in proportions and frequencies. Biometrics, 11
(3), 375386.
Astuti, E.T. and Yanagawa, T. (2002): Testing trend for count data with extra-Poisson
variability. Biometrics, 58(2), 398402.
Buonaccorsi, J.P., Laake, P. and Veierd, M.B. (2011): On the power of the Cochran-
Armitage test for trend in the presence of misclassication. Stat. Methods Med. Res.,
August 2011; doi:10.1177/0962280211406424.
Cochran, W.G. (1954): Some methods for strengthening the common chi-square tests.
Biometrics, 10(4), 417451.
Cohen, L. and Holliday, M. (2001): Practical Statistics for Students. SAGE Publications
Inc., California, USA.
EPA (2005): United States Environmental Protection Agency. Guidelines for Carcinogen
Risk Assessment. U.S. Environmental Protection Agency, EPA/630/P-03/001F.
USEPA, Washington D.C., USA.
Field, A.P. (2004): Discovering Statistics Using SPSS, 2nd Edition, SAGE, London, UK.
Gad, S.C. (2009): Drug Safety Evaluation, 2nd Edition. John Wiley & Sons, Inc., New
Jersey.
Hamada, C., Yoshino, K., Matsumoto, K., Ikumi Abe, I., Yoshimura, I. and Nomura, M.
(1997): A study on the consistency between statistical evaluation and toxicological
judgment. Drug Inf. J., 31, 413421.
142 A Handbook of Applied Statistics in Pharmacology
Introduction
Survival analysis is one of the oldest elds of statistics, going back to
the 17th century. The rst life-table was presented by John Graunt in
1662 (Kreager, 1988). Life-tables are used extensively in analysing the
mortality data obtained from toxicology studies, especially carcinogenicity
and long-term repeated dose administration studies (Portier, 1988; FDA,
2007) and ecotoxicology studies (Gentile et al., 1982; Van Leeuwen et al.,
1985; Bechmann, 1994). A major advancement in the survival analysis
took place in 1958, when Kaplan and Meier proposed their estimator
of the survival curve (Kaplan and Meier, 1958). Since then, the eld of
survival analysis progressed signicantly with the contributions from
several statisticians (Mantel and Haenszel, 1959; Cox, 1972; Aalen, 1976;
Aalen, 1980; Diggle, et al., 2007; Aalen et al., 2008). The term survival
is a bit misleading. Originally the analysis was concerned with time
from treatment until death, hence the name, survival analysis. Survival
analysis is a collection of statistical procedures for data analysis for which
the outcome variable of interest is time until an event occurs (Kleinbaum
and Klein, 2005). According to Akritas (2004), survival analysis is a
method for the analysis of data on an event observed over time and the
study of factors associated with the occurrence rates of this event. The
event could be the time until a generators bearing seizes, the time until
a patient dies or the time until a person nds employment (Cleves et al.,
2008). Survival analysis can be used in many elds, such as medicine,
biology, public health and epidemiology (Kul, 2010). In pharmacology
and toxicology survival analysis is used in analyzing the events like time
to death, time to signs occurrence, disappearance and reoccurrence, time
to recovery etc. of the experimental animals.
144 A Handbook of Applied Statistics in Pharmacology
Hazard Rate
Hazard rate is an important concept in survival analysis. It provides
information on the risk of event happening as a function of time, condition
on not having happened previously (Aalen et al., 2009), whereas survival
curve provides information on how many have survived upto a certain
time. Hazard function can be estimated using the equation:
H (t) = Number of individuals experiencing an event in interval beginning
at t/(number of individuals surviving at time t) x (interval width)
The hazard function describes the risk of an outcome of an event in
an interval after time t, conditional on the individual having experienced
the event to time t. The hazard function is useful in determining whether
toxicity is constant over time, or it increases or decreases as the exposure
continues (Wright and Welbourn, 2002).
Kaplan-Meier Method
Survival analysis is normally carried out using Kaplan-Meier method or
the log rank test. The log rank test is ideal for the analysis of two groups.
The KaplanMeier estimator uses product-limit methods to estimate the
survival ratio (Kaplan and Meier, 1958). This is a nonparametric maximum
likelihood estimate of survival analysis and is used in animal experiments
to measure the fraction of animals that lives after treatment.
Distribution of the survival time T from the start of the experiment
(rst dose administration) to the event of interest (for example mortality)
is considered as a random variable. The survival rate, St, is dened as the
probability that an animal survives longer than t units of time:
St=P (T> t); for example, if t is in years, S2 is the two-year survival rate;
if S2=P (T> 2)=0.10, it indicates 10% is the probability the time from a
treatment to death is greater than 2 years
Kaplan-Meier product-limit estimator
ri d i ,
St ri
Survival Analysis 145
ri is the number of animals lived just before ti; di is the number of animals
which died in ti. denotes the product (geometric sum) across all cases
less than or equal to t. Kaplan-Meier product-limit estimator measures the
fraction of animals living for a certain amount of time after treatment.
Let us review an example to understand Kaplan-Meier product-limit
estimator. The survival rate of F344 rats in a 110-week chronic toxicity
study is given in Table 15.1. The experimental group of rats (20 rats/group)
was treated with 1000 ppm pesticide in diet. The control group of rats (20
rats/group) was given normal diet without the pesticide.
Table 15.1. Survival rate of F344 rats in a 110-week chronic toxicity (Funaki and Origasda,
2001)
Control group (Normal diet) Treatment group (1000 ppm pesticide in
diet)
Animal Survival Survival Size of Animal Survival Survival Size of
ID-No. period rate (st) effective ID-No. period rate (st) effective
(week) sample (n) (week) sample (n)
1001 85 0.950 20 1101 66 0.900 20
1002 87 0.900 19 1102 66
1003 95 0.800 18 1103 62 0.850 18
1004 95 1104 63 0.800 17
1005 99 0.650 16 1105 68 0.750 16
1006 99 1106 70 0.650 15
1007 99 1107 70
1008 101 0.550 13 1108 72 0.550 13
1009 101 1109 72
1010 102 0.500 11 1110 75 0.400 11
1011 103 0.350 10 1111 75
1012 103 1112 75
1013 103 1113 77 0.300 8
1014 104 0.250 7 1114 77
1015 104 1115 78 0.57 7
1016 106 0.150 5 1116 79 0.154 5
1017 106 1117 79
1018 110 0.050 3 1118 80 0.051 3
1019 112 0.025 2 1119 80
1020 120 - 1 1120 88 - 1
O Eg
2
g
F 2 log rank
g Eg
where O is the number of observed events in each group g, and E is the
total number of expected events in each group g. O and E are computed
each time an event happens; if a survival time is censored, then the subject
is considered to be at risk during the interval of censoring, but not anymore
for the subsequent intervals. The test statistic is then compared with a F 2
with g-1 degrees of freedom. The limitation of log rank test and Coxs
proportional hazards model is that they are based on the assumption that
the hazard ratio is constant over time (Bewick et al., 2004).
Both the life-table and the Kaplan-Meier methods have advantages
and disadvantages. In small data sets in which the time of occurrence event
is measured precisely the Kaplan-Meier method is best used, whereas the
life-table methods works well with large data sets and when the time of
occurrence of an event cannot be measured precisely. The Kaplan-Meier
method handles censored data better than life-table method.
148 A Handbook of Applied Statistics in Pharmacology
References
Aalen, O.O. (1976): Nonparametric inference in connection with multiple decrement
models. Scand. J. Stat., 3, 1527.
Aalen, O.O. (1980): A model for non-parametric regression analysis of lifetimes. In:
Mathematical Statistics and Probability Theory. Editors, Klonecki, W., Kozek, A. and
Rosinski, J. Lecture Notes in Statistics, Vol. 2, Springer-Verlag, NewYork.
Aalen, O.O., Andersen, P.K., Borgan, O., Gill, R.D. and Keiding, N. (2009): History of
applications of martingales in survival analysis. Electronic J. History Probability
Stat., 5(1), 128.
Aalen, O.O., Borgan, . and Gjessing, H.K. (2008): Survival and Event History Analysis:
A Process Point of View. Springer-Verlag, NewYork.
Akritas, M.G. (2004): Nonparametric survival analysis. Stat. Sci., 19(4), 615623.
Altman, D.G. (1991): Practical Statistics for Medical Research. Chapman & Hall/CRC,
London.
Bechmann, R.K. (1994): Use of life tables and lC50 tests to evaluate chronic and acute
toxicity effects of copper on the marine copepod Tisbe furcata (baird), Environmental
Toxicology and Chemistry. 13(9), 13811548.
Bewick, V., Cheek, L. and Ball, J. (2004): Statistics review 12: Survival analysis. Crit Care,
8(5), 389394.
Cleves, M., Gutierrez, R., Gould, W. and Marchenko, Y. (2008): An Introduction to Survival
Analysis Using Data. 2nd Edition, Stata Press, Texas, USA.
Cox, D.R. (1972): Regression models and life tables (with discussion). J. Royal Stat. Soc.
Ser., B34, 187220.
Diggle, P., Farewell, D.M. and Henderson, R. (2007): Analysis of longitudinal data with
drop-out: objectives, assumptions and a proposal. J. Royal Stat. Soc. Ser. C (Applied
Statistics), 56, 499550.
FDA (2007): United States Food and Drug Administration. Redbook 2000: IV.B.4 Statistical
Considerations in Toxicity Studies. USFDA, MD, USA.
Freeman, J.V., Walters, S.J. and Campbell, M.J. (2008): How to display data. Blackwell
BMJ Books, Oxford, UK.
Funaki, K. and Origasda, H. (2001): Statistics with Condence, Scientist Press, Tokyo,
Japan.
Gentile, J.H., Gentile, S.M., Hairston, N.G. and Sullivan, B.K. (1982): The use of life-tables
for evaluating the chronic toxicity of pollutants to Mysidopsis bahia. Hydrobiologia,
93(1-2), 179187.
Kaplan, E. L. and Meier, P. (1958): Nonparametric estimation from incomplete observations.
J. Am. Stat. Assn., 53, 457481.
Kleinbaum, D.G and Klein, M. (2005): Survival Analysis-A Self Learning Text. 2nd
Edition, Springer+Business Media, Inc., New York, USA.
Kreager, P. (1988): Newlighton Graunt. Population Studies, 42,129140.
Kul, S. (2010): The use of survival analysis for clinical pathways. Intl. J. Care Path Ways,
14, 2326.
Mantel, N. and Haenszel, W. (1959): Statistical aspects of the analysis of data from
retrospective studies of disease. J. National Cancer Inst., 22, 719748.
Survival Analysis 149
Portier, C.J. (1988): Life table analysis of carcinogenicity experiments. Int. J.Tox., 7(5),
575582.
Tinazzi, A., Scott, M. and Compagnoni, A. (2008): A gentle introduction to survival analysis.
SAS Conference Proceedings: PhUSE 2008, October 1215, 2008, Manchester, UK.
Van Leeuwen, C.J., Moberts, F. and Niebeek, G. (1985): Aquatic toxicological aspects of
dithiocarbamates and related compounds. II. Effects on survival, reproduction and
growth of Daphnia magna. Aquatic Toxicol., 7(3), 165175.
Wright, D.A. and Welbourn, P. (2002): Environmental Toxicity. Cambridge Environmental
Chemistry Series 11. Cambridge University Press, Cambridge, UK.
16
Dose Response Relationships
Benchmark Dose
NOAEL is based on a single data point and it does not consider the shape
of the dose-response curve, the number of animals in the group, or the
statistical variation in the response and its measurement (EPA, 1998).
An alternative approach to NOAEL is the Benchmark dose approach
(Kimmel and Gaylor, 1988). The Benchmark dose is dened as the dose
of a chemical that is required to achieve a predetermined response of a
toxicological effect (Sand et al., 2006). The Benchmark dose method uses
the full dose response data for the statistical analysis, hence the result
obtained from the analysis is considered to be more reliable than the single
data point based NOAEL. Unlike the NOAEL approach, the Benchmark
dose method includes the determination of the response at a given dose,
the magnitude of the dose at a given response and their condence limits.
According to EPA SAB (1998): The [categorical regression] process
makes use of every bit of data available. The underlying premise of the
approach is that the severity of the effect, not the specic measurement
or outcome incidence, is the information needed for assessing exposure-
response relationships for non-cancer endpoints. All the available data is
plotted on a single chart and one can immediately see a rough picture of the
154 A Handbook of Applied Statistics in Pharmacology
Probit Analysis
Probit analysis was originally published in Science by Bliss (Bliss,
1934). He was an entomologist and was involved in research to nd a
pesticide to control insects that fed on grape leaves (Greenberg, 1980).
Bliss transformed the percentage mortality into a probability units (or
Probits) and plotted the Probits against concentrations. But, he did not
have a statistical tool to compare the effects among various pesticides. In
1952, Finney of the University of Edinburgh wrote a book, Probit Analysis
(Finney, 1952). Probit analysis, a preferred method for analyzing dose-
response relationship even today described elaborately in Finneys book,
is based on the idea developed by Bliss. One of the assumptions of Probit
analysis is that the response vs dose data are normally distributed, if not,
Finney suggested using the logit over the Probit transformation (Finney,
1952). Both Logit analysis (Muhammad et al., 1990) and Probit analysis
(Finney, 1978) are used in biological assays.
Performing Probit analysis manually is tedious. An example is provided
below to show the steps involved in this statistical analysis. Most of the
commercially available statistical software can perform Probit analysis.
Groups of rats (10 rats/group) were given a drug at different dose
levels. The response shown by the number of animals at each dose level is
given in Table 16.1.
Let us plot a graph with dose on X axis and percent response on Y axis
(Figure 16.1).
The very purpose of carrying out the Probit analysis is to nd out that
dose which causes the response in 50% of the animals. If the response that
we are looking at is mortality, the dose that causes mortality in 50% of
animals is called as LD50. Since the inception of the LD50 test by Trevan
(1927), the test has gained wide acceptance as a measure of acute toxicity
of all types of substances (DePass, 1989).
Dose Response Relationships 155
We could have determined the dose which causes 50% response (for
example, LD50) straight away from the plot, had the plot been a straight
line. In Finneys Probit analysis the dose response curve is converted to a
straight line by transforming the doses to logarithmic values and percent
mortality to Probit values (Finney, 1971). Let us try to understand what
Probit values means. Percent response on Y axis can be converted to
normal equivalent deviation (NED). What is an NED? We know that at
one standard deviation below mean value (1SD), 16% will show response
and one standard deviation above mean value (+1SD) 84% will show
156 A Handbook of Applied Statistics in Pharmacology
Lets us now plot a graph with log dose on X axis and Probit on Y
(Figure 16.2.).
You would have observed that Probit responses for 4 doses are missing
in the Figure 16.2. The reason for this is that there are no Probit values for
0% and 100% responses. From the Figure one can nd that the Probit values
somewhat fall in a linear fashion. Let us closely observe the Probit values.
Dose Response Relationships 157
The middle region of the line (region of 50% response, i.e., the region of
Probit 5) is linear, hence this region is somewhat reliable for making a
prediction. The two ends of the line, where the data are controlled by few
animals, are not so linear in fashion, hence these regions are seldom used
for making a prediction. The variation in the middle region of the line is
less, whereas it is on the higher side in the 2 ends. This variation can be
minimised by using weighting coefcients. Once a best-t line is drawn
using a regression equation, a statistically reliable median response dose
can be estimated:
Y = a + b X , where
Y = 5 (Probit value corresponding 50% response)
X = Log dose
a = Intercept
b = Slope
Mentioning the term statistically reliable median response dose, is
intentional as several reports have stated that median response dose, for
example, LD50 is notoriously variable. Usefulness of LD50 test has been
criticized, as the test only expresses mortality; the test requires large
number of animals and the outcome of the LD50 test is inuenced by
several factors associated with the animal (for example, species, age, sex,
etc.), animal house condition (for example, temperature, humidity, light
intensity, etc.) and human error; many times the ndings of the test cannot
be extrapolated to man. On the contrary, supporters of the LD50 test are of
the opinion that a properly conducted LD50 test can yield information on
the cause and time of death, symptomatology, nonlethal acute effects; slope
of the mortality curve can provide information on the mode of action and
metabolic detoxication; the results can be used for the basis for designing
subsequent subchronic studies; the test is rst approximation of hazards to
workers (Hodgson, 2010).
This method for calculating LD50, requires a large number of animals,
thus, not desirable. Interested readers may refer to the Up and Down
Procedure, which requires less number of animals (OECD, 2000b).
Hormesis
All along we have been discussing about threshold dose-response
curve. It is widely believed that to initiate a biological effect some dose
is required. This dose is called as the threshold dose. According to this
belief a dose below the threshold dose level cannot initiate the effect. This
concept has been disproven in recent years by introducing a hypothesis
called hormesis. The term hormesis was coined by Southam and Ehrlich
Dose Response Relationships 159
(1943). The hormesis hypothesis states that most of the chemical agents
may stimulate or inhibit biological effects at doses lower than a threshold,
while they are toxic at doses higher than the threshold. This hypothesis
falls in line with Arndt-Schulz Law, which states that a weak stimulus
increases physiologic activity, a moderate stimulus inhibits activity and
a very strong stimulus abolish the activity (Schulz,1887). However,
Arndt-Schulz Law is not widely known among the toxicologists and
pharmacologists. One of the reasons for this is it was heavily criticised
by earlier pharmacologists and toxicologists, hence did not nd place in
most books on toxicology and pharmacology. Alfred Clark, the renowned
pharmacologist, in his book entitled The Mode of Action of Drugs on
Cells published in 1933 stated: In 1885 Rudolf Arndt put forward the
suggestion that if a weak stimulus excites an organism, then any drug in
sufciently weak dose ought to do this also. This suggestion was developed
by Schulz, who had leanings to homeopathy (Clark, 1933). Clark was
well known among the statisticians like Fisher and Bliss, who contributed
signicantly to the threshold dose-response relationship. Another book by
Clark, Handbook of Experimental Pharmacology (Clark, 1937), which
was very critical of the Arndt-Schulz Law, was published in seven editions,
in 1970s, more than 30 years after his death. Holmstedt and Lijestrand
in their book, Readings in Pharmacology, published in 1981 stated that
Homoeopathic theories like the Ardnt-Schulz law and Weber-Fechner law
were based on loose ideas around surface tension of the cell membranes
but there was little physic-chemical basis to these ideas (Holmstedt and
Lijestrand, 1981).
Brain-Cousens (1989) proposed a modied four-parameter logistic
model in situations where hormesis is present. Several publications indicated
that the hormetic dose-response is far more common and fundamental than
the threshold dose-response models used in toxicology (Calabrese, 2005).
According to Calabrese (2010), the hormetic dose-response model makes
far more accurate predictions of responses in low dose zones than either
the threshold or linear at low dose models.
References
Bliss, C.I. (1934): The method of Probits. Science, 79(2037), 3839.
Brain, P. and Cousens, R. (1989): An equation to describe dose responses where there is
stimulation of growth at low dose. Weed Res., 29, 9396.
Bretz, F. (2006): An extension of the Williams trend test to general unbalanced linear
models. Comp. Stat. Data Anal., 50(7), 17351748.
160 A Handbook of Applied Statistics in Pharmacology
Calabrese, E.J. (2005): Toxicological awakenings: the rebirth of hormesis as a central pillar
of toxicology. Toxicol. Appl. Pharmacol., 206(3):365366.
Calabrese, E.J. (2010): Hormesis is central to toxicology, pharmacology and risk assessment.
Hum. Exp. Tox., 29(4), 249261.
Christensen, F.M., Andersen, O., Duijm, N.J. and Harremos, P. (2003): Risk
terminologya platform for common understanding and better communication. J.
Hazardous Materials, A103, 81203.
Clark, A.J. (1933): The Mode of Action of Drugs on Cells. The Williams & Wilkins
Company, Baltimore, USA.
Clark, A.J. (1937): Handbook of Experimental Pharmacology. Springer, Berlin, Germany.
DePass, L.R. (1989): Alternative approaches in median lethality (LD50) and acute toxicity
testing. Toxicol. Lett., 49(2-3), 159170.
Dorato, M.A. and Engelhardt, J.A. (2005): The no-observed-adverse-effect-level in drug
safety evaluations: Use, issues, and denition(s). Reg. Toxicol. Pharmacol., 42(3),
265274.
Eaton, D.L. and Klaassen, D. (1996): Principles of Toxicology. In: Casarett and Doulls
Toxicology; The Basic Science of Poisons, 5th Edition, McGraw-Hill, New York,
USA.
EMEA (2006): European Medicines Agency. Note for Guidance on Dose Response
Information to Support Drug Registration (CPMP/ICH/378/95). ICH Topic E4 Dose
Response Information to Support Drug Registration. EMEA, London, UK.
EMEA (2007): European Medicines Agency. Guideline on Requirements for First-in-man
Clinical Trials for Potential High-risk Medicinal Products. EMEA/CHMP/SWP/
28367/2007. Committee for Medicinal Products for Human Use, EMEA, London,
UK.
EPA (1998): United States Environmental Protection Agency. Methods for Exposure-
Response Analysis for Acute Inhalation Exposure to Chemicals: Development of the
Acute Reference Exposure (ARE). EPA/600/R-98/051. External Review Draft. April
1998. USEPA, Washington, DC., USA.
EPA SAB (1998):United States Environmental Protection Agency Science Advisory Board.
A SAB Report: Development of the Acute Reference Exposure: Review of the Draft
Document Methods for Exposure-Response Analysis for Acute Inhalation Exposure
to Chemicals: Development of the Acute Reference Exposure (EPA/600/R-98/051)
by the Environmental Health Committee of the Science Advisory Board (SAB),
EPA-SAB-EHC-99-005. US EPA SAB. November 1998. USEPA, Washington, DC.,
USA.
FDA (2005): Food and Drug Administration. Guidance for industryEstimating the
Maximum Safe Starting Dose in Initial Clinical Trials for Therapeutics in Adult
Healthy Volunteers. Centre for Drug Evaluation and Research, Food and Drug
Administration, USFDA, Rockville, USA.
Finney, D.J. (1952): Probit Analysis. Cambridge University Press, Cambridge, UK.
Finney, D.J. (1971): Probit Analysis. 3rd Edition. Cambridge, London, UK.
Finney, D.J. (1978): Statistical Method in Biological Assay. 3rd Edition. Charles Grifn &
Co., London, UK.
Dose Response Relationships 161
Gottschalk, P.G. and Dunn, J.R. (2005): The ve-parameter logistic: A characterization and
comparison with the four-parameter logistic. Anal. Biochem., 343, 5465.
Greenberg, B.G. (1980): Chester I. Bliss, 18991979. International Statistical Review/
Revue Internationale de Statistique, 8(1), 135136.
Gupta, R.C. (2007): Veterinary ToxicologyBasic and Clinical Principles. Academic
Press, New York, USA.
Hayes, W.J. (1991): Dosage and other factors inuencing toxicology. In: Hayes, W.J. &
Laws, E.R. (Editors). Handbook of Toxicology, Vol. 1, General Principles. Academic
Press, San Diego, USA.
Healy, M.J.R. (1972): Statistical analysis of radioimmunoassay data. Biochem. J., 130,
107210.
Hodgson, E. (2010): A Text Book of Modern Toxicology. John Wiley & Sons Inc., New
Jersey, USA.
Holmstedt, B. and Lijestrand, G. (1981): Readings in Pharmacology. Raven Press, New
York, USA.
IPCS, (2009): International Programme for Chemical Safety. Principles and Methods for
the Risk Assessment of Chemicals in Food, Chapter 5, Dose-response Assessment
and Derivation of Health-based Guidance Values. Environmental Health Criteria 240,
IPCS, Geneva, Switzerland.
Kobayashi, K., Pillai, K.S., Michael, M., Cherian, K.M. and Ohnishi, M. (2010):
Determining NOEL/NOAEL in repeat-dose toxicity studies, when the low dose group
shows signicant difference in quantitative data. Lab. Animal Res., 26(2), 133-137.
Kimmel, C.A. and Gaylor, D.W. (1988): Issues in qualitative and quantitative risk analysis
for development toxicology. Risk Anal., 8, 1520.
Muhammad, F., Khan, A. and Ahmad, A. (1990): Logistic regression analyses in dose
response studies. J. Islamic Acad. Sci., 3:2, 103106.
Neus, H. and Boikat, U. (2000): Evaluation of trafc noise-related cardiovascular risk.
Noise & Health, 2(7), 6577.
OECD (1995): Organization for Economic Cooperation and Development. OECD
Guidelines for Testing of Chemicals. Repeated Dose 28-Day Oral Toxicity Study in
Rodents, No. 407. OECD, Paris, France.
OECD (2000a): Organization for Economic Cooperation and Development. Guidance Notes
for Analysis and Evaluation of Repeat-Dose Toxicity Studies. OECD Environment,
Health and Safety Publications Series on Pesticides, No. 10. OECD, Paris, France.
OECD (2000b). Organization for Economic Development and Co-operation. Acute Oral
Toxicity: Up-and-Down Procedure. OECD Guideline for the Testing of Chemicals,
Revised Draft Guideline, No. 425. OECD, Paris, France.
Rodbard, D., Munson, P.J. and DeLean, A. (1978): Improved curve tting, parallelism
testing, characterization of sensitivity and specicity, validation, and optimization for
radioimmunoassays 1977. Radioimmunoassay and Related Procedures in Medicine 1,
Vienna, Italy. Int. Atomic Energy Agency (1978) 469504.
Sand, S., von Rosen, D., Victorin, K. and Filipsson, A.F. (2006): Identication of a
critical dose level for risk assessment: Developments in Benchmark dose analysis of
continuous endpoints. Tox. Sci., 90(1), 244251.
162 A Handbook of Applied Statistics in Pharmacology
Schulz, H. (1887): Zur lehre von der arzneiwirdung. Virchows Archiv fur Pathol. Anatom.
und Physiol. fur Klinische Medizin,108, 423445.
Strickland, J.A. (2000): CatReg Software User Manual. US Environmental Protection
Agency. EPA 600/R-98/052.
Southam, C.M. and Ehrlich, J. (1943): Effects of extracts of western red-cedar heartwood
on certain wood-decaying fungi in culture. Phytopath., 33, 517524.
Trevan, J. (1927): The error of determination of toxicity. Proc. R. Soc., 101B, 483514.
Turner, R.J and Charlton, S.J. (2005): Assessing the minimum number of data points
required for accurate IC50 determination. Assay Drug Devp. Technol., 3(5), 525531.
Vlund, A. (1978): Application of the four-parameter logistic model to bioassay: comparison
with slope ratio and parallel line models. Biometrics, 34(3), 357365.
WHO (1994): World Health Organisation. Assessing Human Health Risks of Chemicals:
Derivation of Guidance Values for Health Based Exposure Limits. Environmental
Health Criteria 170, WHO, Geneva, Switzerland.
WHO (2007): World Health Organisation. Evaluation of Certain Food Additives and
Contaminants. Sixty-eighth Report of the Joint FAO/WHO Expert Committee on Food
Additives. WHO Technical Report Series No. 947, WHO, Geneva, Switzerland.
17
Analysis of Pathology Data
Pathology in Toxicology
Pathology occupies a pivotal role in animal experiments. The toxicity
of a compound can be assessed by linking compound-related changes in
biochemical, haematological or urinalysis parameters with organ weight,
gross pathology and/or histopathological changes (Tyson and Sawhney,
1985; Krinke et al., 1991). All regulatory guidelines on animal experiments
have given special emphasis to pathology. For example, in the long-term
repeated dose administration studies, it is a regulatory requirement that
all data relating to moribund or dead animals as well as the results of
postmortem examinations is scrutinized and the analysis of the cause of
individual deaths is done (OECD, 2000).
Pathologists usually make a biological judgment based on their
experience, which differs from one pathologist to the other (Glaister,
1986). In a repeated dose administration study involving a large number
of animals, the observation of tissue section slides may be completed
over a substantial length of time. Thus it is not possible to maintain
the consistency of grading the lesions, causing a diagnostic drift. It
has been stated that even the nomenclature used to describe pathology
ndings in toxicology studies suffers from the lack of uniformity. Use
of different nomenclature for describing the lesions causes difculties
while interpreting the observations (Haseman et al., l984). Statistically
and logically, blinding the slides is the best way to avoid the bias. But,
several veterinary pathologists do not favor this, because they fear that
blinded reading of slides of animal tissues/organs may result in loss
of information critical to interpretation, such as the ability to relate
164 A Handbook of Applied Statistics in Pharmacology
Peto test
While most pharmaceutical companies use the Peto test (Peto et al., 1980),
some do not categorize neoplasms as fatal or incidental. Generally, this test
is considered to be useful for the groups with different survival rates. Before
analysing, pathological ndings should be examined (whether malignant
of benign) and conclude whether the drug caused the death or not. Some
categorize neoplasms as fatal or incidental based solely on the type of
neoplasm rather than on an animal-by-animal basis. Others categorize
neoplasms as fatal or incidental based on the gross and microscopic
ndings for each animal. Some controversies exist when relying on the
Peto test for information on cause of death (STP, 2002).
166 A Handbook of Applied Statistics in Pharmacology
Decision rules
A distinguished characteristic of the Peto test is that it involves dosages
in the calculation procedure. The power of the Peto test is very high,
when the signicance level is set at 5% probability level. However,
the use of signicance set at 5% and 1% probability levels in tests for
positive trend in incidence rates of rare tumours and common tumours,
respectively, will result in an overall false positive rate around 10% in a
study in which only one 2-year rodent bioassay (plus the shorter rodent
study) is conducted (Lin, 1998; Lin and Rahman, 1998). The power to
detect a signicant difference is greater with the trend tests than with the
pair-wise comparisons in an animal experiment with a control group and
more than two treatment groups. There are situations in which pair-wise
comparisons between control and individual treated groups may be more
appropriate than trend tests. However, both trend and pair-wise comparison
tests are likely to cause false positive results. In order to control overall
positive rates associated with trend tests and pair-wise comparisons certain
statistical decision rules were developed (Haseman,1983). The decision
rules were developed based on historical control data of Crl: CD BR rats
and Crl: CD-1 (ICR) BR mice to achieve an overall false positive rate
of around 10% for the standard in vivo carcinogenicity studies in rodents.
The decision rule tests the signicance difference in tumour incidences
between the control and the treatment groups at 5% probability level for
rare tumours (tumours with background rate of 1% or less) and at 1%
probability level for common tumours (frequent tumours). However, the
decision rule described by Haseman (1983) to analyse the trend tests
would lead to an excessive overall false positive error rate about twice
as large as that associated with control-high dose pair-wise comparison
Analysis of Pathology Data 167
tests. Statistical decision rules for controlling the overall false positive
rates associated with tests for positive trend or with control vs high dose
pair-wise comparison in tumour incidences in carcinogenicity studies
were reported by FDA (2001). These decision rules test positive trend in
tumour incidence at 2.5% probability level for rare tumours and at 0.5%
probability level for common tumours. Although the overall false positive
rate resulting from the use of the decision rule may vary from study to
study, it is estimated that it will be around 10%.
The decision rules for testing positive trend or differences between
control and individual treatment groups in incidence rates of tumours
for standard studies using two species and two sexes as well as studies
following ICH guidance and using only one 2-year rodent bioassay are
summarized in Table 17.1.
Table 17.1. Statistical decision rules for controlling the overall false positive rates
associated with tests for positive trend or with control vs high dose pair-wise comparisons
in tumour incidences to around 10 percent in carcinogenicity studies of pharmaceuticals
(FDA, 2001).
Study Tests for positive trend Control vs high dose pair-
wise comparison
Standard 2-year studies Common and rare tumours Common and rare tumours
with 2 species and 2 sexes are tested at 0.5% and are tested at 1% and
2.5% probability levels, 5% probability levels,
respectively respectively
Alternative ICH studies Common and rare tumours Under development and not
(one two-year study in one are tested at 1% and yet available.
species and one short- or 5% probability levels,
medium-term study, two respectively
sexes)
Note: The decision rules were developed assuming the use of two-species and two-sex (or
one-species and two-sex) for the standard design of a two-year study with 50 animals in
each of the four treatment/sex/group.
Table 17.2. Comparison of treatment group with historical control data using Kastenbaum
and Bowman test (Kastenbaum and Bowman, 1966)
Incidence of tumour Incidence of tumour in 50 animals (Treatment group)
(Historical control dataa) 1 (2%) 2 (4%) 3 (6%) 4 (8%)
1/ 200 (0.5%) NS NS NS NS
2/ 500 (0.4%) NS NS NS *
3/ 700 (0.4%) NS NS NS **
4/1000 (0.4%) NS NS * **
5/1250 (0.4%) NS NS * **
7/1500 (0.5%) NS NS * **
7/1700 (0.4%) NS NS * **
8/2000 (0.4%) NS NS * **
10/2500 (0.4%) NS NS * **
a
Number of animals in the historical controls showing tumour/total number of animals in
the historical controls; NS-Not signicance, *P<0.05, **P<0.01.
Analysis of Pathology Data 169
statistically and compared with the incidence in the historical control data
as well as those in the concurrent control group (Kobayashi and Inoue,
1994).
58 2 50 2 62 2 65 2
X= 235 2.157
235 u 0.25 235 u 0.25 235 u 0.25 235 u 0.25
Carmichael et al., 1997; Meyer, 2003; Doe et al., 2006). However, some
current regulatory programmes require carcinogenicity testing in rats and
mice.
Kobayashi et al. (1999) made an interesting comparison of incidence
of spontaneous malignant tumours in human, rats, mice and dogs. The
prevalence of each carcinoma in rodents was calculated as the population
ratio P, at a 95% condence interval, and compared with that in humans.
The primary carcinomas according to sex in Japanese people who died
of cancer were cited from the report of investigations on the population
dynamics and economy in 1992, Malignant neoplasm published by the
Welfare Statistics Association, Japan (Ministers Secretariat, 1994). Data
on spontaneous incidence of tumours in rats, mice and dogs were obtained
from Biosafety Research CentreFoods, Drugs and Pesticides, Japan. The
incidence of spontaneous malignant tumours of various organs in humans,
rodents and dogs is shown in Table 17.4.
Table 17.4. Incidence (%) of spontaneous malignant tumours in dead humans, rodents and
dogs
Male Female Male+Female
Organ
Human Rat Mouse Human Rat Mouse Dog
No. of deaths
139674 105 120 92243 117 100 5845
with cancer
Total 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Esophagus 4.7 0 0 1.4 0 0 0.3
Stomach 21.8 1.0 0 19.0 0.9 1.0 0.3
Intestinea 4.4 0 0.89 4.3 0 0 1.0
Liver 14.0 0. 52.5 8.1 1.7 24.0 0.7
Pancreas 5.6 0 0 6.9 0.9 2.0 0.5
Lung, trachea,
20.9 2.9 5.0 11.9 0 1.0 0.6
bronchi
Mammary gland <0.1 0 0. 7.1 2.6 8.0 9.1
Uterus - - - 5.1 10.3 11.0 0.3
Leukemia 2.4 53.3 20.8 2.69 59.8 31.0 4.3
Other 26.1 42.9 20.8 33.9 23.9 22.0 82.9
a
Including colon and anus in humans, small intestine, duodenum, large intestine and colon
in rodents, and colon in dogs.
Since there are four groups, the data is analysed using one-way
ANOVA, which shows a non-signicant F value, indicating that there is no
signicant difference in the absolute weight of the liver among the groups.
Close examination of the mean value of the groups indicates that there is
a dose-dependent increase in the absolute weight of the liver. When the
data is analysed using Dunnetts multiple comparison test, absolute weight
of the liver of the high dose group is found to be signicantly different
from the control group. It may be worth mentioning in this context that
Dunnett (1964) did not recommend ANOVA prior to multiple comparison
tests. Several authors are of the opinion that the error of second kind can
be prevented by carrying out direct multiple comparison tests without
subjecting the data to ANOVA (Hamada et al., 1998; Sakaki et al., 2000;
Kobayashi et al., 2000).
References
Ahn, H., Kodell, R.L. and Moon, H. (2000): Attribution of tumour lethality and estimation
of time to onset of occult tumours in the absence of cause-of-death information. App.
Stat., 49, 157169.
174 A Handbook of Applied Statistics in Pharmacology
Bailer, A.J. and Portier, C.J. (1988): Effects of treatment-induced mortality and tumour-
induced mortality on tests for carcinogenicity in small samples. Biometrics, 44, 417
431.
Bailey, S.A., Zidell, R.H. and Perry, R.W. (2004): Relationships between organ weight and
body/brain weight in the rat: what is the best analytic endpoint? Toxicol. Pathol., 32,
448466.
Billington, R., Lewis, R., Mehta, J. and Dewhurst, I. (2010): The mouse carcinogenicity
study is no longer a scientically justiable core data requirement for the safety
assessment of pesticides. Crit. Rev.Toxicol., 40, 3549.
Carmichael, N.G., Enzmann, H., Pate, I. and Waechter, F. (1997): The signicance of mouse
liver tumour formation for carcinogenic risk assessment: Results and conclusions
from a survey of ten years of testing by the agrochemical industry. Environ. Health
Perspect., 105, 11961203.
Deschl, U., Kittel, B., Rittinghausen, S., Morawietz, G., Kohler, M., Mohr, U. and Keenan,
C. (2002): The value of historical control dataScientic advantages for pathologists,
industry and agencies. Toxicol. Pathol., 30, 8087.
Dinse, G.E. (1994): A comparison of tumour incidence analyses applicable in single-
sacrice animal experiments. Stat. Med., 13, 689708.
Doe, J.E., Boobis, A.R., Blacker, A., Dellarco, V., Doerrer, N.G., Franklin, C., Goodman,
J.I., Kronenberg, J.M., Lewis, R., Mcconnell, E.E., Mercier, T., Moretto, A., Nolan,
C., Padilla, S., Phang, W., Solecki, R., Tilbury, L., van Ravenzwaay, B. and Wolf, D.C.
(2006): A tiered approach to systemic toxicity testing for agricultural chemical safety
assessment. Cri. Rev. Toxicol., 36, 3768.
Dunnett, C.W. (1964): New tables for multiple comparisons with a control. Biometrics,
20(3), 482491.
EMEA (2002): European Medicines Agency. CPMP, Note for guidance on carcinogenic
potential, EMEA, CPMP/SWP/2877/00, London, 25 July 2002. http://www.emea.
europa.eu/pdfs/human/swp/287700en.pdf.
Ennever, F.K. and Lave, L.B. (2003): Implications of the lack of accuracy of the lifetime
rodent bioassay for predicting human carcinogenicity. Reg. Tox. Pharm., 38, 5257.
EPA (2005): United States Environmental Protection Agency. Guidelines for Carcinogen
Risk Assessment. U.S. Environmental Protection Agency (USEPA), Washington DC,
USA.
FDA (2001): Food and Drug Administration. Statistical Aspects of the Design, Analysis,
and Interpretation of Chronic Rodent Carcinogenicity Studies of Pharmaceuticals.
Draft Guidance, US FDA, Rockville, MD, USA.
Gad, S.C. and Rousseaux, C.G. (2002): Use and misuse of statistics in the design and
interpretation of studies. In: Handbook of Toxicologic Pathology, 2nd Edition.
Editors, Haschek, W.M., Rousseaux, C.G. and Wallig, M.A. Academic Press, San
Diego, USA.
Glaister, J.R. (1986): Principles of Toxicological Pathology. Taylor & Francis, Philadelphia,
USA.
Goodman, D.G. (1988): Factors Affecting Histopathologic Interpretation of Toxicity-
Carcinogenicity Studies. Carcinogenicity: The Design, Analysis, and Interpretation of
Long-Term Animal Studies. ILSI Monographs, Springer-Verlag, New York, USA.
Analysis of Pathology Data 175
Greim, H., Gelbke, H.P., Reuter, U., Thielmann, H.W. and Elder, L. (2003): Evaluation of
historical control data in carcinogenicity studies. Hum. Exp. Toxicol., 22(10), 541
549.
Grifths, S.A., Parkinson, C., McAuslane, J.A.N. and Lumley, C.E. (1994): The utility
of the second rodent species in the carcinogenicity testing of pharmaceuticals.
Toxicologist, 14(1), 214.
Hamada, C., Yoshino, K., Matsumoto, K., Nomura, M. and Yoshimura, I. (1998): Three-
type algorithm for statistical analysis in chronic toxicity studies. J. Toxicol. Sci., 23
(3), 173181.
Haseman, J.K. (1983): A reexamination of false-positive rates for carcinogenesis studies.
Fund. Appl. Toxicol., 3, 334339.
Haseman, J.K., Huff, J. and Boorman, G.A. (1984): Use of historical control data in
carcinogenicity studies in rodents. Toxicol. Pathol., 12, 126135.
House, D.E., Berman, E., Seely, J.C. and Simmons, J.E. (1992): Comparison of open and
blind histopathologic evaluation of hepatic lesions. Toxicol. Lett., 63, 127133.
Iatropoulos, M.J. (1984): Editorial : Toxicol. Pathol., 12(4), 305306.
Iatropoulos, M.J. (1988): Society of Toxicologic Pathologists Position Paper: Blinded
Microscopic Examination of Tissues from Toxicologic or Oncogenic Studies, In:
Carcinogenicity, the Design, Analysis, and Interpretation of Long-Term Animal
Studies, ILSI Monographs, Editros, Grice, H.C. and Ciminera, J.L., Spring-Verlag,
New York, USA.
Kastenbaum, M.A. and Bowman, K.O. (1966): The minimum signicant number of
successes in a binominal sample. Oak Ridge National Laboratory (ORNL-3909),
Oak, Tennessee, USA.
Keenan, C., Elmore, S., Francke-Carroll, S., Kemp, R., Kerlin, R., Peddada, S., Pletcher,
J., Rinke, M., Schmidt, S.P., Taylor, I. and Wolf, D.C. (2009): Best practices for use of
historical control data of proliferative rodent lesions. Toxicol. Pathol., 37, 679693.
Kobayashi, K., Hagiwara, T., Miura, D., Ohori, K., Takeuchi, H., Kanamori, M. and
Takasaki, K. (1999): A comparison of spontaneous malignant tumours in humans,
rats, mice and dogs. J. Environ. Biol., 20(3), 189193.
Kobayashi, K. and Inoue, H. (1994): Statistical analytical methods for comparing the
incidence of tumours to the historical control data. J. Toxicol. Sci., 19(1), 16.
Kobayashi, K., Kanamori, M., Ohori, K., and Takeuchi, H. (2000): A new decision tree
method for statistical analysis of quantitative data obtained in toxicity studies on
rodents. San Ei Shi, 42, 125129.
Kobayashi, K. and Pillai, K.S. (2003): Applied Statistics in Toxicology and Pharamacology,
Science Publishers, Eneld, USA.
Kodell, R.L, Farmer, J.H., Gaylor, D.W. and Cameron, A.M. (1982): Inuence of cause-
of-death assignment on time-to-tumour analyses in animal carcinogenesis studies. J.
Natl. Cancer Inst., 69, 659664.
Krinke, G.J., Perrin, L.P.A. and Hess, R. (1991): Assessment of toxicopathological effects
in ageing laboratory rodents. Arch. Toxicol. Suppl., 14, 4349.
176 A Handbook of Applied Statistics in Pharmacology
Lee, P.N., Fry, J.S., Fairweather, W.R., Haseman, J.K., Kodell, R.L., Chen, J.J., Roth, A.J.,
Soper, K. and Morton, D. (2002): Current issues: statistical methods for carcinogenicity
studies. Toxicol. Pathol., 30, 403414.
Lin, K.K. (1998): CDER/FDA formats for submission of animal carcinogenicity study
data. Drug Information J., 32, 4352.
Lin, K.K. (2000): Progress report on the guidance for industry for statistical aspects of
the design, analysis, and interpretation of chronic rodent carcinogenicity studies of
pharmaceuticals. J. Biopharm. Stat., 10(4), 481501.
Lin, K.K. and Rahman, M.A. (1998): False Positive Rates in Tests for Trend and Differences
in Tumour incidence in Animal Carcinogenicity Studies of Pharmaceuticals under
ICH Guidance S1B, Unpublished Report, Division of Biometrics 2, Center for Drug
Evaluation and Research, Food and Drug Administration. USFDA, MD, USA.
Malani, H.M. and Van Ryzin, J. (1988): Comparison of two treatments in animal
carcinogenicity experiments. J. Am. Stat. Assoc., 83, 11711177.
Mc Connell, E.E., Solleveld, H.A., Swenberg, J.A. and Boorman, G.A. (1986): Guidelines
for combining neoplasms for evaluation of rodent carcinogenicity studies. J. Natl.
Cancer Inst., 76, 283289.
Meyer, O. (2003):Testing and assessment strategies, including alternative and new
approaches. Toxicol. Lett., 140141, 2130.
Ministers Secretariat (1994): The Report of Investigation on Population Dynamics and
Social Economy in 1992 Malignant Neoplasm, Welfare Statistics Association,
Tokyo, Japan.
Mohr, U., Dungworth, D.L. and Capen, C.C. (1992): Pathobiology of the Aging Rat. Vol.
1. ILSI Press, Washington, DC, USA.
Mohr, U., Dungworth, D.L. and Capen, C.C. (1994): Pathobiology of the Aging Rat. Vol 2.
ILSI Press, Washington, DC, USA.
Mohr, U., Dungworth, D.L., Ward, J., Capen, C.C., Carlton, W. and Sundberg, J. (1996):
Pathobiology of the Aging Mouse. Vols. 1 & 2. ILSI Press, Washington, DC, USA.
Morton, D., Elwell, M., Fairweather, W., Fouillet, X., Keenan, K., Lin, K., Long, G.,
Mixson, L., Morton, D., Peters, T., Rousseaux, C. and Tuomari, D. (2002): The
Society of Toxicologic Pathologys recommendations on statistical analysis of rodent
carcinogenicity studies. Toxicol. Pathol., 30(3): 415418.
Murakami, M., Yamada, M. and Yokouchi, H. (2000): Statistical method appropriate for
general toxicological studies in rats. J. Toxicol. Sci., 25, 7198.
Newberne, P.M. and de la Lglesia, F.A. (1985): Editorial: Philosophy of blind slide reading.
Toxicol. Pathol., 13(4), 255.
OECD (2000): Organisation for Economic Cooperation and Development. Environment
Directorate Joint Meeting of the Chemicals Committee and the Working Party on
Chemicals, Pesticides and Biotechnology. Guidance Notes for Analysis and Evaluation
of Repeat-Dose Toxicity Studies. OECD Series on Testing and Assessment, Number
32 and OECD Series on Pesticides, Number 10, ENV/JM/MONO(2000)18, Paris,
France.
OECD (2002): Organisation for Economic Cooperation and Development. Environment,
Health and Safety Publications. Series on Testing and Assessment No. 35 and Series
Analysis of Pathology Data 177
on Pesticides No. 14. Guidance Notes for Analysis and Evaluation of Chronic Toxicity
and Carcinogenicity Studies. ENV/JM/MONO 19, Paris, France.
OECD (2009): Organization for Economic Cooperation and Development. Guidelines for
Testing of Chemicals. Carcinogenicity Studies. Test Guideline 451. OECD, Paris,
France.
Peto, R., Pike, M., Day, N.E., Gray, R.G., Lee, P.N., Parish, S., Peto, J., Richards, S.
and Wahrendorf, J. (1980): Guidelines for Simple, Sensitive Signicance Tests for
Carcinogenic Effects in Long-term Animal Experiments. In: IARC Monographs
on the Evaluation of Carcinogenic Risk of Chemicals to Humans, Supplement of
Long-term and Short-term Screening Assays for Carcinogens: A Critical Appraisal.
International Agency for Research on Cancer, Lyon, France.
Piegorsch, W.W. and Bailer, A.J. (1997): Statistics for Environmental Biology and
Toxicology, Chapman and Hall, London, UK.
Portier, C.J. and Bailer, A.J. (1989): Testing for increased carcinogenicity using a survival-
adjusted quantal response test. Fundam. Appl. Toxicol.,12, 731737.
Prasse, K., Hildebrandt, P. and Dodd, D. (1986): Letter to the Editor: Vet. Pathol., 23,
540541.
Sakaki, H., Igarashi, S., Ikeda, T., Imamizo, K., Omichi, T., Kadota, M., Kawaguchi, T.,
Takizawa, T., Tsukamoto, O., Terai, K., Tozuka, K., Hirata, J., Handa, J., Mizuma, H.,
Murakami, M., Yamada, M. and Yokouchi, H. (2000): Statistical method appropriate
for general toxicological studies in rats. J. Toxicol. Sci., 25, 7198.
Sellers, R.S., Mortan, D., Michael, B., Roome, N., Johnson, J.K., Yano, B.L., Perry, R. and
Schafer, K. (2007): Society of Toxicologic Pathology Position Paper: Organ weight
recommendations for toxicology studies. Toxicol. Pathol., 35(5), 751755.
Sielken, R.L. (1988): A critical evaluation of dose-response assessment of TCDD. Food
Chem. Toxicol., 26(1), 7983.
Storer, R.D., Sistare, F.D., Reddy, M.V. and DeGeorge, J.J. (2010): An industry perspective
on the utility of short-term carcinogenicity testing in transgenic mice in pharmaceutical
development. Toxicol. Pathol., 38, 5161.
STP (2002): STP Peto Analysis Working Group (2002). The Society of Toxicological
Pathologys recommendations on rodent carcinogenicity studies. Toxicol. Pathol., 30,
415418.
Sun, J. (1999): On the use of historical control data for trend test in carcinogenicity studies.
Biometrics, 55, 12731276.
Tyson, C.A. and Sawhney, D.S. (1985): Organ Function Tests in Toxicology Evaluation.
Noyes Publications, Park Ridge, New Jersey, USA.
Wade, A. (2005): Fear or favour? Statistics in pathology. J. Clin. Pathol., 53,1618.
18
Designing An Animal
Experiment in Pharmacology and
ToxicologyRandomization,
Determining Sample Size
Acclimation
It should be ensured that the animals are not stressed at the start of the
experiment. One way to ensure this is by acclimating the animals to the
laboratory conditions. The acclimation period can be used for health-related
quarantine and monitoring, and for behavioral conditioning. This period
may include habituation to, desensitization to, and training for procedures
that will be involved in experimental use (Bloomsmith et al., 2006). Well-
acclimated animals are able to deal appropriately with the challenges
of the experimental environment. This ability is typically manifested in
a transient divergence from equilibrium in response to a manipulation,
followed by a gradual return to homeostatic balance (Schapiro and Everitt,
2006). Animals appearing to be behaviorally acclimated to a procedure may
not necessarily physiologically acclimated to that procedure (Capitanio
et al., 2006). For example, acclimated animals may sometimes show
change in metabolic proles. Changes in nuclear magnetic resonance
spectroscopic-based urinary metabolite proles were observed in germ-
free rats acclimated in standard laboratory animal facility conditions
(Nicholls et al. 2003).
Randomization
Appropriate randomization and statistical procedures in the design of animal
experimentation provide condence that statistically signicant results are
180 A Handbook of Applied Statistics in Pharmacology
One-way ANOVA of the data given in Table 18.3 indicates that there
is no signicant different in body weight of rats among the groups. This
is still not satisfactory for few researchers. The difference of the body
weight between groups 1 and 3 is about 8 g. In order to bring the mean
body weight of these two groups closer, one more adjustment is required.
A rat of 210 g is taken from group 1 and placed in group 3. Then a rat of
185 g is taken from group 3 and placed in group 1. Now the animals are
distributed as given in Table 18.4.
Table 18.4. Body weight (g) of rats after rearranging the animals (second time)
Statistic Group 1 Group 2 Group 3
1981 2132 2103
2052 1891 1933
1851 1801 1952
2013 2083 1903
2031 2112 2152
Mean 198.40 200.20 200.60
CV (%) 3.99 7.39 5.56
Note: Superscripts indicate litter number.
Designing An Animal Experiment in Pharmacology and Toxicology 183
The mean values of the three groups are very close to each other, thus
satisfactory. If you closely observe the individual values of the groups, you
will realize that Group 3 represents animals from litters 2 and 3 and Groups
1 and 2 represent animals from all the three litters. Rearrangement increases
variation within the groups, consequently, the animals respond to a treatment
differently. This is evident from the Tables 18.2 and 18.4. The variations
(CV%) of groups 1, 2 and 3 after randomization, but before rearrangement
were 2.22, 5.07 and 3.24, respectively (Table 18.2). After the rearranging
the animals a second time, the variations (CV%) of groups 1, 2 and 3 were
3.99, 7.39 and 5.56, respectively (Table 18.4). Such variations reduce the
power of the experiment (Beynen et al., 2001). In the rst randomization
(Table 18.2), each group represented animals from all the litters and the
variation (CV%) among the groups are less and somewhat close to each
other. Therefore, rearrangements of observations after the randomization to
obtain desired mean values should be avoided as far as possible.
experiment you are condent that there is a treatment-related effect, but the
statistical analysis does not show it because of random variation. This is a
typical example of error that commonly occurs in animal experiments. A
large error is a risk in detecting a genuine difference. Power of a study to
detect a signicant difference is explained by this risk:
Power = 1
In simple language, the power is the probability of obtaining a statistically
signicant result using a statistical test (Lenth, 2007). In other words,
power of the test is the probability of correctly rejecting the null hypothesis,
when false. A study with a high power is unlikely to fail in detecting a
genuine signicant difference, whereas a study with a weak power may
fail in detecting a genuine signicant difference. The power of the tests
can be improved by increasing , sample size, or limiting the statistical
analysis to detection of large differences among samples (Hayes, 1987).
To design an experiment to investigate the effect of a hypoglycemic
NCE in diabetic rats, the blood sugar in the individual diabetic rat is
measured before and after the treatment with the NCE. Then the difference
in blood sugar level of the individual rat is calculated. Another group of
animals treated similarly, but with a placebo is also maintained. Let us
work out number of animals required in each group to obtain the desired
result. For that specications of the study need to be dened:
1. The signicance level (probability of error). Usually it is set at 5%
probability level.
2. Probability of error is set at 10%. The statistical power (1) is
90%.
3. The desired treatment effect (difference between NCE treated group
and placebo treated group. This is determined based on the factors
like clinical, economical etc.)
4. Estimate of expected variation (variation between individual
measurements with respect to difference of before and after
treatments. This is estimated based on earlier experiments of similar
nature or a pilot study)
5. Type of statistical analysis (since there are only two groups, the t-test
would be better).
Designing An Animal Experiment in Pharmacology and Toxicology 185
( Z a / 2 ZS ) 2
n= 2
2
P1 P 2 / V
Number of animals in each group by one-sided test can be calculated using
the formula,
( Z a ZS ) 2
n= 2 2
P1 P 2 / V
Let us work out an Example; = 0.05, S = 0.9, Desired effect = 25%; V
=15% (CV).
Z = 1.645 (vide Appendix 3 for Z 0.05)
Z S = 1.282 (vide Appendix 3 for Z 0.10)
(1.645 1.282) 2
n= 2 = 6.2; Number of animals required in each group = 7
25 /15
2
References
Aguilar-Nascimento, J.E. (2005): Fundamental steps in experimental design for animal
studies. Acta Cir. Bras., 20(1), 18.
Bebarta, V., Luyte, D. and Heard, K. (2003): Emergency medicine research: Does use of
randomization and blinding affect the results? Acad. Emerg. Med., 10, 684687.
Beynen, A.C., Festing, M.F.W. and van Montfort, M.A.J. (2001): Design of Animal
Experiments. In: Principles of Laboratory Animal Science, Editors, van Zutphen,
L.F.M., Baumans, V. and Beynen, A.C. Elsevier Science B.V., The Netherlands.
Bloomsmith, M.A., Schapiro, S.J. and Strobert, E.A. (2006): Preparing chimpanzees for
laboratory research. ILAR J., 47, 316325.
Capitanio, J.P., Kyes, R.C. and Fairbanks, L.A. (2006): Considerations in the selection and
conditioning of old world monkeys for laboratory research: Animals from domestic
sources. ILAR J., 47, 294306.
Cox, D.R. (1958): Planning Experiments. John Wiley & Sons, New York, USA.
Dirnagl, U. and Macleod, M.R. (2009): Stroke research at a road block: the streets from
adversity should be paved with meta-analysis and good laboratory practice. Br. J.
Pharmacol., 157(7), 11541156.
EPA. (2005): United States Environmental Protection Agency. Guidelines for Carcinogen
Risk Assessment. EPA/630/P-03/001B. http://www.epa.gov/iris/backgr-d.htm.
Festing, M.F.W. (1997): Teaching statistics can save animals, In: Animal Alternatives,
Welfare and Ethics. Edited by van Zuphen, L.F.M. and Balls, M. Elsevier Science
B.V., Amsterdam, The Netherlands.
Festing, M.F.W. (2003): Principles: the need for better experimental design. Trends
Pharmacol. Sci., 24, 341345.
Festing, M.F.W. and Altman, D.G. (2002): Guidelines for the design and statistical analysis
for experiments using laboratory animals. ILAR J., 43, 244258.
Fisher, R.A. (1935): The Design of Experiments. 8th Edition, 1966. Hafner Press, New
York, USA.
Hamada, C. and Ono, H. (2000): The role of biostatistics in pharmacological studies
(randomization and statistical evaluation). Nihon Yakurigaku Zasshi, 116(1), 411.
Hauschke, D. (1997): Statistical proof of safety in toxicological studies. Drug Inf. J., 31,
357361.
Hayes, J.P. (1987): The positive approach to negative results in toxicology studies. Ecotox.
Environ. Safety, 14(1), 7377.
Hess, K.R. (2011): Statistical design considerations in animal studies published recently in
cancer research. Cancer Res., 71, 625.
Kilkenny, C., Parsons, N., Kadyszewski, E., Festing, M.F.W., Cuthill, I.C., Fry, D., Jane
Hutton, J. and Altman, D.J. (2009): Survey of the quality of experimental design,
statistical analysis and reporting of research using animals. PLoS One, 4(11), 111.
Kozinetz, C.A. (2011): Application of epidemiologic principles for optimizing preclinical
research study design. Int. J. Preclin. Res., 2(1), 6365.
Lenth, R.V. (2007): Statistical power calculations. J. Anim. Sci. 2007. 85 (E. Suppl.),
E24E29.
Designing An Animal Experiment in Pharmacology and Toxicology 187
Decision Trees
Several attempts have been made to standardize statistical methodologies
for the analysis of data obtained from the toxicological and pharmacological
studies. One of the methodologies proposed by several authors is the
tree-type algorithms (Gad and Weil, 1986; Healey, 1997; Hamada et al.,
1998; Gad, 2006). The tree-type algorithms are called as decision trees,
which are graphical representation of decisions involved in the choice of
How to Select An Appropriate Statistical Tool? 189
End End
Group size Group size
Same Diff. Diff.
Same
Visual recognition of
data (scatter diagram
or box-plot)
Check for
homogeneity
(Bartletts
test)
P<0.01(Heterogeneity) P>0.01 ( Homogeneity)
Log-transformation of the
data
P0.05 (Homogeneity)
P<0.05 (Heterogeneity)
Figure 19.2. Tree-type algorithm for the analysis of quantitative data proposed by Hamada
et al. (1998)
P=0.05
Bartletts test
Dunnetts multiple
Steels test
comparison test
Figure 19.3. The tree-type algorithm for the analysis of toxicological data proposed by
Kobayashi et al. (2000)
How to Select An Appropriate Statistical Tool? 191
Williams test
(=0.025, 2-sided)
Steel test
End
(=0.025, 2-sided)
Figure 19.4. The tree-type algorithm for the analysis of quantitative data obtained from
repeated dose administration studies proposed by Sakaki et al. (2000)
*
/
$2
$%
0
1
$
$
() $
(+)
##
3 3 ! "#
$
$% :6
$
() $
(+)
$2
$
(+) $
() $
$
&'
3 3 (
) 2
4
$ $
(+)
- / *
$
1
A Handbook of Applied Statistics in Pharmacology
*
*1 1
*
$ $
()
$&;; *
$
#4$
$
-
##:
8
$
6
$$
Figure 19.6. Decision tree produced by OECD for the analysis of data in long-term
toxicology studies (OECD, 2010)
Figure 19.7. Flow chart for selecting the statistical tool when the data show a normal or
non-normal distribution (Situation 1, Number of group = 2)
196 A Handbook of Applied Statistics in Pharmacology
Figure 19.8. Flow chart for selecting the statistical tool when the data show a normal or
non-normal distribution (Situation 2, Number of group 3)
Table 19.1. Analysis of qualitative data of urinalyses and pathological ndings (Kobayashi,
2010)
Incidence
Table 19.2. Parametric and non-parametric statistical tools for the analysis of data obtained
from toxicology studies
Group settings Parametric test Non-parametric test
Only two groups Student, Aspin-Welch, Cochran- Mann-Whitney U test,
Coxt-tests Wilcoxon test
Three or more ANOVA Kruskal-Wallis rank sum test
group Dunnetts multiple comparison Nonparametric type Dunnetts
test, General, multiple comparison rank sum test
test Steels test
Tukeys multiple range test Nonparametric type Tukeys
(the size of the group is the same) rank sum test
Tukey-Kramers multiple range test Steel-Dwass test
(the size of the group is different)
Duncans multiple range test Nonparametric type Duncans
rank sum test
Scheffs multiple comparison test Nonparamteric type Scheffs
rank sum test
Williamss t-test (analyzes the Shirley-Williamss test
difference of the mean values
between each treated group and
control, when the mean value of
the treated groups changes in one
direction.)
Jonckheeres trend test
Tests recommended.
198 A Handbook of Applied Statistics in Pharmacology
Table 19.3. Statistical tools suggested for the comparison of two and multi-groups
Group setting Comparison Analysis
Only two groups Only one time Aspin-Welchs t-test
Control(x0), Low Analysis of difference of the Dunnetts multiple comparison
dose(x1), Mid- chisel between control group and test;
dose(x2), High each dose group (the analysis Williamss t-test (assumption:
dose(x3) frequency is three times) data possess a dose-
dependency)
Control, Drug A, Analysis of difference between Dunnetts multiple comparison
Drug B, Drug C control group and each drug test
or or group (total number of
Group A, Group B, comparisons made is three)
Comparison of all pairs (total Tukeys multiple range test;
Group C, Group D
number of comparisons made is Duncans multiple range test
six)
Control(x0), Low Analysis of difference between Dunnetts test or Williamss
dose(x1), control group and Reference t-test. Examine if there is a
Mid dose(x2), drug followed by comparison signicant difference between
High dose(x3), of control group with each dose x0 and R1 by t-test; if there is
group. a signicance, then compare
Reference drug (R1)
the control with x1, x2 and x3,
excluding R1 using the tests of
Dunnett or Williams.
References
Altman, D. and Bland, M. (1994): Regression towards the mean. British Med. J., 308,
1499.
Bailer, A.J., and Portier, C.J. (1988): Effects of treatment-induced mortality and tumor-
induced mortality on tests for carcinogenicity in small samples. Biometrics, 44, 417
431.
Cox, D.R. (1972): Regression models and life-tables. J.R. Stat. Soc., B34, 187220.
Dago, K.T., Luthringer, R., Lengell, R., Rinaudo, G. and Macher, J.P. (1994): Statistical
Decision Tree: A Tool for Studying Pharmaco-EEG Effects of CNS-Active Drugs.
Neuropsychobiol., 29, 9196.
Dunn, O.J. (1964): Multiple comparisons using rank sums. Technometrics, 6, 241252.
How to Select An Appropriate Statistical Tool? 199
Dunnett, C.W. (1955): A multiple comparison procedure for comparing several treatments
with a control. J. Am. Stat. Assoc., 50, 10961121.
Gad, S. (2006): Statistics and Experimental Design for Toxicologists and Pharmacologists.
4th Edition, Taylor and Francis, Boca Raton, FL, USA.
Gad, S. and Weil, C.W. (1986): Statistics and Experimental Design for Toxicologists. The
Telford Press Inc., New Jersey, U.S.A.
Hamada, C., Yoshino, K., Matsumoto, K., Nomura, M. and Yoshimura, I. (1998): Tree-type
algorithm for statistical analysis in chronic toxicity studies. J. Toxicol. Sci., 23(3),
173181.
Healey, G.F. (1997): How to achieve standardization of statistical methods in toxicology.
Drug Inf. J., 3132, 327334.
Hess, K.R. (2011): Statistical design considerations in animal studies published recently in
cancer research. Cancer Res., 15, 71(2), 625.
Hollander, M. and Wolfe, D.A. (1973): Nonparametric Statistical Methods, John Wiley and
Sons, New York, USA.
Hothorn, L.A. (2002): Selected biostatistical aspects of the validation of in vitro toxicological
assays. ATLA, 30 (Supp. 2), 9398.
Howell, D.C. (2008): Fundamental Statistics for the Behavioral Sciences. 6th Edition,
Thomson Wadsworth, Belmont, USA.
Jonckheere, A.R. (1954): A distribution-free k-sample test against ordered alternatives.
Biometricka, 41, 133145.
Kaplan, E.L. and Meier, P. (1958): Nonparametric estimation from incomplete observations.
J. Am. Stat. Assoc., 53, 457481.
Kilkenny, C., Parsons, N., Kadyszewski, E., Festing, M.F.W., Cuthill, I.C., Fry, D. Hutton,
J. and Altman, D.J. (2009): Survey of the quality of experimental design, statistical
analysis and reporting of research using animals. PLoSOne, 4(11), 111.
Kim, B.S., Zhao, B., Kim, H.J. and Cho, M. (2000): The statistical analysis of the in vitro
chromosome aberration assay using Chinese hamster ovary cells. Mutation Res., 469,
243252.
Kobayashi, K. (2010): Trend of statistics used for toxicity studies. Yakuji-niposha, Tokyo,
Japan.
Kobayashi, K., Kanamori, M., Ohori, K. and Takeuchi, H. (2000): A new decision tree
method for statistical analysis of quantitative data obtained in toxicity studies in
rodents. San Ei Shi., 42(4), 125129.
Kobayashi, K., Pillai, K.S., Guhatakurta, S., Cherian, K.M. and Ohnishi, M. (2011):
Statistical tools for analysing the data obtained from repeated dose toxicity studies
with rodents: A comparison of the statistical tools used in Japan with that of used in
other countries. J. Environ. Biol., 32(1), 1116.
Kobayashi, K., Pillai, K.S., Suzuki, M. and Wang, J. (2008): Do we need to examine the
quantitative data obtained from toxicity studies for both normality and homogeneity
of variance? J. Environ. Biol., 29(1), 4752.
Kobayashi, K., Watanabe, K. and Inoue, H. (1995): Questioning the usefulness of the non-
parametric analysis of quantitative data by transformation into ranked data in toxicity
studies. J. Toxicol. Sci., 20(1), 4753.
200 A Handbook of Applied Statistics in Pharmacology
Krores, R., Renwick, A.G., Cheeseman, M., Kleiner, J., Piersma, A., Schilter, B., Schlatter,
J., van Schothorst, F., Vos, J.G. and Wurtzen, G. (2004): Structure-based thresholds of
toxicological concern (TTC): Guidance for application to substances present at low
levels in the diet. Food Chem. Toxicol., 42(1), 6583.
Morrison, D.F. (1976): Multivariate Statistical Methods, 2nd Edition, McGraw-Hill Book
Co., New York, USA.
Nomura, M. (1994): International comparison of statistical analysis methods for
toxicological study. Jap. Soc. Biopharm. Statistics, 40, 136.
NTP. National Toxicology Program, USA. http://ntp.niehs.nih.gov/go/10007
OECD (2010): Organisation for Economic Cooperation and Development. OECD Draft
Guidance Document N 116 on the Design and Conduct of Chronic Toxicity and
carcinogenicity Studies, Supporting TG 451, 452, 453. OECD, Paris, France.
Piegorsch, W.W. and Bailer, A.J. (1997): Statistics for Environmental Biology and
Toxicology, Section 6.3.2., Chapman and Hall, London.
Portier, C.J. and Bailer, A.J. (1989) : Testing for increased carcinogenicity using a survival-
adjusted quantal response test. Fund. Appl. Toxicol., 12, 731737.
Sakaki, H., Igarashi, T., Ikeda, Y., Mizoguchi, K., Omichi, T., Kadota, M., Kawada, T.,
Takizawa, O., Tsukamoto, Y., Terai, K., Tozuka, A., Hirata, J., Handa, H., Mizuma, Z.,
Murakami, M., Yamada, M. and Yokouchi, H. (2000): Statistical method appropriate
for general toxicological studies in rats. J. Toxicol. Sci., 7181.
Shirley, E. (1977): A non-parametric equivalent of Williams test for contrasting increasing
dose levels of a treatment. Biometrics, 33, 386389.
Tarone, R.E. (1975): Tests for trend in life table analysis. Biometrika, 62, 679682.
Williams, D.A. (1971): A test for differences between treatment means when several dose
levels are compared with a zero dose control. Biometrics, 27, 103117.
Williams, D.A. (1972): The comparison of several dose levels with a zero dose control.
Biometrics, 28, 519531.
Williams, D.A. (1986): A note on Shirleys nonparametric test for comparing several dose
levels with a zero-dose control. Biometrics, 42, 182186.
Yamazaki, M., Noguchi, Y., Tanda, M. and Shintani, S. (1981): Statistical method appropriate
for general toxicological studies in rats. J. Takeda Res. Lab., 40 (3), 163187.
Appendices
Appendix 1. Coefcient for Shapiro-Wilk W Test (Conover, 1999)
202
i\n 2 3 4 5 6 7 8 9 10
1 0.7071 0.7071 0.6872 0.6646 0.6431 0.6233 0.6052 0.0588 0.5739
2 - 0.0000 0.1667 0.2413 0.2806 0.3031 0.3164 0.3244 0.3291
3 - - - 0.0000 0.0875 0.1401 0.1743 0.1976 0.2141
4 - - - - - 0.0000 0.0561 0.0947 0.1224
5 - - - - - - - 0.0000 0.0399
i\n 11 12 13 14 15 16 17 18 19 20
1 0.5601 0.5475 0.5359 0.5251 0.5150 0.5056 0.4968 0.4886 0.4808 0.4734
2 0.3315 0.3325 0.3325 0.3318 0.3306 0.3290 0.3273 0.3253 0.3232 0.3211
3 0.2260 0.2347 0.2412 0.2460 0.2495 0.2521 0.2540 0.2553 0.2561 0.2565
4 0.1429 0.1586 0.1707 0.1802 0.1878 0.1939 0.1988 0.2027 0.2059 0.2085
5 0.0695 0.0922 0.1099 0.1240 0.1353 0.1449 0.1524 0.1587 0.1641 0.1686
6 0.0000 0.0303 0.0539 0.0727 0.0880 0.1005 0.1109 0.1197 0.1271 0.1334
7 - - 0.0000 0.0240 0.0433 0.0593 0.0725 0.0837 0.0932 0.1013
8 - - - - 0.0000 0.0196 0.0359 0.0496 0.0612 0.0711
A Handbook of Applied Statistics in Pharmacology
Appendix 1. contd....
Appendices
203
Appendix 1. contd....
204
i\n 31 32 33 34 35 36 37 38 39 40
1 0.4220 0.4188 0.4156 0.4127 0.4096 0.4068 0.4040 0.4015 0.3989 0.3964
2 0.2921 0.2898 0.2876 0.2854 0.2834 0.2813 0.2794 0.2774 0.2755 0.2737
3 0.2475 0.2462 0.2451 0.2439 0.2427 0.2415 0.2403 0.2391 0.2380 0.2368
4 0.2145 0.2141 0.2137 0.2132 0.2127 0.2121 0.2116 0.2110 0.2104 0.2098
5 0.1874 0.1878 0.1880 0.1882 0.1883 0.1833 0.1883 0.1881 0.1880 0.1878
6 0.1641 0.1651 0.1660 0.1667 0.1673 0.1678 0.1683 0.1686 0.1689 0.1691
7 0.1433 0.1449 0.1463 0.1475 0.1487 0.1496 0.1505 0.1513 0.1520 0.1526
8 0.1243 0.1265 0.1284 0.1301 0.1317 0.1331 0.1344 0.1356 0.1366 0.1376
9 0.1066 0.1093 0.1118 0.1140 0.1160 0.1179 0.1196 0.1211 0.1225 0.1237
10 0.0899 0.0931 0.0961 0.0988 0.1013 0.1036 0.1056 0.1075 0.1092 0.1108
11 0.0739 0.0777 0.0812 0.0844 0.0873 0.0900 0.0924 0.0947 0.0967 0.0986
12 0.0585 0.0629 0.0699 0.0706 0.0739 0.0770 0.0798 0.0824 0.0848 0.0870
13 0.0435 0.0485 0.0530 0.0572 0.0610 0.0645 0.0677 0.0706 0.0733 0.0759
14 0.0289 0.0344 0.0395 0.0441 0.0484 0.0523 0.0559 0.0592 0.0622 0.0651
15 0.0144 0.0206 0.0262 0.0314 0.0361 0.0404 0.0444 0.0481 0.0515 0.0546
A Handbook of Applied Statistics in Pharmacology
16 0.0000 0.0068 0.0131 0.0187 0.0239 0.0287 0.0331 0.0372 0.0409 0.0444
17 - - 0.0000 0.0062 0.0119 0.0172 0.0220 0.0264 0.0305 0.0343
18 - - - - 0.0000 0.0057 0.0110 0.0158 0.0203 0.0244
19 - - - - - - 0.0000 0.0053 0.0101 0.0146
20 - - - - - - - - 0.0000 0.0049
i\n 41 42 43 44 45 46 47 48 49 50
1 0.3940 0.6917 0.3894 0.3872 0.3850 0.3830 0.3808 0.3789 0.3770 0.3751
2 0.2719 0.2701 0.2684 0.2667 0.2651 0.2635 0.2620 0.2604 0.2589 0.2574
3 0.2357 0.2345 0.2334 0.2323 0.2313 0.2302 0.2291 0.2281 0.2271 0.2260
4 0.2091 0.2085 0.2078 0.2072 0.2065 0.2058 0.2052 0.2045 0.2038 0.2032
5 0.1876 0.1874 0.1871 0.1868 0.1865 0.1862 0.1859 0.1855 0.1851 0.1847
6 0.1693 0.1694 0.1695 0.1695 0.1695 0.1695 0.1695 0.1693 0.1692 0.1691
7 0.1531 0.1535 0.1539 0.1542 0.1545 0.1548 0.1550 0.1551 0.1553 0.1554
8 0.1384 0.1392 0.1398 0.1405 0.1410 0.1415 0.1420 0.1423 0.1427 0.1430
9 0.1249 0.1259 0.1269 0.1278 0.1286 0.1293 0.1300 0.1306 0.1312 0.1317
10 0.1123 0.1136 0.1149 0.1160 0.1170 0.1180 0.1189 0.1197 0.1205 0.1212
11 0.1004 0.1020 0.1035 0.1049 0.1062 0.1073 0.1085 0.1095 0.1105 0.1113
12 0.0891 0.0909 0.0927 0.0943 0.0959 0.0972 0.0986 0.0998 0.1010 0.1020
13 0.0782 0.0804 0.0824 0.0842 0.0860 0.0876 0.0892 0.0906 0.0919 0.0932
14 0.0677 0.0701 0.0724 0.0745 0.0765 0.0783 0.0801 0.0817 0.0832 0.0846
15 0.0575 0.0602 0.0628 0.0651 0.0673 0.0694 0.0713 0.0731 0.0748 0.0764
16 0.0476 0.0506 0.0534 0.0560 0.0584 0.0607 0.0628 0.0648 0.0667 0.0685
17 0.0379 0.0411 0.0422 0.0471 0.0497 0.0522 0.0546 0.0568 0.0588 0.0608
18 0.0283 0.0318 0.0352 0.0383 0.0412 0.0439 0.0465 0.0489 0.0511 0.0532
19 0.0188 0.0227 0.0263 0.0296 0.0328 0.0357 0.0385 0.0411 0.0436 0.0459
20 0.0094 0.0136 0.0175 0.0211 0.0245 0.0277 0.0307 0.0355 0.0361 0.0386
21 0.0000 0.0045 0.0087 0.0126 0.0163 0.0197 0.0229 0.0259 0.0288 0.0314
22 - 0.0000 0.0042 0.0081 0.0118 0.0153 0.0185 0.0215 0.0244
23 - - - - 0.0000 0.0039 0.0076 0.0111 0.0143 0.0174
24 - - - - - - 0.0000 0.0037 0.0071 0.0104
25 - - - - - - - - 0.0000 0.0035
Appendices
Conover, W.J. (1999): Practical Nonparametric Statistics, 3rd Edition, John Wiley & Sons, Inc. New York, USA.
205
Appendix 2. Quantiles of the Shapiro-Wilk Test Statistic (Tsubaki and Tsubaki, 2001)
206
Proportional Parts
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2746 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
A Handbook of Applied Statistics in Pharmacology
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0584 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0121 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0018 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0013 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0009 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
3.1 0.0010 0.0009 0.0006 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
3.2 0.0007 0.0007 0.0005 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
3.3 0.0005 0.0005 0.0003 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004
3.4 0.0003 0.0003 0.0002 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
3.5 0.0002 0.0002 0.0001 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002
3.6 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
3.7 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
3.8 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
3.9 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Gad, S. and Weil, C.S. (1988). Statistics and Experimental Design for Toxicologists. Telford Press, New Jersey, USA.
Appendices
209
.