Using Transformations: Key Words
Using Transformations: Key Words
Using Transformations: Key Words
Using Transformations
KEY WORDS antilog, arcsin, bacterial counts, Box-Cox transformation, cadmium, confidence inter-
val, geometric mean, transformations, linearization, logarithm, nonconstant variance, plankton counts,
power function, reciprocal, square root, variance stabilization.
There is usually no scientific reason why we should insist on analyzing data in their original scale of
measurement. Instead of doing our analysis on y it may be more appropriate to look at log(y), y, 1/y,
or some other function of y. These re-expressions of y are called transformations. Properly used trans-
formations eliminate distortions and give each observation equal power to inform.
Making a transformation is not cheating. It is a common scientific practice for presenting and inter-
+
preting data. A pH meter reads in logarithmic units, pH = – log 10 [ H ] and not in hydrogen ion concen-
tration units. The instrument makes a data transformation that we accept as natural. Light absorbency
is measured on a logarithmic scale by a spectrophotometer and converted to a concentration with the
aid of a calibration curve. The calibration curve makes a transformation that is accepted without
hesitation. If we are dealing with bacterial counts, N, we think just as well in terms of log(N ) as N itself.
There are three technical reasons for sometimes doing the calculations on a transformed scale: (1) to
make the spread equal in different data sets (to make the variances uniform); (2) to make the distribution
1
of the residuals normal; and (3) to make the effects of treatments additive (Box et al., 1978). Equal
variance means having equal spread at the different settings of the independent variables or in the different
data sets that are compared. The requirement for a normal distribution applies to the measurement errors
and not to the entire sample of data. Transforming the data makes it possible to satisfy these requirements
when they are not satisfied by the original measurements.
1 a b
For example, if y = x z , a log transformation gives log y = a log x + b log z. Now the effects of factors x and z are additive.
See Box et al. (1978) for an example of how this can be useful.
MPN count
800 1000
600
400 100
200
0 10
0 1 2 3 4 0 1 2 3 4
Time Time
FIGURE 7.1 An example of how a transformation can create constant variance. Constant variance at all levels is important
so each data point will carry equal weight in locating the position of the fitted curve.
100 100
80
Concentration
60
10
40
20
0 1
0 2 4 6 8 10 12 0 2 4 6 8 10 12
value has roughly equal weight in determining the position of the line. The log transformation is used to
achieve this equal weighting and not because it gives a straight line.
A word of warning is in order about using transformations to obtain linearity. A transformation can
turn a good situation into a bad one by distorting the variances and making them unequal (see Chapter 45).
Figure 7.2 shows a case where the constant variance of the original data is destroyed by an inappropriate
logarithmic transformation.
In the examples above it was easy to check the variances at the different levels of the independent variables
because the measurements had been replicated. If there is no replication, this check cannot be made. This
is only one reason why replication is always helpful and why it is recommended in experimental and moni-
toring work.
Lacking replication, should one assume that the variances are originally equal or unequal? Sometimes
the nature of the measurement process gives a hint as to what might be the case. If dilutions or concentrations
are part of the measurement process, or if the final result is computed from the raw measurements, or
if the concentration levels are widely different, it is not unusual for the variances to be unequal and to
be larger at high levels of the independent variable. Biological counts frequently have nonconstant
variance. These are not justifications to make transformations indiscriminately. Do not avoid making
transformations, but use them wisely and with care.
Source: Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters:
An Introduction to Design, Data Analysis, and Model Building, New York, Wiley Interscience.
TABLE 7.2
Plankton Counts on 20 Replicate Water Samples from Five Stations in a Reservoir
Station 1 0 2 1 0 0 1 1 0 1 1 0 2 1 0 0 2 3 0 1 1
Station 2 3 1 1 1 4 0 1 4 3 3 5 3 2 2 1 1 2 2 2 0
Station 3 6 1 5 7 4 1 6 5 3 3 5 3 4 3 8 4 2 2 4 2
Station 4 7 2 6 9 5 2 7 6 4 3 5 3 6 4 8 5 2 3 4 1
Station 5 12 7 10 15 9 6 13 11 8 7 10 8 11 8 14 9 6 7 9 5
Source: Elliot, J. (1977). Some Methods for the Statistical Analysis of Samples of Benthic Invertebrates, 2nd ed.,
Ambleside, England, Freshwater Biological Association.
TABLE 7.3
Statistics Computed from the Data in Table 7.2
Station 1 2 4 5 6
Untransformed data y = 0.85 2.05 3.90 4.60 9.25
2
s y = 0.77 1.84 3.67 4.78 7.57
Transformed x = y+c x = 1.10 1.54 2.05 2.20 3.09
2
s x = 0.14 0.20 0.22 0.22 0.19
The effect of square root and logarithmic transformations is to make the larger values less important
relative to the small ones. For example, the square root converts the values (0, 1, 4) to (0, 1, 2). The 4,
which tends to dominate on the original scale, is made relatively less important by the transformation.
The log transformation is a stronger transformation than the square root transformation. “Stronger” means
that the range of the transformed variables is relatively smaller for a log transformation that for the square
root. When the sample contains some zero values, the log transformation is x = log(y + c), where c is a
constant. Usually the value of c is arbitrarily chosen to be 1 or 0.5. The larger the value of c, the less severe
the transformation. Similarly, for square root transformations, y + c is less severe than y.
The arcsin transformation is used for decimal fractions and is most useful when the sample includes values
near 0.00 and 1.00. One application is in bioassys where the data are fractions of organisms showing an effect.
Example 7.1
Twenty replicate samples from five stations were counted for plankton, with the results given in
Table 7.2. The computed averages and variances are in Table 7.3. The computed means and variance
on the original data show that variance is not uniform; it is ten times larger at station 5 than at
2
station 1. Also, the variance increases as the average increases and s y seems to be proportional to
y. This indicates that a square root transformation may be suitable. Because most of the counts
0
10
Station 2
0
10
Frequency
Station 3
0
10
Station 4
0
10
Station 5
0
0 4 8 12 16 0 1 2 3 4
Plankton Count Count + 0.5
TABLE 7.4
Eight Replicate Measurements on Bacteria at Three Sampling Stations
y = Bacteria/100 mL x = log10(Bacteria/100 mL)
1 2 3 1 2 3
27 225 1020 1.431 2.352 3.009
11 99 136 1.041 1.996 2.134
48 41 317 1.681 1.613 2.501
36 60 161 1.556 1.778 2.207
120 190 130 2.079 2.279 2.114
85 240 601 1.929 2.380 2.779
18 90 760 1.255 1.954 2.889
130 112 240 2.144 2.049 2.380
y = 59.4 132 420.6 x = 1.636 2.050 2.502
2 2
s y = 2156 5771 111,886 s x = 0.151 0.076 0.124
are small, the transform used was y + 0.5 . Figure 7.3 shows the distribution of the original and
the transformed data. The transformed distributions are more symmetrical and normal-like than the
originals. The variances computed from the transformed data are uniform.
Example 7.2
Table 7.4 shows eight replicate measurements of bacterial density that were made at three
locations to study the spatial pattern of contamination in an estuary. The data show that s > y and
2
that s increases in proportion to y. Table 7.1 suggests a logarithmic transformation. The improvement
due to the log transformation is shown in Table 7.4. Note that the transformation could be done
using either loge or log10 because they differ by only a constant (loge = 2.303 log10).
© 2002 By CRC Press LLC
Confidence Intervals and Transformations
After summary statistics (means, standard deviations, etc.) have been calculated on the transformed
scale, it is often desirable to translate the results back to the original scale of measurement. This can
create some confusion. For example, if the average x has been estimated using x = log(y), the simple
back-transformation of antilog( x ) does not give an unbiased estimate of y. The antilogarithm of x is
the geometric mean of the original data (y); that is, antilog( x ) = y g . The correct estimate of the arithmetic
2
mean on the original y scale is y = antilog( x + 0.5 s ) (Gilbert, 1987).
If the transformation produced a near-normal distribution, the standard deviations and standard errors
computed from the transformed data will be symmetric about the mean on the transformed scale. But
they will be asymmetric on the original scale. The options are to:
Two examples illustrate the use of log-transformed data to construct confidence limits for the geometric
mean.
Example 7.3
2
A sample of n = 5 observations [95, 20, 74, 195, 71] gives y = 91 and s y = 4,140. Clearly,
s y > y and a log transformation should be tried. The x = log10(y) values are 1.97772, 1.30103,
2
2
1.86923, 2.29003, and 1.85126. This gives x = 1.85786 and s x = 0.12784. The value of t for
ν = n −1 = 4 degrees of freedom and α /2 = 0.025 is 2.776. Therefore, the 95% confidence interval
for the mean on the log-transformed scale, ηx, is:
2
s 0.1278
x ± t ----x = 1.85786 ± 2.776 ---------------
- = 1.85786 ± 0.44381
n 5
and
Transforming ηx back to the original scale gives an estimate of the geometric mean of the y’s:
The asymmetric 95% confidence limits for the true value of the geometric mean, ηx, are obtained
by taking antilogarithms of the upper and lower confidence limits of ηx, which gives:
25.94 ≤ η g ≤ 200.29
Note that the upper and lower confidence limits in the log metric are x + β and x – β , where
β = t α /2 s /n. The upper confidence limit on the original scale is antilog ( x + β ) = antilog ( x ) ⋅
2
antilog( β ), which becomes y g ⋅ β ′ where β ′ = antilog(β). Likewise, the lower confidence limit is
y g / β ′. For this example, β = 0.44381, antilog (0.44381) = 2.778, and the 95% confidence limits
for the geometric mean on the original scale are 72.09(2.7785) = 200.29 and 72.09/2.7785 = 25.94.
© 2002 By CRC Press LLC
Example 7.4
and
The −1 is due to using x = log(y + 1). The similar inverse of the confidence limits gives:
is to use:
2
s
η̂ = exp x + ----x σ̂ = η̂ [ exp ( s x ) – 1 ]
2 2 2
and
2
where η̂ and σ̂ are the estimated mean and variance. x and s x are calculated in the usual way shown
2 2
syH 1 – α
UCL 1 – α = exp x + 0.5s x + ----------------
2
-
n – 1
syH α
LCL α = exp x + 0.5s x + ---------------
2
-
n – 1
The quantities H1–α and Hα depend on sx , n, and the confidence level α. Land (1975) provides the
necessary tables; a subset of these may be found in Gilbert (1987).
λ
(λ) yi – 1
Yi = -------------
λ–1
-
λ yg
where y g is the geometric mean of the original data series, and λ expresses the power of the transfor-
mation. The geometric mean is obtained by averaging ln(y) and taking the exponential (antilog) of the
(0)
result. The special case when λ = 0 is the log transformation: Y i = y g ln ( y i ). λ = −1 is a reciprocal
transformation, λ = 1/2 is a square root transformation, and λ = 1 is no transformation. Example
applications of this transformation are given in Box et al. (1978).
Example 7.5
Table 7.5 lists 36 measurements on cadmium (Cd) in soil, and their logarithms. The Cd concen-
trations range from 0.005 to 0.094 mg/kg. The limit of detection was 0.01. Values below this
were arbitrarily replaced with 0.005. Comparisons must be made with other sets of similar data
and some transformation is needed before this can be done. Experience with environmental data
suggests that a log transformation may be useful, but something better might be discovered if
we make the Box-Cox transformation for several values of λ and compare the variances of the
transformed data.
The variance of the log-transformed values in Table 7.5 is σ ln ( y ) = 0.549. This cannot be
2
compared directly with the variance from, for instance, a square root transformation unless the
calculations are normalized to keep them on the same relative scale. The denominator of the Box-
λ –1
Cox transformation, λ y g , is a normalizing factor to make the variances comparable across different
TABLE 7.5
Cadmium Concentrations in Soil
Cadmium ln
(mg/kg) [Cadmium]
0.023 0.005 0.005 −3.7723 −5.2983 −5.2983
0.020 0.005 0.032 −3.9120 −5.2983 −3.4420
0.010 0.005 0.031 −4.6052 −5.2983 −3.4738
0.020 0.013 0.005 −3.9120 −4.3428 −4.2687
0.020 0.005 0.014 −3.9120 −5.2983 −3.9120
0.020 0.094 0.020 −3.9120 −2.3645 −5.2983
0.010 0.011 0.005 −4.6052 −4.5099 −3.6119
0.010 0.005 0.027 −4.6052 −5.2983 −4.1997
0.010 0.005 0.015 −4.6052 −5.2983 −3.3814
0.010 0.028 0.034 −4.6052 −3.5756 −5.2983
0.010 0.010 0.005 −4.6052 −4.6052 −5.2983
0.005 0.018 0.013 −5.2983 −4.0174 −4.3428
Average = 0.0161 Average of ln[Cd] = −4.42723
Variance = 0.000255 Variance = 0.549
Geo. mean = 0.01195
Note: Concentrations in mg/kg.
log transformation
0.3
Variance (x1000)
No transformation
0.2
0.1
0.0
-1.0 -0.5 0.0 0.5 1.0
λ
values of λ. The geometric mean of the untransformed data is y g = exp ( – 4.42723 ) = 0.01195.
The denominator for λ = 0.5, for example, is 0.5(0.01195)
0.5−1
= 4.5744; the denominator for
λ = −0.5 is −765.747.
Table 7.6 gives some of the power-transformed data for λ = −1, −0.5, 0, 0.5, and 1. λ = 1 is
(1)
no transformation ( Y i = y i – 1 ) except for scaling to be comparable to the other transforma-
(0)
tions. λ = 0 is the log transformation, calculated from Y i = y g ln ( y i ) , which is again scaled
so the variance can be compared directly with variances of other power-transformed values.
The two bottom rows of Table 7.6 give the mean and the variance of the power-transformed
values. The suitable transformations give small variances. Rather than pick the smallest value
from the table, make a plot (Figure 7.4) that shows how the variance changes with λ. The smooth
curve is drawn as a reminder that these variances are estimates and that small differences between
them should not be taken seriously. Do not seek an optimal value of λ that minimizes the variance.
Such a value is likely to be awkward, like λ = 0.23. The data do not justify such detail, especially
because the censored values (y < 0.01) were arbitrarily replaced with 0.005. (This inflates the variance
from whatever it would be if the censored values were known.) Values of λ = −0.5, λ = 0, or λ = 0.5
are almost equally effective transformations. Any of these will be better than no transformation
(λ = 1). The log transformation (λ = 0) is very satisfactory and is our choice as a matter of
convenience.
Figure 7.5 shows dot diagrams for the original data, the square root (λ = 0.5), the logarithms
(λ = 0), and reciprocal square root (λ = −0.5). The log transformation is most symmetric, but
it is not normal because of the 11 non-detect data that were replaced with 0.005 (i.e., 1/2 the
MDL).
λ=0
y
0.00 0.02 0.04 0.06 0.08 0.10
λ = 0.5
√y
0.0 0.1 0.2 0.3 0.4
λ=1
In (y)
-6 -5 -4 -3 -2
FIGURE 7.5 Dot diagrams of the data and the square root, log, and reciprocal square root transformed values. The eye-
catching spike of 11 points are “non-detects” that were arbitrarily assigned values of 0.005 mg/kg.
Comments
Transformations are not tricks to reduce variation or to convert a complicated nonlinear model into a
simple linear form. There are often statistical justifications for making transformations and then analyzing
the transformed data. They may be needed to stabilize the variance or to make the distribution of the
errors normal. The most common and important use is to stabilize (make uniform) the variance.
It can be tempting to use a transformation to make a nonlinear function linear so that it can be fitted
using simple linear regression methods. Sometimes the transformation that gives a linear model will
coincidentally produce uniform variance. Beware, however, because linearization can also produce the
opposite effect of making constant variance become nonconstant (see Chapter 45).
When the analysis has been done on transformed data, the analyst must consider carefully whether
to report the final results on the transformed or the original scale of measurement. Confidence intervals
that are symmetrical on the transformed scale will not be symmetric when transformed back to the
original scale. Care must also be taken when converting the mean on the transformed scale back to the
original scale. A simple back-transformation typically does not give an unbiased estimate, as was
demonstrated in the case of the logarithmic transformation.
References
Box, G. E. P. and D. R. Cox (1964). “An Analysis of Transformations,” J. Royal Stat. Soc., Series B, 26, 211.
Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters: An Introduction to Design,
Data Analysis, and Model Building, New York, Wiley Interscience.
Elliot, J. (1977). Some Methods for the Statistical Analysis of Samples of Benthic Invertebrates, 2nd ed.,
Ambleside, England, Freshwater Biological Association.
Gilbert, R. O. (1987). Statistical Methods for Environmental Pollution Monitoring, New York, Van Nostrand
Reinhold.
Exercises
7.1 Plankton Counts. Transform the plankton data in Table 7.2 using a square root transformation
x = sqrt( y) and a logarithmic transformation x = log( y) and compare the results with those
shown in Figure 7.3.
7.2 Lead in Soil. Examine the distribution of the 36 measurements of lead (mg/kg) in soil and
recommend a transformation that will make the data nearly symmetrical and normal.
7.6 32 5 4.2 14 18 2.3 52 10 3.3 38 3.4 4.3 0.10 5.7 0.10 0.10 4.4
0.42 0.10 16 2.0 1.2 0.10 3.2 0.43 1.4 5.9 0.23 0.10 0.10 0.23 0.29 5.3 2.0 1.0
7.3 Box-Cox Transformation. Use the Box-Cox power function to find a suitable value of λ to
transform the 48 lead measurements given below. Note: All < MDL values were replaced by
0.05.
7.6 32 5.0 4.2 14 18 2.3 52 10 3.3 38 3.4 4.3 0.05 0.05 0.10
0.10 0.05 0.05 0.05 0.0 0.05 1.2 0.10 0.10 0.10 0.10 0.10 0.23 4.4 0.42 0.10
16. 2.0 2.0 1.0 3.2 0.43 1.4 0.10 5.9 0.10 0.10 0.23 0.29 5.3 5.7 0.10
7.4 Are Transformations Necessary? Which of the following are correct reasons for transforming
data? (a) Facilitate interpretation in a natural way. (b) Promote symmetry in a data sample.
(c) Promote constant variance in several sets of data. (d) Promote a straight-line relationship
between two variables. (e) Simplify the structure so that a simple additive model can help us
understand the data.
7.5 Power Transformations. Which of the following statements about power transformations are
correct? (a) The order of the data in the sample is preserved. (b) Medians are transformed to
medians, and quartiles are transformed to quartiles. (c) They are continuous functions. (d)
Points very close together in the raw data will be close together in the transformed data, at
least relative to the scale being used. (e) They are smooth functions. (f) They are elementary
functions so the calculations of re-expression are quick and easy.