Dealing With Outlying Observations: Standard Practice For
Dealing With Outlying Observations: Standard Practice For
Dealing With Outlying Observations: Standard Practice For
for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
! !
small probability is called the “significance level” or “percent-
age point” and can be thought of as the risk of erroneously
(
i51
~ x i 2 x̄ ! 2 (x
i51
i
2
2 n·x̄ 2
s5 5
rejecting a good observation. If a real shift or change in the n21 n21
value of an observation arises from nonrandom causes (human
error, loss of calibration of instrument, change of measuring
5 !(
n
i51
xi 2 2 S( Dn
i51
xi
2
/n
(3)
instrument, or even change of time of measurements, and so n21
forth), then the observed value of the sample criterion used will
exceed the “critical value” based on random-sampling theory. If x1 rather than xn is the doubtful value, the criterion is as
Tables of critical values are usually given for several different follows:
significance levels. In particular for this practice, significance T 1 5 ~ x̄ 2 x 1 ! /s (4)
levels 10, 5, and 1 % are used.
The critical values for either case, for the 1, 5, and 10 %
NOTE 1—In this practice, we will usually illustrate the use of the 5 % levels of significance, are given in Table 1.
significance level. Proper choice of level in probability depends on the 7.1.1 The test criterion Tn can be equated to the Student’s t
particular problem and just what may be involved, along with the risk that
one is willing to take in rejecting a good observation, that is, if the test statistic for equality of means between a population with
null-hypothesis stating “all observations in the sample come from the one observation xn and another with the remaining observa-
same normal population” may be assumed correct. tions x1, ... , xn – 1, and the critical value of Tn for significance
level α can be approximated using the α/n percentage point of For the measurements of breaking strength above:
Student’s t with n – 2 degrees of freedom. The approximation r 11 5 ~ 596 2 584! / ~ 596 2 570! 5 0.462
is exact for small enough values of α, depending on n, and
Which is a little less than 0.478, the 5 % critical value for n
otherwise a slight overestimate unless both α and n are large:
= 10. Under the Dixon criterion, we should therefore not
t α⁄n,n22 consider this observation as an outlier at the 5 % level of
T n~ α ! # (5)
Œ 11
2
nt α⁄n,n22
~ 2 1!2
n
21 significance. These results illustrate how borderline cases may
be accepted under one test but rejected under another.
7.1.2 To test outliers on the high side, use the statistic Tn = 7.3 Recursive Testing for Multiple Outliers in Univariate
(xn – x̄ )/s and take as critical value the 0.05 point of Table 1. Samples—For testing multiple outliers in a sample, recursive
To test outliers on the low side, use the statistic T1 = (x̄ – x1)/s application of a test for a single outlier may be used. In
and again take as a critical value the 0.05 point of Table 1. If recursive testing, a test for an outlier, x1 or xn, is first
we are interested in outliers occurring on either side, use the conducted. If this is found to be significant, then the test is
statistic Tn = (xn – x̄ )/s or the statistic T1 = (x̄ – x1)/s whichever repeated, omitting the outlier found, to test the point on the
is larger. If in this instance we use the 0.05 point of Table 1 as opposite side of the sample, or an additional point on the same
our critical value, the true significance level would be twice side. The performance of most tests for single outliers is
0.05 or 0.10. Similar considerations apply to the other tests affected by masking, where the probability of detecting an
given below. outlier using a test for a single outlier is reduced when there are
7.1.3 Example 1—As an illustration of the use of Tn and two or more outliers. Therefore, the recommended procedure is
Table 1, consider the following ten observations on breaking to use a criterion designed to test for multiple outliers, using
strength (in pounds) of 0.104-in. hard-drawn copper wire: 568, recursive testing to investigate after the initial criterion is
570, 570, 570, 572, 572, 572, 578, 584, 596. See Fig. 1. The significant.
doubtful observation is the high value, x10 = 596. Is the value
of 596 significantly high? The mean is x̄ = 575.2 and the 3
The boldface numbers in parentheses refer to a list of references at the end of
estimated standard deviation is s = 8.70. We compute: this standard.
i51
z i 2 z̄ k ! 2 /
n
( ~ z 2 z̄ !
i51
i
2
G (11)
10 3.574 3.685 3.875
11 3.684 3.803 4.011 where:
12 3.782 3.909 4.133
n2k
13 3.871 4.005 4.244
14 3.952 4.092 4.344 z̄ k 5 ( z /~n 2 k!
i51
i (12)
15 4.025 4.171 4.435
16 4.093 4.244 4.519 is the mean of the (n − k) least extreme observations and z̄ is
17 4.156 4.311 4.597
the mean of the full sample. Percentage points of Ek in Table 4
18 4.214 4.374 4.669
19 4.269 4.433 4.736 were computed by simulation.
20 4.320 4.487 4.799
21 4.368 4.539 4.858 7.5.1 Example 4—Applying this test to the Venus semidi-
22 4.413 4.587 4.913 ameter residuals data in Example 3, we find that the total sum
23 4.456 4.633 4.965
24 4.497 4.676 5.015
of squares of deviations for the entire sample is 4.24964.
25 4.535 4.717 5.061 Omitting –1.40 and 1.01, the suspected two outliers, we find
26 4.572 4.756 5.106 that the sum of squares of deviations for the reduced sample of
27 4.607 4.793 5.148
28 4.641 4.829 5.188
13 observations is 1.24089. Then E2 = 1.24089/4.24964 =
29 4.673 4.863 5.226 0.292, and by using Table 4, we find that this observed E2 is
30 4.704 4.895 5.263 slightly smaller than the 5 % critical value of 0.317, so that the
35 4.841 5.040 5.426
40 4.957 5.162 5.561
E2 test would reject both of the observations, –1.40 and 1.01.
45 5.057 5.265 5.674
50 5.144 5.356 5.773
7.6 Criterion for Two Outliers on the Same Side of the
A
Sample—Where the two largest or the two smallest observa-
Each entry calculated by 50 000 000 simulations.
tions are probable outliers, employ a test provided by Grubbs
(8, 9) which is based on the ratio of the sample sum of squares
when the two doubtful values are omitted to the sample sum of
squares when the two doubtful values are included. In illus-
and: trating the test procedure, we give the following Examples 5
and 6.
T 14 5 ~ 1.01 2 0.119! /0.401 5 2.22
7.6.1 It should be noted that the critical values in Table 5 for
From Table 1, for n = 14, we find that a value as large as 2.22 the 1 % level of significance are smaller than those for the 5 %
would occur by chance more than 5 % of the time, so we level. So for this particular test, the calculated value is
should retain the value 1.01 in further calculations. The Dixon significant if it is less than the chosen critical value.
test criterion is: 7.6.2 Example 5—In a comparison of strength of various
r 22 5 ~ x 14 2 x 12! / ~ x 14 2 x 3 ! plastic materials, one characteristic studied was the percentage
5 ~ 1.01 2 0.48! / ~ 1.0110.24! elongation at break. Before comparison of the average elonga-
50.53/1.25
tion of the several materials, it was desirable to isolate for
further study any pieces of a given material which gave very
50.424
small elongation at breakage compared with the rest of the
From Table 2 for n = 14, we see that the 5 % critical value pieces in the sample. Ten measurements of percentage elonga-
for r22 is 0.546. Since our calculated value (0.424) is less than tion at break made on a material are: 3.73, 3.59, 3.94, 4.13,
the critical value, we also retain 1.01 by Dixon’s test, and no 3.04, 2.22, 3.23, 4.05, 4.11, and 2.02. See Fig. 3. Arranged in
further values would be tested in this sample. ascending order of magnitude, these measurements are: 2.02,
7.5 Criteria for Two or More Outliers on Opposite Sides of 2.22, 3.04, 3.23, 3.59, 3.73, 3.94, 4.05, 4.11, 4.13.
the Sample—For suspected observations on both the high and 7.6.2.1 The questionable readings are the two lowest, 2.02
low sides in the sample, and to deal with the situation in which and 2.22. We can test these two low readings simultaneously
some of k ≥ 2 suspected outliers are larger and some smaller by using the S1,22/S2 criterion of Table 5. For the above
than the remaining values in the sample, Tietjen and Moore (7) measurements:
suggest the following statistic. Let the sample values be x1, x2, n
x3, ..., xn. Compute the sample mean, x̄ , and the n absolute S 2 5 Σ ~ x i 2 x̄ ! 2 5 5.351
i51
residuals: n n
S 21,2 5 Σ ~ x 2 x̄ 1,2 ! 2 5 1.196, where x̄ 1,2 5 Σ x i ⁄ ~ n 2 2 !
? ? ? ?
r 1 5 x 1 2 x̄ , r 2 5 x 2 2 x̄ , … , r n 5 x n 2 x̄ ? ? (10) i53 i53
2), limited historical information (Table 9), standard deviation 9.5 All of the documented test methodologies are univari-
known (Table 10). A cautionary note is that a historical ate. This practice does not address the issue of multivariate
variation estimate must still be relevant. outlier testing or testing in time-ordered or structured data.
9.4 Much outlier practice is directed towards a more reliable 9.6 The outlier tests provided in this practice are generally
estimate of a measure of the mean. If a goal of study is instead most useful with moderate numbers of observations. Outlier
to make inferences about variability or to estimate a relatively tests that only use information about variability internal to the
low or high quantile of the distribution, then any action that is sample can only reject gross outlying values. With much larger
taken with the disposition of perceived outliers dramatically numbers of observations, especially in data sets that have not
changes the resulting statistical estimates and interpretation. been screened by a knowledgeable reviewer to remove invalid
REFERENCES
(1) Grubbs, F. E., and Beck, G., “Extension of Sample Sizes and (6) Chauvenet, W., Method of Least Squares, Lippincott, Philadelphia,
Percentage Points for Significance Tests of Outlying Observations,” 1868.
Technometrics, TCMTA, Vol 14, No. 4, November 1972, pp. 847–854. (7) Tietjen, G. L., and Moore, R. H., “Some Grubbs-Type Statistics for
(2) Dixon, W. J., “Processing Data for Outliers,” Biometrics, BIOMA, Vol the Detection of Several Outliers,” Technometrics, TCMTA, Vol 14,
9, No. 1, March 1953, pp. 74–89. No. 3, August 1972, pp. 583–597. Corrigendum Technometrics, Vol
(3) Bohrer, A., “One-sided and Two-sided Critical Values for Dixon’s 21, No. 3, August 1979, p. 396.
Outlier Test for Sample Sizes up to n=30,” Economic Quality Control, (8) Grubbs, F. E., “Sample Criteria for Testing Outlying Observations,”
Vol 23, No. 1, 2008, pp. 5–13. Annals of Mathematical Statistics, AASTA, Vol 21, March 1950, pp.
(4) Verma, S. P., and Quiroz-Ruiz, A., “Critical Values for Six Dixon 27–58.
Tests for Outliers in Normal Samples up to Sizes 100, and Applica- (9) Grubbs, F. E., “Procedures for Detecting Outlying Observations in
tions in Science and Engineering,” Revista Mexicana de Ciencias Samples,” Technometrics, TCMTA, Vol 11, No. 4, February 1969, pp.
Geologicas, Vol 23, No. 2, 2006, pp. 133–161. 1–21.
(5) David, H. A., Hartley, H. O., and Pearson, E. S., “The Distribution of (10) Kudo, A., “On the Testing of Outlying Observations,” Sankhya, The
the Ratio, in a Single Normal Sample, of Range to Standard Indian Journal of Statistics, SNKYA, Vol 17, Part 1, June 1956, pp.
Deviation,” Biometrika, BIOKA, Vol 41, 1954, pp. 482–493. 67–76.
ASTM International takes no position respecting the validity of any patent rights asserted in connection with any item mentioned
in this standard. Users of this standard are expressly advised that determination of the validity of any such patent rights, and the risk
of infringement of such rights, are entirely their own responsibility.
This standard is subject to revision at any time by the responsible technical committee and must be reviewed every five years and
if not revised, either reapproved or withdrawn. Your comments are invited either for revision of this standard or for additional standards
and should be addressed to ASTM International Headquarters. Your comments will receive careful consideration at a meeting of the
responsible technical committee, which you may attend. If you feel that your comments have not received a fair hearing you should
make your views known to the ASTM Committee on Standards, at the address shown below.
This standard is copyrighted by ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959,
United States. Individual reprints (single or multiple copies) of this standard may be obtained by contacting ASTM at the above
address or at 610-832-9585 (phone), 610-832-9555 (fax), or [email protected] (e-mail); or through the ASTM website
(www.astm.org). Permission rights to photocopy the standard may also be secured from the Copyright Clearance Center, 222
Rosewood Drive, Danvers, MA 01923, Tel: (978) 646-2600; http://www.copyright.com/