Academia.eduAcademia.edu

Using tolerance bounds in scientific investigations

1996

Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. The Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy.

I LA-UR- QD/VlfTitle: Author(s): Submitted to: gi;og/o.;7--/ USING TOLERANCE BOUNDS IN SCIENTIFIC INVESTIGATIONS Joanne R. Wendelberger American Statistical Association Meetings Chicago, IL August 4-8, 1996 Los Alamos NATIONAL LABORATORY Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. The Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Form No. 836 R5 ST 2629 10191 DRAFI' USING TOLERANCE BOUNDS IN SCIENTIFIC INVESTIGATIONS Joanne R. Wendelberger Los Alamos National Laboratory, MS-F600, Los Alamos, NM 87545 KEY WORDS:prediction, uncertainty, variation ABSTRACT Assessment of the variability in population values plays an important role in the analysis of scientific data. Analysis of scientific data often involves developing a bound on a proportion of a population. Sometimes simple probability bounds are obtained using formulas involving known mean and variance parameters and replacing the parameters by sample estimates. The resulting bounds are only approximate and fail to account for the variability in the estimated parameters. Tolerance bounds provide bounds on population proportions which account for the variation resulting from the estimated mean and variance parameters. A beta content, gamma confidence tolerance interval is constructed so that a proportion beta of the population lies within the region bounded by the interval with confidence gamma. An application involving com>sion measurements is used to illustrate the use of tolerance bounds for diffexent situations. Extensions of standard tolerance intervals are applied to generate regression tolerance bounds, tolerance bounds for more general models of measurements collected over time, and tolerance intervals for varying precision data Tolerance bounds also provide useful information for designing the collection of future data. 1. INTRODUCTION Statistical intervals play an important role in the analysis and interpretation of scientific data. Quantities calculated from experimental data require some type of information about uncertainty to be meaningful. Statistical intervals provide a tool for expressing Uncertainty in estimated quantities. Meeker and Hahn (1991) discuss some different types of intervals. One of the most familiar intervals is the statistical cosidence interval. Confidence intervals provide a measure of the variability in an estimated quantity. Typically, a confidence interval is used to characterize some interval 1 DRAFT within which a particular population attribute, such as the mean, will lie, based on a random sample. Another commonly used interval is the prediction interval. prediction intervals are used to provide bounds within which a future observation or an estimated quantity may be expected to lie, with some specified degree of confidence based on a future sample. One might consider constructing prediction intervals to contain all future observations. Unfortunately, such intervals am usually so wide that they are essentially useless. In order to provide a useful measure of uncertainty in estimated values, an alternative approach m a y be called for which only attempts to bound some proportion of the future population. Statistical tolerance intervals provide a method for deterrmrun ’ ’ g an interval such that a specified proportion P of the popdahn values lie within the interval with confidence 7. One-sided tolerance intervals are used to produce upper and lower tolerance limits. For example, an upper tolerance limit provides an upper bound such that a specified proportion P of the population values lie below the corresponding tolerance limit with C Q n f i i7. Tolerance intervals were introduced by Wald and Wolfowitz (1946). Standard tolerance intervals assume that the population from which the sample values have been drawn have constant mean and constant variance. Analysis of variance and regression models have been used to extend standard tolerance interval methodology to situations where the mean may be modeled by a linear function of independent variables. Wallis (1951) extended tolerance intervals to the linear regression situation. Tolerance limits for balanced ANOVA models with random-effects have been discussed by Lemon (1977), Mee and Owen (1983), and Beckman and Tietjen (1989). Error structures which are more complicated than a simple additive error with constant variance have also received attention in recent years. Kim and Myers (1992) considered tolerance limits for response surface models in situations where there is environmental variation in experimental variables. Hwang (1992) discussed tolerance intervals for a special case of measureanent error models. Classical and Bayesian confidence regions are discussed in Guttman (1970). Books by Odeh and Owen (1983) and Meeker and Hahn (1991). provide extensive treatment of tolerance intervals, including tables of factors which m a y be used to calculate tolerance intervals for different situations. This paper will examine some different types of tolerance limits. The methods described here have been used to examine corrosion measurements and other features of metal components. Corrosion measurements were made on metal components to measure the amount of corrosion present at different times. Because the actual data 2 DRAFI' from this experiment is not available for public release, data which exhibits similar statistical Properties has been generated to illustrate the analysis process. [Analysis of generated data will be added to final version.] 2. STANDARD TOLERANCE INTERVALS As a first step in examining the corrosion, the measurements fiom different times were considered separately. For each time point, upper tolerance intervals were calculated to develop bounds on specified praportions of values for a given level of confidence. Standard tolerance limits are computed using the mean Z,standard deviation s and number of observations n from a sample. For example, for a one-sidedupper tolerance limit,a value E associated with a specified percenrage P and confidence 7 for sample size n is obtained such that Prob{Prob(X "he resulting upper tolerance limit is given by f 5 Z + Es) 2 P ) = 7. + ks. Lower tolerance liits are obtained in an analagous manner. The value of k may be obtained from tabled values of k for varying values of P and 7 given by Odeh and Owen (1980). Fable of of means, standard deviations, and upper tolerance intervals for generated data goes here.] 3. REGRESSION TOLERANCE INTERVALS In another experiment, pressure values were measured over time. [Include figure showing a typical plot of pressure versus time.] [Include table of data] Over time,pressure increases approximately linearly. In this example, the data points correspond to measurements on a single unit over time.This time-&pendent structure provides information about the curve as a whole which may be used to compute tolerance bounds on the line, rather than individual tolerance intervals at specific points. Wallis (1951) describes tolerance intervals for linear regression. [Give formulas for computing tolerance bounds.] [Show tolerance bounds.] [Discuss special considemions for how to compute.] 3 DRAFT 4. TOLERANCE INTERVALS FOR MORE GENERAL MODELS The pressure data in the previous section followed a linear regression model. Data collected on different units over time m a y be thought of as following growth curve models. Within a family of curves, parameters are estimated for each individual unit. These estima&esmay then be used to provide predicted values at later times. Compute an error estimz& based on a statistical model fit to the data for a given unit. [piscuss exampk using growth curve fitting technique here.] 5. TOLERANCE INTERVALS FOR VARYING PRECISION DATA In some cases,additional information about the error structure for the model is available. Wang and Iyer (1994) describe a method for computing tolerance intervals in the presence of measurement emm. For the corrosion data, covariate data was available which provided additional information about how the precision of the corrosion data varied over time. This information m a y be incorporated into the tolerance interval calculations to develop modified predictions which make better use of the available information. The approgch used here suggested by Weisberg (1992) is to obtain mean and variance estimates from a lineat model incorporating varying precision. These estimates are then used to calculate tolerance intervals for the varying precision model. A varying precision model allows the incorporation * of information about varying precision of the data values. Let X i j denote the variable of interest for thejthunitoftheithpup,i= 1,..., k , j = 1 , Xij ...,nj. Thetotalnumberofobservationsisn=CiL,lnj. distributed With mean pi and variance u2. The Observed data is x ate assumed to be -ally j = Xij me + Eij, where e i j is an additive random error term which is normally distributed with mean 0 and variance a 2 r g i j . The g i j ate known fUnCti0m Of t i j , gij values. Then the variance of = g(tjj), Where tij denotes a variable which affect^ the precision Of the measured is given by a2(1 + rgij). The variance parameter u2 is a positive unknown constant common to all of the groups, i = 1,...,k. "he constant r is a scaling parameter. 5.1 TESTING FOR VARYING PRECISION Before a varying precision model is implemented, a test is perfonned to check whether changing variation is actually present. A test for heterosaedasticity developed by Cook and Weisberg (1983) m a y be used to test for the presence of heteroskedasticity. The score statistic is used to test the null hypothesis of homogeneous variance against the alternative hypothesis that the variance is heteroskedastic with specified form. 4 DRAFT k t gj = Yjj/ni denote the mean of the observations in the ith group. Using the results of Cook and Weisberg (1983), the regression sum of squares obtained from the regression of J on the variance function vector gjj is be used to test for the presence of changing variance for a specified variance function. The dishibution of the statistic equal to one-haLf of the regression sum of squares is approximately chisquared with one degree of freedom. A chi-squared table is used to determine the probability that a value as large as the estimated value of r would occur if r is actually zero, indicating that changing precision is not present. 5.2 ESTIMATION One method of estimating the parameters of the varying p i s i o n model is maximum likelihood estimation. Assuming that the additive error term is normally distributed with mean 0 and variance u2. the log likelihood function for the varying precision mode1 is given by Pamneterestimatesof the group means p j , i = 1,.. .,R, the error variance u2 and the errorratio r may be obmined by determining the values of these parameters which maximize the likelihood function. Let wjj = (1 k + rgjj)-'. ni i=l j=l k ni i=l j=1 For a given value of r, this IikeIihood may be maximized using weighted least squares regression, weigh- observations. The estimates obtained using this procedure for a fixed value of r are n; n; j=1 j=1 5 each DRAFT and The maximized value of the likelihood is ~ a s ( r= ) -(n/2)1og(&’(?)) + (1/2) k n, xlogwij(r) - (n/2)- i=l j=1 The global maximum likelihood estimates are found by perfoming a one dimensional maximization over r. For a specificvariance function, the value of r which yields the largest value of the likelihood from this procedure is selected,and the c<Mespondingparameter estimates are maximum likelihood estimates. Asymptotic standard errors of the estimated group means may be obtained from the square root of the diagonal of the inverse of the information matrix. 5 3 TOLERANCE INTERVALS ModifM tolerance limits are computed using the results from the changing precision model described above. A prediction of the future value of corrosion for group i is given by The estimated standard error of a future observation in group a’ which is required for the tolerance limit calculations is given by 462 + +;, where & ,; is the variance of the estimated group mean jij. Note that for future values, the measurement error is not of interest. The standard error used for the tolerance limit calculations reflects variation in the true values, which depends upon the variance of the estimated group mean and the estimated variance 6’ of the additive error term, but does not depend upon the variance of the measurement error. Tolerance limits are computed using the tabled values provided by Odeh and Owen (1980). The estimated means and estimated standard errors of future observations for each group are used instead of the sample means and standard deviations of the individual groups. When 8’ is large relative to the variance of the estimated group mean, the distribution of the estimated standard error will be approximately chi-squared and tabled values may be used to construct the tolerance limits. The selection of tabled values depends on the number of degrees of freedom 6 DRAm associated with the estimated group means and the estimated standard errors of future observations. The degrees of freedom for the estimated group means are given by the number of observations in each group. The degrees of freedom for the standard m r s of future observations may be approximated by the total number of observations minus the number of groups minus one for the information used to estimate r. 5.4 EXAMPLE The method of computing tolerance limits for varying precision data will be illustrated using the generated corrosion data. In order to cany out the estimation procedure described in Section 2, a functional fonn must be specified for the variance functions gij. Figure 1 shows a plot of the projected corrosion values yij versus the time tij at which the original data values were collected. The plot indicates that the variance of the projected corrosion decreases with age. Several variance functions were considered for the corrosion data, including the following l.fI(2) = 1/.p 2.f2(2) = (20 - .)P 3.f3(z)= ezp(-X 4.f4(2) * z) = 2’ = ezp(Iog(2) * A) For f1 and f’, p is positive, with p = 1 or 2 reasonable choices. The function fi is chosen to have the “right” shape (large if age is small, relatively flat as age increases), while fi is “right“ at age = 20 (no variance due to age), but is likely to be too flat in the region of interest. The functions f3 and f4 have the advantage of being of more or less the same shape as f~(for some lambda), and of the same form used in Cook and Weisberg (1983) and elsewhere, so results in the literature are directly applicable. For the score test, as long as the variance function is in the right direction (i.e., so large ages imply less variable), the results of Chen (1983) imply that the exact form of the variance function is not very important. [provide detaiis: look at plot of data as a function of age use ezp(-A * age) use simple grid-search method to get ml estimate of lambda for several values of r result will be ml for lambda and r 7 c DRAFI’ Perform score test. Estimation Compte Tolerance Intervals Show figure with different tolerance intervals from different methods] 6. DESIGN CONSIDERATIONS Tolerance intervals may also be used in the ongoing process of scientific inquiry to help determine additional experiments to run. By examining the formulas used to calculate tolerance intervals, the impact of obtaining a d d i t i d data can be examined. [Go through example.] ACKNOWLEDGEMENTS The author wishes to acknowledge Don MacMillan of Los Alamos National Laboratory for providing the data which motivated this work, Sandy Weisberg, U. of Minnesota, for assistance in the development of the varying precision method, and Tom Bement of Los Alamos National Laboratory for encouraging this work. REFERENCES 1. Beckman and Tietjen, ‘‘Two-sided Tolerance Limits for B a l a n d Random-Effects ANOVA Models,”Tech- nornetrics, 31,2, pp. 185-197. 2. Chen, C.F. (1983), “Score Tests for Regression Models,”Journal of the American Statistical Association, 78, 158-161. 3. Cook, R. D., and Weisberg, S. (1983), “Diagnostics for Heteroskedasticity in Regression,” B i o m e t d a , 70, 1-10. 4. Guttman, I. (1970), Stadistical Tolerance Regions, Classical and Bayesian, Charles Griffin t Company, Limited, London. 5. Hwang, J. T. Gene (19!22), “prediction and Tolerance Intervals for Linear Measurement Error Models with Applications in predicting the Compressive Sttength of Concrete,” presented at 1992 Joint Statistical Meetings. 8 DRAFI’ 6. Kim, Y. G. and Myers, R. H. (1992), “A Response Surface Approach to Data Analysis in Robust Parameter Design,” Technical Report Number 92-12, Dept. of Statistics, Virginia Polytechnic Institute and Stale University. 7. Lemon, G. H. (1977), “Factors for One-sided Tolerance Limits for B a l a n d One-Way-ANOVA Random Effects Model,” Journal of the American Statistical Association, 72,676-680. 8. Mee, R. W. and Owen, D. B. (1983), “Improved Factors for One-sided Tolerance Limits for Balanced One- Way ANOVA Random Models,” Journal of the American Statistical Association, 78,901-905. 9. Meeker and Hahn (1991), Statistical Intervals, A Guide for Practitioners, Wiley, New York 10. Odeh, R. E.and Owen, D. B. (1983), Tables f o r Normal Tolerance Limits, Sampling Plans, and Screening, New Yorlr: Marcel Dekker. 11. Owen, D. B. (1%3), Factors f o r One-sided Tolerance Limits and for Variables Sampling Plans, Sandia CorporatKHl . Monograph, SCR-607, Mathematics and Computers, TID-4500 (19th Edition). 12. Wald and Wolfowitz (1946), “Tolerance Limits for a Normal Distribution,” The Annals of Mathematical Statistics, 17,208-215. 13. Wallis, W. A. (1951), “Tolerance Intervals for Linear Regression,” Second Berkeley Symp. Math. Stat. Rob., Univ. of California Press, 43-51. 14. Wang, C. M. and Iyer, H. K. (1994), “Tolerance Intervals for the Distribution of True Values in the Presence of Measurement Errors,” Technometrics, 36,2, 162-170. 15. Weisberg, S. (1992), personal communication. 9