Statistical Model Validation
Statistical Model Validation
Statistical Model Validation
In statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or
not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes,
resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model
validation is used to test whether a statistical model can hold up to permutations in the data. This topic is not
to be confused with the closely related task of model selection, the process of discriminating between
multiple candidate models: model validation does not concern so much the conceptual design of models as
it tests only the consistency between a chosen model and its stated outputs.
There are many ways to validate a model. Residual plots plot the difference between the actual data and the
model's predictions: correlations in the residual plots may indicate a flaw in the model. Cross validation is a
method of model validation that iteratively refits the model, each time leaving out just a small sample and
comparing whether the samples left out are predicted by the model: there are many kinds of cross
validation. Predictive simulation is used to compare simulated data to actual data. External validation
involves fitting the model to new data. Akaike information criterion estimates the quality of a model.
Overview
Model validation comes in many forms and the specific method of model validation a researcher uses is
often a constraint of their research design. To emphasize, what this means is that there is no one-size-fits-all
method to validating a model. For example, if a researcher is operating with a very limited set of data, but
data they have strong prior assumptions about, they may consider validating the fit of their model by using
a Bayesian framework and testing the fit of their model using various prior distributions. However, if a
researcher has a lot of data and is testing multiple nested models, these conditions may lend themselves
toward cross validation and possibly a leave one out test. These are two abstract examples and any actual
model validation will have to consider far more intricacies than describes here but these example illustrate
that model validation methods are always going to be circumstantial.
In general, models can be validated using existing data or with new data, and both methods are discussed
more in the following subsections, and a note of caution is provided, too.
Validation based on existing data involves analyzing the goodness of fit of the model or analyzing whether
the residuals seem to be random (i.e. residual diagnostics). This method involves using analyses of the
models closeness to the data and trying to understand how well the model predicts its own data. One
example of this method is in Figure 1, which shows a polynomial function fit to some data. We see that the
polynomial function does not conform well to the data, which appears linear, and might invalidate this
polynomial model.
Commonly, statistical models on existing data are validated using a validation set, which may also be
referred to as a holdout set. A validation set is a set of data points that the user leaves out when fitting a
statistical model. After the statistical model is fitted, the validation set is used as a measure of the model's
error. If the model fits well on the initial data but has a large error on the validation set, this is a sign of
overfitting, as seen in Figure 1.
A Note of Caution
Figure 1. Data (black dots), which was generated
A model can be validated only relative to some
via the straight line and some added noise, is
application area.[1][2] A model that is valid for one perfectly fitted by a curvy polynomial.
application might be invalid for some other
applications. As an example, consider the curve in
Figure 1: if the application only used inputs from the interval [0, 2], then the curve might well be an
acceptable model.
Expert judgment can sometimes be used to assess the validity of a prediction without obtaining real data:
e.g. for the curve in Figure 1, an expert might well be able to assess that a substantial extrapolation will be
invalid. Additionally, expert judgment can be used in Turing-type tests, where experts are presented with
both real data and related model outputs and then asked to distinguish between the two.[4]
For some classes of statistical models, specialized methods of performing validation are available. As an
example, if the statistical model was obtained via a regression, then specialized analyses for regression
model validation exist and are generally employed.
Residual diagnostics
Residual diagnostics comprise analyses of the residuals to determine whether the residuals seem to be
effectively random. Such analyses typically requires estimates of the probability distributions for the
residuals. Estimates of the residuals' distributions can often be obtained by repeatedly running the model,
i.e. by using repeated stochastic simulations (employing a pseudorandom number generator for random
variables in the model).
If the statistical model was obtained via a regression, then regression-residual diagnostics exist and may be
used; such diagnostics have been well studied.
Cross validation
Cross validation is a method of sampling that involves leaving some parts of the data out of the fitting
process and then seeing whether those data that are left out are close or far away from where the model
predicts they would be. What that means practically is that cross validation techniques fit the model many,
many times with a portion of the data and compares each model fit to the portion it did not use. If the
models very rarely describe the data that they were not trained on, then the model is probably wrong.
See also
All models are wrong Predictive model
Cross-validation (statistics) Sensitivity analysis
Identifiability analysis Spurious relationship
Internal validity Statistical conclusion validity
Model identification Statistical model selection
Overfitting Statistical model specification
Perplexity Validity (statistics)
References
1. National Research Council (2012), "Chapter 5: Model validation and prediction" (https://ww
w.nap.edu/read/13395/chapter/7), Assessing the Reliability of Complex Models:
Mathematical and statistical foundations of verification, validation, and uncertainty
quantification, Washington, DC: National Academies Press, pp. 52–85, doi:10.17226/13395
(https://doi.org/10.17226%2F13395), ISBN 978-0-309-25634-6.
2. Batzel, J. J.; Bachar, M.; Karemaker, J. M.; Kappel, F. (2013), "Chapter 1: Merging
mathematical and physiological knowledge", in Batzel, J. J.; Bachar, M.; Kappel, F. (eds.),
Mathematical Modeling and Validation in Physiology, Springer, pp. 3–19, doi:10.1007/978-3-
642-32882-4_1 (https://doi.org/10.1007%2F978-3-642-32882-4_1).
3. Deaton, M. L. (2006), "Simulation models, validation of", in Kotz, S.; et al. (eds.),
Encyclopedia of Statistical Sciences, Wiley.
4. Mayer, D. G.; Butler, D.G. (1993), "Statistical validation", Ecological Modelling, 68 (1–2): 21–
32, doi:10.1016/0304-3800(93)90105-2 (https://doi.org/10.1016%2F0304-3800%2893%299
0105-2).
Further reading
Barlas, Y. (1996), "Formal aspects of model validity and validation in system dynamics",
System Dynamics Review, 12 (3): 183–210, doi:10.1002/(SICI)1099-
1727(199623)12:3<183::AID-SDR103>3.0.CO;2-4 (https://doi.org/10.1002%2F%28SICI%29
1099-1727%28199623%2912%3A3%3C183%3A%3AAID-SDR103%3E3.0.CO%3B2-4)
Good, P. I.; Hardin, J. W. (2012), "Chapter 15: Validation", Common Errors in Statistics
(Fourth ed.), John Wiley & Sons, pp. 277–285
Huber, P. J. (2002), "Chapter 3: Approximate models", in Huber-Carol, C.; Balakrishnan, N.;
Nikulin, M. S.; Mesbah, M. (eds.), Goodness-of-Fit Tests and Model Validity, Springer,
pp. 25–41
External links
How can I tell if a model fits my data? (http://www.itl.nist.gov/div898/handbook/pmd/section4/
pmd44.htm) —Handbook of Statistical Methods (NIST)
Hicks, Dan (July 14, 2017). "What are core statistical model validation techniques?" (https://s
tats.stackexchange.com/q/291481). Stack Exchange.