Validity
Validity
Validity
Validity
Reliability is a necessary, but not sufficient, condition for validity. The lower the reliability, the
lower the validity à Rxy < √Rxx.
Rxy=validity coefficient(correlation between scores on procedure X and external criterion Y).
Rxx=reliability coefficient.
Rxt = Rxy à Rxt = correlation between scores on some procedure and “true score”.
√Ryy
Using different reliability estimates is likely to lead to different conclusions regarding validity.
An underestimation of Ryy produces an overestimation of the validity coefficient.
* Traditionally, validity was viewed as the extent to which a measurement procedure actually
measures what it is designed to measure.
* Validation = the investigative process of gathering or evaluating the necessary data:
1). WHAT a test or procedure measures;
2). HOW WELL it measures.
* Validity is not a dichotomous variable (valid or not valid), but a matter of degree.
* No different “kinds” of validity, but only different kinds of evidence for analyzing validity.
* It is the inferences regarding the specific uses of a test or other measurement procedure that are
validated, not the test itself.
* Validity is an evolving property and validation is a continuing process.
1. Content-related evidence
The extent to which items cover the intended domain.
* Although it does have its limitations, it has made a positive contribution by directing
attention toward (1) improved domain sampling and job analysis procedures, (2) better
behavior measurement, (3) the role of expert judgment in confirming the fairness of
sampling and scoring procedures and in determining the degree of overlap between
separately derived content domains.
2. Criterion-related evidence
Empirical relationship between predictor and criterion scores.
Predictive study = oriented toward the future and involves a time interval during which
events take place à “Is it likely that Laura will be able to do the job?”
* 1. Measure candidates for the job;
2. Select candidates without using the results of the measurement procedure;
3. Obtain measurements of criterion performance at some later date;
4. Assess the strengths of the relationship between predictor and criterion.
* Statistical power: the probability of rejecting a null hypothesis when it is false:
Parameters: 1. The power of the test (1–ß)
2. Type I error (α)
3. Sample size, N (power increases as N increases)
4. Effect size (power increases, as effect size increases).
* A power analysis should be conducted before a study is conducted.
* “small” (.10), “medium” (.30) of “large” (.50) effects.
* Assuming that multiple predictors are used in a validity study and that each predictor
accounts for some unique criterion variance, the effect size of a linear combination of the
predictors is likely to be higher than the effect size of any single predictor.
* When has an employee been on the job long enough to appraise his performance?
When there is evidence that the initial learning period has passed (+ after 6 months).
Concurrent study = oriented toward the present and reflects only the status quo at a
particular time à “Can Laura do the job now?”
* Criterion measures usually are substitutes for other, more important, costly, complex
performance measures. Valuable only if:
1). There is a relationship between the convenient/accessible measure and the
costly/complex measure;
2). The use of the substitute measure is more effective.
* With cognitive ability tests, concurrent studies often are used as substitutes for
predictive studies.
* This design ignores the effects of motivation and job experience on ability.
Range Improvement
Range Limit
Because the size of the validity coefficient is a function of two variables, restricting the
range (truncating or censoring) either of the predictor of the criterion will serve to lower
the size of the validity coefficient (figure 7-1, p. 150). Selection effects on validity
coefficients result from changes in the variance(s) of the variable(s).
à Direct range restriction & Indirect/Incidental range restriction (when an experimental
predictor is administrated to applicants, but is not used as a basis for selection decisions).
* The range of scores also may be narrowed by preselection; when a predictive validity
study is undertaken after a group of individuals has been hired, but before criterion data
become available for them.
* Selection at the hiring point reduces the range of the predictor variable(s), and selection
on the job during training reduces the range of the criterion variable(s).
To correct for direct range restriction on the predictor when nu third variable is involved; 3
scenario’s (formula’s à figures 7-4, 7-5, 7-6, p. 151).
1). See formula
2). Selection takes place on one variable (either the predictor or the criterion), but the
unrestricted variance is not known
3). If incidental restriction takes place on third variable z and the unrestricted variance on z is
known.
à In practice, there may be range-restriction scenarios that are more difficult to address with
corrections. These include:
1. Those were the unrestricted variance on the predictor, the criterion, or the third variable is
unknown, and:
2. Those were there is simultaneous or sequential restriction on multiple variables.
Multivariate-correction formula = can be used when direct restriction (one or two variables)
and incidental restriction take place simultaneously. Also, the equation can be used repeatedly
when restriction occurs on a sample that is already restricted à Computer program RANGEJ,
which makes this correction easy to implement.
There are several correction procedures available. Criterion-related validation efforts focusing on
a multiple-hurdle process should consider appropriate corrections that take into account that
range restriction, or missing data, takes place after each test is administered. Corrections are
appropriate only when they are justified based on the target population (the population to which
one wishes to generalize the obtained corrected validity coefficient).
3. Construct-related evidence
The understanding of a trait or construct that a test measures.
Cross Validity = to whether the weights derived from one sample can predict outcomes to the
same degree in the population as a whole or in other samples drawn from the same population.
There are procedures available to compute cross-validity:
- Empirical cross-validity à fitting a regression model in a sample and using the resulting
regression weights with a second independent cross-validation sample.
- Statistical cross-validity à adjusting the sample-based multiple correlation coefficient (R) by a
function of sample size (N) and the number of predictors (k).
Cross-validation, including rescaling and reweighting of items if necessary, should be continual
(recommended annually), for as values change, jobs change, and people change, so also do the
appropriateness and usefulness of inferences made from test scores.
In many cases, local validation may not be feasible due to logistics or practical constraints.
Several strategies available to gather validity in such situations:
Synthetic Validity
Process of inferring validity in a specific situation from a systematic analysis of jobs into
their elements, a determination of test validity for these elements, and a combination or
synthesis of the elemental validities into a whole.
Test Transportability
to be able to use a test that has been used elsewhere locally without the need for a local
validation study, evidence must be provided regarding some points.
Empirical Bayes Analysis = his approach involves first calculating the average inaccuracy of
meta-analysis and a local validity study under a wide variety of conditions and then computing
an empirical Bayesian estimate, which is a weighted average of the meta-analytically derived and
local study estimates.