An Individual
An Individual
An Individual
largely temporary conditions, such as his mental alertness or emotional state, and uncontrolled differences in test
method facets, such as changes in the test environment from one day to the next, or idiosyncratic differences in the
way different test administrators carry out their responsibilities.
Thorndike (1951) and Stanley (1971) begin their treatments of reliability with
general frameworks for describing the factors that cause test scores to vary from individual to individual:
general and specific lasting characteristics,
general and specific temporary characteristics,
and systematic and chance factors related to test administration and scoring.
Since we never know what the true or error scores are, we cannot know the reliability of the observed scores. To be
able to estimate the reliability of observed scores, then, we must define reliability operationally in a way that depends
only on
observed scores.
Thus, if the observed scores on two parallel tests are highly correlated, this indicates that effects of the error scores
are minimal, and that they can be considered reliable indicators of the ability being measured.
The definition of reliability (the basis for all estimates of reliability within CTS theory): the correlation between the
observed scores on two parallel tests, which we can symbolize as r .
xx’
Assumption: the observed scores on the two tests are experimentally independent. That is, an individual's
performance on the second test should not depend on how she performs on the first.
If an individual's observed score on a test is composed of a true score and an error score, the greater
the proportion of true score, the less the proportion of error score, and thus the more reliable the
observed score.
Thus, one way of defining reliability is as the proportion of the observed score variance that is true
score variance:
2 2
r =s /s
xx’ t x
Note: reliability refers to the test scores, and not the test itself.
Internal consistency is concerned with how consistent test takers’ performances on the different parts of
the test are with each other.
Inconsistencies in performance on different parts of tests can be caused by a number of factors, including
the test method facets.
SPLIT-HALF RELIABILITY ESTIMATES
One approach to examining the internal consistency of a test is the split-half method, in which we divide
the test into two halves and then determine the extent to which scores on these two halves are consistent
with each other.
In so doing, we are treating the halves as parallel tests, and so we must make certain assumptions about
the equivalence of the two halves, specifically that they have equal means and variances. In addition, we
must also assume that the two halves are independent of each other.
In some cases, where we are not sure that the items are measuring the same ability or that they are
independent of each other, the test-retest and parallel forms methods, are more appropriate for
estimating reliability.
The Spearman-Brown split-half estimate
Once the test has been split into halves, it is rescored, yielding two scores - one for each half - for each test
taker.
In one approach to estimating reliability, we then compute the correlation between the two sets of scores.
This gives us an estimate of how consistent the halves are,
however, and we are interested in the reliability of the whole test.
In general, a long test will be more reliable than a short one, assuming that the additional items correlate
positively with the other items in the test.
Since this formula is based on the variance of the total test, it provides a direct
estimate of the reliability of the whole test.
Therefore, unlike Spearman-Brown, the Guttman split-half estimate does not
require an additional correction for length.