Item Response Theory - An Introduction

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 23

Item Response Theory

An Introduction

This paper provides a basic introduction to Item Response Theory

Prepared by: Kevin McCormack


Table of Contents Page

1. Background 1
1.1 Measuring latent trait 1
1.2 Classical and Item Response Theories 1
1.3 Item Characteristic curve 2
1.4 Discrimination 4
1.4.1 Perfect Discrimination 5
1.5 Features of item characteristic curves 6

2 Item Characteristic Curve Model 7


2.1 The Logistic Function 7
2.1.1 Example 8
2.2 The Rasch, or One-Parameter, Logistic Model 10
2.3 The Three-Parameter Model 12
2.3.1 Computational example 13
2.4 Negative Discrimination 15
2.5 Guidelines for Interpreting Item Parameter Values 16
2.6 Features of Item Characteristic Models 18
1 Background

In many educational and psychological measurement situations, there is an underlying variable


of interest. This variable is often something that is intuitively understood, such as “intelligence.”
When people are described as being bright or average, the listener has some idea as to what
the speaker is conveying about the object of the discussion. Similarly, one can talk about
scholastic ability and its attributes, such as getting good grades, learning new material easily,
relating various sources of information, and using study time effectively. In academic areas, one
can use descriptive terms such as reading ability and arithmetic ability. Each of these is what
psychometricians refer to as an unobservable, or latent trait.

1.1 Measuring latent trait

Although such a variable is easily described, and knowledgeable persons can list its attributes, it
cannot be measured directly as can height or weight, for example, since the variable is a
concept rather than a physical dimension. A primary goal of educational and psychological
measurement is the determination of how much of such a latent trait a person possesses. Since
most of the research has dealt with variables such as scholastic, reading, mathematical, and
arithmetic abilities, the generic term “ability” is used within item response theory to refer to such
latent traits.

If one is going to measure how much of a latent trait a person has, it is necessary to have a
scale of measurement, i.e., a ruler having a given metric. For a number of technical reasons,
defining the scale of measurement, the numbers on the scale, and the amount of the trait that
the numbers represent is a very difficult task.

In general, this problem is solved by simply defining an arbitrary underlying ability scale. It will
be assumed that, whatever the ability, it can be measured on a scale having a midpoint of zero,
a unit of measurement of one, and a range from negative infinity to positive infinity. Since there
is a unit of measurement and an arbitrary zero point, such a scale is referred to as existing at an
interval level of measurement.

The underlying idea here is that if one could physically ascertain the ability of a person, this ruler
would be used to tell how much ability a given person has, and the ability of several persons
could be compared. While the theoretical range of ability is from negative infinity to positive
infinity, practical considerations usually limit the range of values from, say, -3 to +3. However,
you should be aware that values beyond this range are possible.

1.2 Classical and Item Response Theories

The usual approach taken to measure an ability is to develop a test consisting of a number of
items (questions). Each of these items measures some facet of the particular ability of interest.
From a purely technical point of view, such items should be free-response items for which the
examinee can write any response that seems appropriate. The person scoring the test must
then decide whether the response is correct or not. When the item response is determined to be
correct, the examinee receives a score of one; an incorrect answer receives a score of zero,
(i.e., the item is dichotomously scored).

1
Under classical test theory, the examinee’s raw test score would be the sum of the scores
received on the items in the test. Under item response theory, the primary interest is in
whether an examinee got each individual item correct or not, rather than in the raw test score.
This is because the basic concepts of item response theory rest upon the individual items of a
test rather than upon some aggregate of the item responses such as a test score.

From a practical point of view, free-response items are difficult to use in a test. In particular, they
are difficult to score in a reliable manner. As a result, most tests used under item response
theory consist of multiple-choice items. These are scored dichotomously: the correct answer
receives a score of one, and each of the wrong answer (distractors) yields a score of zero.

Items scored dichotomously are often referred to as binary items. A reasonable assumption is
that each examinee responding to a test item possesses some amount of the underlying ability.
Thus, one can consider each examinee to have a numerical value, a score, that places him or
her somewhere on the ability scale. This ability score will be denoted by the Greek letter theta,
θ.

1.3 Item characteristic curve

At each ability level, there will be a certain probability that an examinee with that ability will give
a correct answer to the item. This probability will be denoted by P(θ). In the case of a typical test
item, this probability will be small for examinees of low ability and large for examinees of high
ability. If one plotted P(θ ) as a function of ability, the result would be a smooth S-shaped curve
such as shown in Figure 1-1. The probability of correct response is near zero at the lowest
levels of ability. It increases until at the highest levels of ability, the probability of correct
response approaches 1. This S-shaped curve describes the relationship between the probability
of correct response to an item and the ability scale. In item response theory, it is known as the
item characteristic curve. Each item in a test will have its own item characteristic curve.

FIGURE 1-1. A typical item characteristic curve

2
The item characteristic curve is the basic building block of item response theory; all the other
constructs of the theory depend upon this curve. Therefore, considerable attention will be
devoted to this curve and its role within the theory. There are two technical properties of an item
characteristic curve that are used to describe it. The first is the difficulty of the item. Under item
response theory, the difficulty of an item describes where the item functions along the ability
scale. For example, an easy item functions among the low-ability examinees and a hard item
functions among the high-ability examinees; thus, difficulty is a location index. The second
technical property is discrimination, which describes how well an item can differentiate between
examinees having abilities below the item location and those having abilities above the item
location.

This property essentially reflects the steepness of the item characteristic curve in its middle
section. The steeper the curve, the better the item can discriminate. The flatter the curve, the
less the item is able to discriminate since the probability of correct response at low ability levels
is nearly the same as it is at high ability levels.

Using these two descriptors, one can describe the general form of the item characteristic curve.
These descriptors are also used to discuss the technical properties of an item. It should be
noted that these two properties say nothing about whether the item really measures some facet
of the underlying ability or not; that is a question of validity. These two properties simply
describe the form of the item characteristic curve. The idea of item difficulty as a location index
will be examined first.

In Figure 1-2, three item characteristic curves are presented on the same graph. All have the
same level of discrimination but differ with respect to difficulty. The left-hand curve represents
an easy item because the probability of correct response is high for low-ability examinees and
approaches 1 for high-ability examinees. The center curve represents an item of medium
difficulty because the probability of correct response is low at the lowest ability levels, around .5
in the middle of the ability scale and near 1 at the highest ability levels. The right-hand curve
represents a hard item. The probability of correct response is low for most of the ability scale
and increases only when the higher ability levels are reached. Even at the highest ability level
shown (+3), the probability of correct response is only .8 for the most difficult item.

FIGURE 1-2. Three item characteristic curves with the same discrimination but different levels
of difficulty

3
1.4 Discrimination

The concept of discrimination is illustrated in Figure 1-3. This figure contains three item
characteristic curves having the same difficulty level but differing with respect to discrimination.
The upper curve has a high level of discrimination since the curve is quite steep in the middle
where the probability of correct response changes very rapidly as ability increases. Just a short
distance to the left of the middle of the curve, the probability of correct response is much less
than .5, and a short distance to the right the probability is much greater than .5. The middle
curve represents an item with a moderate level of discrimination.

The slope of this curve is much less than the previous curve and the probability of correct
response changes less dramatically than the previous curve as the ability level increases.
However, the probability of correct response is near zero for the lowest-ability examinees and
near 1 for the highest ability examinees.

The third curve represents an item with low discrimination. The curve has a very small slope
and the probability of correct response changes slowly over the full range of abilities shown.
Even at low ability levels, the probability of correct response is reasonably large, and it
increases only slightly when high ability levels are reached. The reader should be warned that
although the figures only show a range of ability from -3 to +3, the theoretical range of ability is
from negative infinity to positive infinity. Thus, all item characteristic curves of the type used
here actually become asymptotic to a probability of zero at one tail and to 1.0 at the other tail.
The restricted range employed in the figures is necessary only to fit the curves on the computer
screen reasonably.

FIGURE 1-3. Three item characteristic curves with the same difficulty but with different levels of
discrimination of zero at one tail and to 1.0 at the other tail.

The restricted range employed in the figures is necessary only to reasonably fit the curves on
the page.

4
1.4.1 Perfect Discrimination

One special case is of interest is namely, that of an item with perfect discrimination. The item
characteristic curve of such an item is a vertical line at some point along the ability scale. Figure
1-4 shows such an item.

To the left of the vertical line at θ = 1.5, the probability of correct response is zero; to the right of
the line, the probability of correct response is 1. Thus, the item discriminates perfectly between
examinees whose abilities are above and below an ability score of 1.5. Such items would be
ideal for distinguishing between examinees with abilities just above and below 1.5. However,
such an item makes no distinction among those examinees with abilities above 1.5 nor among
those examinees with abilities below 1.5.

FIGURE 1-4. An item that discriminates perfectly at θ = 1.5

At the present point in the presentation of item response theory, the goal is to allow you to
develop an intuitive understanding of the item characteristic curve and its properties. In keeping
with this goal, the difficulty and discrimination of an item will be defined in verbal terms.

Difficulty will have the following levels:

 very easy
 easy
 medium
 hard
 very hard

Discrimination will have the following levels:

 none
 low
 moderate
 high
 perfect

5
1.5 Features of item characteristic curves

1. When the item discrimination is less than moderate, the item characteristic curve is
nearly linear and appears rather flat.

2. When discrimination is greater than moderate, the item characteristic curve is S-shaped
and rather steep in its middle section.

3. When the item difficulty is less than medium, most of the item characteristic curve has a
probability of correct response that is greater than 0.5.

4. When the item difficulty is greater than medium, most of the item characteristic curve
has a probability of correct response less than 0.5.

5. Regardless of the level of discrimination, item difficulty locates the item along the ability
scale. Therefore item difficulty and discrimination are independent of each other.

6. When an item has no discrimination, all choices of difficulty yield the same horizontal line
at a value of P(θ) = 0.5. This is because the value of the item difficulty for an item with
no discrimination is undefined.

7. If you have been very observant, you may have noticed the point at which P(θ) = 0.5
corresponds to the item difficulty. When an item is easy, this value occurs at a low ability
level. When an item is hard, this value corresponds to a high ability level.

6
2 Item Characteristic Curve Models

In the first section of this paper, the properties of the item characteristic curve were defined in
terms of verbal descriptors. While this is useful to obtain an intuitive understanding of item
characteristic curves, it lacks the precision and rigor needed by a theory.

Consequently, in this section three mathematical models are introduced for the item
characteristic curve. These models provide a mathematical equation for the relation of the
probability of correct response to ability. Each model employs one or more parameters whose
numerical values define a particular item characteristic curve. Such mathematical models are
needed if one is to develop a measurement theory that can be rigorously defined and is
amenable to further growth. In addition, these models and their parameters provide a vehicle for
communicating information about an item’s technical properties.

For each of the three models, the mathematical equation will be used to compute the probability
of correct response at several ability levels. Then the graph of the corresponding item
characteristic curve will be shown. The goal of this section is to develop a sense of how the
numerical values of the item parameters for a given model relate to the shape of the item
characteristic curve.

2.1 The Logistic Function

Under item response theory, the standard mathematical model for the item characteristic curve
is the cumulative form of the logistic function. It defines a family of curves having the general
shape of the item characteristic curves shown in the first chapter. The logistic function was first
derived in 1844 and has been widely used in the biological sciences to model the growth of
plants and animals from birth to maturity. It was first used as a model for the item characteristic
curve in the late 1950s and, because of its simplicity, has become the preferred model. The
equation for the two-parameter logistic model is given in equation 2-1 below.

(2.1)

where:

e is the constant 2.718


b is the difficulty parameter
a Is the discrimination parameter1
L = a(θ-b) is the logistic derivate (logit) and
θ is the ability level.

The difficulty parameter, denoted by b, is defined as the point on the ability scale at which the
probability of correct response to the item is 0.5. The theoretical range of the values of this
parameter is . However, typical values have the range .

1
In much of the item response literature, the parameter a is reported as a normal ogive model value that is then multiplied
by 1.70 to obtain the corresponding logistic value.

7
Due to the S shape of the item characteristic curve, the slope of the curve changes as a function
of the ability level and reaches a maximum value when the ability level equals the item’s
difficulty. Because of this, the discrimination parameter does not represent the general slope of
the item characteristic curve as was indicated in Section 1. The technical definition of the
discrimination parameter is beyond the level of this note. However, a usable definition is that
this parameter is proportional to the slope of the item characteristic curve at θ = b.

The actual slope at θ = b is a/4, but considering a to be the slope at b is an acceptable


approximation that makes interpretation of the parameter easier in practice. The theoretical
range of the values of this parameter is , but the usual range seen in practice is
-2.80 to +2.80.

2.1.1 Example

To illustrate how the two-parameter model is used to compute the points on an item
characteristic curve, consider the following example.

The values of the item parameters are:

b = 1.0 is the item difficulty.


a = .5 is the item discrimination.

The illustrative computation is performed at the ability level θ = -3.0.

The first term to be computed is the logistic deviate (logit), L, where:

L = a (θ - b).

Substituting the appropriate values yields:

L = .5 (-3.0 - 1.0) = -2.0.

The next term computed is e (2.718) raised to the power -L.

Substituting yields:

EXP (-L) = EXP (2.0) = 7.389, where EXP represents e.

Now the denominator of equation 2-1 can be computed as:

1 + EXP (-L) = 1 + 7.389 = 8.389.

Finally, the value of P(θ ) is:

8
Thus, at an ability level (T) of -3.0, the probability of responding correctly to this item is 0.12.

From the above, it can be seen that computing the probability of correct response at a given
ability level is very easy using the logistic model.

Table 2-1 shows the calculations for this item at seven ability levels evenly spaced over the
range of abilities from -3 to +3.

Two Parameter Model

Ability Logit EXP (-L) 1 – EXP(-L) P


-3 -2.0 7.389 8.389 0.12
-2 -1.5 4.482 5.482 0.18
-1 -1.0 2.718 3.718 0.27
0 -0.5 1.649 2.649 0.38
1 0.0 1.000 2.000 0.50
2 0.5 0.607 1.607 0.62
3 1.0 0.368 1.368 0.73

Table 2-1. Item characteristic curve calculations under a two-parameter model, b = 1.0, a =0 .5

The item characteristic curve for the item of Table 2-1 is shown below. The vertical arrow
corresponds to the value of the item difficulty.

FIGURE 2-1. Item characteristic curve for a two parameter model with b = 1.0, a = 1.5

9
2.2 The Rasch, or One-Parameter, Logistic Model

The next model of interest was first published by the Danish mathematician Georg Rasch in the
1960s. Rasch approached the analysis of test data from a probability theory point of view.
Although he started from a very different frame of reference, the resultant item characteristic
curve model was a logistic model.

Under this model, the discrimination parameter of the two-parameter logistic model is fixed at a
value of a = 1.0 for all items; only the difficulty parameter can take on different values.
Because of this, the Rasch model is often referred to as the one parameter logistic model.

The equation for the Rasch model is given by the following:

(2.2)

where: b is the difficulty parameter and θ is the ability level.

It should be noted that a discrimination parameter was used in equation 2-2, but because it
always has a value of 1.0, it usually is not shown in the formula.

2.2.1 Computational Example

Again the illustrative computations for the model will be done for the single ability level -3.0.
The value of the item difficulty parameter is:

b = 1.0.

The first term computed is the logit, L, where:

L = a (θ - b)

Substituting the appropriate values yields:

L = 1.0 (-3.0 - 1.0) = -4.0

Next, the e to the x term is computed, giving:

EXP(-L) = 54.598

The denominator of equation 2-2 can be computed as:

1 + EXP(-L) = 1.0 + 54.598 = 55.598

10
Finally, the value of P(θ ) is:

Thus, at an ability level of -3.0, the probability of responding correctly to this item is .02. Table
2-2 shows the calculations for seven ability levels.

The item characteristic curve corresponding to the item in Table 2-2 is shown below

One Parameter Model

Ability Logit EXP (-L) 1 – EXP(-L) P


-3 -4.0 54.598 55.598 0.02
-2 -3.0 20.086 21.086 0.05
-1 -2.0 7.389 8.389 0.12
0 -1.0 2.718 3.718 0.27
1 0.0 1.000 2.000 0.50
2 1.0 0.368 1.368 0.73
3 2.0 0.135 1.135 0.88

Table 2-2. Calculations for the one-parameter model, b = 1.0

FIGURE 2-2. Item characteristic curve for a one-parameter model with b = 1.0
2.3 The Three-Parameter Model

11
One of the facts of life in testing is that examinees will get items correct by guessing. Thus, the
probability of correct response includes a small component that is due to guessing. Neither of
the two previous item characteristic curve models took the guessing phenomenon into
consideration. Birnbaum (1968) modified the two-parameter logistic model to include a
parameter that represents the contribution of guessing to the probability of correct response.

Unfortunately, in so doing, some of the nice mathematical properties of the logistic function were
lost. Nevertheless the resulting model has become known as the three-parameter logistic
model, even though it technically is no longer a logistic model. The equation for the three-
parameter model is:

(2.3)

where:

b is the difficulty parameter


a Is the discrimination parameter
c is the guessing parameter and
θ is the ability level.

The parameter c is the probability of getting the item correct by guessing alone. It is important to
note that by definition, the value of c does not vary as a function of the ability level. Thus, the
lowest and highest ability examinees have the same probability of getting the item correct by
guessing. The parameter c has a theoretical range of , but in practice, values above
0.35 are not considered acceptable.

A side effect of using the guessing parameter c is that the definition of the difficulty parameter is
changed. Under the previous two models, b was the point on the ability scale at which the
probability of correct response was 0.5. But now, the lower limit of the item characteristic curve
is the value of c rather than zero. The result is that the item difficulty parameter is the point on
the ability scale where:

This probability is halfway between the value of c and 1.0. What has happened here is that the
parameter c has defined a floor to the lowest value of the probability of correct response. Thus,
the difficulty parameter defines the point on the ability scale where the probability of correct
response is halfway between this floor and 1.0.

The discrimination parameter a can still be interpreted as being proportional to the slope of the
item characteristic curve at the point θ = b. However, under the three-parameter model, the

slope of the item characteristic curve at θ = b is actually .

12
While these changes in the definitions of parameters b and a seem slight, they are important
when interpreting the results of test analyses.

2.3.1 Computational Example

The probability of correct response to an item under the three-parameter model will be shown
for the following item parameter values:

b = 1.5, a = 1.3, c =.2

at an ability level of θ = -3.0.

The logit is:

L = a (θ - b) = 1.3 (-3.0 - 1.5) = -5.85

The e to the x term is:

EXP(-L) = EXP(5.85) = 347.234

The next term of interest is:

1 + EXP(-L) = 1.0 + 347.234 = 348.234

and then,

1/(1 + EXP(-L)) = 1/348.234 = .0029

Up to this point, the computations are exactly the same as those for a two-parameter model with
b = 1.5 and a = 1.3. But now the guessing parameter enters the picture. From equation 2-3
we have:

P(θ ) = c + (1 - c) (0.0029) and,

c = 0.2 so that:
P(θ ) = 0.2 + (1.0 - 0.2) (0.0029)
= 0.2 + (0.80)(0.0029)
= 0.2 + (0.0023)
= 0.2023

Thus, at an ability level of -3.0, the probability of responding correctly to this item is 0.2023.
Table 2-3 shows the calculations at seven ability levels.

Three Parameter Model

13
Ability Logit EXP (-L) 1 – EXP(-L) P
-3 -5.85 347.234 348.234 0.20
-2 -4.55 94.632 95.632 0.21
-1 -3.25 25.790 26.790 0.23
0 -1.95 7.029 8.029 0.30
1 -0.65 1.916 2.916 0.47
2 0.65 0.522 1.522 0.73
3 1.95 0.142 1.142 0.90

Table 2-3. Calculations for the three-parameter model, b = 1.5, a = 1.3, c = .2

FIGURE 2-3. Item characteristic curve for a three parameter model with b = 1.5, a = .3, c = .2

2.4 Negative Discrimination

While most test items will discriminate in a positive manner (i.e., the probability of correct
response increases as the ability level increases), some items have negative discrimination. In

14
such items, the probability of correct response decreases as the ability level increases from low
to high. Figure 2-4 depicts such an item.

FIGURE 2-4. An item with negative discrimination under a two-parameter model with b = 0, a = -.75

Items with negative discrimination occur in two ways. First, the incorrect response to a two-
choice item will always have a negative discrimination parameter if the correct response has a
positive value. Second, sometimes the correct response to an item will yield a negative
discrimination index.

This tells you that something is wrong with the item: Either it is poorly written or there is some
misinformation prevalent among the high-ability students. In any case, it is a warning that the
item needs some attention. For most of the item response theory topics of interest, the value of
the discrimination parameter will be positive. Figure 2-5 shows the item characteristic curves for
the correct and incorrect responses to a binary item.

15
FIGURE 2-5. Item characteristic curves for the correct (b = 1.0, a = .9) and incorrect responses (b = 1.0,
a = -.9) to a binary item

It should be noted that the two item characteristic curves have the same value for the difficulty
parameter (b = 1.0) and the discrimination parameters have the same absolute value.
However, they have opposite signs, with the correct response being positive and the incorrect
response being negative.

2.5 Guidelines for Interpreting Item Parameter Values

In Section 1, verbal labels were used to describe the technical properties of an item
characteristic curve. Now the curves can be described via parameters whose numerical values
have intrinsic meaning. However, one needs some means of interpreting the numerical values
of the item parameters and conveying this interpretation to a non-technical audience. The verbal
labels used to describe an item’s discrimination can be related to ranges of values of the
parameter as follows:

16
Verbal Label Range of values

None 0
Very low 0.01 – 0.34
Low 0.35 – 0.64
Moderate 0.65 – 1.34
High 1.35– 1.69
Very high >1.70
perfect + infinity

Table 2-4. Labels for item discrimination parameter values

These relations hold when one interprets the values of the discrimination parameter under a
logistic model for the item characteristic curve. If the one wants to interpret the discrimination
parameter under a normal ogive model, divide these values by 1.7.

Establishing an equivalent table for the values of the item difficulty parameter poses some
problems. The terms easy and hard used in Section 1 are relative terms that depend upon some
frame of reference. As discussed above, the drawback of item difficulty, as defined under
classical test theory, was that it was defined relative to a group of examinees. Thus, the same
item could be easy for one group and hard for another group. Under item response theory, an
item’s difficulty is a point on the ability scale where the probability of correct response is 0.5 for
one – and - two-parameter models and (1 + c)/2 for a three-parameter model.

Because of this, the verbal labels used in Section 1 have meaning only with respect to the
midpoint of the ability scale. The proper way to interpret a numerical value of the item difficulty
parameter is in terms of where the item functions on the ability scale. The discrimination
parameter can be used to add meaning to this interpretation. The slope of the item
characteristic curve is at a maximum at an ability level corresponding to the item difficulty. Thus,
the item is doing its best in distinguishing between examinees in the neighborhood of this ability
level.

Thus, one can speak of the item functioning at this ability level. For example, an item whose
difficulty is -1 functions among the lower ability examinees. A value of +1 denotes an item that
functions among higher ability examinees. Again, the underlying concept is that the item
difficulty is a location parameter.

Under a three-parameter model, the numerical value of the guessing parameter c is interpreted
directly since it is a probability. For example, c = 0.12 simply means that at all ability levels, the
probability of getting the item correct by guessing alone is 0.12.

17
2.6 Features of item characteristic models

1. Under the one-parameter model, the slope is always the same; only the location of the
item changes.

2. Under the two- and three-parameter models, the value of a must become quite large
(>1.7) before the curve is very steep.

3. Under Rasch and two-parameter models, a large positive value of b results in a lower
tail of the curve that approaches zero. But under the three-parameter model, the lower
tail approaches the value of c.

4. Under a three-parameter model, the value of c is not apparent when b < 0 and a < 1.0.
However, if a wider range of values of ability were used, the lower tail would approach
the value of c.

5. Under all models, curves with a negative value of a are the mirror image of curves with
the same values of the remaining parameters and a positive value of a.

6. When b = -3.0, only the upper half of the item characteristic curve appears on the
graph. When b = +3.0, only the lower half of the curve appears on the graph.

7. The slope of the item characteristic curve is the steepest at the ability level
corresponding to the item difficulty. Thus, the difficulty parameter b locates the point on
the ability scale where the item functions best.

8. Under the Rasch and two-parameter models, the item difficulty defines the point on the
ability scale where the probability of correct response for persons of that ability is 0.5.
Under a three-parameter model, the item of difficulty defines the point on the ability scale
where the probability of correct response is halfway between the value of the parameter
c and 1.0. Only when c = 0 are these two definitions equivalent.

18
3. Estimating Item Parameters

Because the actual values of the parameters of the items in a test are unknown, one of the
tasks performed when a test is analyzed under item response theory is to estimate these
parameters. The obtained item parameter estimates then provide information as to the technical
properties of the test items.

To keep matters simple in the following presentation, the parameters of a single item will be
estimated under the assumption that the examinees' ability scores are known. In reality, these
scores are not known, but it is easier to explain how item parameter estimation is accomplished
if this assumption is made.

In the case of a typical test, a sample of M examinees responds to the N items in the test. The
ability scores of these examinees will be distributed over a range of ability levels on the ability
scale. For present purposes, these examinees will be divided into, say, J groups along the scale
so that all the examinees within a given group have the same ability level θ j and there will be
mj examinees within group j, where j = 1, 2, 3. . . . J.

Within a particular ability score group, rj examinees answer the given item correctly. Thus, at an
ability level of θ j, the observed proportion of correct response is p(θ j) = rj/mj , which is an
estimate of the probability of correct response at that ability level. Now the value of rj can be
obtained and p(θ j ) computed for each of the j ability levels established along the ability scale.
If the observed proportions of correct response in each ability group are plotted, the result will
be something like that shown in Figure 3-1.

19

You might also like