Item Difficulty and Item Discrimination

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Presented by Bhagavathi Shree B V

Item Difficulty
and Item
Discrimination
Item Difficulty
Item difficulty is a psychometric property that
measures how easy or difficult an item is for
respondents to answer correctly.
The formula used to measure item difficulty is quite
straightforward.
It involves finding out how many students answered
an item correctly and dividing it by the number of
students who attempted to answer the item.
N correct
P=
N
N correct (number of students with correct
answer for the item)
N (total nuamber of students who answered
the item)
Classical Item Difficulty (P Value):
In classical test theory (CTT), item difficulty is often
represented by the P value. This value indicates the
probability that examinees will answer the item
correctly.
To calculate classical item difficulty for dichotomous
items (items with two response options, like true/false
or yes/no):
Count the number of examinees who responded
correctly (or in the keyed direction).
Divide this count by the total number of
respondents.
The resulting proportion lies between 0 and 1.
Higher values indicate easier items, while lower
values indicate more difficult items.
To calculate polytomous items (items with
more than two response options):
Calculate the mean response value.
For example, if we have a 5-point Likert item,
and some respondents choose 4 and others
choose 5, the average is 4.5.
This is mathematically equivalent to the P value
if the points are 0 and 1 for a no/yes item.
The major reason for measuring item
difficulty is to choose the suitable items
of difficulty level .
Most standardized ability tests are
designed to assess as accurately as
possible each individual's level of
attainment in the particular ability.
The closer the difficulty of an item
approaches 1.00 or 0, the less differential
information about test takers it
contributes. Conversely, the closer the
difficulty level approaches .50, the more
differentiations the item
can make.
Purpose of Assessing Item Difficulty:

Improving Items: Item analysis helps


identify items that need improvement. By
analyzing student responses, educators
can refine ambiguous or misleading
items for future tests.
Test Construction: It enhances
instructors’ skills in test construction by
pinpointing areas that require greater
emphasis or clarity.
Content Emphasis: Item analysis reveals
specific areas of course content that may
need more attention.
Item Discrimination
Item discrimination measures the extent to which an
individual test item differentiates between participants
who possess high or low levels of the construct being
measured.
Essentially, it tells us how well an item can distinguish
between individuals with varying levels of the trait or skill
being assessed
Discrimination Index

The discrimination index (DI) measures how discriminating items in an


exam are – i.e. how well an item can differentiate between high scoring
candidates and low scoring candidates.
For each item it is a measure based on the comparison of performance
between stronger and weaker candidates in the exam as a whole. The
discrimination index value for an item ranges from -1 to +1 with positive
numbers over 0.2 reliably implying that an item is positively
discriminating.
D.I = 2 x (H-L)
N
D.I = DISCRIMINATION INDEX
H = NUMBER OF CORRECT ANSWERS IN HIGH
GROUP
L = CORRECT ANSWERS IN LOW GROUP
N = TOTAL NUMBER OF STUDENTS IN BOTH GROUP
The discrimination index ranges
from -1 to +1:
Values close to +1 indicate
that the item effectively
discriminates between high
and low performers.
Values near zero suggest
poor discrimination.
Values near -1 indicate that
the item tends to be
answered correctly by low-
performing individuals and
incorrectly by high-
performing individuals.
Importance:
High item discrimination is desirable
because it indicates that the item
effectively separates individuals based on
their abilities or attributes.
Items with good discrimination contribute
significantly to the overall test
performance and help differentiate
between high- and low-performing
examinees.
Conversely, items with poor
discrimination may not effectively
discriminate between different ability
levels, undermining the test’s validity and
reliability
Evaluation
Classical Test Theory (CTT):

CTT provides a straightforward approach to


item analysis.
Item discrimination in CTT is assessed using
the point-biserial correlation coefficient.
The point-biserial correlation (r-pbis) is a
(measure of the discrimination or
differentiating strength, of the item. It
ranges from −1.0 to 1.0 and is a correlation
of item scores and total raw scores.)
This coefficient measures the relationship
between an item’s correct responses and
the total test score. Positive correlations
indicate good discrimination.
Item Response Theory (IRT):

IRT evaluates item discrimination


using the slope of the item response
function, represented by the a-
parameter.(something that decides or
limits the way in which something can
be done)
In IRT, values above 0.80 are
generally considered good
discrimination, while values below
0.80 indicate less effective
discrimination.
IRT also considers the evaluation of
different answers (distractors) for
polytomous items (items with more
than two response options)
Implementation
To perform item analysis, specialized
software designed for this purpose is
recommended.
Online assessment platforms often provide
item analysis output, including distractor
statistics.
Standalone software tools like Iteman (for
CTT) and Xcalibre (for IRT) offer more
advanced capabilities for professionals.
Thank
you very
much!

You might also like