20201231172157D4978 - Psikometri 6 - 8

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Course Code : Z1870

Course Name : Psychometrics

Classical Test theory and


Reliability

Session 7 - 9
TODAY’S OUTLINE

• Basic concept of reliability


• Different type of reliability
• Reliability and standard error measurement
Learning Outcome
Explain the Analyse the test
Describe what theory and or scale items
measurement is practice of critically and
and its various psychological make
aspects in testing and the comparison of
psychological important test or scale
research aspects of items
psychometry
References
Raykov, Tenko & marcoulides, Geoge A. 2011.
Introduction psychometric theory, Rouletge . New
York . ISBN 0-203-84162-X

Linda Crocker and James Algina. 2008. Introduction to


Classical and Modern Test Theory. Cengage Learning.
Ohio. ISBN:978-0-495-39591-1
Classical test theory
The Nature of Measurement
Error

Measurement Error

Systematic Random
error error
Foundations of
Classical Test Theory

1. The Classical Test Theory equation →


2. A Definition of True Score →
3. Relationships Between True Score and error Score →
Corr(T, E) = 0
Models Based on Classical
Test Theory

1. Model of Parallel Tests

Based on the CTT Equation Xk = Tk + Ek (k = 1,…,p), the


model of parallel tests asserts that all p tests (measures) X1
through Xp share the same true score; thereby, all
differences among their observed scores come from
differences in their error scores that are also assumed to be
equally varied, i.e., have the same variance. That is, the
model of parallel tests (MPT) assumes that Xk = T + Ek and
Var(Ek)=Var(Ek’)
2. Model of True Score equivalent Tests (Tau-equivalent Tests)

The assumptions made in the MPT—that the p tests measure


the same true score with the same precision (error variances)
—are often quite strong and restrictive in behavioral and
social research.
According to this model, it is required not only that the same
true score be measured in the same units of measurement
with the given set of tests but also that they all are equally
error prone (meaning, with identical error variances). Both of
these assumptions are limiting substantially the range of
applicability of the MPT in empirical work, making it rather
difficult for substantive scholars to construct measures that
fulfill it. It is therefore desirable to have available alternatives
that do not make these strong assumptions.
3. Model of Congeneric Tests

The assumption of the same true score being evaluated with


the same units of measurement by a set of given tests, which
underlies both the MPT and TSEM, is rather strong and
restrictive in the behavioural and social sciences. It requires
not only that the same latent continuum be evaluated by all
tests (measures) but in addition that they all should be based
on the same units of measurement.
In behavioural and social research, however, units of
measurement are quite often lacking particular meaning,
arbitrary or perhaps even irrelevant, in addition to being
specific to different tests and therefore likely to be distinct
across tests. This assumption of equality of measurement units
does indeed strike one as fairly strong, because it requires what
seems to be in general far too much from these arbitrary units
that cannot be typically meaningfully interpreted.
Reliability
Basic concept of reliability

Reliability of a test is a criterion of test quality relating to the


accuracy of psychological measurements. The higher the
reliability of a test, relatively the freer it would be of
measurement errors.
The concept of reliability underlines the computation of
error of measurement of a single score, whereby we can
predict the range of fluctuations likely to occur in a single
individual score as a result of irrelevant chance factors.
Different type of reliability

There are fives methods to measure the reliability of a test.


These are,
a) test–retest method,
b) method of parallel form,
c) split-half reliability,
d) method of rational equivalence and
e) Cronbach’s Alpha.
• Test–retest method
Two trial • Method of parallel form

• Split-half reliability
Single trial • Method of rational equivalence
• Cronbach alpha
Test–Retest Method

The most frequently used method to find the reliability of a test


is by repeating the same test on a second occasion.
The reliability coefficient (r) in this case would be the
correlation between the score obtained by the same person on
two administrations of the test.

1st Test 2nd Test


The formula used to find the test–retest reliability is the
Pearson product–moment formula.

Where,
N = total numbers of observations,
X = Scores in first data set,
Y = Scores in second data set.
Split-half Reliability

The advantage that this method has over the test–retest


method is that only testing is needed. This technique is also
better than the parallel form method to find reliability
because only one test is required.

A reliability coefficient of this type is called a coefficient of


internal consistency.
In this method, the test is scored for the single testing to get
two halves, so that variation brought about by difference
between the two testing situations is eliminated.

The two halves of the test can be made by counting the


number of odd-numbered items answered correctly as the
other half.
1 2 3 4 5 6 1 3 5 ∑ 2 4 6 ∑
A 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 1 0 0 0 1 1 0 0 0 0
C 1 0 1 1 1 0 1 1 1 3 0 1 0 1
D 1 1 1 1 1 1 1 1 1 3 1 1 1 3
E 1 1 1 1 1 1 1 1 1 3 1 1 1 3
F 0 0 1 0 0 0 0 1 0 1 0 0 0 0
G 0 0 1 1 1 0 0 1 1 2 0 1 0 1
H 1 1 1 1 1 0 1 1 1 3 1 1 0 2
I 0 0 0 1 0 0 0 0 0 0 0 1 0 1
J 0 1 0 1 0 1 0 0 0 0 1 1 1 3

these halves will be correlated with the help of the Pearson


product–moment formula,
The reliability depends upon the test length. When we score
as two halves, we, in fact, cut the length of the original test to
half.

Therefore, the reliability which we have calculated is


equivalent of one for a test of half of the size of our original
test. Thus, we make the correlation of test length to get the
reliability of the total original test
For this Spearman–Brown formula is used for doubling the
length of the test. The formula is,
Method of Rational Equivalence

The coefficient of internal consistency could also be obtained


with the help of Kuder–Richardson formula number 20

To compute reliability with the help of Kuder–Richardson


formula number 20, the following procedure is used.
First, write the first column in a worksheet showing the
number of items.
The second, column should give the difficulty value (p) of each
item obtained during item analysis.
The third, column is given as q where q = 1 – p.
The fourth, column is taken as (p) (q). This column is the product
of column two and column three.
The Kuder–Richardson formula number 20 is,

Where N is the number of items on the test, σ2 is the variance


of the test and σ2t is the standard deviation, M is the mean of
the test score
Cronbach Alpha

• The Kuder–Richardson formula is applicable to find the


internal consistency of tests whose items are scored as
right or wrong, or according to some other all or none
system.
• Some tests, however, may have multiple choice items.
• On a personality inventory, however, there are more than
two response categories.
For such tests, a generalised formula has been derived known
as coefficient alpha (Cronbach 1951).

In this formula, the value of Σpq is replaced by Σσt2 which is the


sum of variance of item scores.

The procedure is to find the variance of all individuals scores for


each item and then to add these variances across all items.
• The formula is,

where
n = number of items in test
σt2 = variance of test scores
σi2 = item ith variance
Σ σi2 = sum total of items variance
Reliability and standard error
measurement

THE formula to compute sem is

Where
σE = standard error measurement
σX = standard deviation
ρXX’ = reliability coeffecient

You might also like