Assess 1 PED 106 Lesson 6

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 75

LESSON 6:

ESTABLISHING
TEST VALIDITY
AND RELIABILITY
PED106 - ASSESSMENT OF LEARNING 1

DAMIANO, JOHN PAULO B.


JAMOLOD, ALBELYN Q.
MAMHOT, BABY JENN V.
MIRANDA, CHONA J.
Desired Significant Learning Outcomes
In this lesson, you are expected to:

• use procedures and statistical analysis


to establish test validity and reliability;
• decide whether a test is valid or reliable,
and
• decide which items are easy and
Significant Culminating Performance Task and Success
Indicators
At the end of the lesson, you should be able
to demonstrate your knowledge and skills in
determinng whether the test and its items are
valid and reliable. You are considered
successful in this culminating performance
task if you have satisfied at least the following
indicators of success:
SPECIFIC
SUCCESS INDICATORS
PERFORMANCE TASKS
Use appropriate procedure in Provided the detailed steps,
determining test validity and decision, and rationale in the use
reliability. of appropriate validity and
reliability measures.
Show the procedure on how to Provided the detailed procedure
establish test validity and from the preparation of the
reliability instrument, procedure in
pretesting, and analysis in
determining the test’s validity
and reliability.
Provide accurate results in the Made the appropriate
analysis of item difficulty and computation, useof software,
reliability. reporting of results, and
Suggested Time
Frame: 6 Hours
ARE YOU
READY?
What is test reliability?
Reliability is the consistency of the responses to measure
under three conditons:
(1) When retested on the same person
In the first condition, consistent response is expected
when the test is given to the same participants.
(2) When retested on the same measure
Reliability is attained if the responses to the same test is
consistent with the same test or its equivalent or another test
that measures but measures the same characteristics
(3) Similarity of responses across items that
measures the same characteristics
There is reliability when the person responded in the
same way or consistently across items that measure the same
What is test reliability?
There are different factors that affect the reliability
of a measure.
1. The number of items in a test
The more items a test has, the likelihood of
reliability is high. The probability of obtaining
consistent scores is high because of the large pool of
items.

2. Individual differences of participants


Every participant possesses characteristics that
affect their performance in a test, such as fatigue,
concentration, innate ability, perseverance, amd
motivation. These individual factors change over time
What is test reliability?
3. External environment
It may include room temperature, noise level,
depth of instruction, exposure to materials, and quality
of instruction, which could affect changes in the
responses of examinees in a test.
What are the
different ways to
establish test
validity?
What are the different ways to establish test validity

There are different ways in determining the


reliability of a test. The specific kind of
reliability will depend on the variable you
are measuring, type of test, and number
of versions of the test.

The different types of reliability are


indicated and how they are done.
Method in Testing How is this reliability What statistics is
Reliability done? used?
You have a test, and you Correlate the test scores
need to administer it at from the first and the next
one time to a group of administration. Significant
examinees. Administer it and positive correlation
again at another time to indicates that the test has
the “ same group “ of temporal stability over
examinees. There is a time time.
interval of not more than 6
months between the first Correlation refers to a
and second administration statistical procedure where
TEST - RETEST of tests that measure linear relationship is
stable characteristics, such expected for two variables.
as standardized aptitude You may use Pearson
tests. The post - test can Product Moment
be given with a minimum Correlation or Pearson r
time interval of 30 minutes. because test data are
The responses in the test usually in an interval scale
should more or less be the (refer to a statistics book
Method in Testing How is this reliability What statistics is
Reliability done? used?
Test re - test is applicable
for tests that measure
stable variables, such as
aptitude and psychomotor
measures (e.g., typing test,
tasks in physical education.

There are two versions of a Correlate the test results for


test. The items need to the first form and the
exactly measure the same second form. Significant and
skill. Each test version is positive correlation
called a “ form. “ coefficient are expected.
Administer one form at one The significant and
PARALLEL FORMS time and the other form to positive correlation
another time to the “ same indicates that the responses
“ group of participants. The in the two forms are the
responses on the two forms same or consistent. Pearson
should be more or less the r is usually used for this
Method in Testing How is this reliability What statistics is
Reliability done? used?
Parallel forms are
applicable if ther are two
versions of the test. This is
usually done when the test
is repeatedly used for
different groups, such as
entrance examinations and
licensure examinations.
Different versions of the test
are given to a different
groups of examinees.
Administer a test to a group Correlate the two sets of
of examinees. The items need scors using Pearson r. After
to be split into halves, usually the correlation, use another
using the odd - even formula called Spear -
technique. In this technique, Brown Coefficient. The
SPLIT - HALF get the sum of the points of correlation coefficient
the odd - numbered items obtained using Pearson r and
and correlate it with the sum Spearman Brown should be
Method in Testing How is this reliability What statistics is
Reliability done? used?
Each examinee will have two consistency reliability.
scores coming from the the
same test. The scores on
each set should be close or
consistent.

Split - half is applicable


when the test has a large
number of items.
This procedure involves A statistical analysis called
determining if the scores for Cronbach’s Alpha or the
each item are consistently Kuder Richardson is used to
TEST OF INTERNAL answered by the examinees. determine the internal
CONSISTENCY USING After administering the test to consistency of the items. A
KUDER - RICHARDSON a group of examinees, it is Cronbach’s alpha value of
AND CRONBACH’S necessary to determine and 0.60 and above indicates that
record the scores for each the test items have internal
ALPHA METHOD
item. The idea her is to see if consistency.
the responses per item are
Method in Testing How is this reliability What statistics is
Reliability done? used?
This technique will work well consistency reliability.
when the assessment tool has
a large number of items. It is
also applicable for scales and
inventories (e.g., Likert scale
from “ stronly agree “ to “
strongly disagree”)

This procedure is used to A statistical analysis called


determine the consistency of Kendall’s tau coefficient of
multiple raters when using concordance is used to
rating scales and rubrics to determine if the ratings
judge performance. The provied by multiple raters
INTER - RATER
reliability her refers to the agree with each other.
RELIABILITY similar or consistent ratings Significant Kendall’s tau
provided by more than one value indicates that the
rater or judge when they use raters concur or agree with
an assessment tool. each other in their ratings.
Method in Testing How is this reliability What statistics is
Reliability done? used?
It is applicable when the
assessment requires the use
of multiple raters.
LINEAR
REGRESSION
LINEAR REGRESSION
• It is demonstrated when you have two
variables that are measured, such as a
two set of scores in a test taken at the
different times by the same participants
• The straight line formed for the two sets
of scores can produce a linear
regression.
• When a straight line is formed, we can
say that there is a correlation between
FIGURE 1
LINEAR REGRESSION
• The graph is called a scatterplot.
• Each point in the scatterplot is a
respondent with the two scores (one for
each test)
COMPUTATION
OF PEARSON r
CORRELATION
• Correlation coefficient - is the index of a
linear regression.

• When the direction of the scatterplot is


directly proportional, the correlation
coefficient will have a positive value.
• If the line is inverse, then the correlation
coefficient will have a negative value.

Pearson r – is the statistical analysis used to


determine the correlation coefficient.
EXAMPLE:
Suppose that a teacher gave the spelling of two –
syllable words with 20 items for Monday and
Tuesday. The teacher wanted to determine the
reliability of two sets of scores by computing for
the Pearson r.
Formula:
MONDAY TEST TUESDAY TEST
X Y XY
10 20 100 400 200
9 15 81 225 135
6 12 36 144 72
10 18 100 324 180
12 19 144 361 228
4 8 16 64 32
5 7 25 49 35
7 10 49 100 70
16 17 256 289 272

⅀X = 87
8 13 64 169 104
= 871 = 2125 Y = 1328
LEGEND:
⅀X = Add all the X scores (Monday scores)
⅀Y = Add all the Y scores (Tuesday scores)
= Square the value of the X scores (Monday scores)
= Square the value of the Y scores (Tuesday
scores)
XY = Multiply the X and Y scores


⅀XY
Substitute the values in the formula.

The value of a correlation coefficient does not


exceed 1.00 or -1.00. A value of 1.00 and -1.00
indicates perfect correlation. In test of reliability
though, we aim for high positive correlation to
mean that there is consistency in the way the
student answered the tests taken.
DIFFERENCE
BETWEEN A
POSITIVE AND A
NEGATIVE
CORRELATION
Positive Correlation – It means that the
higher the scores in X, the higher the
scores in Y
Negative Correlation – It means that
the value of the correlation coefficient is
negative , it means that the higher the
scores in X, the lower the scores in Y, and
vice – versa.
DETERMINING
THE STRENGTH
OF A
CORRELATION
• It indicates the strength of the reliability of the
test.
• This is indicated by the value of the correlation
coefficient.
• The closer the value to 1.00 or -1.00, the
stronger is the correlation.
0.80 Very strong relationship
0.6 – 0.79 Strong relationship
0.40 – 0.59 Substantial/marked
relationship
0.2 – 0.39 Weak relationship
0.00 – 0.19 Negligible relationship
DETERMINING THE
SIGNIFICANCE OF
THE CORRELATION
• The correlation obtained between two variables
could be due to chance.
• In order to determine if the correlation if free of
certain errors, it is tested for significance. When
a correlation is significant, it means that the
probability of the two variables being
related is free of certain errors.
• In order to determine the correlation coefficient
value is significant, it is compared with an
expected probability of correlation coefficient
values called a critical value.
• Computed value > Critical value, then the information
obtained has more than 95% chance of being
correlated and is significant.
• Another statistical analysis mentioned to determine the
internal consistency of test is the Cronbach’s alpha.

EXAMPLE:
Suppose that five students answered a checklist about
their hygiene with a scale of 1 to 5, where in the
following are the corresponding scores.
5 – always, 4 – often , 3 – sometimes, 2 – rarely, 1 -
never
Student Item Item Item Item Item Total for each case Score –
1 2 3 4 5 (X) Mean

A 5 5 4 4 1 19 2.8 7.84

B 3 4 3 3 2 15 -1.2 1.44

C 2 5 3 3 3 16 -0.2 0.04

D 1 4 2 3 3 13 -3.2 10.24

E 3 3 4 4 4 18 1.8 3.24
⅀ = 22.8
⎺Xcase =
16.2
• The internal consistency of the responses in the
attitude toward teaching is 0.10, which is a low
internal consistency.
• The consistency of ratings can also be obtained
using a coefficient of concordance. The Kendall’s
w coefficient of concordance is used to test the
agreement among raters.

At the next slide is the performance task


demonstrated by the five students by three raters.
The rubric used a scale of 1 to 4 where 4 is the
The scores given by the three raters are first
computed by summing up the total ratings for
each demonstration. The mean is obtained for
the sum ratings ( x ratings = 8.4). The mean is
subtracted from each of the sum of ratings
(D). Each difference is squared (D²), then the
sum squared is computed ( D²=33.2 ). The
mean and summation of squared difference is
substituted in the Kendall’s w formula. In the
formula, m is the numbers of raters.
WHAT IS TEST
VALIDITY?
• A measure is valid when it measures
what it is supposed to measure.
• If a quarterly exam is valid, then the
content should directly measure the
objectives of the curriculum.
• If an entrance exam is valid, then it
should predict students’ grades after
the first semester.
WHAT ARE THE
DIFFERENT WAYS
TO ESTABLISH
TEST VALIDITY?
• There are different ways to establish
test validity.
1. Content Validity
2. Face Validity
3. Predictive Validity
4. Construct Validity
5. Concurrent Validity
6. Convergent Validity
7. Divergent Validity
Type of validity Definition Procedure
When the items The items are
CONTENT represent the compared with
VALIDITY domain being the objectives of
measured. the program.

When the test is The test items and


presented well, layout are
free of errors, reviewed and tried
FACE VALIDITY out on a small
and
group of
administered respondents.
Measure should A correlation
predict a future coefficient is
criterion. obtained where the x
PREDICTIVE
variable is used as
VALIDITY the predictor and the
y variable as the
criterion.
The components or The Pearson r can be
factors of a test used to correlate the
should contain items items for each factor.
that are strongly However, there is a
CONSTRUCT correlated. technique collect
VALIDITY factor analysis to
determine which
When two or more The scores on the
measures are measures should
CONCURRENT present for each be correlated.
VALIDITY examinee that
measure the same
characteristic.

When the Correlation is done


components or for the factors of
CONVERGENT factors of a test the test.
VALIDITY are hypothesized
to have a positive
correlation.
When the Correlation is done
components or for the factors of
DIVERGENT factors of a test the test.
VALIDITY are hypothesized
to have a negative
correlation.
HOW TO
DETERMINE IF
AN ITEM IS EASY
OR DIFFICULT?
Below is a dataset of five items on the addition and
subtraction of integers.
1. Get the total score of each student and arrange
scores from highest to lowest.
2. Obtain the upper and lower 27% of the group
Multiply 0.27 by the total number of students, and
you will get a value of 2.7. The rounded whole
number value is 3.0. Get the top three students the
bottom 3 students based their total scores. The top
three students are students 2,5, and 9. The bottom
three students are students 7,8, and 4. The rest of
the students are not included in the item analysis.
3. Obtain the proportion correct for each item. This
is computed for the upper 27% group and the
lower 27% group. This is done by summating the
correct answer per item and dividing it by total
number of students.
4. The item difficulty is obtained using the
formula
5. The index of discrimination is obtained using
formula.
item discrimination = pH-pL
The value is interpreted using the table.
QUESTIONS?/
CLARIFICATIONS?/
VIOLENT
REACTIONS?
LESSON 6:
ESTABLISHING
TEST VALIDITY AND
RELIABILITY
PED106 - ASSESSMENT OF LEARNING 1
- ASSESSMENT -
DAMIANO, JOHN PAULO B.
JAMOLOD, ALBELYN Q.
MAMHOT, BABY JENN V.
MIRANDA, CHONA J.
ARE YOU READY?
DIRECTIONS: Read the
following statements.
Identify what is being
asked.
1. Part of the assessment of validity.
It is the consistency or the
dependent results of the scores
given.
2. When retested on the same
person, it is much expected when
the test is given to the same
participants to gain reliability.
3 – 5. Give the three factors that
affects the reliability of a test
measure.
6. In test – retest, it indicates
that the test has temporal
stability over time.
7. It refers to a statistical
procedure where linear
relationship is expected for two
variables.
8. It is one of the statistics analysis
to be used to determine the
internal consistency of the items.
9. It is a method of testing reliability
that involves to determine if the
scores for each item are consistently
answered by the examinees.
10. It is used to determine the
consistency of multiple raters when
using rating scales and rubrics to
judge performance.
11. It is one of the statistical value
given in the inter – rater reliability
indicates that the raters concur or
agree with each other in their
ratings.
12. If the two sets of scores formed
a straight line, then it is what we
called __________.
13. If the correlation coefficient
value is negative, then the
variation of the line is _________.
14. If the correlation coefficient
value is positive, then the
direction of the scatterplot is
___________.
15. It is the index of the linear
regression.
16 – 17. Give at least 2
methods of testing reliability.
18. Assuming that the scores are
given and the r value is computed.
If the value of r is 1.00, then what
the correlation indicates?
19. The correlation obtained
between two variables could be
due to ________.
20. If a computed value is
greater than critical value, in
what percentage (as a chance)
should be obtained so that it will
be correlated and significant.
21-23. Give at least three ways
to establish test validity.
24. What is the formula for
computing the item difficulty?
25. What is the formulafor
computing the item
discrimination?
26. If the two sets of scores are
plotted in the scatterplot and
formed a straight line, we can
say that two score are said to be
_______.
27. Give the formula of Pearson
Correlation Coefficient.
28. “Measure should predict a
future criterion.” What type of
validity is being asked?
29. It is the extent by which it
measures what it is designed to
measure.
30. Give the complete name of
our report for this semester.

You might also like