Lesson 5 Construction of Written Test
Lesson 5 Construction of Written Test
Lesson 5 Construction of Written Test
There are many ways by which learners can demonstrate their knowledge
and skills and shoe evidence of their proficiencies at the end of the lesson, unit, or
subject. While authentic/performance-based assessments have been advocated as
the better and more appropriate methods in assessing learning outcomes,
particularly as they assess higher-level thinking skills, traditional written
assessment methods, such as multiple-choice tests, are also considered as
appropriate and efficient classroom assessment tools for some types of learning
targets. This is especially true for large classes and when test results are needed
immediately for some educational decisions. Traditional tests are also deemed
reliable and exhibit excellent content and construct validity.
To learn or enhance your skills in developing good and effective test items for
a particular test format, you need to go back and review your prior knowledge on
different test formats; how and when to choose a particular test format that is the
most appropriate measure of the identified learning objectives and desired learning
outcomes of your subject; and how to construct good and effective items for each
format.
What are the general guidelines in choosing the appropriate test format?
Not every test is universally valid for every type of learning outcome. For
example, if an intended outcome for a Research Method 1 course “to design and
produce a research study relevant to one’s field study,” you cannot measure this
outcome through a multiple-choice test or a matching-type test.
To guide you on choosing the appropriate test format and designing fair and
appropriate yet challenging tests, you should ask the following important
questions:
1. What are the objectives or desired learning outcomes of the
subject/unit/lesson being assessed?
Deciding on what test format to use generally depends on your learning
objectives or the desired learning outcomes of the subject/unit/lesson.
Desired learning outcomes (DLOs) are statements of what learners are
expected to do or demonstrate as a result of engaging in the learning
process.
2. What level of thinking is to be assessed (i.e., remembering, understanding,
applying, analyzing, evaluating, and creating)? Does cognitive level of the test
question match your instructional objectives or DLOs?
The level of thinking to be assessed is also an important factor to consider
when designing your test, as this will guide you in choosing the appropriate
test format. For example, if you intend to assess how much your learners are
able to identify important concepts discussed in class (i.e., remembering, or
understanding level), a selected-response format such as multiple-choice
test would be appropriate. However, if you intend to assess how your students
will be able to explain and apply in another setting a concept or framework
learned in class (i.e., applying and/or analyzing level), you may consider
giving constructed-response test such as essays.
3. Is the test matched or aligned with the course’s DLOs and the course contents
of learning activities?
The assessment tasks should be aligned with the instructional activities and
the DLOs. Thus, it is important that you are clear about what DLOs are to
be addressed by your test and what course activities or tasks are to be
implemented to achieve the DLOs.
For example, if you want learners to articulate and justify their stand on
ethical decision-making and social responsibility practices in business (i.e.,
DLO), then an essay test and class debate are appropriate measures and
tasks for this learning outcome. A multiple-choice may be used but only if you
intend to assess learners’ ability to recognize what is ethical versus unethical
decision-making practice. In the same manner, matching-type items may be
appropriate if you want to know whether your students can differentiate and
match the different approaches or terms to their definitions.
4. Are the test items realistic to the students?
Test items should be meaningful and realistic to the learners. They should
be relevant their everyday experiences. The use of concepts, terms, or situations
that have not been discussed in the class or that they have never
encountered, read, or heard about should be minimized or avoided. This is to
prevent learners from making wild guesses, which undermine your
measurement of what they have really learned from the class.
What are the major categories and formats of traditional tests?
For the purpose of classroom assessment, traditional tests fall into two
general categories: (1) selected-response type, in which learners select the correct
response from the given options, and (2) constructed-response type, in which the
learners are asked to formulate their own answers. The cognitive capabilities
required to answer selected-response items are different from those required by
constructed-response items, regardless of content.
Selected-Response Tests require learners to choose the correct answer or best
alternative from several choices. While they can cover a wide range of learning
materials very efficiently and measure a variety of learning outcomes, they are
limited when assessing learning outcomes that involve more complex and higher-
level thinking skills. Selected-response test include:
Writing multiple choice items requires content mastery writing skills, and
time. Only good and effective items should be included in the test. Poorly written
test items could be confusing and frustrating to learners and yield test scores that
are not appropriate to evaluate their learning and achievement. The following are
the general guidelines in writing good multiple-choice items. Theyb are classified in
terms of content, stem, and options.
Content:
1. Write items that reflect only one specific content and cognitive processing
skills.
A. ANCOVA C. Correlation
B. ANOVA D. t-test
A. ANCOVA C. Chi-Square
B. ANOVA D. Mann-Whitney Test
2. Do not lift and use statements from the textbook or other learning materials
as test questions.
4. Edit and proofread the items for grammatical and spelling before
administering them to the learners.
Stem:
Faulty: Read each question and indicate your answer by shading the circle
corresponding to your answer.
Good: This test consists of two parts. Part A is a reading comprehension test,
and Part B is a grammar/language test. Each question is a multiple-
choice test with five (5) options. You are to answer each question but
will not be penalized for a wrong answer or for guessing. You can go
back and review your answers during the time allotted.
2. Write stems that are consistent in form and structure, that is, present all
items either in question form or in descriptive or declarative from.
Faulty: (1) Who was the Philippine President during Martial Law?
Good: (1) Who was the Philippine President during Martial Law?
3. Word the stem positively and avoid double negatives, such as NOT and
EXCEPT in a stem. If a negative word is necessary, underline or capitalize
the words for emphasis.
4. Refrain from making the stem too wordy or containing too much information
unless the problem/question requires the facts presented to solve the
problem.
Faulty: What does DNA stand for, and what is the organic chemical of
complex molecular structure found in all cells and viruses and codes
genetic information for the transmission of inherited traits?
Options:
1. Provide three (3) to five (5) options per item, with only one being the correct
or best answer/alternative.
2. Write options that are parallel or similar in form and length to avoid giving
clues about the correct answer.
Faulty: Which experimental gas law describes how pressure of a gas tends to
increase as the volume of the container decreases? (i.e., “The
absolute pressure exerted by a given mass of an ideal gas is inversely
proportional to the volume it occupies.”)
Good: Which experimental gas law describes how pressure of a gas tends to
increase as the volume of the container decreases? (i.e., “The
absolute pressure exerted by a given mass of an ideal gas is inversely
proportional to the volume it occupies.”)
5. Use Non-of-the-above carefully and only when there is one absolutely correct
answer, such as in spelling or math items.
A. ANCOVA D. t-test
B. ANOVA E. None of the above
C. Correlation
Faulty: Who among the following has become the President of the Philippine
Senate?
Good: Who was the first ever President of the Philippine Senate?
The matching test item format requires learners to match a word, sentence,
or phrase in one column (i.e., premise) to a corresponding word, sentence, or
phrase in a second column (i.e., response). It is most appropriate when you need to
measure the learner’s ability to identify the relationship or association between
similar items. They work best when the subject content has many parallel
information, you can find ways to make it applicable or useful in assessing higher
level of thinking such as applying and analyzing.
The following are the general guidelines in writing good and effective
matching-type tests:
1. Clearly state in the directions the basis for matching the stimuli with the
responses.
Item #1’s instruction is less preferred as it does not detail the basis for
matching the stem and the response option.
2. Ensure that the stimuli are longer, and the responses are shorter.
A B
A B
Item #2 is a better version because the descriptions are presented in the first
column while the response options are in the second column. The stems are
also longer than the options.
3. For each item, include only topics that are related with one another and
share the same foundation of information.
Faulty: Match the following:
A B
____1. Indonesia A. Asia
____2. Malaysia B. Bangkok
____3. Philippines C. Jakarta
____4. Thailand D. Kuala Lumpur
____5. Year ASEAN was established E. Manila
F. 1967
Good: On the line to the left of each country in Column I, write the letter of
the country’s capital presented in Column II.
Column I Column II
A B
_____ Gold A. Au
_____ Hydrogen B. Magnetic metal used in steel
_____ Iron C. Hg
_____ Potassium D. K
_____ Sodium E. With lowest density
F. Na
_____ Gold A. Au
_____ Hydrogen B. Fe
_____ Iron C. H
_____ Potassium D. Hg
_____ Sodium E. K
F. Na
In Item #1, response options are not parallel in content and length. They are
not also arranged alphabetically.
5. Include response options that are reasonable and realistic and similar in
length
and grammatical form.
A B
_____ History A. Studies the production and
distribution of goods/services
_____ Political Science B. Study of politics and power
_____ Psychology C. Study of society
_____ Sociology D. Understands role of mental functions in
social behavior
E. Uses narratives to examine and analyze
past events
A B
____1. Study of living things A. Biology
____2. Study of mind and behavior B. History
____3. Study of politics and power C. Political Science
____4. Study of recorded events in the D. Psychology
past
____5. Study of society E. Sociology
F. Zoology
Item #1 is less preferred because the response options are not consistent in
terms of their length and grammatical form.
A B
____ 1/4 A. 0.25
____ 5/4 B. 0.28
____ 7/25 C. 0.90
____ 9/10 D. 1.25
True or false items are used to measure learners’ ability to identify whether a
statement or proposition is correct/true or incorrect/false. They are best used when
learners’ ability to judge or evaluate is one of the desired learning outcomes of the
course.
There are different variations of the true or false items. These include the following:
2. Yes-No Variation. In this format, the learner has to choose yes or no, rather
than true of false.
e.g., The following are kinds of test. Circle Yes if it is an authentic test
and No if not:
3. A-B Variation. In this format, the learner has to choose A or B, rather than
true or false.
Extended-Response Restricted-Response
Requires much longer and complex Is much more focused and restrained.
responses.
How are the leopard and tiger differ? Tina is preparing for a demonstration to
Support your answer with details and display at her schools’ science fair. She
information from the article. needs to show the effects of salt on the
buoyancy of egg.
The following are the general guidelines in constructing good essay questions:
1. Clearly define the intended learning outcome to be assessed by the essay test.
3. Clearly define and situate the task within a problem situation as well as the type of
thinking required to answer the test.
Essay questions or prompts should provide clear and well-defined tasks to the
learners. It is important to carefully choose the directive verb, to write clearly the
object or focus of the directive verb, and to delimit the scope of the task. Having
clear and well-defined tasks will guide learners on what to focus on when
answering the prompts, thus avoiding responses that contain ideas that are
unrelated or irrelevant, too long, or focusing only on some part of the task.
Emphasizing the type of thinking required to answer the question will also guide
students on the extent to which they should be creative, deep, complex, and
analytical in addressing and responding to the questions.
4. Present tasks that are fair, reasonable, and realistic to the students.
Essay questions should contain tasks or questions that students will be able to do
or address. These include those that are within the level of instruction/training,
expertise, and experience of the students.
5. Be specific in the prompts about the time allotment and criteria for grading the
response.
Essay prompts and directions should indicate the approximate time given to the
students to answer the essay questions to guide them on how much time they
should allocate for each item, especially if several essay questions are presented.
How the responses are to be graded or rated should also be clarified to guide the
students on what to include in their responses.
Example: What is the mean of the following score distribution: 32, 44, 56,
69, 75, 77, 95, 96?
A. 68 D. 74
B. 69 E. 76
C. 72
The correct answer is A (68)
2. All possible answer choices – This type of question has four or five options,
and students are required to choose all of the options that are correct.
Example: Consider the following score distribution: 12, 14, 14, 14, 17, 24,
27, 28, 30. Which of the following is/are the correct measure/s of central
tendency? Indicate all possible answers.
A. Mean = 20 D. Median = 17
B. Mean = 22 E. Mode = 14
C. Median = 16
Options A, D, and E are all correct answers.
3. Type-In answer - This type of question does not provide options to choose
from. Instead, the learners are asked to supply the correct answer. The
teacher should inform the learners at the start how their answers will be
rated. For example, the teacher may require just the correct answer or may
require learners to present the step-by-step procedures in coming up their
answers. On the other hand, for non-mathematical problem solving, such as a
case study, the teacher may present a rubric how their answers will be rated.
Example: Compute the mean of the following score distribution: 32, 44, 56,
69, 75, 77, 95, 96. Indicate your answer in the blank provided.
In this case, the learners will only need to give the correct answer without
having to show the procedures for computation.
Example: Lilian, a 55-year old accountant, has been suffering from frequent
dizziness, nausea, and lightheadedness. During the interview, Lilian was
obviously restless, and sweating. She reported feeling so stressed and fearful
of anything without any apparent reason. She could no sleep and eat well.
She also started to withdraw from family and friends, as she experienced
frequent panic attacks. She also said that she was constantly worrying about
everything in work and at home. What might be Lilian’s problem? What
should she do to alleviate all her symptoms?
Problem-solving test items are good test format as they minimized guessing,
measure instructional objectives that focus on higher cognitive levels, and measure
extensive amounts of contents or topics. However, they require more time for
teachers to construct, read, and correct, and are prone to rater bias, especially
when scoring rubrics/criteria are not available. It is therefore important that good
quality problem-solving test items are constructed.
The following are some of the general guidelines in constructing good
problem-solving test items.
Faulty: Tricia was 135.6 lbs. when she started with her Zumba/aerobics
exercises. After three months of attending the sessions three times a
week, her weight was down to 122.8 lbs. About how many lbs. did she
lose after three months? Write your final answer in the space provided
and show your computations. (This question asks, “about how many”
and does not indicate whether learners need to give the exact weight or
whether they need to round oof their answer and to what extent.)
Good: Tricia was 135.6 lbs. when started with her Zumba/aerobics
exercises. After three months of attending the sessions three times a
week, her weight was down to 122.8 lbs. How many lbs. did she lose after
three months? Write your final answer in the space provided and show your
computations. Write the exact weight; do not round off.
2. Be specific and clear of the type of response required from the students.
Faulty: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity
juice in Philippines, aside from their Singapore market. The sales for
the juice in the Singapore market were S$5million more than those of
their Philippine market in 2016, S$3million more in 2017, and
S$4.5million in 2018. If the sales in Philippine market in 2018 was
PHP35million, what were the sales in Singapore market during that
year? (This is a faulty question because it does not specify in what
currency should the answer be presented.)
Good: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity
juice in Philippines, aside from their Singapore market. The sales for
the juice in the Singapore market were S$5million more than those of
their Philippine market in 2016, S$3million more in 2017, and
S$4.5million in 2018. If the sales in Mexican market in 2018 was
PHP35million, what were the sales in U.S. market during that
year? (This is a better item because it specifies in what currency should
the answer be presented, and the exchange rate was given.)
4. How should the items for the above traditional tests be constructed?