EBS 234 Assessment in Basic Schools

Download as pdf or txt
Download as pdf or txt
You are on page 1of 92

COURSE OUTLINE

Course Title (Name): Assessment in Education


Course description: This course deals with both theoretical and practical issues in students’ assessment. The
course presents the nature of assessment and its related concepts as well as the purposes and the principles of
assessment. The issues of validity and reliability which form the theoretical basis for assessment are discussed
and guidelines for the construction of various assessment formats (objective and constructed response) are also
discussed. The discussions will be more of interactions and practical work.

Course Objectives: By the end of the course students will:


1. acquire the knowledge and skills necessary to plan, construct, and administer a variety of
educational assessment instruments;
2. be able to interpret the results of educational assessments for decision-making.
3. be able to explain validity and reliability in assessment.

Unit 1: Nature of Assessment


 Definition of Terms
 Principles and purpose of assessment
 Continuous assessment
 Continuous assessment in Ghana
 Strengths and weaknesses of continuous assessment

Unit 2: Goals and Learning Objectives of Instruction


 Taxonomies of Educational Objectives
 Cognitive Domain (Bloom’s taxonomy and Quellmalz’s taxonomy)
 Affective Domain
 Psychomotor domain

Unit 3: Characteristics of Tests I: Test Validity


 Nature of Validity
 Categories of Validity Evidence
 Factors affecting validity of test scores

Unit 4: Characteristics of Tests II: Test Reliability


 Nature of Reliability
 True Score Theory (Obtained scores, True Scores and error scores)
 Standard Error of Measurement
 Methods of estimating Reliability
 Factors affecting reliability

Unit 5: Planning Achievement Tests and Assessment


 Characteristics of Standardized Achievement Tests
 Characteristics of Classroom Achievement Tests
 Stages in Classroom Achievement Testing
Unit 6: Objective Test Items I
 Strengths and Limitations
 Multiple Choice

Unit 7: Objective Test Items II


 True/False
 Completion Type
 Matching Type
 Constructed Response Type
 Calculations Type

Unit 8: Essay Test Items


 Strengths and Limitations
 Crafting
 Scoring essays
Unit 9: Assembling, Administering and Appraising Achievement Tests
 Assembling classroom tests
 Administering classroom tests
 Appraising classroom tests

Unit 10: Interpreting Test Scores


 Norm-referenced (Percentile ranks, standard scores, Stanines)
 Criterion-referenced
UNIT 1

THE NATURE OF ASSESSMENT

Definition of Concepts and Terms

Definition of Terms:

The general public uses the terms assessment, test, measurement and evaluation interchangeably,

but is important for the student in assessment to distinguish among them.

Assessment: The process of obtaining information that is used for making decisions about

students, curricula and programs, and educational policy. It includes the full range of

procedures used to gain information about student learning. These procedures may be formal

(pencil and paper tests) or informal (observations). Certain concepts and terms are associated

with assessment. These are tests, measurement and evaluation.

Assessment can be said to be purposeful, systematic and on-going collection of information’s as

evidence for use in making judgments of students’ learning, curriculum.

Test: A task or series of tasks, which are used to measure specific traits or attributes in people.

In educational settings, tests include paper and pencil instruments, which contain questions

that students and pupils respond to. The responses provided to the questions help the test

giver to obtain an estimate of the specific trait being measured. It answers the question,

‘How well does the individual perform?’


Measurement

The process of assigning (giving) numbers to the attributes or traits possessed by persons,

events or a set of objects according to specific rules. Educational measurement is the

assignment of numerals to such traits as achievement, aptitude, and performance in such a

way that the numbers describe a degree to which the person possesses the attribute or trait. It

is limited to the quantitative descriptions of students. It answers the question, ‘How much?’

Is the systematic investigation of the worth or merit of an object (a person, programme or a

book). For example a teacher may judge a student’s writing as exceptional good for his grade

placement.

Evaluation

Evaluation involves gathering information, which can be qualitative or quantitative on a

person, programme or process and trying to make a value judgement about the effectiveness

or worth of what is being assessed.

Two forms of Evaluation


Formative evaluation is the process of judging the worth of teaching and learning

constantly during the period of instruction. Formative evaluation of student’s

achievement means we are judging the quality of student’s achievement while the student

is still in the process of learning. Examples include teacher observations, classroom

questions, home assignments and short tests or quizzes. The main purpose is to provide

feedback to both the teacher and the learner about progress being made and not to grade

the student.

Summative evaluation is the process of judging the worth of teaching and learning at the

end of the period of instruction. It is judgmental in nature. It attempts to determine to

what extent the broad objectives of teaching and learning have been attained. In other
words, it is judgement about the quality of students’ achievement after instructional or

learning process is completed. For example, end of term or end of programme

examinations-BECE, WASSCE and UCC end-of-semester examinations.


GENERAL PRINCIPLES OF ASSESSMENT

Principles are fundamental truths and doctrines accepted by most authorities as

characteristics of assessment. Some of them are:

1. Test developer must be clear about the learning target to be assessed.

2. The assessment technique selected must match the learning target.

3. Proper use of assessment procedures requires that the user is aware of limitations of each

technique.

4. Assessment techniques must serve the needs of the learners.

5. Good assessments use multiple methods. Assessment needs to be comprehensive

6. Assessment should be valid and reliable.

7. Good assessment is fair and ethical.

8. Good assessment appropriately incorporates technology.

9. Assessment should be diagnostic, formative and summative

PURPOSES OF ASSESSMENT

Assessment provides information for decisions about students, curricula and programs, and

educational policy. These decisions are:

1. Instructional Management decisions

2. Selection decisions

3. Placement decisions

4. Counselling and Guidance decisions

5. Credentialing and Certification decisions


STEPS IN MEASUREMENT

Measurement involves three main steps. These are:

1. Identifying and providing a clear definition of the attribute/trait to be measured.

2. Determining the set of procedures/operations by which the attribute is to be

manifested.

3. Establishing a set of procedures/rules for quantifying the attribute/trait being

measured.

SCALES OF MEASUREMENT
The type of data obtained or collected determines the appropriate measurement scale

used. There are 4 types of measurement scales: Nominal, Ordinal, Interval, and

Ratio

1. Nominal Scales (categorical scores)

Nominal scale is summarised below

 A nominal scale classifies persons or objects into two or more categories

(groups). Whatever the classification, a person can only be in one category,

and members of a given category have a common set of characteristics.

 Places data (people or things) into a category (group)

 Categorical data are numbers that are simply used as identifiers or names

 Nominal scores cannot be ranked or ordered along any dimension.

 For identification purposes, categories are numbered (coded). For example:

Gender: Male 1 and Female 2.


College of Education: TACE 1, BACE 2, ST. VICENT 3, FOCE 4, TUCE 5,

WELSCE 6 and others

Halls of Residence: Atlantic 1, Oguaa 2, Adehye 3, Casford 4, VALCO 5, and

Kwame Nkrumah 6.

2. Ordinal Scales/Ordinal score

Ordinal scale is summarised below

 An ordinal scale does not only group subjects (data) but also ranks them in

terms of the degree to which they possess a characteristic/attribute of interest.

 Scores or data are ranked (ordered) along some dimension.

 No common unit of measurement exists between rankings.

 Comparisons cannot be made across different group rankings.

 An ordinal scale puts subjects in order from highest to lowest, from most to

least. For example the height of 5 students can be ranked from 1st to 5th.

NOTE: Though ordinal scales indicate that some subjects are higher or better than

others, they do not indicate how much higher or better. Thus, intervals between the

ranks are not equal.

3. Interval Scales/ Interval scores:

Interval scale is summarised below;


 Interval scale possesses all the features of Nominal and Ordinal scales and in

addition has equal intervals between adjacent scores or points (Unlike Ordinal

scale).

 Again, it has no true zero point. Zero point or score here, does not mean

absence of characteristics/traits (arbitrary Zero) but simply represents an

additional point of measurement is the interval scale.

 Values can be added and subtracted to and from each other.

 Values cannot be multiplied or divided.

 Examples include Celsius temperature, academic achievement. Measurement

of sea level, among others

Ratio Scales/Ratio scores:

Ratio scale is summarised below;

 The ratio scale of measurement is similar to the interval scale.

 However, unlike the interval scale, this scale has an absolute/true/meaningful

zero (zero means complete absence of characteristics being measured).

 Height, Weight and time are examples.

 Values can be added, subtracted, multiplied and divided. For example, 60

minutes can be said to be 3 times as long as 20 minutes.


PURPOSE OF ASSESSMENT

In all, purpose of assessment can be grouped under three main

categories;

1. Assessment of learning (AoL),

2. Assessment as learning (AaL)

3. Assessment for learning (AfL).

1. Assessment Of Learning (AoL)

Assessment of Learning is summarised below;

 Assessment of learning is the assessment that becomes public and

results in statements or symbols about how well students are

learning.

 It is accompanied by a number or a letter grade(summative)

 It compares one student`s achievement with standards.

 The results can be communicated to the students and teachers.

 It occurs at the end of the learning unit (summative).

Assessment of learning (AoL) which is carried out purposely for

grading and reporting

TEACHERS` ROLES IN ASSESSMENT OF

LEARNING

 A range of alternative mechanism for assessing the same outcomes

should be provided.
 Public and defensible reference point for making judgement should be

available

 Transparent approaches to interpretation must be made.

 Description of the assessment process should be clear.

 Strategies for recourse in the event of disagreement about the decision

should be considered.

2. Assessment as Learning (Aal)

 Here assessment is seen as a learning process.

 Students are able to learn about themselves as learners and become

aware of how they learn

 Students reflect on their work on a regular basis, usually through self

and peer assessment and decide (often with the help of the teacher,

particularly in the early stages) what their next learning will be.

 Assessment as learning helps students to take more responsibility for

their own learning and monitoring future directions.

Teacher’s Roles in Assessment as Learning

 Guide student in developing internal feedback or self-monitoring

mechanism to validate and question their own thinking and to become

comfortable with ambiguity and uncertainty that is inevitable in

learning anything new.

 Provide regular and challenging opportunities to practice, so that

student can become confident, competent self-assessors.


 Monitor students’ metacognitive processes as well as their learning,

and provide descriptive feedback.

 Create an environment where it is safe for students to take chances

and where support is readily available.

Assessment for learning (Af)

 Comprises two phases— diagnostic assessment and formative

assessment

 Assessment can be based on a variety of information sources (e.g.,

portfolios, works in progress, teacher observation, conversation)

 Verbal or written feedback to the student is primarily descriptive and

emphasizes strengths, identifies challenges, and points to next steps

 As teachers check on understanding they adjust their instruction to

keep students on track

 No grades or scores are given - record-keeping is primarily anecdotal

and descriptive

 Occurs throughout the learning process (formative), from the outset of

the course of study to the time of summative assessment


 The UK Assessment Reform Group (1999) identified the following

seven key characteristics of assessment for learning.

1. It is embedded in a view of teaching and learning of which it is an

essential part.

2. It involves sharing learning goals with learners.

3. It aims to help pupils to know and to recognise the standards for which

they are aiming.

4. It involves pupils in self-assessment [and peer assessment].

5. It provides feedback that leads to pupils recognising their next steps

and how to take

them.

6. It is underpinned by the confidence that every student can improve.

7. It involves both teacher and pupils reviewing and reflecting on

assessment data.

 The Assessment Reform Group in the UK in 2002 derived 10

principles for guidance in Assessment for learning. These

principles are explained below.


1. Assessment for learning should be part of efective planning of

teaching and learning

2. Assessment for learning should focus on how students learn

3. Assessment for learning should be recognised as central to classroom

practice

4. Assessment for learning should be regarded as a key professional skill

for teachers.

5. Assessment for learning should be sensitive and constructive because

any assessment has an emotional impact

6. Assessment for learning should take account of the importance of

learner motivation.

7. Assessment for learning should promote commitment to learning goals

and a shared understanding of the criteria by which they are assessed

8. Learners should receive constructive guidance about how to improve

9. Assessment for learning develops learners' capacity for self-

assessment so that they can become reflective and self-managing

10. Assessment for learning should recognise the full range of

achievements of all learners


Continuous assessment
Definition
Ogunniyi (1984, p. 113) defined continuous assessment as ‘a formative evaluation process
concerned with finding out, in a systematic manner, the overall gains that a student has made in
terms of knowledge, attitudes and skills after a given set of learning experiences’. The definition
implies that a student’s final grade after a programme of instruction is an aggregation of all the
performances exhibited in the cognitive, affective and psychomotor domains during the duration
of the course.

Characteristics
 Continuous assessment is cumulative
The final grade awarded a student at the end of the term or year is an accumulation of all
the attainments throughout the term or year.
 Continuous assessment is comprehensive
Opportunities are provided for the assessment of the total personality of the student.
 Continuous assessment is diagnostic
Continuous assessment involves a constant and continual monitoring of a student’s
performance and achievement. This process enables each student’s strengths and
weaknesses to be identified.
 Continuous assessment is formative
Continuous assessment allows for immediate and constant feedback to be provided to the
student on his performance.
 Continuous assessment is guidance-oriented
 Continuous assessment is systematic

STRENGTHS OF COUNTINOUS ASSESSMENT


1. Continuous assessment provides an excellent picture of a student’s performance over a
period of time.

2. It enables the classroom teacher as well as the school administration to be actively and more
meaningfully involved in the assessment of the students throughout the period of teaching
and learning.
3. It enables the measurements of the three important domains in the taxonomy of education
objectives thus, cognitive, affective and psychomotor domains.

4. It helps to minimise the students’ fears and anxieties about failure in the examinations.

5. Continuous assessment encourages students to work assiduously throughout the period of


teaching and learning. The student becomes more alert in the class. He is punctual and
attends classes regularly. This attitude comes about as a result of the fact that every stage of
the instructional process is assessed and these counts towards the ultimate grade or score he
would obtain. He knows that complacency, absenteeism, laziness and malingering would
prove disastrous to his goals in academic achievement, and he therefore works hard.

6. Constant feedback is given and this provides the groundwork for teachers to engage in
diagnostic teaching.

7. Record keeping is an important aspect of the teaching and learning process.

8. Parents are provided with better and clearer pictures of their wards’ performance and
achievement in school over a period of time and learning experience.

WEAKNESSES OF CONTINOUS ASSESSMENT


1. Continuous assessment brings about an increase in the workload of teachers.

2. To implement a continuous assessment programme, it is assumed that teachers have the


requisite skill in test construction. However, in Ghana, most Ghanaian teachers lack the
skills required for constructing tests, because most initial teacher training programmes do not
make provision for a course in testing. In cases where teachers underwent a course of
instruction in testing and assessment, few teachers use their knowledge in test construction.

3. In Ghana, one problem is the inadequacy of materials and equipment.

4. Continuous assessment, especially in the first and second cycle levels, means less
dependence on an external examining body. This implies that the uniformity that goes with
external written examinations in the form of standard test items and scoring, are reduced to
some extent. The fate of the individual student lies more in the hands of the classroom
teacher. This situation generates fears, doubts and apprehensions in the minds of the public
about the degree of fairness in assessing the achievement of students.
5. In the first and second cycle institutions, certificates obtained are based on performances and
achievements in external examinations in Ghana. This situation enables the certificates to
have credibility, since efforts are made to maintain standards across years and test items.
However, with the continuous assessment, if schools award certificates based on the
attainments of their own students, standards will vary from school to school as well as
certificates. The credibility of certificates becomes doubtful in most cases.

6. Another problem is that of supervision. Continuous assessment requires co-operation and co-
ordination at different levels. Close supervision is needed at all levels. Unfortunately,
supervisors in most cases who are heads of institutions are already laden with loads of work.
They are therefore not effective in their supervisory roles.

7. There is also an additional problem of record maintenance.

Role of the Ghanaian teacher

The Ministry of Education, as a matter of policy expects each teacher to:


(1) Give class assignments/exercises fortnightly and record the scores of four of them with a
maximum score of 10 each;
(2) Conduct three class tests in a term with a subtotal of 40;
(3) Give pupils at least four projects/homework in a term with a subtotal of 20.

The three assessments give a total score of 100, which is scaled down to 30% as the internal
mark for each pupil. The end of term examination is given 70%.

At the end of the junior and senior secondary schools, all the scores a pupil obtains are scaled to
30% and forwarded to the WAEC where 70% is obtained for external assessment.

For the policy to be successful, teachers are expected to perform the following roles.
1. The teacher must accept the philosophy of continuous assessment.
2. The teacher needs to be knowledgeable about continuous assessment.
3. At the beginning of each academic year and term (or semester), the teacher must make a
timetable for the assessments to be made.
4. The teacher must break the learning programme of the period of instruction into smaller,
specific and well-defined units.
5. The teacher must assess the learning outcomes and performances at the end of each unit of
instruction.
6. The teacher must spread the assessment over all areas of student’s behaviour. These are the
cognitive, affective and psychomotor domains.
7. The teacher must formulate measurable, specific and attainable instructional objectives for
each unit for instruction. This helps him to make his teaching more effective and meaningful.
8. The teacher must provide constant feedback. Class assignments and exercises, projects, tests
and home wok must be promptly scored and returned to the students.
9. The teacher must record all the assessment of the student in all the areas of learning and
instruction in the appropriate records. This must be done promptly at the end of each
measurement. The records must be well kept and maintained.
10. The teacher must be involved in remedial and individualised teaching.
11. The teacher must also engage in guidance and counselling. He must identify the weaknesses
and strengths of students in the various areas of learning. He should then use the information
to guide and counsel the student for his full personal development and growth as well as
preparing the student for his future career.
12. The teacher must engage in constant evaluation of himself and of the continuous assessment
programme. The scores obtained from the various assessments should be used to measure his
own performance and the effectiveness of his methods and techniques. He must also evaluate
the success of the programme regularly to identify the lapses and improve upon them.
SCHOOL-BASED ASSESSMENT (SBA)

The SBA came to replace continuous Assessment as an Assessment practice in Ghanaian

schools.

School Based Assessment

 A new School Based Assessment system (SBA) formally referred to as Continuous

Assessment is in use in Ghana as part of the new Educational Reforms starting September

2008.

 SBA is a very effective system for teaching and learning if carried out properly.

 The new SBA system is designed to provide schools with an internal assessment system

that will help schools to achieve the following purposes:

 Standardize the practice of internal school-based assessment in all schools in the country.

 Provide reduced assessment tasks for each of the primary school subjects.

 Provide teachers with guidelines for constructing assessment items/questions and other

assessment tasks.

 Introduce standards of achievement in each subject and in each class of the school

system.

 Provide guidance in marking and grading of test items/questions and other assessment

tasks.

 Introduce a system of moderation that will ensure accuracy and reliability of teachers’

marks.
 Provide teachers with advice on how to conduct remedial instruction on difficult areas of

the syllabus to improve pupil performance.

The marks for the SBA should together constitute the School Based Assessment component marked

out of 60 per cent. The emphasis is to improve students’ learning by encouraging them to perform at

a higher level. The SBA will hence consist of:

 End-of-month tests

 Homework assignments (specially designed for SBA)

 Project

 The SBA system will consist of 12 assessments a year instead of the 33 assessments in

the previous continuous assessment system. This will mean a reduction by 64% of the

work load compared to the previous continuous assessment system.

 The 12 assessments are labelled as Task 1, Task 2, Task 3 and Task 4.

 Task 1-4 will be administered in Term 1

 Tasks 5-8 will be administered in Term 2

 Tasks 9-12 administered in Term 3.

 Task 1 will be administered as an individual test coming at the end of the first month of

the term. The equivalent of Task 1 will be Task 5 (first individual test in second term)

and Task 9 (first individual test in third term).


 Task 2 (also task 6 and task 10 for second and third terms respectively) will be

administered as a Group Exercise and will consist of two or three instructional objectives

that the teacher considers difficult to teach and learn. The selected objectives could also

be those objectives considered very important and which therefore need pupils to put in

more practice. Task 2 will be administered at the end of the second month in the term.

 Task 3 (also task 7 and 11 for second and third terms respectively) will also be

administered as individual test under the supervision of the class teacher at the end of the

11th or 12 week of the term.

 Task 4 (and also Task 8 and Task 12) will be a project to be undertaken throughout the

term and submitted at the end of the term. Schools will be supplied with 9 project topics

divided into three topics for each term. A pupil is expected to select one project topic for

each term. Projects for the second term will be undertaken by teams of pupils as Group

Projects. Projects are intended to encourage pupils to apply knowledge and skills

acquired in the term to write an analytic or investigative paper, write a poem (as may be

required in English and Ghanaian Languages), use science and mathematics to solve a

problem or produce a physical three-dimensional product as may be required in Creative

Arts and in Natural Science.

Apart from the SBA, teachers are expected to use class exercises and home work as processes for

continually evaluating pupils’ class performance, and as a means for encouraging improvements

in learning performance.
End-of-Term Examination

The end-of-term examination is a summative assessment system and should consist of a sample

of the knowledge and skills pupils have acquired in the term. The end-of-term test for Term 3

should be composed of items/questions based on the specific objectives studied over the three

terms, using a different weighting system such as to reflect the importance of the work done in

each term in appropriate proportions. For example, a teacher may build an end of term 3 test in

such a way that it would consist of the 20% of the objectives studied in Term 1, 20% of the

objectives studied in Term 2, and 60% of the objectives studied in Term 3.

Combining SBA marks and End-of-Term Examination Marks

The new SBA system is important for raising pupils’ school performance. For this reason, the 60

marks for the SBA will be scaled to 50 in schools. The total marks for the end of term test will

also be scaled to 50 before adding the SBA marks and end-of-term examination marks to

determine pupils’ end of term results. The SBA and the end-of-term test marks will hence be

combined in equal proportions of 50:50. The equal proportions will affect only assessment in the

school system. It will not affect the SBA mark proportion of 30% used by WAEC for

determining examination results at the BECE.


Grading Procedure

To improve assessment and grading and also introduce uniformity in schools, it is recommended

that schools adopt the following grade boundaries for assigning grades:

Grade A: 80 - 100% - Excellent

Grade B: 70 - 79% - Very Good

Grade C: 60 - 69% - Good

Grade D: 45 - 59% - Credit (Satisfactory)

Grade E: 35 - 44% - Pass

Grade F: ≤ 34% - Fail

The grading system presented above shows the letter grade system and equivalent grade

boundaries. In assigning grades to pupils’ test results, or any form of evaluation, you may apply

the above grade boundaries and the descriptors. The descriptors (Excellent, Very Good etc)

indicate the meaning of each grade. For instance, the grade boundary for “Excellent” consists of

scores between 80 - 100. Writing “80%” for instance, without writing the meaning of the grade,

or the descriptor for the grade i.e. “Excellent”, does not provide the pupil with enough

information to evaluate his/her performance in the assessment. You therefore have to write the

meaning of the grade alongside the score you write. Apart from the score and the grade

descriptor, it will be important also to write a short diagnosis of the points the pupil should

consider in order to do better in future tests etc. Comments such as the following may also be

added to the grades:

Keep it up
Has improved

Could do better

Hardworking

Not serious in class

More room for improvement, etc.

Note that the grade boundaries above are also referred to as grade cut-off scores. When you

adopt a fixed cut-off score grading system as in this example, you are using the criterion-

referenced grading system. By this system a pupil must make a specified score to earn the

appropriate grade. This system of grading challenges pupils to study harder to earn better grades.

It is hence very useful for achievement testing and grading.


UNIT 2

GOALS AND LEARNING OBJECTIVES OF INSTRUCTION

Educational Goals

 An educational goal is a very general statement of what students will know and be

able to do.

 They are those human activities which can be acquired through learning and

contribute to the functioning of a society.

 In the school system, educational goals are listed as defining the mission of the

system.

 Some examples of educational goals require that students will be able to:

i. Know how to think critically and solve problems

ii. Work collaboratively with others

iii. Appreciate culture differences

iv. Develop an appreciation for fine arts

v. Learn to think independently

vi. Become good citizens

Educational Outcome

 Once we have educational goals, there are educational outcomes.

 In very simple terms, educational outcomes are the products or end results of learning

experiences.

 Examples of Educational outcomes include:


i. Knowledge

ii. Understanding

iii. Application of knowledge and understanding to situations

iv. Thinking skills

v. General skills

vi. Attitudes

vii. Interests

viii. Appreciation and

ix. Adjustment

These outcomes, similar to educatooal goals are broad. Each outcome cao thus take several

forms.

Learning Outcome

 A learning outcome is the particular knowledge, skill, or behaviour that a student is

expected to exhibit after a period of study.

 Learning outcomes reflect a nation’s concern with the level of knowledge acquisition

among its student population.

 Measuring learning outcomes provides information on what particular knowledge

(cognitive), skill or behaviour (affective) students have gained after instruction is

completed.

 They are typically measured by administering assessments at sub-national, national,

regional and international levels.


DEFINITION OF TERMS

Instructional Objective: An intended learning outcome in terms of the types of performance

students are able to demonstrate at the end of instruction to show that they have learned what

was expected of them. Example, “by the end of the lesson, students should be able to define

the term, taxonomy”.

Behavioural objectives: A statement that specifies what observable performance the learner

should be engaged in when the achievement of the objective is evaluated. Behavioural

objectives require action verbs such as discuss, write, read, state.

Learning objectives: These specify what the teacher likes the students to do, value, or feel at the

completion of an instructional segment.

Importance Of Learning Objectives (Targets) For Classroom Assessment

1. Learning objectives make the general planning for an assessment procedure easier through

the knowledge of specific outcomes.

2. The selection, designing and construction of assessment instruments depend on knowing

which specific outcome should be assessed.

3. Evaluating an existing assessment instrument becomes easier when specific outcomes are

known.

4. They help to judge the content relevance of an assessment procedure. Specific learning

outcomes provide information for the judgment.


TAXONOMIES OF EDUCATIONAL OBJECTIVES

 Taxonomies are hierarchical schemes for classifying learning objectives into

various levels of complexity.

 There are three main domains of educational objectives.

Namely;

 Cognitive

 Affective

 Psychomotor

Cognitive domain objectives: produce outcomes that focus on knowledge and abilities

requiring memory, thinking, and reasoning processes.

Affective domain objectives: produce outcomes that focus on feelings, interests, attitudes,

dispositions and emotional states.

Psychomotor domain objectives: produce outcomes that focus on motor skills and perceptual

processes.

The Cognitive domain

 This domain was developed by Benjamin Bloom in 1956, hence it is known as

Bloom’s taxonomy of educational objectives.

 The taxonomy classifies educational objectives into 6 main headings.

i. Knowledge. This involves the recall of specific facts, methods and processes. It is

often defined as the remembering of previously learned material. Illustrative verbs

such as define, identify, label etc are calling for the students’ knowledge
ii. Comprehension. It is the ability to grasp the meaning of material. It is shown by

translating material from one form to another, or by interpreting material (explaining

or summarizing). Illustrative verbs include convert, explain, summarize

iii. Application. This refers to the ability to use learned material in new and concrete

situations. This includes the application of such things as rules, methods, concepts,

principles etc. Illustrative verbs include change, compute, and prepare.

iv. Analysis. This is the ability to break down material into its component parts so that

its organizational structure may be understood. This includes the identification of

parts, analysis of the relationships between parts etc. Illustrative verbs include break

down, differentiate, illustrate

v. Synthesis. This refers to the ability to put parts together to form a new whole. This

may involve the production of a unique communication, or a plan of operations.

Illustrative verbs include categorize, combine, organize

vi. Evaluation. This is the ability to judge the value of material (e.g. novel, poem, and

research report) for a given purpose. The judgments are based on definite criteria.

Illustrative verbs include appraise, contrast, support.

Revised Bloom’s Taxonomy

1. Remember Retrieve relevant knowledge from long-term memory.

 Recognizing (identifying)

 Recalling (retrieving)
2. Understand Construct meaning from instructional messages, including

oral, written, and graphic communication

 Interpreting (clarifying, paraphrasing, representing, translating)

 Exemplifying (illustrating, instantiating)

 Classifying (categorizing, subsuming)

 Summarizing (abstracting, generalizing)

 Inferring (concluding, extrapolating, interpolating, predicting)

 Comparing (contrasting, mapping, matching)

 Explaining (constructing models)

3. Apply Carry out or use a procedure in a given situation.

 Executing (carrying out)

 Implementing (using)

4. Analyse Break material into its constituent parts and determine how the parts

relate to one another and to an overall structure or purpose

 Differentiating (discriminating, distinguishing, focusing, selecting)

 Organizing (finding coherence, integrating, outlining, parsing, structuring)

 Attributing (deconstructing)

5. Evaluate Make judgments based on criteria and standards

 Checking (coordinating, detecting, monitoring, testing

 Critiquing (judging)

6. Create Put elements together to form a coherent or functional whole;

reorganize elements into a new pattern or structure.

 Generating (hypothesizing)
 Planning (designing)

 Producing (constructing)
Continuation Of Taxonomies Of Educational Objectives

Quellmalz (1985) also proposed a cognitive taxonomy, which has five categories. These are:

i. Recall. This requires that students recognize or remember key facts, definitions,

concepts, rules, and principles. They require students to repeat verbatim or to

paraphrase given information. e.g. Who wrote the story?

ii. Analysis. Students divide a whole into component elements, e.g. What are the

different parts of the story?

iii. Comparison. This requires students to recognize or explain similarities and

differences. e.g. How was this story like the last one?

iv. Inference. Students are given a generalization and are required to recognize evidence

or details and are required to come up with the generalization. e.g. What might be a

good title for the story?

v. Evaluation. Students are required to judge quality, credibility, worth, or practicality.

e.g. Is this a good story?

The Affective domain

This was developed by David Krathwohl, Benjamin Bloom and Masia in 1964. They classified

educational objectives in the affective domain into 5 categories.

i. Receiving: It is the lowest level of learning outcomes in the affective domain. It is

the willingness of a student/pupil to attend to particular phenomena/stimuli (e.g.


classroom activities, reading textbook or library books, doing class assignments etc.).

Examples of general instructional objectives include; listen attentively, and attends

closely to the classroom activities. Illustrative verbs that are used include asks,

chooses, follows, gives, holds, names.

ii. Responding: It is the active participation of a student/pupil in given activities. The

student/pupil does not only attend to particular stimuli but also reacts to it in some

way. The student/pupil may read an assigned material or does an assignment or

project. Examples of general instructional objectives include; completes assigned

homework, obeys school rules and regulations. Illustrative verbs that are used include

answers, assists, complies, conforms, discusses, greets, practices, writes.

iii. Valuing: It is concerned with the value, worth or importance a student/pupil attaches

to a particular object, phenomenon or behaviour. The value ranges from a simple

acceptance of a value to a more complex level of commitment. Examples of general

instructional objectives include; shows concern for the welfare of others, appreciates

the role of science in everyday life. Illustrative verbs include completes, describes,

differentiates, explains, follows, initiates, invites, joins, reads.

iv. Organization: It is the ability to bring together different values, resolving conflicts

between them, and beginning to build and internally consistent value system.

Students/pupils begin to develop philosophies of life. Examples of general

instructional objectives include; accepts responsibility for own behaviour,

understands and accepts own strengths and weaknesses. Illustrative verbs include

adheres, alters, arranges, combines, compares, completes, defends.


v. Characterization by a value or value complex: This is the highest level in the

affective domain. At this level, the individual student/pupil has a value system that

has controlled his/her behaviour for a sufficiently long time for him/her to have

developed a characteristic lifestyle. Examples of general instructional objectives

include; practices cooperation in group activities, maintains good study habits.

Illustrative verbs include acts, discriminate, displays, influences, listens, modifies,

performs, practices, proposes, qualifies.

The Psychomotor domain

Simpson (1972) and Harrow (1972) developed categories in this domain. Simpson produced 7

categories while Harrow had 6 categories.

Simpson’s categories

1. Perception. This is the lowest level. It is the ability to use the sense organs to

obtain cues that guide motor activity. For example relating the sound of drums to the

dance type. Illustrative verbs include, choose, describe, detect, identify.

2. Set. It is the readiness to take a particular type of action. Demonstrating a proper

position to save a penalty kick in a soccer game. Illustrative verbs include, begin,

displays, explains, shows, starts.

3. Guided response. It involves the early stages in learning a complex skill. For

example, starting a car while beginning to learn how to drive. Illustrative verbs

include, assemble, build, construct, display.


4. Mechanism. This occurs when a learned activity has become habitual and

movements are performed with confidence and proficiency. For example, typing,

operating a video recorder. Illustrative verbs include sketch, fix, fasten, dissect,

assemble.

5. Complex Overt Response. It is the ability to perform complex acts. For

example driving an articulator truck, performing skilfully on the piano. Illustrative

verbs include assemble, build, construct, organize.

6. Adaptation. It is the ability to modify movement patterns from well-developed

skills to fit special requirements or situations. For example modify piano rhythms to

suit local songs. Illustrative verbs include adapt, alter, change, reorganize.

7. Origination. This is the highest level. It involves the ability to create new

movement patterns to meet a specific need or particular problem. Creativity and

originality are emphasized. For example design a new computer software, create a

new musical dance. Illustrative verbs include arrange, create, design, originate.

Harrow’s (1972) categories

1. Reflex movements are actions elicited without learning in response to some stimuli.

Examples include: flexion, extension, stretch, postural adjustments.

2. Basic fundamental movements are inherent movement patterns which are formed by

combining of reflex movements and are the basis for complex skilled movements. Examples

are: walking, running, pushing, twisting, gripping, grasping, manipulating.

3. Perceptual abilities refers to interpretation of various stimuli that enable one to make

adjustments to the environment. Visual, auditory, kinesthetic, or tactile discrimination.


Suggests cognitive as well as psychomotor behavior. Examples include: coordinated

movements such as jumping rope, punting, or catching.

4. Physical activities require endurance, strength, vigor, and agility which produces a sound,

efficiently functioning body. Examples are: all activities which require a) strenuous effort for

long periods of time; b) muscular exertion; c) a quick, wide range of motion at the hip joints;

and d) quick, precise movements.

5. Skilled movements are the result of the acquisition of a degree of efficiency when

performing a complex task. Examples are: all skilled activities obvious in sports, recreation,

and dance.

6. Non-discursive communication is communication through bodily movements ranging from

facial expressions through sophisticated choreographies. Examples include: body postures,

gestures, and facial expressions efficiently executed in skilled dance movement and

choreographies.
UNIT 3

TEST VALIDITY (CHARACTERISTICS OF TEST I)

Nitko (1996, p. 36) defined validity as the “soundness of the interpretations and use of students’

assessment results”. Validity emphasizes the interpretations and use of the results and not the

test instrument.

Evidence needs to be provided that the interpretations and use are appropriate.

NATURE OF VALIDITY

In using the term validity in relation to testing and assessment, five points have to be

borne in mind.

 Validity refers to the appropriateness of the interpretations of the results of an assessment

procedure for a group of individuals. It does not refer to the procedure of instrument itself.

 Validity is a matter of degree. Assessment results may have high, moderate or low validity.

Results have different degrees of validity for different purposes and for different situations.

 Validity is always specific to some particular use or interpretation. No assessment is valid

for all purposes.

 Validity is a unitary concept that is based on various kinds of evidence.

 Validity involves an overall evaluative judgment. Several types of validity evidence should

be studied and combined.


PRINCIPLES FOR VALIDATION

There are four principles that help a test user/giver to decide the degree to which his/her

assessments results are valid.

1. The interpretations (meanings) given to students’ assessment results are valid only to the

degree that evidence can be produced to support their appropriateness.

2. The uses made of assessment results are valid only to the degree that evidence can be

produced to support their appropriateness and correctness.

3. The interpretations and uses of assessment results are valid only when the educational and

social values implied by them are appropriate.

4. The interpretations and uses made of assessment results are valid only when the

consequences of these interpretations and uses are consistent with appropriate values.

CATEGORIES OF VALIDITY EVIDENCE

There are 3 major categories of validity evidence.

1. Content-related evidence

 This type of evidence refers to the content representativeness and relevance of tasks

or items on an instrument.

 Judgments of content representativeness focus on whether the assessment tasks are a

representative sample from a larger domain of performance.


 Judgments of content relevance focus on whether the assessment tasks are included

in the test user’s domain definition when standardized tests are used.

Content-related evidence answers questions like:

i. How well do the assessment tasks represent the domain of important content?

ii. How well do the assessment tasks represent the curriculum as defined?

iii. How well do the assessment tasks reflect current thinking about what should be taught

and assessed?

iv. Are the assessment tasks worthy of being learned?

To obtain answers for the questions, a description of the curriculum and content to be learned (or

learned) is obtained. Each assessment task is checked to see if it matches important content and

learning outcomes. Each assessment task is rated for its relevance, importance, accuracy and

meaningfulness.

One important way to ascertain content-related validity is to inspect the table of

specification.

2. Criterion-related evidence

 This type of evidence pertains to the empirical technique of studying the relationship

between the test scores or some other measures (predictors) and some independent

external measures (criteria) such as intelligence test scores and university grade point

average.
 Criterion-related evidence answers the question, ‘How well the results of an assessment

can be used to infer or predict an individual’s standing on one or more outcomes other

than the assessment procedure itself’. The outcome is called the criterion.

 There are two types of criterion-related evidence. These are concurrent validity and

predictive validity.

 Concurrent validity evidence refers to the extent to which individuals’ current status on

a criterion can be estimated from their current performance on an assessment instrument.

For concurrent validity, data are collected at approximately the same time and the

purpose is to substitute the assessment result for the score of a related variable. e.g. a test

of swimming ability vrs swimming itself to be scored.

 Predictive validity evidence refers to extent to which individuals’ future performance on

a criterion can be predicted from their prior performance on an assessment instrument.

For predictive validity, data are collected at different times. Scores on the predictor

variable are collected prior to the scores on the criterion variable. The purpose is to

predict the future performance of a criterion variable. e.g. Using WASSCE results to

predict the first year GPA in the University of Cape Coast.

Criterion-related validation is determined coefficient of correlation between the assessment

result and the criterion. The correlation coefficient is a statistical index that quantifies the degree

of relationship between the scores from one assessment and the scores from another. This

coefficient is often called the validity coefficient and takes values from

–1.0 to +1.0.
3. Construct-related evidence: This type of evidence refers to how well the assessment results

can be interpreted as reflecting an individuals’ status regarding an educational or

psychological trait, attribute or mental process. Examples of constructs are mathematical

reasoning, reading comprehension, creativity, honesty and sociability.

Factors affecting validity

1. Unclear directions. Validity is reduced if students do not clearly understand how to respond

to the items and how to record the responses or the amount of time available.

2. Too difficult reading vocabulary and sentence structure tends to reduce validity. The

assessment may be measuring reading comprehension which is not to be measured.

3. Ambiguous statements in assessment tasks and items. This confuses students and makes

way for different interpretations thus reducing validity.

4. Inadequate time limits. This does not provide students with enough time to respond and thus

may perform below their level of achievement. This reduces validity.

5. Inappropriate level of difficulty of the test items. Items that are too easy or too difficult do

not provide high validity.

6. Poorly constructed test items. These items may provide unintentional clues which may

cause students to perform above their actual level of achievement. This lowers validity.

7. Test items being inappropriate for the outcomes being measured lowers validity.

8. Test being too short. If a test is too short, it does not provide a representative sample of the

performance being interested in and this lowers validity.


9. Improper arrangement of items. Placing difficult items in the beginning of the test may put

some students off and cause them to become unstable thereby performing below their level of

performance thus reducing validity.

10. Identifiable pattern of answers. Placing the answers to tests like multiple-choice and

true/false types enables students to guess the correct answers more easily and this lowers

validity.

11. Cheating. When students cheat by copying answers or helping their friends with answers to

test items, validity is reduced.

12. Unreliable scoring. Scoring of test items especially essay tests may lower reliability if they

are not scored reliably.

13. Student emotional disturbances. These interfere with their performance thus reducing

validity.

14. Fear of the assessment situation. Students can be frightened by the assessment situation and

are unable to perform normally. This reduces their actual level of performance and

consequently, lowers validity.


UNIT 4

TEST RELIABILITY (FEATURES OF A GOOD TESTS II)

Definition

Reliability is the degree of consistency of assessment results. It is the degree to which

assessment results are the same when;

 the same tasks are completed on two different occasions

 different but equivalent tasks are completed on the same or different occasions

 two or more raters mark performance on the same tasks.

NATURE OF RELIABILITY

Note the following about reliability;

 Reliability refers to the results obtained with an assessment instrument and not to the

instrument itself.

 An estimate of reliability refers to a particular type of consistency.

 Reliability is a necessary condition but not a sufficient condition for validity.

 Reliability is primarily statistical. It is determined by the reliability coefficient, which is

defined as a correlation coefficient that indicates the degree of relationship between two

sets of scores intended to be measures of the same characteristic. It ranges from 0.0 - 1.0
Definition of terms:

Obtained (Observed) score (X): Actual scores obtained in a test or assessment.

Error score (E): The amount of error in an obtained score.

True score (T): The difference between the obtained and the error scores. It is the portion of the

observed score that is not affected by error. An estimate of the true score of a student is the

mean score obtained after repeated assessments under the same conditions.

X=T+E

Reliability can be defined theoretically as the ratio of the true score variance to the observed

s 2t

score variance. i.e. rxx = s 2x

Standard Error Of Measurement (SEM): It is a measure of the variation within individuals on

a test. It is an estimate of the standard deviation of the errors of measurement. It is obtained by

the formula: Se = Sx √ 1−r xx or SEM = SDx√ √ 1−reliabilitycoefficient , where Sx or SDx

is the standard deviation of the obtained scores. For example, given that, rxx = 0.8, Sx = 4.0,

SEM = 4 √ 1−0.8 =4 √ 0.2 = 4 x 0.447 = 1.788

Interpreting standard errors of measurement

It estimates the amount that a student is likely to deviate from her/his true score. e.g. SEM=4

indicates that a student’s obtained scores lies 4 points above or below the true score. An

obtained score of 75 means the true score is either 71 or 79. The true score therefore lies
between 71 and 79. 71-79 therefore provides a confidence band for interpreting an obtained

score. A small standard error of measurement indicates high reliability providing greater

confidence that the obtained score is near the true score.

Reliability coefficient: A correlation coefficient that indicates the degree of relationship

between two sets of scores intended to be measures of the same characteristic (e.g. correlation

between scores assigned by two different raters or scores obtained from administration of two

forms of a test).

Methods Of Estimating Reliability

1. Test-retest method. With this method, the same test is given to a group of students twice on

different occasions ranging from several minutes to years. The scores on the two

administrations are correlated (compared) and the result is the estimate of the reliability of

the test. The time interval should be reasonable, not be too short nor too long. This is a

measure of the stability of scores over a period of time.

2. Equivalent forms method. Two test forms, which are alternate or parallel with the same

content and level of difficulty for each item, are administered to the same group of students.

The forms may be given on the same or nearly the same occasion or a time interval will

elapse before the second form is given. The scores on the two administrations are correlated

and the result is the estimate of the reliability of the test.

3. Split-half method. This is a measure of internal consistency. A single test is given to the

students. The test is then divided into two halves for scoring. The two scores for each

student are correlated to obtain the estimate of reliability. The test can be split into two
halves in several ways. These include using (i) odd-even numbered items, and (ii) first half-

second half.

4. Inter-rater reliability. Two raters (scorers) each score a student’s paper. The two scores for

all the students are correlated. This estimate of reliability is called scorer reliability or inter-

rater reliability. It is an index of the extent to which the raters were consistent in rating the

same students.

Factors influencing reliability

1. Test length. Longer tests give more reliable scores. A test consisting of 40 items will give a

more reliable score than a test consisting of 25 items.

2. Group variability. The more heterogeneous the group, the higher the reliability. The

narrower the range of a group’s ability, the lower the reliability.

3. Difficulty of items. Too difficult or too easy items produce little variation in the test scores.

This in turn lowers reliability.

4. Scoring objectivity. Subjectively scored items result in lower variability. More objectively

scored assessment results are more reliable. For subjectively-scored items, multiple markers

are preferred.

5. Speed. Tests, where most students do not complete the items due to inadequate allocation of

time, result in lower reliability. Sufficient time should be provided to students to respond to

the items.

6. Sole marking. Using multiple markers improves the reliability of the assessment results. A

single person grading may lead to low reliability especially of essay tests, term papers, and

performances. Averaging the results of several markers increases reliability.


7. Testing conditions. Where test administrators do not adhere strictly to uniform test

regulations and practices, students’ scores may not represent their actual level of

performances and this tends to reduce reliability. In cases of the test-retest method of

estimating reliability, this issue is of a great concern.


UNIT 5

PLANNING ACHIEVEMENT TESTS AND ASSESSMENTS

Definition

Achievement tests are tests that measure the extent of present knowledge and skills. In

achievement testing, test takers are given the opportunity to demonstrate their acquired

knowledge and skills in specific learning situations.

TYPES OF ACHIEVEMENT TESTS

There are two types of achievement tests. These are;

 Standardized achievement tests

 Teacher-made/classroom achievement tests.

The major difference between these two types of tests is that standardized achievement tests

are carefully constructed by test experts with specific directions for administering and scoring

the tests. This makes it possible for standardized achievement tests to be administered to

individuals in different places often at the same time.

CHARACTERISTICS OF STANDARDIZED ACHIEVEMENT TESTS

 Standardized specific instructions are provided for test administration and scoring.

Directions are so precise and uniform that the procedures are standard for different users

of the test.
 The test items are developed by test experts and specialists who follow well-formulated

procedures for test development.

 The tests often have high quality. Reliability is often over 0.90.

 They use test norms which are based on national samples of students in the classes/forms

where the tests are intended for use.

 Test content is determined by curriculum and subject-matter experts and involves

extensive investigations of existing syllabi, textbooks and programs.

 Equivalent and comparable forms of the test are usually provided and administered.

 A test manual is available as a guide for test administration and scoring. It provides

information for evaluating the test for technical quality and interpretation and use of the

results.

Teacher-made/classroom achievement tests

 These tests are constructed by classroom teachers for specific uses in each classroom and

are closely related to particular objectives.

 They are usually tailored to fit the teacher’s instructional objectives.

 The content of the test is determined by the classroom teacher.

 The quality of the test is often unknown but usually lower than standardized tests.

STAGES IN TEACHER MADE/CLASSROOM ACHIEVEMENT TESTING


Four principal stages are involved in classroom testing. These are:
1. Constructing the test,
2. Administering the test,
3. Scoring the test,
4. Analysing the test results.
Stage 1: Constructing the test

There are eight steps (8) in the construction of a good classroom test.

Step 1. Define the purpose of the test

 The basic question to answer is, “Why am I testing?” The tests must be related to the

teacher’s classroom instructional objectives.

 The teacher has to answer other questions such as ‘Why is the test being given at this

time in the course?’, ‘Who will take the test?’, ‘Have the test takers been informed?’,

‘How will the scores be used?’

Step 2. Determine the item format to use

 Test items could either be essay, objective or performance types.

 Objective-type tests include multiple-choice, short-answer, matching and true and false.

The choice of format must be appropriate for testing particular topics and objectives. It is

sometimes necessary to use more than one format in a single test.

 Mehrens and Lehmann (1991) mentioned 8 factors to consider in the choice of the

appropriate format. These include:

(1) the purpose of the test, (2) the time available to prepare and score the test,

(3) the number of students to be tested, (4) skill to be tested, (5) difficulty desired, (6) physical

facilities like reproduction materials, (7) age of pupils, (8) skills in writing the different types of

items.
Step 3. Determine what is to be tested.

The teacher asks himself or herself the question, ‘What is it that I wish to measure?’. The

teacher has to determine what chapters or units the test will cover as well as what knowledge,

skills and attitudes to measure. A test plan called table of specifications or blue print must be

made. The specification table matches the course content with the instructional objectives.

To prepare the table, specific topics and sub-topics covered during the instructional period

are listed. The major course objectives are also specified and the instructional objectives

defined. The total number of test items is then distributed among the course content and

instructional objectives or behaviours.


Example: Terms and Concepts in Assessment

Behaviour
Content
Knowledge Comprehension Application Analysis Synthesis Evaluation Total
Sets 1 1 1 3
Indices 1 1 1 3
Angles 1 1 1 3
polygons 2 1 1 4
factorization 2 1 3
Number plane 1 1 1 1 4
Total 4 7 5 3 1 20

The table of specifications has a number of advantages.

1. It makes sure that justice is done to all the topics covered in the course.

2. It helps the teacher to determine the content validity of the test.

3. It helps to weight the score distribution fairly

4. It avoids overlapping in the construction of the test items.

5. It helps students to determine the content and behavioural areas where they have difficulty.

Teachers can also determine areas where the class has difficulty.

Step 4. Write the individual items.

In writing the individual items (questions), the following general guidelines must be considered.

1. Keep the table of specifications before you and continually refer to it as you write the

items.

2. Items must match the instructional objectives.

3. Formulate well-defined items that are not vague, and ambiguous and should be

grammatically correct and free from spelling and typing errors.

4. Avoid excessive verbiage. Avoid needlessly complex sentences.


5. The item should be based on information that the examiner should know.

6. Write the test items simply and clearly.

7. Prepare more items than you will actually need.

8. The task to be performed and the type of answers required should be clearly defined.

9. Include questions of varying difficulty.

10. Write the items and the key as soon as possible after the material has been taught.

11. Avoid textbook or stereotyped language

12. Write the items in advance of the test date to permit reviews and editing.

Step 5. Review the items.

 Carefully examine each item at least a week after writing the item.

 Items that are ambiguous and those poorly constructed as well as items that do not match

the objectives must be reworded or removed.

 Items must not be too difficult or too easy.

 Check the length of the test (i.e. number of items against the purpose, the kinds of test

items used and the ability level of the students).

 Assemble the test in the final form for administration.

Step 6. Prepare scoring key

 Prepare a scoring key or marking scheme while the items are fresh in your mind.

 List correct responses and acceptable variations for objective-type tests.

 Assign points to the various expected qualities of responses.


Step 7. Write directions (Instructions).

 Give clear and concise directions for the entire test as well as sections of the test.

 Clearly state the time limit for the test.

 Penalties for undesirable writing must be spelt out.

 Directions must include number of items to respond to, how the answers will be written,

where the answers will be written, amount of time available, credits for orderly

presentation of material (where necessary), and mode of identification of respondent. For

selection-type tests, indicate what will be done to guessing.

Step 8. Evaluate the test.

Before administration, the test should be evaluated by the following five criteria: clarity,

validity, practicality, efficiency and fairness.

Clarity: Who is being tested?

What material is the test measuring?

What kinds of knowledge is the test measuring?

Do the test items relate to content and course objectives?

Are the test items simple and clear?

Validity: Is the test a representative sampling of the material

presented in the chapter, unit, section or course?

Does the test faithfully reflect the level of difficulty of


material covered in the class?

Practicality: Will students have enough time to complete the test?

Are there sufficient materials available to present the test to

complete it effectively?

Efficiency: Is this the best way to test for the desired knowledge, skill or

attitude?

What problems might arise due to material difficulties or

shortage?

Fairness: Were the students given advance notice?

Have I adequately prepared students for the test?

Do the students understand the testing procedures?

How will the scores affect the students’ lives?


UNIT 6
TYPES OF TEST ITEMS

There are two major types of tests items. These are the essay-type tests and objective-type tests.

ESSAY TESTS OBJECTIVE TESTS

1. Requires students to plan their 1. Requires students to choose among


own answers and to express them several designated alternatives or
in their own words. write a short answer.

2. Consists of relatively few items that 2. Consists of many items requiring


call for extended answers only brief answers.

3. A lot of time is spent by students 3. A lot of time is spent by students in


in thinking and writing when taking reading and thinking when taking
the test. the test.

4. Quality of test is determined 4. Quality of test is determined largely largely by


the skill of the test scorer. by the skill of the test constructor.
5. Relatively easy to prepare but 5. Relatively tedious and difficult to
rather tedious and difficult to score. prepare but rather easy to score.

6. Permits and encourages bluffing. 6. Permits and encourages guessing.


7. Afford both the student and teacher 7. Afford only the test constructor the opportunity
to be individualistic. (teacher) the opportunity to be
individualistic.
8. Score distribution varies from one 8. Score distribution is determined
scorer to another. largely by the test.

9. Less amenable to item and statistical 9. Amenable to item and statistical


analysis. analysis.
10. Scoring is subjective. 10. Scoring is highly objective.
11 Content validity is low. 11. Content validity is high.
12. Reliability of test scores 12. Reliability of test scores could be is low.
high.
OBJECTIVE TEST ITEMS I
Description

An objective test requires a respondent to provide a briefly response which is usually not
more than a sentence long. The tests normally consist of a large number of items and the
responses are scored objectively

TYPES OF OBJECTIVES
 The selection type. Examples are multiple-choice type, true and false type and matching
type.
 The supply type. Examples are completion, fill-in-the-blanks and short-answer.

Strengths and advantages of Objectives type


1. Scoring is easy and objective.
2. They allow an extensive coverage of subject content.
3. They do not provide opportunities for bluffing.
4. They are best suited for measuring lower-level behaviours like knowledge and
comprehension.
5. They provide economy of time in scoring.
6. Student writing is minimized. Premium is not placed on writing.
7. They are amenable to item and statistical analysis.
8. Scores are not affected by extraneous factors such as the likes and dislikes of the scorer.

Weaknesses and disadvantages


1. They are relatively difficult to construct.
2. Item writing is time consuming.
3. They are susceptible to guessing.
4. Higher-order mental processes like analysis, synthesis and evaluation are difficult to
measure.
5. Places premium on student’s reading ability.

MULTIPLE-CHOICE TESTS
Description

 A multiple-choice test is a type of objective test in which the respondent is given a


stem (Question) and then is to select from among three or more alternatives (options
or responses) the one that best completes or answers the stem.
 The incorrect options are called foils or distracters.
Types of multiple choice test
 Single ‘correct’ or 'best response' type
The single ‘correct’ or ‘best response’ type consists of a stem followed by three or
more responses and the respondent is to select only one option to complete the stem.

Examples:
Single correct response

Write 0.039387 as a decimal correct to 3 significant figures.

A. 0.394
B. 0.0394
C. 0.0393
D. 0.039

Single best response

In which one of the following sites would you, as a community health worker, advise a
community to dispose refuse?
A. A compost pit
B. An abandoned well
C. An incinerator
D. An uncultivated land

Multiple response type


 This consists of a stem followed by several true or false statements or words. The
respondent is to select, which statement(s) could complete the stem
An example is:
Which of the following action(s) contribute (s) to general principles of First Aid?

I. Arrest hemorrhage
II. Bath the patient
III. Immobilize injured bone

A. I only
B. 11 only
C. I and II
D. I and III
E. 1, II and III

Guidelines for constructing multiple-choice tests

1. The central issue of the item should be in the stem. It should be concise, easy to read and
understand.
The following are examples of poor and good items

Poor question Good question


Ghana The largest man-made lake in Africa is in
A. became independent in 1960 A. Chad
B. has West Africa's largest population B. Ghana
C has the largest man-made lake in Africa C. Kenya
D. is the world's leading cocoa producer D. Tanzania
2. The options should be plausible. Distracters must be plausibly attracted to the uninformed.

Poor Good

The longest river in Africa is The longest river in Africa is


A. Benue A. Congo
B. Congo B. Niger
C. Gambia C. Nile
D. Nile D. Volta
E. Thames E. Zambesi
In the poor example, rivers Benue, and Gambia are not significantly long to
attract respondents and Thames is not in Africa.

3. All options for a given item should be homogeneous in content.


Poor Good
Who was Ghana’s first President The first woman Prime Minister
/Head of State /Head of State in the world is/was
A. Dr. Kwame Nkrumah A. Corazon Acquino
B. Haruna Iddrisu B. Golda Meir
C. Spio Gabra C. Indira Gandhi
D. Kennedy Agyapong D. Margaret Thatcher
E. Dominic Nitiwul E. Sirimavo Bandaranaike
In the poor example, only options A was President of head of state.

4. All options for a given item should be homogeneous in grammatical structure.

Example
In constructing multiple-choice test items, options to an item should be….
A. arranged in horizontally.
B. copied directly from class notes or textbooks.
C. must have a discernible pattern of responses.
D. homogeneous in content.

5. All options must follow syntax and punctuation rules.


Good:
A nurse observes that a colleague nurse reports to work always drunk. What
should be the nurse’s first reaction?
A. Ignore the drunk nurse.
B. Report the colleague to the union.
C. Request for transfer of the colleague.
D. Talk to the colleague.
6. Repetition of words in the options should be avoided.

Poor
'Which is the best definition of a contour-line?
A. A line on a map joining places of equal barometric pressure.
B. A line on a map joining places of equal earthquake intensity.
C. A line on a map joining places of equal height.
D. A line on a map joining-places of equal mean temperature.
E. A line on a map joining places of equal rainfall.
Good
A line on a map joining places of equal pressure is called an
A. isobar
B. isobront
C. isochasm
D. isogeotherm
E. isotherm

7. Specific determiners which are clues to the best/correct option should be avoided.

Poor Good
The first woman cosmonaut is a The first woman to go into space is a/an
A. American A. American
B. Englishman B. British
C. Irish C. French
D. Italian D. Italian
E. Russian E. Russian
In the poor example, the article, a, gives a clue that the correct option is
Russian. In addition, it is only Russians who use the term, cosmonaut. Also
Englishman does not belong to the group.

8. Vary the placement of the correct options. No discernible pattern of the correct/best
responses should be noticed.

9. Sentences should not be copied from textbooks, or from others’ (colleagues, friends etc) past
test items. Original items should be made. This builds capacity in item writing.

9. The responses/options in agreement must be in alphabetical/sequential order. This reduces


unnecessary searching on the part of the respondents.

For example:
In constructing multiple-choice test items, options to an item should be
A. arranged horizontally.
B. copied from textbooks.
C. heterogeneous in content.
D. homogeneous in content.
10. Items measuring opinions should not be included. One option should clearly be correct or
the best.

Poor Good
The best Ghanaian medical doctor is The Ghanaian medical doctor famous
for his work on the sickle-cell disease is
A. Charlotte Gardiner A. F. I. D. Konotey-Ahulu
B. F. I. D. Konotey-Ahulu B. F. O. Acheampong
C. Mary Grant . C. K. G. Korsah
D. Mohamed Mustafa D. M. K. Mustafa

11. The responses in agreement must be itemized vertically and not horizontally.

Poor
In constructing multiple-choice test items, options to an item should be
A. arranged in a horizontally B. copied directly from class notes or
textbooks. C. must have a discernible pattern of responses. D.homogeneous
in content

Good
In constructing multiple-choice test items, options to an item should be
A. arranged horizontally
B. copied from textbooks.
C. heterogeneous in content.
D. homogeneous in content.

13. The responses in agreement must be parallel in form i.e. sentences must be about the same
length.

Poor
In constructing multiple-choice test items, options to an item should be
A. arranged horizontally.
B. copied directly from class notes or textbooks.
C. in a discernible pattern of responses for easy identification.
D. homogeneous in content.

Good
In constructing multiple-choice test items, options to an item should be
A. arranged horizontally.
B. copied from textbooks.
C. heterogeneous in content.
D. homogeneous in content.
14. Each option must be distinct. Overlapping alternatives should be avoided.
Poor Good
In a healthy adult, the liver weighs about In a healthy adult, the liver
weighs between
A. 3.0kg A. 6.5 – 7.5kg
B. 2.5kg B. 4.5 – 6.0kg
C. 2.0kg C. 3.0 – 4.0kg
D. 1.5kg D. 1.0 – 2.5kg

15. Avoid using “all of the above” as an option but "None of the above” can be used sparingly.
It should be used only when an item is of the 'correct answer' type and not the 'best answer'
type.
Poor Good
The following are local signs and In administering intramuscular
symptoms of inflammation except injection, the needle is inserted into
the muscle at an angle of
A. rashes. A. 300.
B. redness. B. 450.
C. restoration of function. C. 600.
D. sleeplessness. D. 90.
E. None of the above. E. None of the above.
In the poor example, there are other signs and symptoms not included whereas in the good
example there is one and only one answer.

16. Stems and options should be stated positively. However, a negative stem could be used
sparingly and the word not should be emphasized either by underlining it or writing it in
capital form.
An example is:
Which of these insects has NOT been incriminated to transmit diseases?

A. Bed-bug
B. Blackfly
C. Body louse
D. Housefly
E. Tsetsefly
17. Create independent items. The answer to one item should not depend on the knowledge of
the answer to a previous item.

For example:
Item 1. The perimeter of a rectangular field is 60 metres. If one side is 20 metres
long, what is the width of the field?

A. 10 metres
B. 20 metres
C. 30 metres
D. 40 metres
E. 60 metres

Item 2. Find the length of the diagonal of the rectangular field in item 1 above.

A. 10.0 metres
B. 20.0 metres
C. 22.4 metres
D. 30.6 metres
E. 40.0 metres

18. The expected response should not be put at the beginning of the stem.

Poor
…………..…printing devices transmit output to a printer via radio waves.
a. Infrared
b. Laser
c. Bluetooth
d. Large Format

Good
What printing device transmits output to a printer via radio waves?
A. Bluetooth
B. Cartridge
C. Infrared
D. Laser

19. Check each item to make sure that there is only one correct or best response to the item.

Poor
Amsterdam is the capital city of _______.
A. Holland
B. Hungary
C. Luxemburg
D. Netherlands
Both Holland and Netherlands provide the answer
20. Be consistent in the number of options used. Four or five options are good for higher
education students.

21. Read through all items carefully to ensure that the answer to one question is not revealed in
another.

Example:
Q6. What do you use to test for sugar in urine?
A. Albustix
B. Clinitest
C. Ketostix
D. Uristix

Q20. This can also be used to test for sugar in urine if clinitest is
not available.
A. Albustix
B. Ictostix
C. Ketostix
D. Uristix
UNIT 7

True and False tests

Description

A true and false test consists of a statement to marked true or false. A respondent is expected to

demonstrate his command of the material by indicating whether the given statement is true or

false.

Types

There are 4 variations/types.

 Simple True or False

 Complex True – False

 Compound True – False

 Multiple True – False

1. Simple True-False: This consists of only 2 choices; True, False

Example:

Sir Gordon Guggisberg was the governor who built the Takoradi Harbour. True or False

2. Complex True-False: This consists of 3 choices; True, False, Opinion

Example: Adrenaline can relax uterine smooth muscles.

True, False, opinion

1. Compound True-False: This consists of 2 choices, True and False plus a conditional
completion response.

Example; A nurse who values equality demonstrates honesty to patients. True, False.

If this statement is false, what makes it false?

2. Multiple True-False: This consists of a stem with three, four or five options and the

respondent indicates if the options are True or False.

Example: The factors that reduce the reliability of classroom tests include:

A. Scoring test items objectively

B. Using homogeneous groups

C. Setting easy questions

D. Using unidentifiable pattern of answers

Guidelines for constructing true and false tests

 For Simple, Compound and Multiple types, statements must be definitely true or definitely

false.

Poor: The value of 2/3 as a decimal fraction is 0.7. True or False

Good: The value of i expressed as a decimal fraction correct to two decimal

places is 0.66. True or False

 Avoid words that tend to be clues to the correct answer.

Words like some, most, often, many, may are usually associated with true statements. All,

always, never, none are associated with false statements. These words must therefore be

avoided.
 For simple true-false type, approximately, half (50%) of the total

number of items should be false because it is easier to construct statements that are true and

the tendency is to have more true statements.

 Statements must be original. They must not be copied directly from

textbooks, past test items or any other written material.

 Statements should be worded such that superficial logic suggests a

wrong answer.

Poor: A patient took one tablet of a prescribed medicine and was healed in 24 hours. 8

tablets would therefore heal him in 3 hours. True or False

The true case is that 8 tablets would constitute an overdose.

 Statements should possess only one central theme.

Poor: Akropong Teacher Training College, built in 1900, is the first teacher training

institution in Ghana

Two main themes are in the statement.

 State each item positively. Negative item could however be used with

the negative word, 'not', emphasized by underlining or writing in capital letters. Double

negatives should be avoided.


 Statements should be short, simple and clear. Ambiguous as well as

tricky statements should be avoided.

Examples: (1) Abedi Pele is the best Ghanaian footballer. True or False

(2) Nana Aidoo won Ghana’s Presidential election in 2016.

True or False

Item 1 is ambiguous because best is relative while the trick in item 2 is the spelling of Aidoo.

 Statements should measure important ideas not trivia.

Poor: Dr. Kwame Nkrumah, had artificial teeth. True or False.

Good: Dr. Kwame Nkrumah was the first President of Ghana. True or False.

 Arrange the items such that the correct responses do not form a discernible pattern like TTTT

FFFF TTTT FFFF.

 To avoid scoring problems, let students write the correct options in full.

 Double-barreled statements should be avoided. These statements have one part true and one

part false.

Poor: The Bond of 1844, signed by Governor Commander Hill declared the Northern

territories of Ghana a Protectorate.

The Bond was signed by Commander Hill but did not achieve the stated purpose.
 Avoid the use of unfamiliar vocabulary.

Poor: According to some politicians, the raison d’etre for capital punishment is retribution.

Good: According to some politicians, the justification for the existence of capital

punishment is retribution.

 Avoid using extreme items. Words such as; all, no, always, never, the very most; or very

least, usually make a statement false.

Eg. The weather in Northern Ghana is always hot.

MATCHING-TYPE TESTS

Description

The matching type of objective test consists of two columns. The respondent is expected to

march (associate) an item in Column A with a choice in Column B on the basis of a well-defined

relationship.

Column A contains the premises and Column B contains the responses or options.

Example.

Match the vitamins in Column A with the diseases and conditions which a lack of the vitamin in

causes in column B

Column A: Column B

Vitamins Diseases caused by lack


1. Vitamin A a. Beriberi

2. Vitamin C b. Kwashiorkor

3. Vitamin D c. Pellagra

d. Poor eyesight

e. Rickets

f. Scurvy

Guidelines for constructing matching-type tests

1. Do not use perfect matching. Have more responses than premises. There should be at least

three more responses than premises.

2. Arrange premises and responses alphabetically or sequentially. This reduces the amount of

unnecessary searching on the part of the person who knows the answer.

3. Column A (premises) should contain the list of longer phrases. The shorter items should

constitute the responses.

4. Limit the number of items in each set. For each set, the number of premises should not be

more than six per set with the responses not more than ten.

5. Use homogeneous options and items.

Poor:

Instruction: Select an option from List B to match list A.

A B
1. The Battle of Dodowa a. 1824

2. Built Korle Bu hospital b. Gordon Guggisberg

3. Longest river in Africa c. Nile

d. 1826

e. Lord Listowel

f. Congo

Good:

Instruction: Select a river from list B to complete the description in list A. Write the answer

against the number in list A.

A B

Description of river Name of river

1. Aswan Dam is built on it. a. Niger

2. The longest river in Africa. b. Nile

3. It is a tributary of River Congo. c. Orange

d. Ubangi

e. Volta

f. Zambezi

6. Provide complete directions. Instructions should clearly show what the rules are and also

how to respond to the items.


7. State clearly what each column represents.

8. Avoid clues (specific determiners) which indirectly reveal the correct option.

9. All options-must be placed (and typed) on the same page.

10. Avoid using multiple correct choices for one premise.


Short-Answer type tests

Description

This type of objective test is also known as the Supply, Completion, and fill-in-the blanks.

It is made up of an incomplete statement or question and the respondent is required to complete

it with a short answer usually not more than one line.

Examples:

1. Modern nursing was introduced into Ghana in the year _________________.

2. What is the name of the first Ghanaian Prime Minister? ________________

3. The environment has three component parts: Name them.

Strengths and Advantages

1. Scoring is easy.

2. They allow an extensive coverage of subject content.

3. They do not provide opportunities for bluffing.

4. Minimizes guessing.

5. They are best suited for measuring lower-level behaviours especially knowledge,

comprehension application.

6. Encourages students to study in a deeper and more integrated manner.

7. Can assess students’ reasoning skills.

8. Makes cheating more difficult and reduces its incidence as compared to multiple-choice

and true/false types of tests.


9. Discourages last minute rote learning.

10. Gives some practice in writing.

Weaknesses and Limitations

1. They are difficult to construct so that the desired response is clearly indicated.

2. Higher-order objectives and behaviours are difficult to measure.

3. Often includes more specific determiners.

4. Time consuming to score.

5. Difficult to score since more than one answer may have to be considered.

6. Penalizes students who write slowly or have poor writing skills.

GUIDELINES FOR CONSTRUCTING SHORT-ANSWER TESTS

1. Keep the number of missing words or blank spaces low. Preferably use one blank per item.

There should not be more than two blanks in one item.

Poor: The_____of _____ took place in _____________.

Good: The battle of Dodowa took place in the year________.

2. Use original statements that are carefully constructed. Statements should not be

lifted from textbooks or past items or any written material.

3. Avoid specific determines which provide clues to the correct option.

4. Blanks must be placed at the end or near the end of the statement and not at the
beginning.

Poor: ____________ is an instrument used for measuring temperature.

Good: An instrument used for measuring temperature is called ___________.

 Items should be so clearly written that the type of response required is clearly

recognized.

Poor. The battle of Nsamankow was fought in ______________.

Good: The battle of Nsamankow was fought in the year _______.

 Avoid lengthy and tortuous statements

Poor: A specific disease in which acute glomerular damage occurs following distant

infections, particularly with certain streptococci and usually affects children and

young adults and which clinical picture is commonly one of a dramatic onset of

oedema and haomaturia is _______________________

Good: The disease in which acute glomerular, damage occurs following distant

infections is ___________________

 Think of the intended answer first before constructing the item.

 Missing words must be important ones. Avoid omitting trivial words to trick the
student. Only test for important facts and knowledge.

Poor: The ___ of the June 4, 1979 revolution in Ghana was Flt. Lt. J. J.

Rawlings.

 Specify the degree of precision and the units of expression required in computational

problems.

Poor: The value of 2.6  0.07 is ____________.

Good: The value of 2.6  0.07 correct to 3 decimal places is ____________.

 Aim at providing items that belong to the correct answer type and not the best answer type.

Poor: The best audio-visual material to use in the classroom is _____________.

Good: Radios and tape recorders are regarded as ______________ audio-visual aids.

 Keep all blanks the same length, and in a column to the right of the question.

 A direct question is generally more desirable than an incomplete statement.

 Allocate marks/scores fairly to each item where sub-items are used.

Example:

i. State the basic function of a retaining wall (6 marks)

ii. State 3 design principles of a retaining wall (7 marks)


UNIT 8
ESSAY-TYPE TESTS

Description:
An essay type test is a test that gives freedom to the respondent to compose his own response
using his own words. The tests consist of relatively few items but each item demands an
extended response.

Types Of Essay-Tests

1. The restricted response type limits the respondent to a specified length and scope of the
response. For example, 'In not more than 200 words discuss the causes of the 1948 riots.
2. The extended response type does not limit the student in the form and scope of the
answer. For example, Discuss the factors that led to the overthrow of the Dr. Kwame
Nkrumah’s government in Ghana in 1966.

Strengths and Advantages


1. They provide the respondent with freedom to organize his own ideas and respond within
unrestricted limits.
2. They are easy to prepare.
1. They eliminate guessing on the part of the respondents.
2. Skills such as the ability to organize material and ability to write and arrive at conclusions are
improved.
5. They encourage good study habits as respondents learn materials in wholes.
6. They are best suited for testing higher-order behaviours and mental processes
such as analysis, synthesis and evaluation
7. Little time is required to write the test Items.
8. They are practical for testing a small number of students.

Weaknesses and Disadvantages


1. They are difficult to score objectively.
2. They provide opportunities for bluffing where students write irrelevant and
unnecessary material.
3. Limited aspects of student’s knowledge are measured as students respond to few items only.
4. The items are an inadequate sample of subject content. Several content areas are omitted.
5. A premium is placed on writing. Students who write faster, all things being equal are
expected to score higher marks.
6. They are time-consuming to both the teacher who scores the responses and the student who
writes the responses.
7. They are susceptible to the halo effect where the scoring is influenced by extraneous factors
such as the relationship between scorer and respondent.
8. A critical reader as well as a competent scorer can only effectively score responses.
GUIDELINES IN CONSTRUCTING GOOD CLASSROOM ESSAY TESTS
1. Plan the test.
Give adequate time and thought to the preparation of the test items.

2. The items should be based on novel situations. Be original. Do not copy directly from
textbooks or colleagues/others’ past test items.

3. Test items should require the students to show adequate command of essential knowledge.
The items must be restricted to the measuring of higher mental processes such as
application, analysis, synthesis and evaluation.

Examples of items include:


(a) application:
You are in charge of a youth camp of 100 campers. Prepare a menu chart which
shows a balanced diet taking into consideration cost and nutritional value.
Here the student uses knowledge learnt in school to deal with a concrete situation.

(b) analysis:
A Form 1 student girl was severely and unfairly punished. Describe the feelings
such treatment aroused in her.

(c) synthesis:
You are the financial secretary of a society aimed at raising money to build a
fish pond in your community. Plan and describe a promotional campaign for
raising the money.

(d) evaluation:
Evaluate the function of the United Nations Organization as a promoter of
world peace.

4. The length of the response and the difficulty level of items should be adapted to
the maturity level of students (age and educational level).

5. Optional items should not be provided when content is relevant.


6. All items should be of equal difficulty if students are to select from a given number of items.

7. Prepare a scoring key (marking scheme) at the time the item is prepared.
8. Establish a framework and specify the limits of the problem so that the student
knows exactly what to do.

9. Present the student with a problem which is carefully worded so that only ONE
interpretation is possible. The questions/items must not be ambiguous or vague.

10. Indicate the value of the question and the time to be spent in answering it.
11. Structure the test item such that it will elicit the type of behaviour you really want to
measure.

12. The test items must be based on the instructional objectives for each content unit.

13. Give preference to a large number of items that require brief answers.
14. Statements and sub-questions for each item should be clearly related.
15. Avoid words such as: what, list, who, as much as possible in essay type test.

Commonly used words to start essay questions


1. Explain to make plain or clear; to make known in detail. To tell what an
activity/process is and how it works and why it works the way it works
2. Describe to tell or depict (a picture) in written words.
3. Analyze to determine elements or essential features; examine in detail to identify
causes, key factors, possible results.
4. Assess to estimate or judge the value, character etc of
5. Examine to inspect or scrutinize carefully; to inquire into or investigate
6. Discuss to consider or examine by argument, or comment; give points for and
against the content of the question
7. Evaluate to judge or determine the significance worth or quality of. Involves
discussion and making a judgment
8. Give an account of – to describe a process/activity, and giving reasons, causes,
effects etc

SCORING ESSAY TESTS

Essay tests can be scored by using the analytic scoring rubrics (also known as the point-score
method) or holistic scoring rubrics (also called global-quality scaling or rating method).

Analytic Scoring

In analytic scoring, the main elements of the answer are identified and points awarded to each
element. This works best on restricted response essays.

Holistic Scoring
In holistic scoring, major points are written.
Five (sometimes 4) levels of quality are described and marks awarded. Eg
A. Excellent 26 - 30
B. Very good 21 - 25
C. Good 16 - 20
D. Fairly good 11 - 15
E. Fail Below 11
Each response is read for a general impression of its adequacy as compared to the standard. The
general impression is then transformed into a numerical score.
• A: Excellent (26-30)
 Gives an introduction
 Discusses five reasons very well/in depth
 Very few grammatical errors/expression
 Gives conclusion
• B: Very Good (21-25)
 Gives an introduction
 Discusses five reasons but not too well or discusses four reasons very well
 Few grammatical errors/expression
 Gives conclusion

• C: Good (16-20)
 Gives an introduction
 Discusses five/four reasons but not in depth
OR discusses three reasons very well
 Many grammatical errors/expressions
 No conclusion
• D: Fairly Good (11-15)
 No introduction
 Discusses three reasons but not in depth
 Many grammatical errors/expressions
 No conclusion
• E: Fail (Below 11)
 No introduction
 Discusses one/two reasons but not in depth
 Many grammatical errors/expressions
 No conclusion

PRINCIPLES FOR SCORING ESSAY TESTS

1. Prepare a form of scoring guide, either an analytic scoring rubric or a holistic scoring rubric.
2. Score (mark) tests without knowing the one whose paper is being scored. This reduces the
halo effect. Different forms of identification could be used instead of names.
3. Grade the responses item by item and not script by script. Score all responses to each item
before going to the next item. This reduces the carryover effect. The carryover effect
occurs when the mark for a question is influenced by the performance on the previous
question.
4. Keep scores of previously graded items out of sight when evaluating the rest of the items.
5. Periodically rescore previously scored papers.
6. Before starting to score each set of items the script should be shuffled.
7. Score the essay test when you are physically sound, mentally alert and in an
environment with very little or no distraction.
8. Constantly follow the scoring guide as you score. This reduces the rater drift which is the
tendency to either not paying attention to the scoring guide over time or interpreting it
differently as time passes.
9. Score a particular question on all papers at one sitting. Break when fatigue sets in.
10. Arrange for an independent scoring of the responses or at least a sample of them where
grading decision is crucial.
11. Comments could be provided and errors corrected on the scripts for class tests to facilitate
learning.
12. Avoid being influenced by the first few papers read. These could make you either too harsh
or too lenient.
13. The mechanics of writing such as correct grammar usage, paragraphing, flow of
expression, quality of handwriting, orderly presentation of material and spelling
should be judged separately from the content.
UNIT 9

ASSEMBLING, ADMINISTERING AND APPRAISING ACHIEVEMENT TESTS

Guidelines for Assembling Achievement Tests

1. Review test items and assessment tasks.


 The item or task should not be excessively wordy.
 The point of the item or task as well as the desired response should be clear.
 A scoring rubric or scoring guide should be available.
 The item or task should be free from technical errors and irrelevant clues.
 The item or task should be free from racial, ethnic and gender bias.
2. Decide on the total number of items and the length of time to constitute the test.
3. Test items should be typed or written neatly.
4. Arranging test items.
 Items should be sequenced (especially objective-type tests) such that they appear in the
order of difficulty with the easiest ones placed first.
 Items should also be arranged in sections by item-type. The sections should progress from
easier formats to more difficult formats. Within each section, group items such that the
easier ones come first. For example, all true-false items should be grouped together, then
all matching items and so on.
 Items can also be arranged according to the order in which they are taught in class or the
order in which the content appeared in the textbook.
5. Provide directions to students.
6. Reproducing the test.

Guidelines in Administering Achievement Tests

1. Prepare students for the test. The following information is essential to students’ maximum
performance.
 When the test will be given (date and time).
 Under what conditions it will be given (timed or take-home, number of items, open
book or closed book, place of test).
 The content areas it will cover (study questions or a list of learning targets).
 Emphasis or weighting of content areas (value in points).
 The kinds of items on the test (objective-types or essay-type tests).
 How the assessment will be scored and graded.
 The importance of the results of the test.
2. Students must be made aware of the rules and regulations covering the conduct of the test.
Penalties for malpractice such as cheating should be clearly spelt out and clearly adhered to.
3. Avoid giving tests immediately before or after a long vacation, holidays or other important
events where all students are actively involved physically or psychologically/emotionally.
4. Avoid giving tests when students would normally be doing something pleasant e.g. having
lunch etc.
5. The sitting arrangement must allow enough space so that pupils will not copy each others
work.
6. Adequate ventilation and lighting is expected in the testing room.
7. Provision must be made for extra answer sheets and writing materials.
8. Pupils should start the test promptly and stop on time.
9. Announcements must be made about the time at regular intervals. Time left for the
completion of the test should be written on the board where practicable.
10. Invigilators are expected to stand a point where they could view all students. They should
once a while move among the pupils to check on malpractices. Such movements should not
disturb the pupils. He/she must be vigilant.
11. Invigilators should not be allowed to read novels, newspapers, grade papers or receive calls
on mobile phones.
12. Threatening behaviours should be avoided by the invigilators. Speeches like ‘If
you don't write fast, you will fail’ are threatening. Pupils should be made to feel at ease.
13. The testing environment should be free from distractions. Interruptions within and outside
the classroom should be reduced. It is helpful to hang a “Do not DISTURB – TESTING IN
PROGRESS” sign at the door.
14. Test anxiety should be minimized. .
15. Do not talk unnecessarily before letting students start working.
16. Avoid giving hints to students who ask about individual items. Where an item is ambiguous,
it should be clarified for the entire group.
17. Expect and prepare for emergencies. Emergencies might include shortages of answer
booklets, question papers, power outages, illness etc.

Appraising Achievement Tests (Item Analysis)


Item analysis is the process of collecting, summarizing, and using information from students’
responses to make decisions about each test item. It is designed to answer the following
questions:
1. Did the item function as intended?
2. Were the test items of appropriate difficulty?
3. Were the test items free of irrelevant clues and other defects?
4. Was each of the distracters effective (in multiple-choice items)?

Benefits of item analysis


1. It helps to determine whether an item functions as intended.
2. Item analysis provides opportunity for difficult items to be identified and discussed.
Misinformation and misunderstanding of distracters can be corrected.
3. Item analysis provides feedback to the teacher about pupil difficulties. It brings to light
general areas of weakness that require more attention.
4. Item analysis data provide a basis for the general improvement of classroom instruction.
5. Item analysis procedures provide a basis for increased skill in test construction.
6. It helps to create item banks for use in future tests.
Item Analysis

Amedehe and Asamoah-Gyimah (2003), Item analysis usually includes the following;

 Item difficulty
 Item discrimination
 Distracters analysis
 Item bias

Item difficulty

Amedehe and Asamoah-Gyimah (2003), item difficulty is the percentage of students who answer
correctly each test item. It is calculated by dividing number of students who answer correctly the
R
item by total number of examinees. Mathematically, P = where R = number of students
T
who answer correctly the item and T= total number of examinees.

The difficulty index ranges from 0 to 1

If it is 0, it means no student correctly answered the item

If it is 1, it means all students answer the item correctly.

This means that the smaller the difficulty index, the more difficulty the item and the greater the
difficulty index, the less difficult the item.

It should be noted that item difficulty is calculated for each item.

The table below shows the difficulty index of 30 items with 30 students.
Item difficulty

Question No. Total number of students who Proportion Correct (P =


answered the item correctly R/T)
(R)
1 17 0.57
2 25 0.83
3 23 0.77
4 3 0.1
5 16 0.53
6 21 0.7
7 18 0.6
8 8 0.27
9 3 0.1
10 12 0.4
11 20 0.67
12 25 0.83
13 5 0.2
14 16 0.53
15 18 0.6
16 25 0.83
17 15 0.5
18 30 1
19 12 0.4
20 29 0.967
21 25 0.83
22 9 0.3
23 10 0.33
24 21 0.7
25 18 0.6
26 19 0.6
27 12 0.4
28 22 0.73
29 18 0.91
30 17 0.49

Item discrimination

It is a measure of the degree to which an item diferentiates between


students with high and low performance (Tamakloe, Atta & Amedahe,
1996).The item discrimination is an indication of whether the item
diferentiates between the examinees with higher and lower knowledge in
the test. It shows whether the item is able to diferentiate examinees that
scored high in the entire examination with the performance of examinees
that performed poorly.

 Discrimination index ranges from -1 to +1.


 When everyone in the upper group answers the item right, while
everyone in the lower group answers it wrong, the discrimination of the
item becomes +1 which is a perfect highly positive item.
 However, when everyone in the upper group answers the item wrong
while everyone in the lower group answers it right, the discrimination
becomes -1, which indicates a perfect negative discriminating item. In
such a situation either the item is ambiguous or had been miskeyed
and must therefore be discarded or reviewed.
 Again an item is said to be non-discriminating when equal numbers of
both groups (upper and lower) answer it right, Discriminating index =
0.

Distracters Analysis
 The quality of the items depends partly on the effective functioning of the
distracters selected by the examinees.
 By inspection of how options were selected by examinees, a good distracter
should attract at least one examinee especially from the lower group.
 A good distracter must be plausible enough to attract the unknowledgeable
examines (Amedehe & Asamoah-Gyimah, 2003).
 The function of the distracters is to determine whether examinees really know the
correct answer to the item.
Examples:
1. Ideal

Options Upper Group Lower Group


A 0 2
B 2 4
C* 15 5
D 3 9

2. Ambiguous alternative
Options Upper Group Lower Group
A 1 4
B* 10 5
C 9 5
D 0 6
Options B and C seem equally attracted to the high achiever. Option C should be checked as
well as the test item for ambiguities.
3. Miskeyed Item
Options Upper Group Lower Group
A 13 7
B 6 6
C 0 3
D* 1 4
Majority of the upper group selected A. Option A might be the correct response and not D.

4. Poor distractor
Options Upper Group Lower Group
A 2 6
B* 12 6
C 0 0
D 6 8
Option C attracted no student. It is a poor distracter and has to be replaced.
UNIT 10
INTERPRETATION OF TEST SCORES

 Scores obtained in classroom quizzes, tests and examinations are known as raw scores.
 They give very little information about the performance or achievement of a student.
 For example if Ayisha obtained 48 in a test, it is difficult to know her level of
performance unless more information is provided.
 Such types of information include;
 Maximum score/best score
 mean or median score
 the variability of the group
 the difficulty level of the items
 the number of test questions and the amount of time allowed for the test.
 To interpret and obtain meaning from the scores, they need to be referenced or
transformed into other scores.

WAYS OF INTERPRETING TEST SCORES

There are two popular ways of interpreting test scores so the meaning can be derived from the
scores. These are:
1. Norm-referenced Interpretation
2. Criterion-referenced Interpretation

NORM-REFERENCED INTERPRETATION
 These describe test scores or performance in terms of a student’s position in a
reference group that has been assessed.
 In other words, it compares and individuals performance with others in the group who
have taken the same test.
 The reference group is called the norm group.
 In the earlier example, Ayisha’s score of 48 can be compared with the mean score for
the class.
 If the mean score is 40, then one could say that Ayisha’s performance was above the
mean/average.
 If the median score is 40, then one could also say that Ayisha’s performance could be
placed in the upper half of the class.
 The score of 40 can appropriately be called the norm and the class that provided the
mean or median of 40 is called the norm group.
Types of norm-referenced scores
The following are the most popular norm-referenced scores;

1. Class raw score ranks. Raw scores in a class are often ordered from the highest
score (1st position) to the lowest score (last position). The ranks tell about how a student
performs compared with the others in the group.
2. Percentile and percentile ranks. A percentile is a point in a distribution below
which a certain percentage of the scores fall while a percentile rank is a person’s relative
position such that a given percentage of scores fall below the score obtained. If a raw
score of 48 is the 60th percentile, it means that a student who obtains 48 in a test, has done
better than 60 percent of all those in the group that took the test.

3. Standard scores. These are either Z scores or T scores.

 Z scores is based on the normal distribution such that the mean is 0. Raw scores
that are transformed to Z-scores use the formula:
X− X̄
Z=
s , where X is the raw score, X̄ is the group mean and S, the group
standard deviation.
 Negative values show that performance is below average
 Positive values mean that performance is above average.

 T-scores are based on Z-scores and use the formula; T = 50 + 10Z. Scores above
50 show above average performance and scores below 50 show below average
performance.

4. Stanines (Standard Nine). These are derived scores based on the normal
distribution with a mean of 5 and standard deviation of 2. It uses the integers, 1 – 9. The
percentage of scores at each Stanine are: 9 (top 4%), 8 (next 7%), 7 (next 12%), 6 (17%)
5 (next 20%) 4 (next 17%) 3 (next 12%) 2 (next 7%) and 1 (lowest 4%) as shown in the
table below

Result Ranking 4% 7% 12% 17% 20% 17% 12% 7% 4%

Stanine(Grade) 1 2 3 4 5 6 7 8 9

Uses of norm-referenced interpretations

1. Selection decisions. In selecting students for awards and prizes, a norm-referenced


approach is often used. To award the prize for the best student, a ranking is done and the student
at the top position is awarded the prize.

2. Comparison decisions. Comparison of performance across subjects is made easy. For


example performance in Mathematics and English can be compared by using Z and T scores.
Norm referenced scores provide the information needed for the comparisons between two
classes. Mean or median scores would provide information as to which class performs better.
2. Achievement testing. Examination bodies such as the West African Examinations
Council used norm-referenced scores in interpreting the results of students in some
examinations such as the BECE.

4. Monitoring decisions. Norm-referenced scores are useful in monitoring the general


progress of individual students. A student who was at the 10 th percentile in Mathematics in the
first term but moved up to the 75th percentile in the third term has made much progress.

CRITERION-REFERENCED INTERPRETATION

 These describe test scores or performance in terms of the kinds of tasks a person with a
given score can do.
 The performance can be compared to a pre-established standard or criterion.
 For example a student may be able to solve 8 problems out of 10 concerning fractions.
A level of performance can be established at 6.
 The criterion or standard can be used as a competency/mastery score so that students who
have obtained scores that are greater than 6 are termed competent or have mastered skills
in a particular domain.
 Criterion-referenced interpretations generally indicate what an individual can or cannot
do with respect to a specified domain of knowledge attitudes or skill.

Types of criterion-referenced scores

1. Percent correct scores. This is the percentage of items that a student got correct. For
example if a student obtained 8 marks out of 10, the percent correct is 80.
2. Competency scores. These are cut-off scores set to match acceptable performance. Students
who obtained the cut-off scores are believed to have achieved a required level of
competency. Cut-off scores should not be arbitrarily set. There should be a support or basis
for them.
3. Quality ratings. This is the quality level at which a student performs a task. For
example, a student can be rated as A for outstanding, B+ for excellent etc.
4. Speed of performance scores. These indicate the amount of time a student uses to
complete a task or the number of tasks completed within a specified time. For example, a
student may type 30 words in a minute or an athlete may run 100 meters in 11.5 seconds.

Uses of criterion-referenced interpretations

1. Certification decisions. Certificates are needed in several areas of work to demonstrate


the acquisition of skills and knowledge. Criterion-referenced scores provide information
about whether an applicant has the required level of skill or not and certificates of
achievement attest to this.
2. Minimum competency decisions. Certain curricula are structured such that a student
needs to achieve a certain level of competency before moving on to a higher level.
Criterion-referenced scores are used to determine whether a student or a class can move
on to a higher level of study.
3. Diagnostic decisions. Criterion-referenced scores help the teacher to discover the
learning difficulties of the pupils. They help the teacher to diagnose and know which
topics or learning targets have not been grasped. It helps the teacher to provide
individual or class learning activities that will best adapt to students’ requirements and
thereby maximize their opportunities to attain chosen learning targets.
4. Placement decisions. Criterion-referenced scores provide information as to whether a
student can succeed in a programme or not. For example, to determine whether a person
can be a medical doctor or not, a test can be given such that performance on the test can
determine whether the individual has the pre-requisite skills to succeed in the medical
programme.
5. Programme evaluation. Criterion-referenced scores provide information about national
progress in education. The performance of students can indicate whether a particular
curricular is successful in its implementation or not. In Ghana, criterion-referenced
scores were used in the 1990s to assess the level of mathematics and English literacy.
Percentiles: They are points in a distribution below which a given percent, P, of the cases
(scores) lie.
 There are 99 percentiles that divide a distribution into 100 equal parts. Percentiles
are individual scores.
Notation: P30 = 60. Sixty (60) is the score below which 30% of the scores lie in a

specific group after the scores have been arranged sequentially. This means that a
student who obtains a score of 60 has done better than 30% of the members in the
group.
P75 = 50. Fifty (50) is the score below which 75% of the scores lie in a specific

group after the scores have been arranged sequentially. This means that a student
who obtains a score of 50 has done better than 75% of the members in the specific
group.
 A score in one group may be a different percentile in another group.
For example, in Mathematics Quiz 1, a student with a score of 15 may be at P90 in the Arts

class but the same score may put the student at P85 in the Home Economics class.

 P50 is the same as the median. P25 is the first quartile and P75 is the third

quartile.

Percentile Ranks: The percentage of cases falling below a given point on the measurement
scale. It is the position on a scale of 100 to which an individual score lies.

Notation: PR of 60 = 75. Seventy-five is the position for a score of 60 when the distribution
is divided into 100 parts. This means that a student who obtains a score of 60 has
75% of the scores falling below him/her in the group.

You might also like