Assesmaent in Maths

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

VALLEY VIEW UNIVERSITY

(COURSE OUTLINE)

COURSE CODE:
MATE 464

COUSE TITLE:
ASSESSMENT IN MATHEMATICS

COURSE OUTLINE

COURSE OUTLINE

LECTURER:
Isaac Owusu-Darko
[IOD]

2021
EDST 464: ASSESSMENT IN MATHEMATICS

The course is designed to assess the behaviours of students in terms of performance in


order to identify the strengths and weaknesses that may help in decision making
process. The assessment is based on the profile dimensions – knowledge and
understanding that are acquired through the receptive skills of listening and reading,
and the use of knowledge that are also acquired through productive skills of writing and
speaking. Both the formative and summative types of assessment will be covered. The
criterion-referenced testing procedure will be used in the area of class tests, class
assignment, homework, projects (practical and investigative study) more frequently
than the norm-referenced testing procedures. In the construction of the test, the test
purpose, content specification, test development, etc. may be covered.

2
COURSE TITLE: ASSESSMENT IN MATHEMATICS
COURSE CODE: MATE 464
LECTURER: ISAAC OWUSU-DARKO,
MPHIL MATHEMATICS (APPLIED MATHEMATICS- STATISTICS); M.ED (MATHEMATICS
EDUCATION); BED(MATHEMATICS EDUCATION), DIP. (BASIC EDUCATION) ‘A’-3YR POSTSEC

TEL: 0204228266/ O270388248/0243388248/0267631702


E-Mail: [email protected]

Email address: [email protected]

Course Description
The course is designed to assess the behaviours of students in terms of performance in order to
identify the strengths and weaknesses that may help in decision making process. The assessment
is based on the profile dimensions – knowledge and understanding that are acquired through the
receptive skills of listening and reading, and the use of knowledge that are also acquired through
productive skills of writing and speaking. Both the formative and summative types of assessment
will be covered. The criterion-referenced testing procedure will be used in the area of class tests,
class assignment, and homework, projects (practical and investigative study) more frequently than
the norm-referenced testing procedures. In the construction of the test, the test purpose, content
specification, test development, etc. may be covered

Course Objective:
By the end of the course, students will be able to apply the basic concepts of assessment
techniques which is essential practical classroom assessment procedure and for further studies in
Mathematics and its applications in classroom test planning, formal and informal assessment,
continuous assessment, formative and summative assessment procedures, analyses of test results
using educational statistics and Information Technology.
Course Requirements
• You are to revise your course content on measurement and evaluations as well as
general educational assessment course already introduced.
• Any assignment not submitted on the date specified will not be accepted
• Students should switch of their mobile phones or put into silence during lectures
• Every student should be present for any class lecture, test etc.
Evaluation
• Assignments ……………………………..10%
• Quizzes………………………………………10%
• Mid- Semester Examination ……. 20%
• End of Semester Examination …… 60%
Total 100%

Grading System
Grades will be assigned as follows
𝐴 = 80 − 100 𝐶+= 56 − 60
𝐴−= 75 − 79 𝐶 = 50 − 55
𝐵+= 70 − 74 𝐶−= 45 − 49

3
𝐵 = 65 − 69 𝐷 = 40 − 44
𝐵−= 61 − 64 𝐹 = 0 − 39

Calendar of Event
This will later be communicated to students

Class Contribution
Your contribution is essential component in the overall educational lecture and learning process.
Contribution takes place in many forms: asking informed questions in class, making intelligent
comments, reading the case and being prepared to discuss the issues, actively listening to your
peers and working with others. Please remember that quantity is no substitute for quality.
There will be ample opportunity to contribute to the class. The format of the in-class discussions
of cases may take a variety of forms including: group analysis of single case issues during class,
presentation of issues and leading discussions of the case issues and participating in group
discussions.

It is your responsibility to ensure that you take an active role in class. If this is a problem for
you, I urge you to talk to me to discuss ways that you can make a contribution. The grading
for the class contribution in each class is as follows:

Grading Scale:

A 80 – 100 C+ 60 – 56
A- 79 – 75 C 55 – 50
B+ 74 – 70 C- 49 – 45
B 69 – 65 D 44 – 40
B- 64 – 61 F 39 – 00

Dress Code
All students are expected to dress formally for classes. For gentlemen, a shirt, trousers (if
possible with a tie) and a shoe is acceptable. For ladies a top and skirt and a shoe is required.
Jeans and any form of “T” shirt as well as slippers and sandals of any kind is not an
acceptable dress for students undertaking this course. The dress code is intended to
inculcate into students the need to dress appropriately as pertaining in the business
environment.

How to Pass the Course


Ø Attend all lectures and review lecture notes thoroughly.
Ø Focus on solving problems, doing the calculations as well as mastering the theories,
footnotes, graphs deductive/formulas familiarization and solve more questions
involving calculations
Ø Re-work and Master all examples done in class

4
Ø Study (not Read) the Textbook
Ø Ask questions in class
Ø Chat with me after class or appointed times
Ø Form study groups

TIME TO ADDRESS STUDENTS ISSUES


Students who wish to see me for counselling and other guidance or assistances can do
so in the following time schedules:
Thursdays: 10:00am-2:00pm
Friday :10:am-2:pm

TERM PAPER/PROJECT

Students in their groups should present a Solution to one of the following questions in the
form of project

SEE ME FOR THE ASSIGNMENT ON THE 7TH LECTURE


ENCOUNTER
NOTE: this would take 10% score/marks of your semester grading.

TABLES OF CONTENTS (COURSE OUTLINE)

• Our lecture encounter would concentrate on the following course outline defined for your
course in the University bulletin:
WEEK 1
Concept definitions in assessment
WEEK 2
Types of assessment
WEEK 3
Profile dimension (lesson objectives as a form of assessment)
WEEK 4
Planning classroom assessment.
WEEK 5
Validity of assessment
WEEK 6
Reliability
WEEK 7
Planning classroom test
WEEK 8
Types of Test [multiple, true/false, matching, fill-in, essay typed-tests]

WEEK 9
Interpretations of test scores [measurement of central tendencies]

5
WEEK 10
Variability in test scores
WEEK 11
Relative position of students in test evaluation
WEEK 12
The standard normal distribution curve and performance
interpretation)-skewness, kurtosis
WEEK 13
Performance interpretation)-skewness, kurtosis
WEEK 14
Marking scheme interpretation in mathematics assessment
WEEK 15
Revision and examinations
WEEK 16
Examinations

REFRENCES

1. Estey K. A. Measurement and evaluation, UCC

2. Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA:
Brooks/Cole.

3. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.


Psychometrika, 16,
297-334.

4. Gulliksen, H. (1950). Theory of mental tests. New York:Wiley.

5. Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test
reliability.
Psychometrika, 2, 151-160.

6. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill

6
CHAPTER ONE

ASSESSMENTS
Assessment of student learning requires the use of a number of techniques for measuring
achievement. This is done through a systematic process that plays a significant role in effective
teaching. It begins with the identification of learning goals and ends with a judgment concerning
how well those goals have been attained.

Thus for Linn and Gronlund (2000, 31-32) assessment is:


“A general term that includes the full range of procedures used to gain information about
student learning (observations, ratings of performances or projects, paper and-pencil tests) and
the formation of value judgments concerning learning progress….”

For Savage & Armstrong (1987): “Assessment includes objective data from measurement …
(and) from other types of information, some of which are subjective (anecdotal records and
teacher observations and ratings of student performance). In addition, assessment also includes
arriving at value judgments made on the basis of subjective information.”

In each of the definitions above, a process is outlined. It is clear that some sort of instrument /
technique must be administered / used in order to obtain data /information. This data /information
can then be used to judge the level of understanding or standard of student performance in relation
to knowledge, skills, attitude and pattern of behaviour.

Assessment in Mathematics

Assessment is the systematic collection, review, and use of information about educational
programs undertaken for the purposes of improving learning and development. It is the process
of gathering and discussing information from multiple and diverse sources in order to develop a
deep understanding of what students know, understand, and can do with their knowledge as a
result of their educational experiences.

Assessment in education generally refers to the process for obtaining information that is used for
making decisions about students’ curricula programmes and educational policy.

Assessments are designed to help schools, parents, districts and students to determine the level of
proficiency of student understanding of mathematics learning standards.

Continuous Assessment
Continuous assessment can be defined as the daily process by which one gathers information
about students’ progress in achieving instructional objectives.

According to Ogunniyi (1984) continuous assessment is a formative evaluation procedure


concerned with finding out in a systematic manner the overall gains that a student has made in
terms of knowledge, attitudes and skills after a given set of learning experiences. The
assessment is continuous because.
(a) It occurs at various times as a part of instruction. (b) It may occur following a lesson.

From the definition, continuous Assessment implies that a student’s final grade in instructional
programmes is a cumulative total of his performance on planned learning activities given during
the course. The main purpose of continuous assessment is to help every student become a
successful learner as well as remedy the shortcomings of the traditional “one-short” examination.

7
Characteristics of Continuous Assessment

Continuous assessment as a method of evaluation should have the following characteristics or


features. It must be;

(a) Systematic
It must have an operational plan; this must be done before the programmes starts: these include;
Measurement to be made or what is to be measured.
Determine tools or instrument to be used.
When and how assessment or measurement will be done (periods)
Taking and filling of records of organized information.

(b) Comprehensive (Detailed)


It covers all aspects of human behaviour. This includes wide areas of skills, knowledge, attitudes
and processes learned.
It combines all scores obtained. In class assignments, home works, projects, class tests etc.
It uses many types of evaluative tools, processes or methods.
It looks at the different outcomes in the cognitive, affective and psychomotor domains.

(c) Cumulative
Take accounts of learners’ achievement or performance over a period of time.
Decisions made at any point in time consider any other previous decisions made on the learners
or pupils and represent many items put together.

(d) Guidance-Oriented
It points out or reveals areas of weakness and strength from time to time to allow redirection and
motivation of pupils and information obtained about pupils is used to guide pupils for further
growth and development.

(e) Formative
It uses measurement to diagnose pupils’ problems and help him/her overcome the problem or
master task at hand. This results in pupils becoming adjusted to new forms and shapes.

Strengths of Continuous Assessment


1. It provides a more representative sampling of a student’s performance both across time
and across task and
2. it is fairer to the student because he/she has more than one chance to exhibit a behaviour
being used.
3. The student is forced to make a continuous effort throughout the instructional programme
because he knows that each assessment counts towards the final grade.
4. By making assessment part of the learning process, continuous assessment recognizes
that education is a process not an event. It thus enables students to learn from the
evaluation etc. and change. It is therefore oriented towards development of the
individual.
5. It provides continuous feedback to the student and the teacher for corrective or remedial
action to be taken towards the improvement of performance.
6. It helps to minimize the tensions, fears, anxieties; etc. associated with work (one short)
examinations and so reduces occurrences of mal-practices.
7. It facilitates the collection and keeping of up to date information on pupils, thus provide
adequate data for guidance and counseling.

8
8. It makes possible the measurement of all educational outcomes especially those
cognitive, psychomotor, affective abilities that can only be measured over a reasonably
long period of time or are not measured at all under examination conditions.

Weaknesses of Continuous Assessment

1. Continuous assessment requires a great measurement of students’ performance. This means


more work for the teacher.

2. It is costly in terms of material time and energy.

3. The high population of pupils in a class and the high number of teaching period per teacher
are likely to have adverse influence on the teacher’s attitude work.

4. The candidate is anonymous under the system of external examination and external examiners,
so that in theory the examiner has no way of favoring or victimizing enemies. In continuous
assessment, the teacher knows the student well and there is the possibility of a student-tutor
relationship influencing tutors’ assessment, the teacher knows the student well and there is
possibility of a student-tutor relationship influencing tutors’ assessment. This possibility can put
the reliability of continuous assessment works in doubt.

5. The same score awarded by a teacher or teachers from different schools may not mean the
same level of performance. There is the possibility of schools and teachers trying to impress the
public by giving easy test or inflating scores. This leads to lowering of academic standards.

6. In continuous assessment the face of the student is determined to some extent by individual
classroom teachers. Each teacher designs his own assessment. Standard will vary since each
teacher’s assessment is therefore bound to generate fears about lack of uniformity and fairness in
assessing students.

7. Many tutors in Ghana lack the skill in constructing classroom test thus a poorly constructed
classroom test will yield bias information.

Nature of Assessment

Assessment of student learning requires that the classroom teacher review the nature of
assessment in order to effectively link teaching, learning and assessment.
Here are seven principles which emphasize the importance of assessment:– The Nature of
Assessment.

The classroom teacher must know:

1. How to assess:
Teachers must select from among all the techniques (methods) at their disposal.
Thus they must decide whether to use oral method or written techniques in assessing students.

2. What to assess:
Teachers must be aware and decide what they are looking for in the individuals involved in the
learning process. Thus teachers must identify what exactly they want to assess in their students.
• Achievement (the extent to which students grasp content taught)

9
• Performance (how fast students can work out a given task)

3. When to assess:
Teachers must establish the purpose for assessment to be administered.
• Before instruction • During instruction • After Instruction

4. What instruments to use:


Teachers must be knowledgeable of the variety of methods available to assess students’
performance and patterns of behavior. They must decide on the type of instrument
• Standardized Tests • Teacher-made Tests • Observation schedules • Questionnaires
• Inventories, to use.

5. The developmental level of the students: Teachers must use their knowledge of learning
theories to plan appropriate assessment corresponding to students’ level of development, as well
as individual differences. Thus they must consider • Chronological, • Mental, • Physical, and
• Emotional, state of students before coming out with the assessment tasks.

6. How to interpret results: Teachers must consider the purpose and consequence of
assessment to facilitate the method of interpreting scores.

7. Provide feedback: Teachers must share strengths and weaknesses with the stakeholders of
education. Thus • Students, • Parents, • Administrators and • Policy makers, must be abreast
with the overall outcome of educational assessment in order to make an informed decision
affecting various stakeholders.

General Principles of Assessments

1. Specify what is to be observed


2. Select an assessment procedure which is most relevant to the characteristics or
performance to be assessed.
3. Use a variety of procedures to obtain comprehensive information on the students
4. Be aware of the limitation of the assessment procedure you may use.
5. Assessment is a means to an end and not an end in itself

The Assessment is based on the Profile Dimensions

The myriad of educational outcomes has been classified to make mathematical assessment easy
in identifying the most important goals and objectives to consider when teaching specific subject
matter.

Taxonomies of Educational Objective


Taxonomies are hierarchical schemes for classifying learning objective into various level of
complexity. Taxonomy can help bring to mind the wide range of important learning objectives
and thinking skills to avoid narrowly focusing on some lower level objectives only. It assists
teachers to focus on all areas of the child’s learning behaviours. Taxonomies of learning outcomes
are highly organized schemes for classifying educational outcomes into various levels of
complexities, namely
1. cognitive domain
2. affective domain
3. Psychomotor domain

10
The Cognitive Domain:

Generally the cognitive domain refers to educational outcomes that focus on knowledge and
abilities requiring memory, thinking and reasoning processes (Nitko 2001). In other words, the
cognitive domain deals with all mental processes including perception, memory and information
processing by which individual acquire knowledge, solves problems and plan for the future.

Blooms taxonomy

Bloom, Engelhart, Furst, Hill and Krathwohl developed this taxonomy in 1956. It is generally
known as Blooom’s taxonomy. It is a comprehensive outline of a range of cognitive abilities that
might be taught in a course. The taxonomy can be described as general instructional outcomes
and classifies cognitive performance into six major categories arranged from simplex to complex.
These are explained each major learning outcome of the classification and give examples to
illustrate them.

Knowledge

Knowledge refers to facts and tested and accepted explanations (theories) thus.

Knowledge in the cognitive domain involves the recall of facts, principles and procedures among
others. As Bloom and others define it, knowledge refers to recall of knowledge of specifics and
knowledge of ways and means of dealing with specifics. Knowledge of specifics includes
knowledge of terminology and knowledge of specifics facts. For instance we can talk about
knowledge of dates and events.

The knowledge of ways and means of dealing with specifics embraces knowledge of conventions,
classifications, criteria and methodology among others. Thus, we can talk of knowledge of the
criteria by which facts, principles and conduct are tested or judged. Thus, as a teacher if you ask
yourself whether your pupils can recall the main characters of the short story you told them or
whether they can recall the procedures in solving a problem, then you are within the realm of
knowledge in the Bloom’s taxonomy. However, for measurement purposes, the recall situation
involves little more than bringing to mind the appropriate material. Some of the action verbs that
can be used to state knowledge outcomes in specific terms include recall, identify and list.

Comprehension

Comprehension refers to a type of understanding that indicates that the individual knows what is
being communicated and can make use of the material or idea being communicated without
necessarily relating it to other materials or ideas. Comprehension is a bit more complex than
knowledge. One can recall a piece of information without necessarily understanding it. The
achievement of comprehension is evidenced by being able to carefully and accurately translate,
interpret and determine implication of what is communicated by the recipient.

An implication of comprehension is that, one can say what is understood in a different way
accurately. Examples of action verbs that can be used to specifically indicate comprehension
include explain, give and find.

Application:

Application is more complex than comprehension. It involves the use of abstractions in


particular concrete situations. The process of concept formation is termed ‘abstraction.
Abstraction is becoming aware of the similarities among our experience, which we recognize
on future occasion. What this means is the use of ideas, procedures or generalized methods to

11
solve new or novel problems. Thus, at this level of complexity, you do not only know and
understand but also able to apply the knowledge and understanding to solve relevant problems.

It is necessary that, in educating our students, we emphasize this learning outcome of cognitive
domain. The emphasis should not be on memorizing fact and figures and recalling them but on
making use of the knowledge and understanding achieved to solve new mathematical problems.

Analyses

An analysis is a higher level of cognitive ability. It involves identification of parts of an


examination or thing in a breakdown of a communication into its component part with the view
to making the relationship between the parts clear. Analyses may include the organization, the
systematic arrangement and structure that hold the communication together. An example of
analyses is asking your students to show prove of mathematical conceptual deduction or perhaps
asking your students to solve mathematics problem related questions in an essay form.

Example: Abi can do a piece of work in 6 days while Joe can do the same piece of work in 10
day. How many days can the two take to do the piece of work? This question requires high level
of thinking. So here, students are expected to display analytical thinking ability.

Syntheses

It simply concern with putting together elements and part so as to form a whole. It involves the
process of working with pieces, parts, elements, etc. and arrangement and combining them in
such a way as to constitute a pattern or structure not clearly there before.

Example: Asking students to show similarities among two mathematical phenomenon such as
comparing and contrasting square and rhombus etc. We note that analyses and syntheses concerns
part of a whole. While in analyses, the whole is broken into its components part, in synthesis, the
elements or parts are put together.

Example 2. Suppose we are asked to obtain the equation whose are 𝑥3 = 3 and 𝑥4 = −2.
Here we need to put the various ‘part’ of the roots together to get the equation.

Evaluation:

By evaluation we refer to judgment about the value of materials, methods and things for their
effectiveness. Judgment may be about the extent to which materials satisfy specific criteria or
standard. It is the most complex cognitive area in blooms taxonomy of educational outcomes.

Quellmalz’s Taxonomy

Quellmalze, just like Bloom et al, classified the cognitive taxonomy also into recall, analyses,
comparison, inference and evaluation.

Recall: This refers to recognizing or remembering key facts, definitions concepts and rules and
principles. Blooms taxonomy levels of knowledge and comprehension are subsumed in
Quellmalz’s category of recall.

Analysis: in Quelmalze classification is the same as in Bloom’s taxonomy. It involves dividing a


whole into component parts.

Comparison: It is defined as the ability to recognize or explain similarities and differences.

Inferences: involve both deductive and inductive reasoning.

12
In deductive reasoning, we operate from a generalization to specific. It is the method in which
the law is accepted and applied to a number of specific examples [deduction]. Student does not
discover the law but develop skills in applying the same, proceed from general to specific or
abstract to concrete.

Consider the following logical reasoning

𝑄: All human are mortal

𝑃: Kofi is a human

{𝑃 → 𝑄}: Kofi is mortal

Inductive reasoning is the opposite of deductive reasoning which operate from specific to
general. Inductive is that form of reasoning in which a general law is derived from a study of
particular object or specific process. Students use measurement, manipulators or contractive
activities and patterns, etc. to discover a relation. They later formulate a law or rule about the
relationship based on their observations, experiences, inferences and conclusion.

Example of this application is found from mathematical induction and other contrapositive
proves.

Example: 44 × 4= = 44>= = 4?

34 × 3= × 34 = 34>=>4 = 3@

𝑥 = × 𝑥3A = 𝑥3A>= = 𝑥3=

Therefore 𝑎C × 𝑎D × 𝑎E = 𝑎C>D>E

Evaluation: this category of learning outcome is concerned with judging quality, credibility,
worth or practicality. It is related to Bloom’s levels of syntheses and evaluation.

The Affective Domain

The affective domain is concerned with educational outcomes that focus on feelings, attitudes,
disposition and emotional states. In other words, the affective domain describes our feelings, likes
and dislikes and our experience as well as the resulting behaviours (reactions).

Krathwohl et al. identified five main categories of outcomes in the affective domain. These are
Receiving, Responding, Valuing, Organizing and Characterization

• Receiving: refers to attending to something. It represents the lowest level of learning in


affective domain. It awareness, willingness to receive, being willing to tolerate a given
stimulus. Action verbs concerning this include identify, choose, select, and describe.

• Responding: this category refers to active participation on the part of the individual. At
this level, the individual does not only attend a particular phenomenon stimulus but also
react to it in some way. Learning outcome involves obedience or compliance, willingness
to respond and satisfaction. E.g. when a student voluntarily read beyond what is assigned
or solves some given exercises more the instructed.

13
• Valuing: it concerns with the worth or value an individual attach to a particular object or
behaviour. It is based on internalization of set of specific values. It embraces acceptance
of values, commitment and appreciations.

• Organizing: it refers to bringing together different values, resolving conflicts between


them, and building of an internally consistent value system.
Organization takes two forms: conceptualization (involving comparing, relating and
synthesizing values), and organization of value systems (bringing together complex of
value into an ordered relationship with one another). E.g. students developing a career
plan to be a mathematician- mathematics teacher / lecturer.

• Characterization: characterization by a value complex is the last of the levels in


Krathwohl and others taxonomy of affective domain. At this level the individual has a
value system that has controlled his / her behaviour for a sufficiently long time for him /
her to develop a characteristic life style. Thus the individual becomes passive, consistent
and predictable.

The Psychomotor Domain

The psychomotor domain refers to educational outcomes that focus on motor (movement) skills
and perceptual processes. Motor skills relate to movement whilst perceptual processes are
concerned with interpretation of stimuli from various modalities providing data for the learner to
make adjustment to his environment.

Harrow’s taxonomy of psychomotor and perceptual objectives has six levels including: reflex
movement, basic-fundamental movements, perceptual abilities, physical abilities, skilled
movements and non-discursive communication.

Reflex movement: Reflex movements are movement elicited without conscious volition on the
part of individual in respond to some stimuli. Examples of such movement include extension,
stretching and postural adjustment. The sub-categories of reflex movement according to Harrow
(1972) are segmental reflexes, inter-segmental reflexes and super-segmental reflexes.

Basic fundamental movements: This category is concern with inherent movement patterns that
are formed from a combination of reflex movements and are the basis for complex skilled
movement. E.g. Walking, running, jumping, bending, pulling. Sub categories of this level include
locomotor movement, non-locomotor movement, and manipulative movement.

Perceptual abilities: it refers to interpretation of stimuli from various modalities providing


information for an individual to make adjustment to his/her environment. There are five sub-
categories of this level: kinesthetic discrimination, visual discrimination, auditory discrimination,
tactile (touching) discrimination, and coordinated abilities (e.g. walking within 100m, jumping a
rope)

Physical abilities: physical abilities involve functional characteristics of organic vigor which are
essential to the development of highly skilled movement. The category entails endurance strength,
flexibility and agility, E.g include distance running, distance covered or measured, weight lifting,
wrestling and typing, etc.

Skilled movements: it refers to complex movement tasks with degree of efficiency based on
inherent movement patterns. It builds up locomotor and manipulative movements. Three sub-
categories include adaptive skills, compound adaptive skills and complex adaptive skills.

Non-Discursive communication: This last category refers to communication through bodily


movements raging from facial expression through sophisticated choreographies. This category

14
has two levels - expressive movement and interpretative movement. Body posture, gestures,
facial expression, skilled dance movement are included in this category.

Purposes of Assessment.

As a society and educators, we assess both performance and competence in education in a


variety of ways and for variety of purpose. Broadly speaking the purposes are

Ø Serving instruction
Ø Accountability
Ø Selection
Ø Licensure

Assessing students’ performance in order to inform instruction is something that all teachers
do. It is often the case that an external agency of some sort gets involved in the assessment,
normally to serve instruction. The time lapse between the administration of the tests and the
reporting of ‘scores’ to teachers who might be able to use the information is such that there
is little reason to assume that any such testing by an external agency has much to contribute
to assessment for instruction.

Assessment for the purpose of saying how well a student, or a class, or a school, or an
instructional program is doing is the primary purpose of assessment for accountability.
Traditionally, such information has being presented in one of two quite different forms; norm-
referenced and criterion-referenced. Norm referenced accountability statements involve
comparing students’ performance (or classes or schools) to another one and then presenting
the results of those comparisons in rank order. It should be noted that this can only be done
if the performance of the students can be encoded in a one- dimensional measure. Criterion-
referenced accountability statements involve comparing involve comparing students’
performance (or classes or schools) to some predetermined set of performance criteria without
regard to how they compare to one another. It should be noted that this can only be done if
one has a clearly defined set of performance criteria that reflect one’s theory of competence
in the domain being assessed.

Assessing for selection is normally done for the purpose of helping to ascertain whether a
student will have access to limited resources. Such assessment is often employed in order to
inform decisions about access to select universities, polytechnics, colleges of educations,
program for gifted music students, special education programs, etc.

Assessing for the purpose of licensure is normally done in order to ascertain whether the
people being assessed have exceeded some threshold of minimal competence and are thus
permitted to practice in an unsupervised fashion the skill that they have demonstrated. Such
skills include driving automobiles, swimming in the deep part of the pool, barbering,
butchering, working as an electrician or plumber, etc.

Other Purposes of Assessment:


Other purposes of assessment can be outlined as follows:.
Judging pupils’ mastery of skill and knowledge;

i. Evaluating the instructional method;


ii. Ascertaining effectiveness of curriculum;
iii. Encouraging good study habits;
iv. Measuring growth;
v. Ranking pupils;
vi. Diagnosing difficulties;
vii. Providing feedback;

15
viii. Motivating students;
ix. Reporting to stakeholders;
x. Certifying examinees.

Mehrens & Lehmann (1984, 7–12) conclude that the main purpose of assessment, therefore, is
to make EDUCATIONAL DECISIONS.
These include the following:

Ø Instructional decisions (teacher & students)


Ø Guidance decisions
Ø Administrative decisions
Ø Research decisions

Generally, we want to find out about our students in order to make decisions related to:
• Placement • Selection • Aptitude • Achievement • Classification • Guidance
• Promotion

Placement: (entry behaviour)


“Have the students already achieved the intended outcomes?”
Formative: (during instruction)
“Which learning tasks are students handling satisfactorily?
“Which learning tasks do students need help with?”

Diagnostic:(during instruction) Summative: (end of instruction)


“Which students need remedial What grade should I assign to each student?”
“Is the method I am using effective?”
work?

16
CHAPTER TWO

MEASUREMENT

Measurement refers to the procedure for assigning numbers or scores to a specific attribute or
characteristics of a person in such a way that the numbers describe the degree to which the person
possesses the attributes. It is the process of assigning numbers or numerical index to an attribute
or a trait possessed by a person or a learner or an event or a set of objects or whatever quality that
is being assessed according to specific rules. The purpose is to indicate the differences among
those who are being assessed in the degree to which they possess the characteristics being
measured. Thus the essence of measurement is to find the number of attribute possessed by the
people or objects.

Educational measurement is about assigning numbers to such attributes as achievement,


performance, aptitude. It is limited to the quantitative description of learners.

Measurement involves three main steps.


1. Identifying and providing a clear definition of the attributes/traits to be measured;
2. Determining the set of procedures by which the attribute is to manifested;
3. Establishing a set of rules for quantifying the attribute.

Scale of measurement
Depending upon the traits / attributes, characteristics and the way they are measured, different
kinds of data result representing different scales of measurement.
Thus, measurement implies the use of scales. Four measurement scales exist as nominal, ordinal,
interval and ratio.

1.Nominal
Nominal is hardly measurement. It refers to quality more than quantity. A nominal level of
measurement is simply a matter of distinguishing by name, e.g., 1 = male, 2 = female. Even
though we are using the numbers 1 and 2, they do not denote quantity. The binary category of 0
and 1 used for computers is a nominal level of measurement. They are categories or
classifications. The categories are established by the researcher and an item is counted when it
falls into this category.
The most significant point about nominal scales is that they do not imply any ordering among
the response.

For example, when classifying people according to their favorite color, there is no sense in
which green is placed “ahead of” blue. A nominal level of measurement is the least precise form
of measurement.

Examples:
1. Meal Preference: Breakfast, Lunch, Dinner
2. Religious Preference: 1 = Buddhist, 2 = Muslim, 3 = Christian, 4 = Jewish, 5 = Other
3. Political Orientation: NDC, NPP, PNC, PPP, CPP, GFP.
4. Number of males or females.
5. Number of individuals who fall under the category of introvert or extrovert
6. Height – the number of tall, medium or short people in a group
7. Counting the number of participants who did or did not experience anxiety

17
2. Ordinal refers to “order” in measurement. An ordinal scale indicates direction, in addition to
providing nominal information. Low / Medium / High; or Faster/Slower are examples of ordinal
levels of measurement. Ranking an experience as a "nine" on a scale of 1 to 10 tells us that it
was higher than an experience ranked as a "six." Many psychological scales or inventories are at
the ordinal level of measurement.

Unlike nominal levels of measurement, ordinal measurement allows comparisons of the degree
to which two subjects possess the dependent variable.

For example, placing feelings as being “very unsatisfied” “satisfied”, or “very satisfied”
makes it meaningful to assert that one person is more satisfied than another with the with the way
the country Ghana is managed. Such an assertion reflects the first person’s use of a verbal label
that comes later in the list than the label chosen by the second person. However, ordinal data fail
to capture the precise difference between the data. In particular, it cannot be assumed that
differences between two levels of ordinal data are the same as the differences between two other
levels. For instance, it cannot be assumed that the difference between “very unsatisfied” and
“satisfied” is the same as the difference between “satisfied” and “very satisfied.

In the same way, it cannot be assumed that if rank a group of people from tallest to shortest that
the difference between the tallest person in the group and second tallest person in the group is the
same amount of difference between the 4th and 5th tallest people in the group. In other words,
ordinal level data lacks a degree of specific information.

Examples:
a. Rank: 1st place, 2nd place, ... last place
b. Level of Agreement: No, Maybe, Yes
d. Rating of Attractiveness on a scale of 1 to 10
e. Race Results – which racers came in 1st, 2nd, 3rd, etc. (actual times or intervals may be
widely different)
f. Height: Group of people in order from Shortest to Tallest
t Dawn Morning Noon Afternoon Evening

3. Interval scales provide information about order, and also possess equal intervals. From the
previous example, if we knew that the distance between 1 and 2 was the same as that between 7
and 8 on our 10-point rating scale, then we would have an interval scale.

An example of an interval scale is temperature, either measured on a Fahrenheit or Celsius scale.


A degree represents the same underlying amount of heat, regardless of where it occurs on the
scale. Measured in Fahrenheit units, the difference between a temperature of 46 and 42 is the
same as the difference between 72 and 68.

Equal-interval scales of measurement can be devised for opinions and attitudes. However,
constructing them involves an understanding of mathematical and statistical principles beyond

18
those covered in this course. But it is important to understand the different levels of
measurement when using and interpreting scales.

Examples:
a. Time of Day on a 12-hour clock
b. Political Orientation: Score on standardized scale of political orientation. Thus the vote of
the poor and rich has the same magnitude.
c. Other scales constructed so as to possess equal intervals
d. Height of a person(s) in centimeters or Inches
Interval – example is time of day - equal intervals; analog (12-hr.) clock, difference between 1
and 2 pm is same as difference between 11 and 12 am

4.Ratio
The ratio scale of measurement is the most informative level of measurement. It really just an
“interval” measurement with the additional property that its zero position indicates the absence
of the quantity being measured. You can think of a ratio scale as the three earlier scales rolled up
in one. Like a nominal scale, it provides a name or category for each object (the numbers serve
as labels). Like an ordinal scale, the objects are ordered (in terms of the ordering of the numbers).
Like an interval scale, the same difference at two places on the scale has the same meaning. And
in addition, the same ratio at two places on the scale also carries the same meaning. In other
words, In addition to possessing the qualities of nominal, ordinal, and interval scales, a ratio scale
has an absolute zero (a point where none of the quality being measured exists).

Using a ratio scale permits comparisons such as being twice as high, or one-half as much.
Reaction time (how long it takes to respond to a signal of some sort) uses a
ratio scale of measurement -- time.

Although an individual's reaction time is always greater than zero, we conceptualize a zero point
in time, and can state that a response of 24 milliseconds is twice as fast as a response time of 48
milliseconds. Example in what do you know?

In memory experiments, the dependent variable is often the number of items correctly recalled.
What scale of measurement is this? You could reasonably argue that it is a ratio scale. First, there
is a true zero point: some subjects may get no items correct at all. Moreover, a difference of one
represents a difference of one item recalled across the entire scale. It is certainly valid to say that
someone who recalled 12 items recalled twice as many items as someone who recalled only 6
items. However, these words must be roughly the same level of difficulty.

Other Examples
a. Ruler: inches or centimeters
b. Years of work experience
c. Income: Money earned last year
d. Memory – number of correctly remembered items from a list of words (if equal difficulty)

19
e. GPA: Grade point average
f. Number of children a couple has
Ratio – A 24-hr. time format has an absolute 0 (midnight); 14 o'clock is twice as long from
midnight as 7 o'clock
ADDITIONAL NOTES
• The level of measurement for a particular variable is defined by the highest category that it
achieves.

For example, categorizing someone as extroverted (outgoing) or introverted (shy) is nominal.

If we categorize people 1 = shy, 2 = neither shy nor outgoing, 3 = outgoing, then we have an
ordinal level of measurement. If we use a standardized measure of shyness (and there are such
inventories), we would probably assume the shyness variable meets the standards of an interval
level of measurement.

• As to whether or not we might have a ratio scale of shyness, although we might be able to
measure zero shyness, it would be difficult to devise a scale where we would be comfortable
talking about someone's being 3 times as shy as someone else.

• Measurement at the interval or ratio level is desirable because we can use the more powerful
statistical procedures available for Means and Standard Deviations. To have this advantage, often
ordinal data are treated as though they were interval; for example, subjective ratings scales
(1 = poor, 2 = fair, 3 = good, 4 = excellent). The scale probably does not meet the requirement of
equal intervals -- we don't know that the difference between 1 (poor) and 2 (fair) is the same as
the difference between 3(good) and 4 (excellent).

• In order to take advantage of more powerful statistical techniques, researchers often assume
that the intervals are equal.

20
CHAPTER THREE

TEST
Formal and Informal Assessment
There are two major approaches to assessment. These are the Formal and Informal
assessment.
Formal Assessment: This is also known as test technique (pencil and paper test). As described
by Passing, it is that type of test done at the end of a lesson, topic unit, school term or year, etc.
It is planned or structured or well designed. In formal assessment, the design may be objective
test or essay type test. It is quantitative in nature.

A test is a task or series of tasks, which are used to measure specific attributes or traits of people
in educational setting. Tests are classified in various ways using criteria like purposes, uses and
nature. Some of the common ones are diagnostic, aptitude, intelligence and achievement tests.

Achievement tests essentially measure knowledge obtained from formal learning situations. It
measures the degree of students learning in specific curricula areas for which he had received
instructions. They focus on more concretes objectives in the measuring of ability. Achievement
tests therefore measure preciously acquired knowledge.
Achievement tests can be classified into two as:
• Teacher-made achievement tests e.g. Objective and essay tests.
• Standardized achievement test (SAT)

The major difference between standardized and teacher-made achievement test is that
standardized tests are carefully constructed by test experts, administrated and scored under
specific uniform conditions. In addition, the scores are interpreted in terms of established norms
stated in the test manual while teacher-made tests are not necessarily so. Teacher-made tests may
be confined to some specific content covered.

Standardized Tests
v These are tests carefully constructed by test experts administered and scored
under specified uniform conditions. In addition the scores are interpreted in
terms of established norms stated in the test Manual. In administering the test
the test administrator must adhere strictly to the instructions. Any deviations or
violations render the results useless. The test usually has its validity clearly
stated. The norms and their interpretations are very often indicated. The results
are usually expressed in grade equivalent, percentile ranks and standard scores.
Examples of standardized Achievement tests are Sanford Achievement test (SAT), California
Achievement test (CAT) and the Comprehensive Test of Basic Skills (CTBS). They tend to be
more or less commercial.

Purpose of Standardized Test


The main purpose of standardized test is to compare a child’s performance to a normative group.
This group possesses the entire characteristic that a child has. The test gives information relating
to the extent to which the child deviates from the norm.

Characteristics of Standardized Tests


Ø They are well structured.
Ø The test Manual specifically spells out how the test should be administered, scored and
interpreted.
Ø Professionally competent personnel like psychologists and special educators are required
to use them.

21
Strengths of standardized Tests
§ The inherent validity and reliability makes the results genuine.
§ The information they give truly represent the trait measured hence permitting decisions
that are well informed.
§ Professionals are enabled to make decisions related to eligibility and placement.
§ The tests have no room for subjective tendencies. This means that the assessor or
evaluator cannot depend on his / her wits to interpret results.
§ The test protocol specifically describes how the tests should be administered, scored and
interpreted.
§ The outcomes are objective. Professionals can easily compare an individual to normative
group. The extent to which a child is deviating can be known.
§ Standardized tests are very often used as screening devices to sort out an individual who
deviate from the norm group. This helps professions to categorize pupils.

Weaknesses of Standardized Tests


The major problems found with standardized tests are that
• the tests do not give sufficient information, which can be used to plan individualized
instruction. Standardized test more often than not sample general rather than specific
areas. Consequently professionals are not able to determine what the real needs of
individual are.

• the test does not favour children in deprive localities. Most of the children may find it
difficult coping with the instructional requirements. In certain instances, since the child
is aware that he is being assessed he / she could put up behaviour, which may not be
natural. This may provide misleading results.

• the tests do not provide sufficient information on why the test taker fails to achieve.

• some of the tests are not culturally fair; they may be full of biases. Test takers may not
understand the language. Most seriously, since the experiences one has have effects on
his performance, we can imagine the mess that will arise when the test is applied to
children in an environment different from where the test was normed. This is why it
becomes important for test administrators to be careful when selecting test. Leaving test
in the hands of inexperienced individuals will lead to enormous wreckage on the lives of
innocent children.

Teacher-made Tests
Unlike standardized test, these one are structured by teachers in the classroom. For instance after
teaching, a teacher can construct test made up of a few items to test the degree of students’
learning in that specific unit. By doing this, the teacher does not go through any elaborate process
as in the construction of standardized tests. More so, the test may be confined to the specific
content covered within a given period. The test may be either objective or essay tests. Teacher-
made test are means to an end. They aid in decision-making. As a recapitulation, teacher-
constructed classroom achievement tests are used to

22
(a) Determine what students know.
(b) Identify student’s learning problems and areas that should receive remedial teaching.
(c) Determine the effectiveness of pedagogical strategies.
(d) Find out to what extent students are meeting set out instructional objectives.
(e) Give guidance and counseling to students on how and what to study as well as choice of
content.
(f) Encourage and motivate students to learn.
(g) Give students feedback on their performance to enable them buck up in areas in which
they are weak.
(h) Select and promote students from one grade to another.
(i) Group and select students for instructional purposes.
(j) Predict students’ future performance.
(k) Provide parent or guardians with information on the performance of their children
or wards.

Strengths of Teacher-made Tests


Teacher-made tests can be progressive as teacher examines students gradually on every
topic or unit covered.

Students may be at ease with examination as question items are constructed by teacher.
Cultural biases may be removed. Teachers may be comfortable administering test items.

Weaknesses of Teacher-made Tests


• The test items may have subjective tendencies. The assessor can interpret test scores
using his / her discretion.

• There is also the tendency of teachers being partial in administering and scoring test
items.

• The reliability and validity of test items may not be assured.

Test Items Construction

Components of an Assessment Blue Print / Plan

1. The Planning Stage:


The test construction like any other purposeful activity needs to be adequately planned and
executed. What then goes into planning a test?
At the planning phase preliminary steps that could facilitate writing of useful and relevant items
are taken. This stage involves four main interrelated steps. These are:
(a) Listing the main objectives of the subject matter for which the test is being
constructed.

23
(b) Listing the main topics covered or to be covered
(c) Marrying the objectives and the list of topics to build the table of specifications for
the test and
(d) Determine the appropriate test items and types.
It is worthy to note that in the planning; one does not only list objectives but also tries to classify
the objectives. For instance those dealing with recall, comprehension, application interpretation
etc. should be clearly delineated. This information is used to build the Table of specification.

Constructing a Table of Specification


The table of specification is a two – way table, or chart which relates behaviour changes or desired
learning outcomes to the course content. It shows the content or topics covered and instructional
objectives or processes to be tested. The behavioural changes can be classified into many
categories. For example, the six principal categories of Bloom’s taxonomy of educational
objectives.

Preparing a Table of Specification


To prepare the table of specification, one needs the following:
(a) Specific topics and sub-topics covered during instructional period must be listed.
(b) Again list the major course objectives and also instructional objectives
(c) The whole number of test items of each type
(d) Kinds of task the items will represent
(e) Number of items under each task
(f) The content area you want to test
(g) Number of items in each content area.

A convenient way to set up the table of specification is to have the objectives or the abilities to
be demonstrated across the top of the page and the subject matter contents or topics in a column
on the left hand side of the page. An example for Mathematics based on Bloom’s Taxonomy is
given in table below.

CONTENT KNOWLE COMPREHE APPLICAT ANALY SYNTH EVALUA TOT


DGE NSION ION SIS ESIS TION AL
ADDITION 1 1 1 3
SUBTRACTIO 1 1
N
DIVISION 1 1 2
RENAMING 2 1 1 4
DEFINITION 2 2
TOTAL 4 3 3 2 12

In the table of specifications, the number of items is indicated in the set where the two meet. In
table 1 the writer has indicated that for all the content, 4 items will be constructed to test for

24
knowledge of usage and 3 items to test for application. Not all the sets in the table of
specifications needs to have items since certain processes will be unsuitable or irrelevant for
certain topics. The number of items devoted to each topic and objectives as well as the importance
attached to them indicates the relative weight given to each area of content and behaviour. A
table of specification is usually used more especially in the case of objective test items than with
essay items, because objectives item seem to measure single units of behaviour in content areas.
However the table of specification is still applicable in the construction of essay tests.
Importance of the Table of Specification
1. It helps the teacher to cover adequately the topics treated during the term. Also the
behaviours that students were expected to acquire are all catered for when the table of
specification is used.
2. It helps the teacher to determine the content validity of the test. In that the teacher is
able to do a sampling to cover all that has been taught during the term.
3 It helps the teacher to do a meaningful weighting of the test items in each set of the
table of specification accordingly.
4. It avoids over lapping in construction of the test items.
5. It helps teachers to determine content areas where students / pupils have difficulty.

Item Construction Stage


The writing of the test items immediately follows the planning stage. At this point, the test
constructor proceeds to write the individual items. This is the phase at which specific items are
written in accordance to the table of test specifications. Whichever test items types are being
constructed should follow the basic principles laid down for them.
For convenience, the original draft of items should exceed the number of items intended for the
test. The rationale behind this idea is that after eliminating unsuitable items in an attempt to refine
the test, enough number of items could be left for the test to the importance attached to it.

General Guidelines for Question Writing


(Eight Steps in constructing a good Classroom Achievement Test)
For the classroom teacher to succeed in his quest to present meaningful and content valid test
items to pupils or students he must follows certain steps. These are:

1. Define the Purpose of the Test


The teacher must know what he wants a student to be able to do by the end of the test. The test
items must match with classroom instructional objectives. He must ask himself these useful
questions;
• Why am I testing?
• Why is the test being given at this time?
• Who will take the test?
• Have the testees been sufficiently informed?
• How will the scores be used?

25
2. Determine the item Format to Use
The teacher has several options to use as far as the format of the test item is concerned. That is
he may either use the essay type test where the student produces his own answers in an extended
form or the objective type test where the student is required to select an answer from some
alternatives supplied by the teacher. The format must be appropriate for testing the topic and
objectives concerned. It may be necessary to use more than one item format
The format depends on:
• The purpose of the test
• Time available for writing items
• Number of students to be tested
• Physical facilities available
• Academic standard of testees
• Writer’s skill in item writing
• Ages of the students.
3. Determine what is to be tested [define the task and shape the problem situation]
The teacher must know what the test intends to measure and the content area that he seeks to
cover so that the expected knowledge, skills and attitudes from students could be measured. Test
items as much as possible should reflect the content and instructional objectives. Test items
should also match the maturing level of testees. Ideally, a test plan made up of a table of
specification of a blue print must be made.
The following are some important clues that will guide the teacher in determining what to be
tested.
• Define instructional objectives
• Know the chapter or units that the test is to cover
• Make sure test items match with course objective
• Prepare a table of specifications

4. Writing the Individual Items


Writing the individual items is a crucial aspect of test writing, which demands high skills
to be demonstrated by the teacher if the test is to be effective. Etsey et al (2001) provides
the following guidelines for writing individual items.
a. Keep the table of specification before you and constantly refer to it in order to cover
important content areas.
b. Items must match the instructional objectives (content valid)
c. Formulate well defined items that are not vague and ambiguous. Items should
be grammatically correct and free from spelling and typing errors.
d. Avoid needlessly complex sentences. Avoid excessive use of words.
e. Write items simply and clearly (clarity)
f. The test items should be based on information that the student should know.
g. Prepare more items than you actually need.
h. The task to be performed and the type of answers required should be clearly defined.

26
i. Include questions of varying difficulty.
j. Write the items and the scoring scheme as soon as possible after the material has
been taught.
k. Avoid lifting questions directly from test books and past questions.
l. Write test items in advance of test date to permit review and editing,

5. Review of Items [Plain the test]


This is another important area in test items construction. The teacher takes another look at the
items written and in the process makes some corrections if any e.g. Ambiguous questions may
be re-worded; also questions that were too long may be cancelled or be made simple. If there is
any question that seems to be out of context, it may be dropped. The items should have the
tendency to discriminate between low and high achievers. When the review is completely done,
the test can be administered to students.

6. Writing Directions
Every test must be provided with some directions. The directions provided will help the student
respond to the questions appropriately. The directions should include
• Number of questions to answer.
• The time limit for the questions.
• The various sections in the examination and how they are to select questions from the
sections.
• Penalties for offences committed.
• How and where the answers are to be written.
• Clarity of expression etc.
• Marks allocated to the various items.
7. Preparing the Scoring Scheme
Objective type test: Here the best way is to compare a key, which contains the correct
best answer to each question to the answer a student gives.

Essay Test
This type of assessment usually requires students to solve mathematical problem and present their
own solution. Depending on the amount of time given to the testee, easy type test can be divided
into two types. Restricted response type and extended response type.
Restricted response type: it limits the content and the form the testee’s answer should take.

Example: 1. Discuss three factors affecting validity of research in education.


2 Draw FFFF = 5𝑐𝑚 and FFFF
𝐴𝐵 𝐵𝐶 = 9𝑐𝑚 Construct < 𝐴𝐵𝐶 = 75°
Extended response type: This type does not limit the testee in the form and scope of the
answer.

Example: 1. Discuss how a teacher can improve validity of a test


2 Draw a triangle and inscribe a circle

27
Advantages of essay tests
• They are easy to prepare.
• Little time is require to prepare the items
• It encourages global learning
• Skills such as ability to organize materials and to write and to arrive at
conclusions are improved
• They are best suited for measuring higher order behaviours and mental acuity
Disadvantages of essay test
§ Scoring objectively is difficult
§ It can be time consuming for the taker and the marker
§ It is prone to halo effect where scoring can be influenced by extraneous factors like
relationship and handwriting
§ Content validity can be reduced since essay test necessitate testing a limited sample of
the subject matter.
§ Bluffing by testee may arise where students may provide unnecessary stuff
§ Student who write faster may score higher marks since premium is placed on the writing

Marking Scheme

Marking scheme is a step by step procedure outlining (detailing) how a given (mathematical)
question is be solved. It indicates marks that should be awarded to each (interested) step. Thus in
writing the scheme, marks must be allocated to the various expected qualities or behaviour you
want yours students to respond to.
In Mathematics, the letters A for accuracy, B for accuracy and M for methods are use in the award
of marks.

Example
OPQ S
1. a Show that log N 𝑥 = R . 𝟒 𝒎𝒂𝒓𝒌𝒔
OPQR N
1. b Hence, solve for 𝑥 if log T 𝑥 = 4 log S 3
𝟔 𝒎𝒂𝒓𝒌𝒔

Solution: log N 𝑥 = 𝑝 𝑥 = 𝑎E . 𝑴𝟏
𝑴𝟏
Taking log V of both sides gives log V 𝑥 = log V 𝑎E

𝑝 log V 𝑎 = log V 𝑥 𝑩𝟏

OPQ S
𝑝 = log N 𝑥 = OPQR N 𝑴𝟏
R

W OPQX T
log T 𝑥 = 𝑴𝟏
OPQX S

W
log T 𝑥 = 𝑴𝟏
OPQX S

(log T 𝑥)4 = 4
𝑩𝟏

28
log T 𝑥 = ± 2
3
𝑥 = 34 or 3g4 𝑥 = 9 or 𝑥 = 𝑨𝟐
@

Example
S jg4S>T
Given that 𝑦 = , find the value of 𝑦 when 𝑥 = −2
T
Solution: Student A
𝑦=
(g4)jg4(g4)>T
𝑴𝟐 Student B
T
W>W>T (g=) jg4(g=)>T
= 𝑦= 𝑴𝟐
T
𝟏𝟏
𝑩𝟏 T
= 𝒐𝒓 𝟑. 𝟔𝟔𝟔𝟔̇ 4=>3A>T
𝟑 𝑨𝟏 = 𝑩𝟏
T
𝟑𝟖
= 𝟑
𝒐𝒓 𝟏𝟐. 𝟑𝟑𝟑̇ 𝑨𝟎

In the two solutions presented by the two students, Student A scored 4 (all the) marks. Student B
scored 3 marks. He got the M and B marks because he did the correct substitution and the answers
(25 and 10) are correct. His answer is also correct but that is not the answer for the question so he
got the A marks wrong.

Try: Set a mathematics question that would attract 10 marks. Prepare a marking scheme
indicating clearly, the marks at the interest steps.

OBJECTIVE TYPE TESTS


Writing Multiple Choice Items
An objective type test requires respondents to provide a brief response which is usually not more
than one sentence long. There are two types of objective test (response choice items).
These are the selection and the supply type.
The selection type consists of multiple-choice type, true or false type and matching type. The
supply type varies from completion, short-answers to fill-in the blank spaces.

Multiple-choice test is the most frequently used and most highly regarded objective test.
A multiple choice test is a type of objective test in which the respondent is given a stem and he is
to select from among three or more alternatives, options or responses, the one that best completes
the stem. The incorrect options are called foils or distracters.
The multiple choices item consist of two parts.
The stem contains the problem and the incomplete statement introducing the test item.
A list of suggested answers known as responses, options, alternative or choices follows.
There are two types of multiple-choice test. These are:
The single best response and the multiple best responses
The single best response type consists of a stem followed by three or more responses and the
respondent is to select one option to complete the stem.
The multiple best response type consists of a stem followed by several true or false statements or
words. The respondent is to select which statements could complete the stem.

Guidelines for Constructing Multiple Choice Test Stem


Characteristics
1. Items should be concise, specific and easy to read and understand.

29
2. Specific determiners should be avoided. They lead to guess work e.g. ‘an’, ‘a’, ‘some’,
‘most’, ‘often’, ‘all’, ‘always’, ‘never’, ‘none’
3. Items should be stated in positive terms rather than in negative terms.
4. Test items should not be copied directly from textbooks or from other people past test items.
Original items should always be constructed.
5. Create independent items. The answer to one item should not depend on the knowledge of
the answers to previous items.
7. Items that measure opinions should be avoided. One option should clearly be the best
answer.

Writing the Alternatives


1. An alternative response should be able to attract some uniformed students by being related to
the stem in a way. The options should be plausible.
Example
The kind of assessment that goes on hand-in-hand with teaching and learning is
a. continuous evaluation
b. continuous assessment
c. formative assessment
d. continuous assignment

2. Be sure that there is only one correct or clearly best answer.


3. Focus the items to specific learning targets. E.g. Knowledge, comprehension and analysis.
Most teacher-made test focus too much on measuring understanding or comprehension.
The tester must be innovative. He must try to set questions to cover the other domains of
cognitive learning.
4. Vary the placement of the correct alternatives. If the test takers get to know the clue or pattern
of arrangement of the correct answers, they will score all questions though they have no idea
about the answers.
5. Alternatives should be vertically arranged.
6. All options for a given items should be homogenous in content form and grammatical structure.

Advantages of Multiple Choice Tests


1. Scoring is highly objective
2. It is easy to be scored by anyone using the scoring key
3. It allows for extensive sampling
Disadvantages of Multiple Choice Tests
1. Construction of the test requires much time.
2. It is not appropriate when we want to measure the student’s ability to organize and present
ideas.

30
3. The test occupies much space
4. It cannot be used to measure certain problem-solving skills.
Example of multiple – choice question

1. If 𝑋 = {1, 2}, find the number of subsets of 𝑋.


A. 2
B. 3
C. 4
D. 5

2. The length of a rectangle is twice its width. If the perimeter of the rectangle is 42 meters,
find its width.
A. 9 m
B. 8 m
C. 7 m
D. 5 m

3. If 𝑥 < 𝑦 and 𝑦 < 𝑧, then


A. 𝑥 = 𝑧
B. 𝑥 > 𝑧
C. 𝑧 > 𝑥
D. 𝑥 + 𝑦 > 𝑧

4. A man bought a car for Gh¢15,000.00. He later sold it at a profit of 20%. What was the
selling price?
A. Gh¢3,000.00
B. Gh¢18,000.00
C. Gh¢12,500.00
D. Gh¢30,000.00

The Venn diagram below shows a class of 35 students studying one or more of three subjects,
Mathematics (M), Economics (E) and Geography (G).

U = 35
M E
5 k
7
3 2
4
10
G
Use it to answer Questions 5 and 6.
5. Find the value of k.
A. 6
B. 5
C. 4
D. 3

6. Find the number of students who study only one subject.


A. 21
B. 18
C. 15
D. 12

31
CHAPTER FOUR

VALIDITY

In order to ensure a high degree of reliability, suitability, objectivity and validity, there are several
approaches the teacher can utilize to evaluate assessment.

For the statistical analyses of students assessed scores, educational researchers and science
researchers, the estimation of reliability and validity is a task frequently encountered.
Measurement issues differ in the social sciences in that they are related to the quantification of
abstract, intangible and unobservable constructs. In many instances, then, the meaning of
quantities is only inferred.

It is important to bear in mind that validity and reliability are very important in analyses of test
results.

Validity is the extent to which a test measures what it is supposed to measure and the accuracy of
inferences and decisions made on the basis of the assessment results. It refers to the degree to
which evidence and theory support the interpretation of test score entailed by proposed uses of
tests. In other words validity refers to the soundness or appropriateness of interpretations and uses
of students’ assessment results.

For example, if a timed test of one-digit multiplication is used to determine how quickly students
can recall their multiplication facts, the test is measuring what it was designed to measure. If the
same test were employed to assess students’ capacity to determine whether to use addition,
subtraction, multiplication, or division to solve a variety of problems, the test would not meet the
criterion. To the extent that standards-based mathematics test is valid, we should be confident
that a student who does well on it is in fact competent in the mathematics skills and processes
specified in the standards. To be valid, an assessment should also be fair or equitable; that is, it
should enable students to demonstrate their mathematical competence, regardless of their
language or cultural background, or physical disabilities.

The question of validity is raised in the context of the three points- the form of the test,
the purpose of the test and the population for whom it is intended. Therefore, we cannot ask the
general question “Is this a valid test?” The question to ask is “how valid is this test for the decision
that I need to make?” or “how valid is the interpretation I propose for the test?” We can divide
the types of validity into logical and empirical.

It is important to note the

Ø The concept of validity refers to the ways in which we interpret and use the
assessment results and not to the assessment itself.
Ø The assessment results have different degree of validity for different purposes
and for different situations.
Ø Judgment about the validity of interpretation or uses of assessment results
should be made only studying and combining several types of validity
evidence.

Principle for validation

Validation (of assessment) refers to ascertaining the appropriateness or soundness of the uses
and interpretations of assessment results based on available evidence. Nitco (2001) noted that
validity judgment must be based on four principles:

32
1. The interpretation or meaning you give to your students’ assessment results are valid
only to the degree that you can point to evidence that supports their appropriateness and
correctness. Eg. Consider a situation where Ansah has taken the mathematics
achievement test each year but his scores suddenly rose this year. Ansah’s score has
several interpretations:- his mathematical conceptual skills has improved, or is highly
motivated or his solutions to mathematical problems have been improved etc.

2. The uses you make of your assessment results are valid only to the degree to which you
can point to evidence that supports their correctness and appropriateness. Eg. Ansah’s
teacher can use his score in a number of ways:- diagnoses, placement, certification, etc.

3. The interpretation and uses you make of your assessment results are valid only when
the values implied by them are appropriate.

4. The interpretation and uses you make of your assessment results are valid only when
the consequences for the interpretations and uses are consistent with appropriate values.

Content Validity:

When we want to find out if the entire content of the behavior / construct / area is represented in
the test, we compare the test task with the content of the behavior. This is a logical method, not
an empirical one. Example, if we want to test knowledge on area of plane figures, we must not
limit the question to say rectangle but it should cover as many plane figures as possible.

Also, if we want to test knowledge on Ghanaian Geography, it is not fair to have most questions
limited to the geography of Brong Ahafo Region but questions must cover the whole of Ghana.

Face Validity:

Basically face validity refers to the degree to which a test appears to measure what it purports to
measure.

Example we want to test the students’ understanding of the term ‘area’. We can present to the
students various plane figures and ask them to find ‘their areas’. Here we are only interested in
the ‘area’ and nothing else.

Criterion-Oriented or Predictive Validity:

Criterion-related validity is concerned with empirical methods of studying the relationship


between the test scores or other measures and some independent external measures. There two
sub-categories:

Ø Predictive validity evidence: it is when the criterion data are gathered at a later date.
E.g. When a student’s JHS mathematics results are used to predict performance in SHS

Ø Concurrent validity evidence: when the scores, both test scores and criterion scores are
collected at the same time, we have concurrent validity evidence.

Concurrent Validity: is the degree to which the scores on a test are related to the
scores on another, already established test administered at the same time, or to some
other valid criterion available at the same time. Example, a new simple test is to be
used in place of an old cumbersome one, which is considered useful; measurements are
obtained on both at the same time. Logically, predictive and concurrent validation are

33
the same, the term concurrent validation is used to indicate that no time elapsed
between measures.

When you are expecting a future performance based on the scores obtained currently by the
measure, correlate the scores obtained with the performance. The later performance is called
the criterion and the current score is the prediction. This is an empirical check on the value of
the test – a criterion-oriented or predictive validation.

The method use in determining and expressing validity is the same for concurrent and
predictive. We use correlation analyses (correlation coefficient) to measure and quantify the
strength of relationship between the scores. The appropriate correlation to compute is the
Pearson-product moment correlation coefficient given as
xyz
𝑟=x [ the covariance method]
yy xzz
Which can be rewritten as;
∑D}•3(𝑥} − 𝑥̅ )(𝑦} − 𝑦F)
𝑟S{ = … … … (1)
€∑D}•3(𝑥} − 𝑥̅ )4 ∑D}•3(𝑦} − 𝑦F)
That is if 𝑟S{ = 𝜌, then;
∑D}•3(𝑥} − 𝑥̅ )(𝑦} − 𝑦F)
𝜌=
€∑D}•3(𝑥} − 𝑥̅ )4 ∑D}•3(𝑦 − 𝑦F)4

Example

Student Quiz 1 Quiz 2 𝑋 − 𝑋F (𝑋F 𝑌 − 𝑌F (𝑌F − 𝑌)4 (𝑋F − 𝑋)(𝑌F − 𝑌)


No. X Y − 𝑋)4
1 4 6 −2 4 −1 1 2
2 8 8 2 4 1 1 2
3 10 9 4 16 2 4 8
4 7 7 1 1 0 0 0
5 6 8 0 0 1 1 0
6 3 2 −3 9 −5 25 15
7 8 9 2 4 2 4 4
8 5 10 −1 1 3 9 −3
9 5 6 −1 1 −1 1 1
10 4 5 −2 4 −2 4 4

Total 60 70 44 50 33

33
𝑟=
√44 × 50
33
= = 0.7035
10√22

However, to make the computation easier the following formula is used.

𝑛 ∑D}•3 𝑋𝑌 − ∑D}•3 𝑋 ∑D}•3 𝑌


𝑟= … . . (2)
€[𝑛 ∑D}•3 𝑋 4 − (∑D}•3 𝑋)4 ] [𝑛 ∑D}•3 𝑋 4 − (∑D}•3 𝑋)4 ]

34
The concept of correlation provides information about the extent of the relationship between two
variables. Two variables are correlated if they tend to ‘go together’. For example, if high scores
on one variable tend to be associated with high scores on a second variable, then both variables
are correlated. Correlations aim at identifying relationships between variables and also to be able
to predict performances based on known results.

The statistical summary of the degree and direction of the linear relationship or association
between any two variables is given by the coefficient of correlation. Correlation coefficients range
between – 1.0 𝑎𝑛𝑑 + 1.0. Correlation coefficients are normally represented by the
symbols, 𝑟 (for sample) and 𝜌 (rho) (for populations). When two sets of data are strongly linked
together we say they have a High Correlation. The word Correlation is made of Co- (meaning
"together") and Relation (related or have something in common)
Correlation is Perfect when the value 1 or -1. When the relationship is strong (high), the value
of the correlation coefficient, r, is greater than 0.60 or less than – 0.60, i.e. r > 0.60, r < - 0.60

2. When the relationship is moderate (mild), the value of the correlation coefficient, r, lies
between 0.40 𝑎𝑛𝑑 0.60 𝑜𝑟– 0.60 𝑎𝑛𝑑 − 0.40,

𝑖. 𝑒. 0.40 < 𝑟 < 0.60 𝑜𝑟 − 0.40 > 𝑟 > −0.60

3. When the relationship is weak (low), the value of the correlation coefficient, r, is less than
0.40 𝑜𝑟𝑔𝑟𝑒𝑎𝑡𝑒𝑟𝑡ℎ𝑎𝑛 − 0.40, 𝑖. 𝑒. 𝑟 < 0.40 𝑟 > −0.40

4. When the relationship is perfect, the value of the correlation coefficient, r, is 1.0
or -1.0 i.e. r = 1 for perfect positive r = -1.0 for perfect negative.
5. When there is no linear relationship, the value of the correlation coefficient, r, is 0.0

• Correlation is positive when there is direct relation. I.e. One variable increases as the
other also increases and
• Correlation is Negative when one value decreases as the other increases

Scatter Plots
A scatter plot or scatter diagram gives a pictorial representation of the two variables and shows
the nature of the relationship between the two variables. It is important that scatter plots are drawn
before any analysis is done on the variables. This is because scatter plots could either be linear or
curvilinear.

The following diagrams show the trend or nature of correlation between dependent and
independent variables.

35
How can we determine the strength of association based on the Pearson correlation coefficient?

The stronger the association of the two variables, the closer the Pearson correlation coefficient, r,
will be to either +1 or -1 depending on whether the relationship is positive or negative,
respectively. Achieving a value of +1 or -1 means that all your data points are included on the
line of best fit - there are no data points that show any variation away from this line. Values for r
between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there is variation around the line of
best fit. The closer the value of r to 0 the greater the variation around the line of best fit. Different
relationships and their correlation coefficients are shown in the diagram

Are there guidelines to interpreting Pearson's correlation coefficient?

Yes, the following guidelines have been proposed:

𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡, 𝑟
𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝐴𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
𝑆𝑚𝑎𝑙𝑙 . 1 𝑡𝑜 .3 −0.1 𝑡𝑜 − 0.3
𝑀𝑒𝑑𝑖𝑢𝑚 . 3 𝑡𝑜 .5 −0.3 𝑡𝑜 − 0.5
𝐿𝑎𝑟𝑔𝑒 . 5 𝑡𝑜 1.0 −0.5 𝑡𝑜 − 1.0

Remember that these values are guidelines and whether an association is strong or not will also
depend on what you are measuring.

36
Does the Pearson correlation coefficient indicate the slope of the line?
It is important to realize that the Pearson correlation coefficient, r, does not represent the slope of
the line of best fit. Therefore, if you get a Pearson correlation coefficient of +1 this does not mean
that for every unit increase in one variable there is a unit increase in another. It simply means that
there is no variation between the data points and the line of best fit. This is illustrated below:

What assumptions does Pearson's correlation make?

There are four assumptions that are made with respect to Pearson's correlation:

1. The variables must be either interval or ratio measurements


2. The variables must be approximately normally distributed guide for further details).
3. There is a linear relationship between the two variables.
4. Outliers are either kept to a minimum or are removed entirely.
5. There is homoscedasticity of the data.

Calculating the Rank-Order Correlation Coefficient


This method involves calculating the differences between the ranks. The formula is given
below.
6 Σ d2
ρ=1–
N(N2 – 1)
Here are the steps to follow when using the rank order correlation method.
1. Obtain the sample size, N.
2. Obtain d, the differences between the ranks. 𝑅1 – 𝑅2 = 𝑑
3. Square the differences and sum up the differences 𝛴𝑑2
4. Multiply the result in Step 3 by 6. 6𝛴𝑑2

37
5. Obtain the values of 𝑛(𝑛2 – 1).
6. Divide the values in Step 4 by the result in Step 5.
7. Subtract the result from 1 and obtain ρ (rho)

Given the following scores:


𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝑄𝑢𝑖𝑧 1 𝑄𝑢𝑖𝑧 2 𝐷 𝑑2
𝑁𝑜. 𝑅𝑎𝑛𝑘𝑠 𝑅𝑎𝑛𝑘𝑠 𝑅1 – 𝑅2
1 8.5 7.5 1 1.0
2 2.5 4.5 −2.0 4.0
3 1 2.5 −1.5 2.25
4 4 6 −2 4.0
5 5 4.5 0.5 0.25
6 10 10 0 0.0
7 2.5 2.5 0 0.0
8 6.5 1 5.5 30.25
9 6.5 7.5 −1.0 1.0
10 8.5 9 −0.5 0.25
43.00

Step 1 Step 2 Step 3


Now compute the correlation coefficient by substituting the values in the formula.
ρ = 1 - 6 Σd2 = 1 − (6(43) = 1 - 258 = 1 - 0.26 = 0.74
2
N(N – 1) 10(100 – 1) 990

Step 4 Step 5 Step 6 Step 7

The result, ρ = 0.74 shows that there is a strong positive relationship between Quiz 1 and Quiz 2.
Follow the steps and calculate the rank-order correlation coefficient for the data below.

Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Quiz 1 7 6 14 14 18 3.5 3.5 19 20 8.5 17 5 16 12 8.5 1.5 11 10
Quiz 2 6 18 12 14 7 23 20 4 9 19 18 5 17 8 13 2 10.5 10.5

The answer must be close to 0.7?

If your answer is 0.87 or close, then congratulations, you have done well.
If your answer is very different, then check your steps and your calculations again.

Example: Compute the spearman rank correlation coefficient


English 56 75 45 71 61 64 58 80 76 61
Maths 66 70 40 60 65 56 59 77 67 63

38
Using the Product Method
This method uses the product of the two variable and squares of each variable for the
computations. The formula is given below.

𝑛 ∑D3 𝑋𝑌 − 𝑛 ∑D3 𝑋} ∑D3 𝑌


𝑟S{ = … … . . (2)
4
œ[𝑛•∑D3 𝑋}4 ž − (∑D3 𝑋} )4 ] Ÿ𝑛(∑D3 𝑌}4 − •∑D} 𝑌} ž
Here are the steps to follow when using the product method.
1. Identify the sample size, n.
2. Obtain the product of the two variables 𝑋𝑌
3. Obtain the sum of the products in Step 2 𝛴𝑋𝑌.
4. Multiply the result in Step 3 by n, 𝑛𝛴𝑋𝑌
5. Find the sum of the X values, 𝛴𝑋
6. Find the sum of the Y values𝛴𝑌
7. Find the product of 𝛴𝑋 𝑎𝑛𝑑 𝛴𝑌 = (𝛴𝑋)(𝛴𝑌)
8. Subtract Step 7 from Step 4. (𝑛𝛴𝑋𝑌) – (𝛴𝑋)(𝛴𝑌). This gives the numerator.
9. Square the X values and find the 𝑠𝑢𝑚 𝛴𝑥 4
10. Multiply the result in Step 9 with sample size, 𝑛 , 𝑛𝛴𝑥 4
11. Square the result in Step 5 and subtract from result in Step 10. 𝑛𝛴𝑋2 – (∑ 𝑥)4
12. Square the Y values and find the sum, 𝛴𝑌2
13. Multiply the result in Step 12 with the sample size, n. 𝑛𝛴𝑌2
14. Square the result in Step 6 and subtract from result in Step 13. nΣY2 – (ΣY)2
15. Multiply the results in Step 11 and 14 and find the square root. This gives the
denominator.
16. Divide the result in Step 8 by the result in Step 15 to get the answer.
An example is done for you. The steps are highlighted.

Steps 2 Step 9 Step 12

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑄𝑢𝑖𝑧 1 𝑄𝑢𝑖𝑧 2 𝑋𝑌 𝑋4 𝑌4


𝑁𝑜. 𝑋 𝑌
1 4 6 24 16 36
2 8 8 64 64 64
3 10 9 90 100 81
4 7 7 49 49 49
5 6 8 48 36 64
6 3 2 6 9 4
7 8 9 72 64 81
8 5 10 50 25 100
9 5 6 30 25 36
10 4 5 20 16 25
Total 60 70 453 404 540

Step 1 Step 5 Step 6 Step 3 Step 9 Step 12

39
Using the product formulae

𝑛 ∑D3 𝑋𝑌 − 𝑛 ∑D3 𝑋} ∑D3 𝑌


𝑟S{ = … … . . (2)
œ[𝑛(∑D3 𝑋}4 ) − (∑D3 𝑋} )4 ][𝑛(∑D3 𝑌}4 − (∑D} 𝑌} )4 ]

W3=W? W3=W?
Therefore, 𝑟S{ = = W¡@.AW = 0.7
€(WWA)(=AA)

You will notice that the answer we had here is the same as the answer we got with the
covariance method. It does not matter therefore which method is used. The answers will always
be exact or very close.

Now, follow the steps and calculate the correlation coefficient for the data below given
students’ scores in quiz 1 as predictive in quiz 2 using equation (1)

Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Quiz 1 14 16 10 10 8 18 18 8 8 13 10 16 10 12 13 20 13 12 20
Quiz 2 13 14 13 11 12 15 15 10 11 14 14 14 11 12 13 15 12 12 16

The answer should be close to 0.85?

Exercise
Calculate the correlation coefficient for the data below.

Student (xi) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Mid-sem 11 17 12 15 8 15 16 10 17 12 17 15 12 14 13 15 20 20 12 9
End-sem 10 14 15 16 12 16 15 15 18 16 18 18 15 16 10 12 20 19 14 11

Given the data below, compute the Pearson product moment correlation coefficient using:
1. The covariance method
2. The product method
Interpret your result in relation to the strength of relationship between students’ performance in
Mid-semester and end of semester examinations.

3. Do the same for the following table

Stude 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2
nt 0 1 2 3 4 5 6 7 8 9 0
Physic 7 7 8 5 6 8 4 5 6 9 4 5 7 6 5 6 8 6 9 8
s 0 5 8 6 0 0 5 0 8 0 0 5 4 4 8 4 0 5 0 8
Histor 5 6 4 8 6 7 9 8 7 5 8 4 6 7 8 7 4 6 4 5
y 0 0 8 5 0 0 2 6 2 8 5 5 6 5 0 0 6 0 2 0

Construct Validity:

Construct validity is the degree to which a test measures an intended hypothetical construct.
Many times psychologists assess / measure abstract attributes or constructs. The process of
validating the interpretations about that construct as indicated by the test score is construct
validation. A construct validation may be defined as the process of determining the extent to
which performance on assessment can be interpreted in terms of one or more constructs. A

40
construct is an individual characteristics that we assume exist in order to explain some aspect of
behaviour (e.g. mathematical reasoning, mathematical conceptualization, abstraction and
generalization, perception and anxiety related behaviours). This can be done experimentally, e.g.,
if we want to validate a measure of mathematical reasoning or anxiety. We have a hypothesis
that anxiety increases when subjects are under the threat of an electric shock, then the threat of
an electric shock should increase anxiety scores (Note: not all construct validation is this
dramatic!)

A correlation coefficient is a statistical summary of the relation between two variables. It is the
most common way of reporting the answer to such questions as the following: Does this test
predict performance on the job? Do these two tests measure the same thing? Do the ranks of
these people today agree with their ranks a year ago?

According to Cronbach, to the question “what is a good validity coefficient?” the only sensible
answer is “the best you can get”, and it is unusual for a validity coefficient to rise above 0.60,
though that is far from perfect prediction.

All in all, we need to always keep in mind the contextual questions: what is the test going to be
used for? How expensive is it in terms of time, energy and money? What implications are we
intending to draw from test scores? You can use several methods to establish the construct
validity of your test results:

Ø Defining the domain


Ø Analyzing the mental process required by the assessment tasks
Ø Comparing the scores of known groups in terms of the construct of interest.
Ø Correlating the scores with other measures of similar construct or the same
construct.

How can the teacher improve Validity?


§ Design a table of specifications.
§ Test only what is taught.
§ Consider ‘for whom’ and ‘for what’.
§ Ensure that instructions are clear.
§ Use item types that enhance reliability of tests – both subjective and objective items.
§ Ensure appropriate sampling content.
§ Determine which low discriminating items to discard after item analysis.
§ Pay attention to scoring procedures and test administration.

Furthermore, the teacher must be aware of the many factors which may influence the
validity of tests, measurement, or evaluation results at any given time in the assessment
process. Therefore, the teacher must pay attention to:
(1) the test;
(2) administration and scoring;
(3) pupil’s responses;
(4) the group and the criterion.

These factors are outlined below.

41
Factors which may influence Validity:
1. Factors in the test:
a. Unclear directions
b. Poor sentence structure
c. Inappropriate level of difficulty of items
d. Poorly constructed test items
e. Ambiguity
f. Test items inappropriate for items being measured
g. Test too short
h. Improper arrangement of items
i. Identifiable patterns of items

2. Factors in test administration and scoring:


a. Insufficient time to complete test
b. Unfair aid to individuals
c. Cheating
d. Unreliable scoring of items e.g. essays
e. Adverse conditions (physical; psychological)
3. Factors in pupils’ /students responses: 4. Nature of the group and the criterion:
a. Invalid test interpretations a. Age
b. Emotional disturbances b. Sex
c. Ability level
c. Test anxiety c. Educational background
d. Set pattern of answering d. Cultural background

42
CHAPTER FIVE

RELIABILITY

Reliability refers to how consistent an assessment measures student’ knowledge, skill and
understanding.

Reliability is the degree to which a test consistently measures whatever it measures.

Reliability refers to the consistency in assessment scores over time on a population of individuals
or group.

In general, reliability refers to the degree to which students assessment results are the same when:

Ø They complete the same task(s) on two different occasions.


Ø They complete different but equivalent or alternative tasks on the same or different
occasions.
Ø Two or more assessors score (mark) their performance on the same task(s). Thus different
raters (for example, would three different teachers scoring the same student’s response
give the same rating?).
Research requires dependable measurement. Measurements are reliable to the extent that they
are repeatable and that any random influence which tends to make measurements different from
occasion to occasion or circumstance to circumstance is a source of measurement error. Errors
of measurement that affect reliability are random errors and errors of measurement that affect
validity are systematic or constant errors.

Test reliability

Applied to test, test reliability refers to the consistency of the score obtained by the same
individuals when examined with the same test (or with alternate forms) on different occasions.

Test-retest, equivalent forms and split-half reliability are all determined through correlation.

SCORE

A score is a point (or mark) obtained when we engage in a competition or we undertake an


examination. There are different types of score. These are:

Obtained scores, True scores and Error Scores

Obtained scores:

When you conduct any test, the scores or marks your students obtained when you assessed and
mark them are called obtained scores. Those obtained scores can contain errors

True scores:

The true score is the portion of the observed score that contains no measurement errors

Error Scores:

The error score is the remaining portion of the obtained score when the hypothetical true score is
taken away from it. It is referred to as error of measurement. For example, if a student is assessed
ten times and obtained scores recorded as follows:

43
Test Obtained scores Error scores(𝑒} ) 𝑒}
1 68 3 9
2 68 3 9
3 57 −8 64
4 70 5 25
5 70 5 25
6 69 4 16
7 72 7 49
8 65 0 0
9 55 −10 100
10 56 −9 81

If the mean score is 65 and the error variance is 37.8, then the standard deviation is 6.15. The
concept of reliability focuses on the consistency of assessment results whilst the measurement of
error focuses on inconsistency of assessment results.

Standard error of measurement (SEM):

It is the standard deviation of errors of measurement that is associated with the test score for a
specified group of test takers. It is the measure of the variability of the errors of measurement.

It is estimated by using the equations

𝑆𝐸𝑀 = 𝑆𝐷S €1 − 𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡

𝑆𝐸𝑀 = 𝜎S √1 − 𝑟

Where 𝑆𝐷S = 𝜎S =the standard deviation of the obtained scores of the group and

𝑟 = reliability coefficient

If a test has a standard deviation of 10 and a reliability coefficient of 0.89, then the standard error
of measurement will be 3.32

𝑆𝐸𝑀 = 𝜎S √1 − 𝑟

= 10√1 − .89 = 3.32

Methods of obtaining reliability

Test-retest Reliability:

Test-retest reliability is the degree to which scores are consistent over time. It indicates score
variation that occurs from testing session to testing session as a result of errors of measurement.
Problems: Memory, Maturation and Learning can contribute to score variation.

Consistency or stability over time is measured by test – retest reliability. This type of reliability
is “in-line” with the traditional view of reliability, and is usually measured by correlation tests
given to a group of subjects twice over a tasteful period, during which nothing has happen to your
participants to affect their results. Therein lays, the major disadvantage of this method of
reliability. Other problems are concerned with the first test influence on the retesting, perhaps
there is some type of learning effect where taking the first test teaches one how to take the second
test. The problems of history and maturation are additional limitation of the type of reliability

44
Parallel-form reliability:

The other form of stability over time is parallel-form reliability. This reliability is determined by
correlating two forms of a test that measure the same concept. This form of reliability assumes
that the two test versions are equally worded and the word and reading difficulty are as similar as
possible, a hard criterion to justify (a limitation of the method).

Equivalent-Forms or Alternate-Forms Reliability:

Two tests that are identical in every way except for the actual items included assessed when it is
likely that test takers will recall responses made during the first session and when alternate forms
are available. Correlate the two scores. The obtained coefficient is called the coefficient of
stability or coefficient of equivalence.

Problem: Difficulty of constructing two forms that are essentially equivalent. This is a method
used to provide a measure of the degree to which generalizations about students’ performance
from one assessment to another are justified. Both of the above require two administrations.

Split-Half Reliability:

Requires only one administration and especially appropriate when the test is very long. The most
commonly used method to split the test into two is using the odd-even strategy. Since longer tests
tend to be more reliable, and since split-half reliability represents the reliability of a test only half
as long as the actual test, a correction formula must be applied to the coefficient. Split-half
reliability is a form of internal consistency reliability and measures the internal consistency of a
test.

If you conceptualize reliability in terms of stability of the internal structure of a test then the split
half or internal consistency reliabilities are the preferred procedures. Split half reliability is
determined by correlating a sub-score obtained by adding up first half of the test items with a sub-
score determined by the adding of the remaining items. Example, a teacher can give 50 short
questions and mark and take the scores of even numbers for assessment. Since learning may
influence latter item placement, where exposure to early test item may influence your score on
latter items (a limitation of this form of reliability), usually the sum of odd items are correlated
with the sum of even items. Another, limitation of the split half method is that the reliability is
based on just half the test items, not the items of the total test. This restriction of the number of
items will lead to an under estimation of the reliability. The split half correlation needs to be
adjusted for test length; this concept is called attenuation in the literature. The formula for
accomplishing this calculation is called the Spearman-Brown formula.

The formula reads as follow:


2𝑟NV
𝑟SS =
1 + 𝑟NV
Where, rxx is the estimated total correlation and rab is the correlation between half the test.

Internal consistency refers to the consistency of the items comprising your instrument. You
might view this method of reliability as a logical extension of the split half method, where the test
items are viewed as individual sub-tests. Typically these protocols are made of a variety of
questions (items) that are responded to on a dichotomous format. For example, a yes (1) or no
(0) format or a true (1)-false (0) format. The task now is determining the mean correlation among
the various items comprising the test. These correlations are typically calculated by means of a
phi correlation coefficient. This average correlation needs to be modified by Cronback’s alpha
(Cronkack,1947).

45
The formula is as follows:
𝑁. 𝑟̅
𝛼=
1 + (𝑁 − 1). 𝑟̅

Where ‘N’ is the number of items in the test and ‘r’ is the average correlation among test items.
If the average ‘r’ is small, then the alpha approaches zero.

Rationale Equivalence Reliability:

Rationale equivalence reliability is not established through correlation, but rather, it estimates
internal consistency by determining how all items on a test relate to all other items and to the total
test.

Internal Consistency Reliability:

Determining how all items on the test relate to all other items. It is an estimate of reliability that
is essentially equivalent to the average of the split-half reliabilities computed for all possible
halves.

Reliability Coefficient for Internal Consistency

There are several statistical indexes that may be used to measure the amount of internal
consistency for an exam. The most popular index (and the one reported in Testing & Evaluation’s
item analysis) is referred to as Cronbach’s alpha. Cronbach’s alpha provides a measure of the
extent to which the items on a test, each of which could be thought of as a mini-test, provide
consistent information with regard to students’ mastery of the domain. In this way, Cronbach’s
alpha is often considered a measure of item homogeneity; i.e., large alpha values indicate that the
items are tapping a common domain.
The formula for Cronbach’s alpha is as follows:
𝑘 ∑©}•3(1 − 𝑝} )𝑝}
§=
∝ ¨1 − ¬
𝑘−1 𝜎ª«4

k is the number of items on the exam; pi, referred to as the item difficulty, is the proportion of
examinees who answered item i correctly; and 𝜎ª«4 is the sample variance for the total score.
To illustrate, suppose that a five - item multiple − choice test was administered with the following
percentages of correct response:
𝑃3 = 0.4 𝑃4 = 0.5 𝑃T = 0.6 𝑃W = 0.75, 𝑃= = 0.85, and σ §4® =1.84.

Cronbach’s alpha would be calculated as follows:


5 1.045
∝§= ¯1 − ° = 0.54
5−1 1.840

Cronbach’s alpha ranges from 0 to 1.00, with values close to 1.00 indicating high consistency.
Professionally developed high-stakes standardized tests should have internal consistency
coefficients of at least .90. Lower-stakes standardized tests should have internal consistencies of
at least .80 or .85. For a classroom exam, it is desirable to have a reliability coefficient of .70 or
higher.
High reliability coefficients are required for standardized tests because they are administered only
once and the score on that one test is used to draw conclusions about each student’s level on the
trait of interest. It is acceptable for classroom exams to have lower reliabilities because a student’s
score on any one exam does not constitute that student’s entire grade in the course. Usually grades
are based on several measures, including multiple tests, homework, papers and projects, labs,
presentations, and/or participation.

46
Suggestions for Improving Reliability
There are primarily two factors at an instructor’s disposal for improving reliability: increasing
test length and improving item quality.

1. Test Length. In general, longer tests produce higher reliabilities. This may be seen in the old
carpenter’s adage, “measure twice, cut once’’. Intuitively, this also makes a great deal of sense.
Most instructors would feel uncomfortable basing midterm grades on students’ responses to a
single multiple-choice item, but are perfectly comfortable basing mid- term grades on a test of 50
multiple-choice items. This is because, for any given item, measurement error represents a large
percentage of students’ scores. The percentage of measurement error decreases as test length
increases. It is evident that even very low achieving students can answer a single item correctly,
especially through guessing. However it is much less likely that low achieving students can
correctly answer all items on a 20-item test.

Although reliability does increase with test length, the reward is more evident with short tests
than with long ones. Increasing test length by 5 items may improve the reliability substantially if
the original test was 5 items, but might have only a minimal impact if the original test was 50
items. The Spearman-Brown prophecy formula (shown below) can be used to predict the
anticipated reliability of a longer (or shorter) test given a value of Cronbach’s alpha for an existing
test.
𝑚 ∝³´µ
∝D±² =
1 + (𝑚 − 1) ∝³´µ
D±²
∝ is the new reliability estimate after lengthening (or shortening) the test; aold is the reliability
estimate of the current test; and m equals the new test length divided by the old test length. For
example, if the test is increased from 5 to 10 items, m is 10 / 5 = 2.

Consider the reliability estimate for the five-item test used previously (aˆ = .54). If the test is
doubled to include 10 items, the new reliability estimate would be
2(0.54)
∝D±² = = 0.70
1 + (2 − 1)0.54

a substantial increase. Note, however, that increasing a 50-item test (with the same reliability) by
5 items will result in a new test with a reliability of just 0.56. It is important to note that in order
for the Spearman-Brown formula to be used appropriately, the items being added to lengthen a
test must be of a similar quality as the items that already make-up the test. In addition, before
lengthening a test, it is important to consider practical constraints such as time limit and examinee
fatigue. As a general guideline, it is wise to use as many items as possible while still allowing
most students to finish the exam within a specified time limit.

Item Quality
Item quality has a large impact on reliability in that poor items tend to reduce reliability while
good items tend to increase reliability. How does one know if an item is of low or high quality?
The answer lies primarily in the item’s discrimination. Items that discriminate between students
with different degrees of mastery based on the course content are desirable and will improve
reliability. An item is considered to be discriminating if the “better” students tend to answer the
item correctly while the “poorer” students tend to respond incorrectly.
Item discrimination can be measured with a correlation coefficient known as the point-biserial
correlation (rpbi). rpbi is the correlation between students’ scores on a particular item (1 if the
student gets the item correct and 0 if the student answers incorrectly) and students’ overall total
score on the test. A large, positive rpbi indicates that students with a higher test score tended to
answer the item correctly while students with a lower test score tended to respond incorrectly.

47
Items with small, positive rpbi’s will not improve reliability much and may even reduce reliability
in some cases. Items with negative rpbi’s will reduce reliability. For a classroom exam, it is
preferable that an item’s rpbi be 0.20 or higher for all items. Note that the item analysis provided
by Testing and Evaluation Services reports the rpbi for each item.

Regarding item difficulty, it is best to avoid using too many items that nearly all of the students
answer correctly or incorrectly. Such items do not discriminate well and tend to have very low
rpbi’s. In general, 3-, 4-, and 5- alternative multiple-choice items that are answered correctly by
about 60% of the students tend to produce the best rpbi’s. For 2-alternative items, the target item
difficulty is 75% correct.

Standard Error of Measurement:

Reliability can also be expressed in terms of the standard error of measurement. It is an estimate
of how often you can expect errors of a given size.

Factors affecting reliability of test results

Some of the factors that might affect the degree of reliability of test results include:

1. Characteristics of the test


2. Test difficulty
3. Test length
4. Time allocated to the test
5. Subjectivity in scoring
6. Testing conditions
7. Group validity
8. Other factors such as motivation, attitude of testees, guessing of answers, etc.

How can the teacher improve Reliability?


§ Avoid ambiguous questions and directions or instructions.
§ Sample more items with similar contents.
§ Use well defined scoring / marking schemes.
§ Train raters / markers in an effort to standardize marking or interpretation of
students’ work

48
CHAPTER SIX

MEASURES OF CENTRAL TENDENCY

Purposes of The Measures Of Central Tendency

Nature of the Measures


The measures of central tendency are also known as measures of location. They are often referred
to as averages. They are numbers that tend to cluster around the “middle” of a set of values. They
provide single values which are used to summarize a set of observations or data. The three main
measures that are mainly used in educational practice are the arithmetic mean, the median and the
mode.

Purposes of the Measures


The measures of central tendency or location serve three main purposes. These purposes are
described below.
Purpose One
Measures of location are used as single scores to describe data. For example, in an inter-class
spelling competition, one student may be chosen to represent a class. Several factors may be
considered before choosing the particular student. Once selected, the student becomes a
representative (or a typical value) of the class. In a similar sense, a single score may be needed
to represent a set of scores. One of the ways of selecting a single (typical) score is to use the
measures of central tendency.

Suppose you have 200 students in your class. If you give them a quiz and you mark, you will
have a 200 scores before you. What meaning can you give to the performance of the class? Has
the class performed well? Has the performance of the class been poor? To answer these
questions, it will not be wise to start calling out the names of the individual students and their
scores. It will be a tiring and fruitless exercise. The best thing to do is to compute a typical score.
This typical score would either be the mean, median or the mode.

Purpose Two

They help to know the level of performance by comparing with a given standard performance.
Very often teachers are asked about the general performance of their students. The answers are
often like “The performance this year is very poor”, “This year the students did not do well at
all”, “Oh, I tell you, my students did extremely well this year. These responses are based on
subjective comparisons. Phrases such as, very poor, not do well, and extremely well, do not have
any scientific basis. One teacher perception of “poorness” may be different from another.

To solve the problem of subjectivity and ambiguity, measures of central tendency are obtained
for a group and these measures are compared with a known standard. Therefore, instead of saying
the performance is poor, a teacher can say, the performance is above average, or average, or
below average, where average would be the known standard or criterion.
In a school where the grading system is A, B, C, D and E, the average performance could be C,
and the midpoint of the C range can be taken to be the standard or criterion. For example, if the
C range is from 60 – 70, then a possible standard would be 65 (i.e 60 + 70 /2). In some situations,
there is a pass or fail category, based on a pass mark. Suppose the pass mark is 55, those who
score 55 and above have passed and those below 55, have failed. The standard or criterion is
therefore 55.

49
For individual cases, a measure of central tendency or location can be taken as the standard or
criterion for comparison. Instead of an individual responding to a question about his/her
performance as poor, very good, excellent, it is better to say performance is above average, far
above average, below average or far below average. In this instance, there is no subjectivity.
Performances are being compared to actual values that are taken as an average.

Consider the following set of scores for 40 pupils in a Social Studies class.

68 42 58 45 60 72 80 50 70 90
75 80 45 60 72 85 60 75 58 62
48 65 60 65 55 48 74 68 66 59
36 90 54 58 62 68 44 90 65 78

The mean for these scores is 64.0 and the median is 63.50. If it is assumed that the average or
standard or criterion performance is 60, then one can say that the performance of this class is
above average since 64 and 63.5 are above the standard of 60.

Purpose Three
They give the direction of student performance. One can compute the values of the mean,
median and the mode and make comparisons.

Let us see how this works.


1. When mean > Median > Mode, the distribution is skewed to the right (positive skewness)
showing that performance tends to be low.
For example, in a class test, the following values may be obtained for the measures of central
tendency. 𝑀𝑒𝑎𝑛 = 50 𝑀𝑒𝑑𝑖𝑎𝑛 = 45 𝑀𝑜𝑑𝑒 = 38
You can observe that the Mean is greater than the Median and is also greater than the Mode. Also
the Median is greater than Mode. This frequency polygon below illustrates the point.

Mode Median Mean

2. Where Mean < Median, or Mean < Mode or Median is < Mode, the distribution skewed
to the left (negative skewness) showing that performance tends to be high.

For example, in a class test, the following values may be obtained for the measures central
tendency. Mean = 55 Median = 60 Mode = 75

You can observe that the Mean is less than the Median and is also less than the Mode. Also the
Median is less than the Mode. This information implies that the performance tended to be high
in this class test. The frequency polygon below illustrates the point.

50
Mean Median Mode

However, if the mean, median and mode have the same value, then the distribution of the values
is normal. This is illustrated below. Mean = Median = Mode

Exercise

1. In a class quiz, a mean of 48 was obtained with a median f 62. How would the
performance of the class be described?
(a) Average
(b) Below average
(c) High
(d) Low
2. Measures of location can be used to determine the direction of student performance.
(a) True
(b) False
3. In a class test, the mean was 55 and the mode was 68. Performance is therefore high.
(a) True
(b) False
4. The median in an entrance examination was 62 with a mode 54. Performance of the
group was low.
(a) True
(b) False

5. When the mean is equal to the median, performance is skewed to the right

51
(a) True
(b) False

6. When a distribution is negatively skewed, the mode is less than the mean.
(a) True
(b) False

THE ARITHMETIC MEAN


Computing the Mean
In Statistics, there are three types of the mean. These are arithmetic, geometric and
harmonic means. In Education however, the arithmetic mean is the most useful. In this
session, we shall adopt the term, Mean, to represent the Arithmetic mean. The Arithmetic
mean (or the Mean) is the sum of the observations in a set of data divided by the total
number of observations.

The arithmetic mean is often represented by the symbol, 𝑋F pronounced X bar. The mean
can be computed from raw data (ungrouped data) and grouped data. It can also be easily
obtained from Microsoft Excel, SPSS ad other statistical software.

Computing from Raw Data (Ungrouped Data)

Given the following scores, 15, 12, 10, 10, 9, 20, 14, 11, 13, 16, to obtain the mean, all the
scores are added and divided by the total number of observations.
The mean for the scores above is:
3=>34>3A>3A>@>4A>3W>33>3T>3¡ 3TA
𝑥̅ = 3A
= 3A = 13

The above expression can be written in the algebraic form as learnt in Session 2 as
∑·¸
·¹· ¶ =
3TA
·¸
=13
3A
∑«
General equation will be written as 𝑥̅ = º
Where N is the total number of observations.

Now compute the mean for the following of scores:


15, 82. 75, 87, 60, 48, 90, 72, 65, 80, 65, 49, 52, 56, 68, 72, 64, 80, 70, 58
compare your answer with the following result 𝑋F = 66.9

1.2 Computing from Grouped Data


Two methods are generally used. The methods are used with frequency distributions tables.
∑ »S ∑ »S
The long method uses the following formula: 𝑋F = ¼ OR 𝑋F = º where 𝑓 is the frequency
and 𝑥, the class marks or class midpoints. Note that n and N refer to the number of observations
and can also be written here a ∑ 𝑓, which is the total frequency from the frequency distribution
table.

The following steps are used, when given a frequency distribution table.
Step 1. Obtain the class marks or class midpoints.
Step 2 Multiply the class marks or class midpoints.
Step 3 Add the values in the 𝑓𝑥 column
Step 4. Divide the result in Step 3 with total frequency to obtain the mean.

52
Now follow the example in Table

Stores Midpoint Freq


X f fx
46 – 50 48 4 192
41 – 45 43 6 258
36 – 40 38 10 380
31 – 35 33 12 396
26.- 30 28 8 224
21.- 25 23 7 161
16. – 20 18 3 54
Total 50 1665
∑ »S ∑ »S 3¡¡=
Applying the formula give us X= ∑½
= ¼
= =A
= 33.3

I trust that you have followed the steps ad understood it well.


Now let us try the coding method.
In the coding method, the following steps are used. Before you use the coding method, make
sure that all class intervals are of equal size.

Step 1 Obtain the class midpoints or class marks.

Step 2 Create a new column after the frequency column and give a heading, d.

Step 3 Choose the class that is in the middle of the distribution, but if there is not
exactly middle class, choose one of the two middle classes (preferably the
column with the higher frequency). Under the column, d, code this class with
‘0’ (zero).
Step 4 Give a code of 1 to the class immediately above the class coded 0. The higher
class is given a code of 2, the next higher one, a code of 3. Continue till you
reach the topmost class.
Step 5. Give a code of -1 to the class immediately below the class coded 0. The lower
a code of -2, the next lower one, a code of -3. Continue until you reach the
bottom class.
Step 6. Create another column 𝑓𝑑, where you put in the values of the product of
frequencies and the codes.
Step 7. Add the values in the 𝑓𝑑 column.
Step 8 Divide the result in Step 7 with the total frequency and multiply the result with
the class size, i.
Step 9. Add the result in Step 8 to the midpoint of the class coded 0 and obtain the final
answer. This midpoint is called the assumed mean (AM).

The nine steps above are summarized in the formula for the coding method as
∑»µ
mean (X)= AM + Ÿ ∑» I,

where AM, is the assumed mean, f, is the frequency, d is the code for each class, ∑ 𝑓 is the total
frequency or N and I, the class size.

53
Now follow the example in Table below

Computing the mean using the coding method


Score Midpoint (𝒙) Freq ( f) Code (d) fd
46-50 48 4 3 12
41-45 43 6 2 12
36-40 38 10 1 10
31-35 33 12 0 0
26-30 28 8 -1 -8
21-25 23 7 -2 -14
16-20 18 3 -3 -9
Total 50 3

∑»µ T 3=
Applying the formula give us X = AM + Ÿ ∑» i. = 33+ Ÿ=A 5 = 33+ =A = 33.3
You will notice that both methods give the same result. The coding method is more appropriate
where the frequencies are large in value. It is also easier to use when the midpoints have
fractions such that multiplying them with the frequencies produces large values.

TRY
Use both methods to obtain the mean for the frequency distribution below.
Classes Frequency
61-70 15
51-60 20
41-50 25
31-40 17
21-30 12
11-20 11

The answer is 43.1

Properties of the Mean


The mean has features that distinguish it from the other measures of central tendency.

These features of properties are listed below.


1. The mean is influenced by every score or value that makes it up. If a score changed, the
values of the mean change. For example, for the scores, 3, 4, 2, 4, 7, the mean is 4.
However, if we change 2 to get 3, 4, 7, 4, 7, the mean changes to 5. Thus a change in
just one value changes the mean.
2. The mean is very sensitive to extreme scores which are called outliers.
The mean for the following scores. 4, 2, 3, 6, 5 is 4. If 3 is changed to 23, we have 4, 2,
23, 6, 5. The new value of 23 is an extreme score considering the fact that all the scores
are below 7. The mean changes from 4 to 8. The value of 8 is greater than the majority
of the scores.
3. The mean is a function of the sum (or aggregate or total) of the scores. This important
number is missing. The mean cannot be obtained. Of the three measures it is the only
one that is a function of the sum of the scores.

This property also makes it possible to calculate the mean for a combined group if only the
means and number of scores (N) are available since 𝑁𝑥̅ = ∑ 𝑥

54
For example, Sir Lovely’s class has a mean of 5 with a class of 20 while Mr IOD’s class has a
mean of 6 with a class size of 30. The mean for the combined class can be obtained by finding
the sum for Sir Lovely’s class and the sum for Mr IOD’s class. The results are added and
divided by the total number of students. The calculation is shown below.
(=×4A)>(¡×TA) 4¿A
Mean for the total group 𝑥̅ = = = 5.6
=A =A

4. If the mean is subtracted from each individual score and the differences are summed.
The result is 0. Given the scores, 4, 2, 3, 6, 5 with a mean of 4, if, if we subtract the
mean from each individual score and we sum up the results we will get 0.
This is illustrated below.
4–4= 0
2 – 4 = -2
3 – 4 = -1
6–4= 2
5–4=1
The distance of the score from the mean is known as the deviation or the spread about the mean.
The values of 0, -2, -1, 2, and 1 are called deviations and the sum of the deviations is 0.

5. If the same value is added to or subtracted from every number in a set of scores, the
mean goes up or goes down by the value of the number. For example, given the scores,
8, 2, 10, 4, the mean, 𝑥̅ = 6. If we add 2 to each score we obtain, 10, 4, 12, 6, which
gives a mean of 𝑥 = 8 which is the original mean plus the value added to each score
i.e. 6 + 2.

6. If each score is multiplied or divided by the same value, the mean increases or
decreases by the same value. For example, given the scores, 8, 2, 10, 4, the mean of 𝑥̅
= 6. If we multiply each score by 3 we obtain, 24, 6, 30, 12, which gives a mean of 𝑥 =
18 which is the original mean times 3 i.e. 6 × 3.

The mean has a number of strengths and weaknesses.

Strength of the Mean

1. It uses every score in the data set. Thus every score contributes to obtaining the mean.
2. It is the best summary score for a set of observations which is normal and there are no
extreme scores.
3. It is used a lot for further statistical analysis. As we shall see later, the two other
measures, median and mode, have limited statistical use.

Weakness of the Mean


1. It is influenced by extreme scores. These extreme scores distort the value of the mean
and results in wrong interpretation of the data.
2. It is very sensitive to a change in the value of any score. Since the mean is based on all
scores, the moment one score changes, the mean will also change.
3. It cannot be computed if a score is missing and the sum of the scores or observation
cannot be obtained.

Uses of the Mean


As a measure of central tendency or location, the classroom teacher will find the mean useful in
improving teaching and learning.

55
1. It is useful when the actual magnitude of the scores is needed to get an average. For
example, to select a student to represent a whole class in a statistics competition, the
student’s total performance in statistics is used for selection.

2. Several descriptive statistics are based on the mean. These descriptive statistics such as
the standard deviation, variance, correlation coefficients, z-scores and T-scores are very
useful in teaching and learning. Without the mean, they cannot be computed.

3. It is the most appropriate measure of central tendency when the scores are
Symmetrically distributed (i.e. normal). A symmetrical or normal distribution does not
have extreme scores to influence the mean

4. It provides a direction of performance when compared with the other measures of


location especially the median. Where Mean > Median, the distribution is skewed to
the right (position skewness) showing that performance tends to be low and where
Mean ¸Median, the distribution is skewed to the left (negative skewness) showing that
performance tends to be high.

5. It serves as a standard of performance with which individual scores are compared. For
example, for normally distributed scores, where the mean is 56, an individual score of
80 can be said to be far above average. Also performance can be described as just
above average or far below average or far below average or just below average
considering the individual scores.

Exercise
1. The mean score obtained by 10 students in a statistics quiz was 20 out of a total of 25.
It was found later that a student who obtained 5 actually had 20. How would the
discovery affect the mean?
(a) More information is needed
(b) New mean is greater than old mean.
(c) Old mean is greater than the new mean.
(d) There is no change in the old mean.

2. A group of 20 students earned a class mean of 30 on a quiz. A second group of 30


students had a mean score of 45 on the same test. What is the mean score of the 50
students?
(a) 32.5
(b) 39.0
(c) 41.0
(d) 45.0

3. One strength of the mean as a measure of location is that it is


(a) Appropriate for nominal scale variables.
(b) Limited in further statistical analysis.
(c) Not affected by extreme score.
(d) Useful for symmetrical distributions.

56
The table below gives the distribution of the ages of teachers in a school.

Age Numbers of teachers


45-49 25
40-44 36
35-39 77
30-34 47
25-29 15
Total 200

4. What is the value of the mean age?


(a) 35.2
(b) 36.2
(c) 37.2
(d) 38.2

5. One weakness of the mean as a measure of central tendency is that it


(a) Cannot be used when data is complete.
(b) Is influenced largely be extreme scores.
(c) Is most appropriate for normal distributions.
(d) Uses few values in a distribution.

THE MEDIAN

Nature of the median


The median is a score for a set of observation such that approximately one –half (50%) of the
score are above it and one- half (50%) are below it when the scores are arrayed. It is regarded as
the ‘middle score’ in a distribution after the scores have been arrayed. It is often represented by
the symbol, Median.

Computing the median


The median can be computed from both ungroup and grouped data.

Computing from Ungroup Data


Suppose you have the scores, 8, 4, 9, 1, 3, to obtain the median, you arrange the scores in
sequential order, say from the lowest score to the highest score. In the given set of scores, this
gives, 1, 3,4,8,9. Locate the score in the middle. This gives you 4
In several instances, the data set you will have may not be as few as this. You may have about 40,
80, 200 scores or value. Simple formulae have been derived to help us locate the median.
To find the median, first arrange the scores in an ascending or descending order.
D>3
Then for odd set of numbers, locate the median at the Ÿ 4 th position. For the even set of
D>3
numbers, locate the median by adding the number at the Ÿ 4
th position.

Let us look at a few examples.


1. For odd set of numbers.
Suppose you are given a set of observation as: 8 11 26 7 12 9 6 20 14
There are 9 observations so this is an odd number of scores.
1. Rearrange the scores in a sequential order: 6 7 8 9 11 12 14 20 26
=A>=W @>3 4A
2. Find 4 position i.e Ÿ 4 𝑡ℎ = 4 th = 5th position.
3. The score at the 5th position is 11.

57
The advantage with this procedure is that you do not need to rearrange the entire set of
scores. When you locate the score at the required position, you stop

1.1 For even set of numbers


Suppose you are given a set of numbers as 48 52 36 54 62 71 69 45 58 32
There are 10 observations so this is an even number of score.
i. Rearrange the scores in sequential order: 32 36 45 48 50 54 58 62 69 71
ii. Locate the two middle scores, ass them and divide by two i.e.
=A>=W 3AW
= 4 = 52
4
D>3 3A>3 33 3
Alternatively, you can find the Ÿ 4 th position 4 th= = th = 5 4position. This means that
the median lies half-way between the 5th and 6th positions.
iii. The score at the 5th position is 50 and at the 6th position is 54. Half-way between 50 and
(=A>=W) 3AW
54 is 4 = 4 = 52. The median score is therefore 52
You notice that for the even set of numbers, both methods provide the same results.
Now obtain the median for the following set of numbers.
45, 82, 75, 87, 60, 48, 90, 72, 65, 80, 65, 49, 52, 56, 68, 72, 64, 80, 70, 58
The answer is 66.5.

Computing the Median from Grouped Data


Computing the median involves 4 simple steps. These steps are described below.
Step 1. Obtain cumulative frequencies for the frequency distribution.
Step 2. Identify the median class. It is the class that will contain the middle score or the
∑» À
median. Find the value of 4 , or 4 where ∑ 𝑓(or N) is the total frequency.
This is the position of the middle score or median. Checking from the cumulative frequency
column, find the value that is equal to the position or the smallest value that is greater than
the position.
Step 3. Identify the lower class boundary of the median and the class size.
Step 4. Apply the formula below by substituting the respective values into the formula.
Â
gxÃ
Mdn= 𝐿3 +Á»j Ç i where
ÄÅÆ

L1 is the class boundary of the median

N is the total frequency


Cf is the cumulative frequency of the class just below the median class
I is the class size/width
𝑓CµD is the frequency of the median

Now follow the example in Table 3.3

classes Midpoint Freq Cum Freq

X f cf
46- 50 48 4 50
41-45 43 6 46
36- 40 38 10 40
31- 35 33 12 30
26-30 28 8 18
21-25 23 7 10
16- 20 18 3 3

58
Total 50

=A
The total frequency is 50 therefore = = 25. Now there is no 25 in the cumulative frequency
column so we select the smallest value that is greater than 25.this value is 30, which belong to
the 31 – 53 class. The median class therefore 31 – 35. The lower class boundary is 30.5 and the
size is 5. Subtracting the values in the table in the formula above, we have
ȸ
g3¿ 4=g3¿ ?
Median = 30.5+ Á j 34 Ç 5 = 30.5+ Ÿ 5 = 30.5 + Ÿ34 5 = 30.5+ [0.58]5 = 30.5 = 2.9 = 33.4
34

Properties of the Median

The median has a number of properties that distinguishes it from the other measures of
central tendency. These properties are listed below.

1. It is often not influenced by extreme scores as the mean does. For example, the median
for the Following numbers, 2, and 3,4,5,6 is 4. If 6 changes to 23 as an extreme scores
the median remains 4.
2. It does not use all the scores in a distribution but uses only one valve.
3. It has limited use for the further statistical work.
4. It can be used when there is incomplete data at the beginning or the end of the
distribution.
5. It is mostly appropriate for the data from interval and ratio scales.
6. Where there is very few observations, the median is not representative of the data.
7. Where the data set is large, it is tedious to arrange the data in an array for ungrouped data
computation of the median.

Strengths of the median


1. It is not affected by extreme scores.
2. It is the most appropriate measure of central tendency when the distribution of
scores is skewed.
3. It can be obtained even if data is incomplete. If data is missing at the beginning
and end of the sequential arrangement, the median can still be obtained.

Weakness of the Median

1. It has limited use in further statistical work. Most statistical distributions are assured normal
so the Median does not come into focus much.

2. Where there are very few scores or an odd patter n of scores, the medium may not be accurate.
For example in a class of 20, where 15 students had 10 and 4 students had 18 and 1 student had
`20, The distribution of scores looks like this: 10, 10, 10,10,10, 10 10,10, 10,10,10,10,10,10, 18,
18,18, 18, 20.
What would be the middle score? In a situation like, this an estimate of the median may not be
accurate.
3. It uses very little of the information available in the set of scores. It depends on only one score
and ignores information at the ends the distribution. It does not use all the scores in the
distribution.

4. It cannot be used where the variable are from the nominal scale of measurement.

5. It is not sensitive to changes in the distribution, except where the changes occur in the middle
of the distribution.

59
Uses of the Median

As a measure of central tendency, the classroom teacher and other educational practitioners will
find the median as a useful measure of central tendency or location when there is reason to believe
that the distribution is skewed. For skewed distributions, the best measure of central tendency,
which provides a summary score or the typical score, is the median.

2. It is used as the most appropriate measure of location when there are extreme scores to affect
the mean. For example in an establishment of senior and junior staff, the best measure of the
‘average’ or ‘typical’ salary is the median because the senior staff salaries will inflate the mean.

3. It useful when the exact midpoint of the distribution is wanted.

4. It provides a standard of performance for comparison with individual scores when the scores

distribution is skewed. For example, if the median score is 60 and an individual student obtains
55, performance can be said to be below average/median. Also performance can be described as
just above average or far below average or just below average.

5. It can be compared with the mean to determine the direction of student performance.

Where Median < Means, the distribution is skewed to the right (positive skewness) showing that

Performance tends to below and where Median > Mean, the distribution is skewed to the left

(negative skewness) showing that performance tends to be high.

1. One strength of the medium as a measure of location is that it is

A. appropriate for nominal scale variables.


B. limited in further statistical analysis.
C. not affected by extreme scores.
D. Useful for symmetrical distributions.
2. The following scores were available for 9 students in a Statistics class.
18 20 15 12 12 10 8 17 13
The score for the 10th student was missing but it was known to be the second highest score.
What would be the median for the distribution?
A. 12
B. 15
C. 16
D. 17

The median score for group of 19 students was 58. A 20th student who had a score of 45
joined the group. What is the new median score?

A. 10.5
B. 45.0
C. 58.0
D. It cannot be determined

4. The median score for 15 students in a test was 67. Fourteen of the students had a median
score of 66. What was the score for the 15th student?
A. 66
B. 67
C 68
D. More information is required.

60
5. One limitation of the median as a measure of location is that it
A. can be used when data is incomplete.
B. depends largely on extreme scores.
C. is inappropriate for skewed distributions.
D. uses few values in a distribution.

6. Compare the median age in the following distribution.

Age Number of teacher


45-49 25
40-44 36
35-39 77
30-34 47
25-29 15
Total 200

THE MODE

5.1 Nature of Mode

The mode is the number in a distribution that occurs most frequently.


A distribution can have only one mode. Such distributions are called unimodal. A distribution
may have no mode at all. These are also multi-mode distributions like 2 modes (bi-mode), 3
modes (tri-modal) etc. Let us look at the following sets of data.

Set 1: 14, 14, 15, 15, 18, 18, 22, 24


Set 2: 21, 24, 25, 18, 32, 50,45,26 35
Set 3: 42, 42, 50 50, 62, 68, 68, 70,
Set 4: 12 ,13,14, 15, 16, 17, 18 19 20

Which numbers occur most frequently in each of the sets above?

In Set 1,18 occurred most frequently. It occurred 3 times. Therefore there is only one mode.

In Set 2, no number occurred most frequently. Therefore there is no mode.

In Set3, 42, 50, 62 and 68 occurred the same number of times, i.e. 2 times. There are therefore
4 modes.

Computing the Mode


The mode can be computed from both ungrouped (raw) data and grouped (frequency
distributions).

Computing from Raw Data (Ungrouped Data)


Suppose you are given the following set of scores:
12,18,21,56,45,75,48,21,36,35,38,45,65,72,45,48,21,45,21,
To obtain the mode, you do a visual search to determine the number that occurs most frequently.
This method however, wastes a lot of time. To reduce the amount of search and the degree of

61
errors, a tally method is recommended. Here you list the numbers and as each appears you
represent it with a slash. At the end, find the value that has the most number of slashes.

The data above can be represented as follows:


Number 12 18 21 56 45 75 48 36 35 38 65 72
Tally / / //// / ///// / // / / / / /
Frequency 1 1 4 1 5 1 2 1 1 1 1 1

From the distribution of raw data, the mode is 45. It appeared 5 times, which is more than the
others.

Computing from Grouped Data


The mode can be obtained from grouped data by three simple steps. These steps and outlined
below,
Step 1. Determine the modal class i.e. the class with the highest frequency.
Step 2. Determine the lower class boundary of the modal class and the class size or width

Step 3. Apply the following formula.


»ÄÉÅg»Ê· ∇·
Mode= 𝐿3 +Ÿ(» i= L3 + Ì∇ Îi
ÄÉÅg»· )>(»ÄÉÅ g»Ê· ) · >∇j

𝐿3 is the class boundary of the modal class


𝑓g3 = ∇3 is the frequency of the class below the modal class
𝑓3 = ∇4 is the frequency of the class above the modal class
i is the class size/ width
𝑓C³µ is the frequency of the modal class.

Now follow the example in Table


Computing the mode from grouped data
Classes Frequency
46-50 4
41-45 6 The class with the highest frequency
36-40 10 This gives the modal class.
31-35 12 The lower class boundary of the modal class is
26- 30 8 30.5
21-25 7
16-20 3
Total 50

Applying the formula:


»ÄÉÅ g»Ê· 34g¿ W
Mode = 𝐿3 +Ÿ(» i= 30.5+ Е(34g3A)>(34g¿)žÑ ×5=30.5=Ÿ¡ ×5=33.8
ÄÉÅ g»· )>(»ÄÉÅ g»Ê· )
Alternatively
∇3 2
L3 + ¯ ° i = 30.5 + ¯ ° 5 = 33.8
∇3 + ∇4 2+4

If you have not understood it, go over it again and find the mode for the following distribution.

62
MEASURES OF CENTRAL TENDENCY
Classes Frequency
61 -70 15
51-60 20
41-50 25
31-40 17
21-30 12
11-20 11

The answer is 43.6

Strengths and Weaknesses of the Mode


The mode has a number of strengths and weaknesses. These are stated below.

Strengths of the Mode


1. It is easy to find from raw or ungrouped data.
2. It is not affected by extreme values or outliers.
3. It is the most appropriate measure of central tendency when the variable is normal.
4. It is not affected of scores normal or skewed is normal or skewed.
5. When a distribution is normal, the mode is of the same value as the mean and the median.

Weaknesses of the Mode


1. It can be abstract in a distribution. i.e. a distribution may not have a mode at all.
2. A distribution can have two or more modes and it is difficult to select one as the measure of
central tendency or location.
3. It does not take into account all the values in a distribution.
4. For a frequency distribution, where the data involved is discrete, it is difficult to obtain a mode
which has a discrete value.
5. It has limited statistical use. For further statistics, the mode is not used because distribution
may have a mode or may have a multi-mode.

Uses of the mode


As a measure of central tendency, the mode has limited use in improving teaching learning due
to the apparent weaknesses listed above. However, in some cases, mode proves useful
1. It is useful when there is the need for a rough estimate of the measure of central tendency or
location. Computing the mean and median take more time, so mode gives a quick estimate of a
summary score for a group.
2. It is useful when there is the need to know the most frequently occurring value. Example in
the fashion world there may be the need to know the most common dress style. The mode
provides the answer.
3. When a unique mode available, it provides a standard of performance comparison with
individual scores. For example, if the modal score is 48 and individual student obtains 55,
performance can be said to be above average. A performance can be described as just above
average.
4. It can be compared with the mean to determine the direction of student’s performance.
When Mean < Mean, the distribution is skewed to the right (positive skewness showing that
performance tends to be low and where Mode > Mean, the distribution skewed to the left (negative
skewness) showing that performance tends to be high.

5. It can be compared with the medium to determine the direction of student’s performance.

63
Where Mode < Median, the distribution is skewed to the right (positive skewed showing that
performance tends to be low and where Mode > Median, the distribution skewed to the left
(negative skewness) showing that performance tends to be high.

Exercise
1. One strength of the mode as a measure of location is that it is

A. affected by extreme scores variables.


B. appropriate for nominal scale
C. 1 not limited in further statistical analysis.
D. Sensitive to every individual score.

2. One weakness of the mode as a measure of central tendency for a distribution is that, it
A. is appropriate for nominal scale data.
B. is used if there is incomplete data.
C. provides more than one modal score.
D. uses every score in the distribution.

3. The mode for a group of 19 students was 58. A 20th student had a score of 57.
What is the new mode?
A. 20
B. 57
C. 58
D. It cannot be determined.

4. The mode for a group of 30 students in a test was 55. For twenty-nine of the students, the
mode was 54. What was the score for the 20th students?
A. 1
B. 54
C. 55
D. More information is required.

4. Compute the modal, mean and median age in the following distribution and deduce
whether the distribution is skewed positively or negatively or normal. Justify your
answer.

Age Number of the teachers


45-49 25
40-44 36
35-39 77
30-34 47
25-29 15

QUARTILES

Nature of Quartiles

Quartiles are individual scores of location that divide a distribution into 4 equal parts such that
each part contains 25% of the data. Practically there are 3 quartiles the first (lower) quartile, the
second (middle) quartile and the third (upper) quartile. The second (middle) quartile is the
median which you studied in Session 4. The symbols used to represent the quartiles are:

64
Q1 – First (lower), quartile; Q2 – Second (middle) quartile;

Q3 – Third (upper) quartile

The quartiles are illustrated below

Q1 Q2 Q3

Median

Computing the Quartiles

Quartiles can be computed from both ungrouped and grouped data. Our focus is on the lower
quartile and the upper quartile since we have studied the middle quartile (median) already.

Computing from Ungrouped Data


There are two methods in computing quartiles from ungrouped data. These are the median
method and the formula method.

The Median Method


1. First arrange the scores in a sequential order (either ascending or descending)
D>3
2. Find the overall median (i.e. score at the Ÿ th position) for the data set
4
the overall median divides the distribution in two equal parts.
3. Find the median for the first half/part. This median becomes Q1, the first quartile.
4. Find the median for the second half/part. This median becomes the Q3, third quartile.

Example.
Suppose you are given the following scores: 8, 10, 12, 7, 6, 13, 18, 25, 4, 22, 9.
1. Arrange the scores in ascending order as, 4, 6, 7, 8, 9, 10, 12, 13, 18, 22, 25
D>3 33>3 34
2. Median: The score at the Ÿ 4
=Ÿ 4
= 4
= 6𝑡ℎ position which is 10.

4, 6, 7 8, 9, 10 12, 13, 18, 22, 25

3. Find the median for the first part: 4, 6 , 8, 9. This gives Q1 as 7.

4. Find the median for the second part: 12, 13, 18, 22, and 25. This gives Q3 as 18.

Now let us look at the formula method

The Formula Method

1. First arrange the scores in a sequential order (either ascending or descending)


3
2. Locate Q1 at the W (𝑛 + 1)th position
T
3. Locate Q3 at the (𝑛 + 1)th position
W

Let us look at an example.

65
Suppose you are given the following scores: 8, 10, 12, 7, 6, 13, 18, 25, 4, 22, 9.

1. Arrange the scores in ascending order as, 4, 6, 7, 8, 9, 10, 12, 13, 18, 22, 25
3 34
2. Find the W (11 + 1)th position. This gives us W = 3rd position. The score at the
3rd position is 7 which is Q1
T T¡
3. Find the W (11 + 1)th position. This gives us W
= 9th position. The score at the 9th
position is 18 which is Q3

For an even set of numbers, the positions may end up with fractions. Let us look at an
example. Suppose you are given a set of observations as: 8 11 26 7 12 9 6 20 14
18 10 22.
There are 12 observations so this is an even number of scores.
To find the quartiles:

1. Arrange the scores in the an ascending order as, 6, 7, 8, 9, 10, 11, 12, 14, 18, 20, 22, 26
3
2. To obtain Q1, find the (12 + 1) th position. This gives us
W
3T 3
W
= 3 Wth position. This means that Q1 lies between the 3rd and 4th positions. Now, at
3
the 3rd position is 8 and the 4th position is 9. Multiply the difference between 8 and 9 the W.
3 3
This gives us W × 1 = W .
3 3
Add the answer to 8 to obtain Q1 as 8 or 8.25
W W

T T@ T
3. To obtain Q3, find the W (12 + 1)th position. This gives us W = 9 Wth position.
This means that Q3 lies between the 9th and 10th positions. Now, at the 9th position is 18
T
and the 10th position is 20. Multiply the difference between 18 and 20 with W.
T 3 3 3
This gives you W
× 2 = 1 4 . Add the answer 1 4 to 18 to obtain Q3 as 194 or 19.5

Now obtain the lower quartile and the upper quartile for the following set of numbers using both
the median method and the formula method.

45, 82, 75, 87, 60, 48, 92, 72, 65, 80, 65, 49, 52, 56, 68, 72, 64, 80, 70, 58

The answers are: Median method Q1 = 57 Q3 = 77.5

Formula method Q1 = 56.5 Q3 = 78.75

Computing the Quartiles from Grouped Data

Computing the quartiles involves 4 simple steps. These steps are described below

Step 1. Obtain cumulative frequencies for the frequency distribution.

Step 2. Identify the quartile classes.


3 À
For Q1, it is the class that will contain the lower quartile. Find the value of W Σ f, or W where
Σf (or N) is the total frequency. This is the position the lower quartile. Checking from the
cumulative frequency column, find the value that is equal to the position or the smallest value

66
that is greater than the position for Q3. It is the class that will contain the upper quartile. Find the
T T
value W Σ f, or W N where Σf (or N) is the total frequency.

This is the position of the upper quartile. Checking from the cumulative frequency column, find
the value that is equal to the position or the smallest value that is greater than the position.

Steps 3. Identify the lower class boundary of the lower quartile and the upper quartile classes and
the class size.

Step 4. Apply the formula below by substituting the respective values into the formula.
Â
gÓ»
𝑄3 = 𝐿3 + Á »
Ò
Ç𝑖
Ô·

where

𝐿3 is the lower class boundary of the lower quartile class


𝑁 is the total frequency
𝑐𝑓 is the cumulative frequency of the class just below the lower quartile class
𝑖 is the class size / width
𝑓Õ· is the frequency of the lower quartile class
3𝑁
− 𝑐𝑓
𝑄T = 𝐿T + Ö 4 ×𝑖
𝑓ÕX
𝐿T is the class boundary of the upper quartile class
𝑁 is the total frequency
𝑐𝑓 is the cumulative frequency of the class just below the upper quartile class
𝑖 is the class size / width
𝑓ÕX is the frequency of the upper quartile class

Follow the example in Table


Computing the quartiles from grouped data.

Classes Midpoint Freq Cum Freq


𝑋 𝑓 𝑐𝑓
6 – 50 48 4 50
21 – 45 43 6 46
36 – 40 38 10 40
31 – 35 33 12 30
26 – 30 28 8 18
21 – 25 23 7 10
26 – 20 18 3 3
Total 50
=A
The total frequency is 50, therefore for Q1, W = 12.5. Now there is no 12.5. Cumulative
frequency column so we select the smallest value that is greater than 12.5.
ȸ
g3A 34.=g3A 4.=
25.5 + Á Ò ¿
Ç 5 = 25.5 + Ÿ
¿
5 = 25.5 + Ÿ ¿ 5 = 25.5 + [0.31]5 = 25.5 + 1.56

This value is 18, which belongs to the 26 – 30 class. The lower quartile (Q1) therefore is 26 – 30.
The lower class boundary is 25.5 and the class size is 5.

67
Substituting the values in the table in the formula above, we have:
T
For Q3 × 50= 37.5. Now there is no 37.5 in the cumulative frequency column so we select the
W
smallest value that is greater than 37.5. This value is 40, which belongs to the 36 – 40 class. The
upper quartile (Q3) class therefore is 36 – 40, upper class boundary is 35.5 and the class size is
5. Substituting the values in the table in the formula above, we have: Q3 =
150
− 30 37.5 − 30 7.5
35.5 + Ö 4 × 5 = 35.5 + Ð Ñ 5 = 35.5 + Ð Ñ 5 = 35.5 + [0.75]5
10 10 10
= 35.5 + 3.7

TRY:

Calculate Q1 and Q3 for the distribution in the table at the next page.

Classes Frequency
61 – 70 15
51 – 60 20
41 – 50 25
31 – 40 17
21 – 30 12
11 – 20 11

The answers are: Q1 = 31.7 and Q3 = 55.5.

Exercise

1. What is the lower quartile in the following distribution?


82 90 66 78 88 72 60 80
A. 69
B. 78
C. 79
D. 85

2. What is the third quartile in the following distribution?


14 22 8 46 28 30 17 29 10 60 40 33

A. 10.5
B. 15.5
C. 42.0
D. 43.0

3. What is the value of the first quartile in the following distribution?


48 88 98 76 78 68 54 60 90 65 94
A. 60
B. 62.5
C. 76
D. 90

68
4. What is the third quartile in the following distribution?
12 18 10 19 22 25 17 20 14
A. 8
B. 13
C. 18
D. 21

69
CHAPTER SEVEN

MEASURES OF VARIABILITY IN TEST SCORES

Nature of the Measures of Variation

The measures of variation are also called measures of variability or disperses scatter. The main
measures that are used mainly in educational practice are:

1. The range
2. The Variance
3. The Standard Deviation
4. The Quartile Deviation (also known as the semi-interquartile range).

The variance and the standard deviation are closely related. The variance is the square of the
standard deviation and the standard deviation is the square root of the variation. Thus if the
variance is 144, the standard deviation is 12. If the standard deviation is 9 then the variance is 81.

Measures of variation provide the degree of differences within a set of observation.

Let us consider the following situation.

Set 1 scores: 48, 51, 47, 50 Total = 196

Set 2 scores: 30, 72, 90, 4 Total = 196

What are the means of the two sets of scores?

The answer are 49 and 49.

Now compare what you have noticed with this observation.

You will notice that in the first data the scores are close to each other. All the scores are close to
the mean of 49, or they cluster around the mean, which serves as the centre point. In the second
set, the scores are far from each other. For example, 4 is so far from 90 but both sets have the
same mean.

The measures of variation tell us how far the scores are from each other. This information is
important for teaching and learning. If there is a big variation within a class, the teacher needs to
adopt a method to suit the wide dispersion of abilities.

However, if the variation is small, this means that all the students are at about the same level of
performance, which may be low, moderate or high, Again the teacher needs to adopt the
appropriate teaching method to suit the class.

Purposes of the Measures

The measures of variation or variability serve two main purposes. These purposes are described
below.

1. Purpose One

Measures of variation are used as single scores to describe differences within data. They are
scores that are used to indicate whether there are variations in the group. Where there is variation,
the group is believed to be heterogeneous and where the scores are around a typical value, the
group is homogeneous.

70
Let us consider the following example.

Set 1: 44, 4, 40, 42, 42, 45, 43, 40, 40, 41, 40, 40, 41, 40, 40, 42, 46

Set 2: 20, 48, 50, 50, 50, 48, 121, 10, 55, 54, 48, 58, 59, 35, 24, 56, 30, 51, 30, 52

Is the mean score 32 or 42 or 48 or 50?

Your answer is correct if you got 42.

Now let us look at the highest score and the lowest score for each of the sets.

Set 1. Highest score = 48. Lowest score = 40. The difference is 8.

Set 2. Highest score = 59. Lowest score = 10. The difference is 49

You will notice that though both sets of scores have mean scores of 42, the difference between
the highest and lowest scores differ. In set 1, it is 8 units and in set 2 is 49 units.

Set 1 can be taken to be a homogeneous group while set 2 is a heterogeneous group.

For a heterogeneous class, the classroom teacher will notice that there are high achievers as well
as low achievers. It is a mixed ability group. As a teacher, you need a method to cater for the
high achievers, moderate achievers as well as the low achiever

On the other hand, where the class is homogeneous, the teacher has to find out the performance
by computing the measures of central tendency. In our example the mean is 42. Assume that the
proficiency level is 30. Since 42 is higher than 30, the class can be described as performing above
the proficiency level.

2/ Purpose Two

They provide tools for further statistical analysis.

Measures of variability are descriptive statistics. They are single numbers that are used to
describe a group. To know the correlation or relationship between groups, you need to obtain a
measure of variability. Here the most appropriate measures are the variance and the standard
deviation. Knowledge of the standard deviation or variance will help you to understand the
formula used in computing the correlation coefficient, which is a measure of the relationship
between variables.

THE RANGE
Nature of the range
The range is defined as the difference between the highest (largest) and the lower (smallest)
values in a set of data. For example for the set of data, 48, 51, 47, 50, largest values is 51 and
the smallest value is 47. The range is therefore 51 – 47 =4 is the simplest of all the measures of
variation.

Computing the Range

The range can be computed for both raw (ungrouped) data and group data. Procedures are
described below.

Computing the Range from Raw (Ungrouped) Data

Three simple steps are involved in computing the range from raw data.

71
These set are:

1. Determine the highest (largest) value (H) in the data set.


2. Determine the lowest (smallest) value (L) in the data set.
3. Find the difference between the two values i.e. H – L
Let us look at an example.

Given the following set of observations, determine the range.

14 22 8 56 46 28 30 17 29 10 60 40 33

The highest value (H) is 60 and the lowest value (L) is 8. The range is H - L 60 - 8 = 52.

Now obtain the range for the following set of data.

82 90 66 78 88 72 60 80

Now let us look at grouped data.

Computing the range from grouped data


There are three steps:
1. Determine the lower class boundary of the bottom class and denote it as L.
2. Determine the upper class boundary of the topmost class and denote it as H.
3. Compute the range by finding the difference between H and L.

Given the following frequency distribution table, compute the range

Classes Frequency
46 - 50 4
41 - 45 6
36 - 40 10
31 - 35 12
26 - 30 8
21 - 25 7
16 - 20 3

The bottom class is 16 − 20 and the lower class boundary (L) is 15.5. The class is 46 −
50 and the upper class boundary is 50.5 (H). The range is H – L= 50.5 − 15. 5 = 35

Strengths and Weaknesses of the Range

The range has a number of strengths and weaknesses. These are listed below

Strengths of the Range


1. It is easy to compute
2. It is easy to interpret
3. It is simple
4. It can be used when data is incomplete and knowledge of the mission available.

72
Weaknesses of the Range
1. It does not take into account all the data/scores. It uses only values.
2. It ignores the actual spread of all the scores. It may therefore give a picture of the variation
in the data
3. It does not consider how the scores relate to each other.
4. It does not consider the typical observations in the distribution but consider only on the
extreme values.
5. Different distributions can have the same range which would give conclusions.
6. It is only a crude or rough measure of variation

Uses of the Range


Due to the numerous weaknesses, the range has limited use.
1. When data is too scanty or too scattered to justify the computation of a more precise
measure, the range provides a fair estimate of the extent of variability available.

2. It may be necessary to require knowledge of only the extreme scores or total spread in a set
of observations. In a test, a teacher may be interested in only the highest score and the lowest.
The range will conveniently serve that purpose.

Exercise
Compute the range for the following sets of data.
1. 18, 22,48,45,90,93,65,62,28,75,15,30,35,80,82
2. 44, - 8,14, - 14,24,28, - 30,52,58,40,42,48,50, - 1
3. – 4, - 15, - 18, - 56, - 52, - 40, - 75, - 18, - 36, - 19, - 50, - 55, - 0,
4.
Classes Frequency
61 - 70 15
51- 60 20
41 – 50 25
31 - 40 17
21 - 30 12
11 - 20 11

5. One strength of the range as a measure of variation is that it


A. can be used when data is incomplete.
B. depends largely on extreme scores.
C. disregards the actual spread of the scores
D. uses few values in a distribution.

THE VARIANCE

Nature of the Variance

The variance is the mean square deviation. It is defined as the mean of the square of the
deviations of the scores from the mean of the distribution. The symbols used are or S2 for
population variance and s2 for sample variance.

Computing the variance

The variance can be computed from both the raw (ungrouped) data and grouped data

73
Computing from Raw Data (Ungrouped Data)

The variance can be computed from raw data by using two formula. These are conventional
formula and the computing formula. The procedures are described below.
∑⌈SgS̅ ⌉j
i. The conventional formula for variance is given by 𝑠 4 = À
Example: Given the set of scores, find the variance
15, 12, 10, 10, 9, 20, 14, 11, 13, 16
Answer 10.2
∑ Sj ∑S 4
ii. The computational formula for variance is given by 𝑠 4 = −ÌÀ Î
À
Use this method to solve for the variance of
15, 12, 10, 10, 9, 20, 14, 11, 13, 16

Calculating the variance from a grouped data.


i. The conventional formula for variance is given by
∑ »⌈SgS̅ ⌉j
𝑠4 =
À
ii. The computational formula for variance is given by
4
4
∑ 𝑓𝑥 4 ∑ 𝑓𝑥
𝑠 = −¨ ¬
𝑁 𝑁
Example:
Compute the variance for the group data below

Classes 46-50 41-45 36-40 31-35 26-30 21-25 16-20


Frequency 4 6 10 12 8 7 3

Note:
The coding method can also be used to calculate for variance

Properties of Variance
i. The variance of a Constance is zero.
ii. It is not resistant. It is affected by extreme scores or outliers.
iii. The variance is independent of change of origin
iv The variance is not independent of change of scale

Strengths and weaknesses of the variance

Strengths of the variance


1. It uses every score in the data set. Thus every score contributes to obtaining variance.
2. It is used a lot for statistical analysis
3. It is appropriate for scores that are normally distributed

Weaknesses of variance
1. It is influenced by extreme scores. It gives more weight to these extreme scores
resulting in a wrong interpretation of results
2. It is sensitive to change in the value of any score in the distribution.
3. It cannot be computed if missing data is reported since the variance depends on
every individual score.
4. It is not appropriate for judging the variation within a set of observations

74
The standard deviation

Nature of the standard deviation

It is the most used measure of variation. It is the square root of the mean square deviation. It is
denoted by the symbols 𝜎 or 𝑆 .

Computing The Standard Deviation.

The standard deviation can be computed from both the raw (ungrouped) data and grouped data.

Computing from Raw Data (Ungrouped Data)

The procedures are describe below.

i. The conventional formula for standard deviation is given by


∑⌈SgS̅ ⌉j
𝑠=œ À

Example: Given the set of scores, find the standard deviation.


15, 12, 10, 10, 9, 20, 14, 11, 13, 16
The computational formula for standard deviation is given by
4
∑ 𝑥4 ∑𝑥
𝑠=Ú −¨ ¬
𝑁 𝑁
Example: Use this method to solve for the standard deviation of
15, 12, 10, 10, 9, 20, 14, 11, 13, 16

Calculating the standard deviation from a grouped data.


iii. The conventional formula for standard deviation is given by
∑ »⌈SgS̅ ⌉j
𝑠=œ À
iv. The computational formula for standard deviation is given by
4
∑ 𝑓𝑥 4 ∑ 𝑓𝑥
𝑠=Ú −¨ ¬
𝑁 𝑁
Example:
Compute the standard deviation for the group of data below
Classes 46 -50 41- 45 36 - 40 31-35 26 -30 21- 25 16- 20
Frequency 4 6 10 12 8 7 3

Note: The coding method can also be used to calculate for standard deviation.

Properties of the standard deviation


1. The standard deviation of a constant is zero.
2. it is not resistant. It is affected by extreme scores or outliers.
3. it is independent of change of origin.
4. it is not independent of change of scale.

Strengths of the standard deviation


1. It uses every score in the data set. Thus every score contributes to obtaining standard
deviation.

75
2. It is used a lot for statistical analysis
3. It is appropriate for scores that are normally distributed

Weaknesses of standard deviation


1. It is influenced by extreme scores. It gives more weight to these extreme scores
resulting in a wrong interpretation of results
2. It is sensitive to change in the value of any score in the distribution.
3. It cannot be computed if missing data is reported since the standard deviation depends on
every individual score.

Coefficient of variation

Nature of the coefficient of variation (CV)


It is considered as a relative measure of variation. It is defined as the ratio of the standard deviation
to the mean. It is often expressed as a percentage, so that the value is multiplied by 100. It is only
defined for non – zero means, and is most useful for variables that are always positive. It is
appropriate for ratio and interval scales of measurement.

Computing the Coefficient of Variation.

The coefficient of variation for both grouped and ungrouped data is given by
Ü
𝐶𝑉 = S̅ × 100 ,

where: 𝑠 is the standard deviation of the data. 𝑥̅ is the arithmetic mean.

Strengths of the coefficient of variation


1. It is easy to compute
2. It is unit less and this makes it possible to compare variability for different distributions.
3. Where the distribution is normal it is based on every score in the distribution.
4. It is easy to interpret. The larger the CV, the greater the variability.

Weakness of the coefficient of variation


1. Is affected by extreme values
2. It cannot be used when the mean is negative, zero, or near zero.
3. It is sensitive to a change in the value of any score in the distribution.
4. It cannot be computed for a variable that is normally distributed if missing data is
reported.

Uses of the Coefficient of variation.


1. It is used to determine whether a group is homogeneous or heterogeneous. If the value
of CV is 33% or less then the group is homogeneous, otherwise it is heterogeneous.
2. It is used to compare variations within or between groups where there are different units
of measurement.
3. It is used to compare variations within or between groups where there are different
means but with the same unit of measurement.

76
Exercise

1. The variance for set of scores is 25. What is the standard deviation?

A . 2
B. 5
C. 25
D. 625

2. Measures of dispersion can be used to determine the direction of performance.


A. False
B. True

3. In a homogeneous class, students perform at the same ability subject.


A. False
B. True

4. The standard deviation for a set of scores is 9. The variance for the same set of scores is 3.
A. True
B. False

5. The standard deviation is indispensable in the computation of the z – scores.


A. True
B. False

77
TRY QUESTIONS[PASCO]

January, 2015 2½ hours.

GENERAL INSTRUCTION: answer all the questions in section A, B, C and one (1)
question from section D.

SECTION A: MULIPLE – CHOICE ITEMS

INSTRUCTION: This section consists of 10 items. Circle the most appropriate option in ink
once only. One mark for each question.

1. Why is it necessary for the teacher to specify what he or she wants to assess?
A. To ensure easiness in the development procedure
B. To ensure the reliability of the procedure used
C. To ensure the selection of appropriate procedures
D. None of the above
2. Which of the following is the most specific?
A. Instructional aims
B. Instructional objectives
C. Educational goals
D. Educational outcomes
3. One of the general principles of assessment is that
A. Good assessments are provided by multiple indicators of performance
B. Good assessments focus on students’ critical thinking objectives
C. Assessment techniques require knowledge about student learning
D. Assessment techniques must serve the needs of the community
4. Taxonomy means the same as
A. Organization
B. Selection
C. Classification
D. Demarcation
5. The hierarchical sequence of Bloom’s taxonomy is
A. Knowledge, comprehension, synthesis, application, analysis and evaluation
B. Knowledge, comprehension, analysis, application, synthesis and evaluation
C. Knowledge, application, comprehension, analysis, synthesis and evaluation
D. Knowledge, comprehension, application, analysis, synthesis and evaluation
6. Which of the following statements depict the nature of validity?
A. Assessment results may have high, moderate or low validity for a situation
B. A single validity is most appropriate for an evaluative judgement
C. Validity refers to the appropriateness of the test items to meet learning
objectives
D. Validity refers to whether the assessment measures what it purports to measure

78
7. When constructing her test items for the end-of-term examination in mathematics Ms.
Sarpong checked each item to see if it matched the material that she taught the class.
What type of evidence was Ms. Sarpong looking for?
A. Construct –related evidence
B. Concurrent-related evidence
C. Content-related evidence
D. Predictive-related evidence
8. The objectivity of a test refers to the
A. Format of its items
B. Selection of items for the test
C. Use made of the results
D. Scoring of the students responses
9. Which of the following item format is the best to use to assess analysis type of learning
behavior
A. Short answer type item
B. Essay items
C. Multiple choice items
D. True – false items
10. Instructional outcomes that aim at inculcating in students’ movement abilities is
concerned with
A. Affective domain
B. Quellmalz domain
C. Cognitive domain
D. Psychomotor domain
11. The following scores were available for 9 students in a elective mathematics class.
18 20 15 12 12 10 8 17 13
th
The score for the 10 student was missing but it was known to be the second highest
score. What would be the median for the distribution?
A. 14
B. 15
C. 16
D. 17
12. The median score for group of 19 students was 58. A 20th student who had a score of
45 joined the group. What is the new median score?
A. 10.5
B. 45.0
C. 58.0
D. More information is required
13. The variance for set of scores is 25 and the mean score is 12.81. What is the coefficient
of variation?
A. 256%
B. 93%
C. 37.81%
D. 39%

79
14. The mean score obtained by 10 students in a statistics quiz was 20 out of a total of 25,.
It was found later that a student who obtained 5 actually had 20. How would the
discovery affect the mean score?
(e) More information is needed
(f) New mean score is greater than old mean.
(g) Old mean score is greater than the new mean.
(h) There is not change in the old mean score.

15. A group of 20 students earned a class mean of 30 on a quiz. A second group of 30


students had a mean score of 45 on the same test. What is the mean score of the 50
students.?
A. 32.5
B. 39.0
C. 41.0
D. 45.0

SECTION B: TRUE OR FALSE

INSTRUCTION: This section has ten (10) TRUE OR FALSE items. Write the appropriate
response in ink once only. One mark for each question.

16. Measures of dispersion can be used to determine the direction of performance.


A. True
B. False
17. A teacher can evaluate her students without measuring them?
A. True
B. False

18. One advantage of essay-type test is that is premium is placed on writing speed.
A. True
B. False
19. Educational goals are geared towards meaningful functioning of the society.
A. True
B. False
20. An assessment can be done through interviewing.
A. True
B. False
21. Statements that pose more than one central theme should be avoided when contracting
short answers test.
A. True
B. False
22. Test scores are perfect measures of student’s performance.
A. True
B. False
23. Assessment is necessary for making certification decision.
A. True
B. False

80
24. Test scores that have high validity are necessarily reliable.
A. True
B. False
25. Communication using gestures is an example of a sub-domain in the affective domain.
A. True
B. False

SECTION C: SHORT ANSWERS


INSTRUCTION: Supply the appropriate answer in the space provided

26. State two characteristics of continuous assessment. (2 marks)


……………………………………………………………………………………………………
……………………………………………………………………………………………………
Outline any two classification of test. (2 marks)

……………………………………………………………………………………………………
……………………………………………………………………………………………………
A test or examination needs to be planned before being administered and scored. Determine the
four main principal stages involved in classroom testing. (4 marks)
……………………………………………………………………………………………………
……………………………………………………………………………………………………
……………………………………………………………………………………………………
……………………………………………………………………………………………………
Explain the following terms (1 marks for each)
a. Obtained score
……………………………………………………………………………………………………
……………………………………………………………………………………………………

b. True score
..........................................................................................................................................................
..........................................................................................................................................................

c. Error score
……………………………………………………………………………………………………
……………………………………………………………………………………………………
Differentiate between essay test and objective test (2 marks)

……………………………………………………………………………………………………
……………………………………………………………………………………………………

dentify any two factors that affect test validity (2 marks)

……………………………………………………………………………………………………
……………………………………………………………………………………………………

81
SECTION D (ESSAY)

INSTRUCTION: Answer any one (1) Question. Each question carries equal marks of 20

Question 1
(a) Outline the categories (levels) of Benjamin Bloom (1958) cognitive domain. (6
marks)

(b) With a practical example identify and demonstrate how to set an essay question on
each taxonomy of learning outcomes of educational activities on the topic,” graph
of relations and functions. (9 marks)

(c) Given the following scores

15, 12, 10, 10, 9, 20, 14, 11, 13, and 16

Compute the coefficient of skewness of the test scores and deduce whether the general
performance is good or weak. (5 marks)

Question 2
(a) Explain the view that “assessment is a means to an end and not an end in itself’”.(5
marks)

(b) The following data represent the recorded scores of two quizzes conducted and marked
over 10
Quiz 1(x) 3 3 4 5 6 7 7 8 9 6
Quiz 2(y) 4 6 5 4 6 8 7 7 9 9
Calculate the Pearson-product-moment correlation coefficient for the two quizzes and
interpret your results in relation to concurrent validity between the two assessments. (5 marks)

(c) Suppose that a five-item multiple-choice exam was administered with the following
percentages of correct response: p1 = .4, p2= .5, p3 = .6, p4 = .75, p5 = .85, and
𝜎ª«4 =1.84 . Compute the internal consistency of the test using Cronbach’s alpha
estimator
© ∑Þ
ݹ· E(3gEÝ )
§=
∝ ¯1 − j °
©g3 ß
§y

82
Question 3
(a) (i) Explain the term variability in test scores (8 marks)

(ii) The table below shows the end of term examination scores of Otwebeweate SHS
Science 1 class
in elective mathematics marked over 70.

Test Scores 0-9 10-19 20-29 30-39 40-49 50-59


Number of students 2 7 8 16 9 8

Calculate the coefficient of variation in the test distribution for the class and interpret
your results.

(b) Discuss four guidelines for assembling test in mathematics assessment.

(c) Compare and contrast the norm and criterion-referenced interpretation of mathematics
test scores.

83

You might also like