Measurement & Evaluation

Measurement & Evaluation



Test Measurement
assessment evaluation
Types of tests
Test by method
Test by purpose
Qualities of tests
Validity and its types
Types of assessment
Types of reporting
Assessment agencies
Item analysis
Table of specification
Guessing correction formula
Assessment for ,of , as learning
Test / Measurement

 Test: Test is narrow in scope.

 “ A test is an instrument or systematic procedure for measuring some sample of behavior. “ Norman E Gronlund”.
 A Test is set of question which has correct answer in writing or orally.
 It is a tool to measure some sample of behavior.
 It is formal and systematic procedure of getting information.
 A test answer the question “ How Well”.
 A test is sum of question.

 Measurement :
 It is given to numerical value to test
 Measurement is quantitative in nature.
 It is limited to quantitative description of pupil.
 It is the purpose of obtaining numerical values.
 Product of measurement is score
 It answer the question “ How much”.

 Assessment:
 Assessment involves the interpretation of measurement data.
 It makes the sense of data collected on student performance PHASES OF EVALUATION
 It is interpretation of numerical value.
It is the first phase which involves situation
analysis, selection of objectives.
 It is Highest in Scope / Broader in Scope. in this phase evaluation is conducted.
 Giving in judgment or decision about worth value. PRODUCT PHASE:
It involves test a analysis , scoring ,
 It determines worth and value of something. interpretation of data and making
 It is qualitative in nature recommendations on basis of results of the
 It is expressed in words.
 Evaluation answer the question “ How good”.
Test by Purpose

 1. Teacher Made Test:

 Teacher made test are classroom assessment normally prepared by teacher.
 These have not been tested on sample population .
 These are usually conduct to help teachers student understanding of a particular body of knowledge.
 Used for evolution pupils day-to-day progress.

 2. Standardized Test:
 These are expertly constructed. Have well defined objective. Objectively is maximum. It has following steps
 Purpose: The objective of test is set. These are externally mandated tests. 3 rd party evaluation.
 Specification : Blue print/ outline of content is prepared .
 Development: Expert panel writes each section of test item according to specification.
 Pilot: Initial application is done to insure its applicability and to remove any flaws and re-piloting is done.
 Forms: Final forms / papers are assembled.

 3. Norm Referenced Test (NRT):

 Student performance is compared with other students in a group/ class.
 Its tells individuals relative standing is same known group.
 Ranks / positions are given 1st, 2nd , 3rd / Gold , silver, brown.
 It uses percentile to tell student position. It is the test used to make comparison among student.
 In NRT item is best having difficulty near 50 .

 4. Criterion Referenced Test:

 Student Performance is compared with clearly defined learning tasks and set standards.
 It uses percentage to tell students performance.
 It check student performance against some set criteria.
Types of Test by Method

i. Supply type Test

a- Extended Response
b- Restricted Response
c- Short Answer
d- Completion Item
II. Objective type Test
a- Multiple choice question(MCQ)
b- Alternative Question
c- Matching Items
Supply Type Test
 These are subjective type tests . These include expression of opinion, Logical Thinking and writing power. There are following
types which are given as follows.
 Extended Response:
 In extended response , no restriction replaced on students .
 Items required lengthy , logical and coherent answers include high orders skill of thinking and expression. Items need large
details of answers Example Essay type test . These provide opportunity to organize knowledge.
 Its offer ability to select , organized integrate ideas.
 Main advantage is that it can measure complex learning outcomes.
 There is no objectivity. These are less reliable.
 Restricted Response:
 these required short details of answers . These measure more specific learning outcomes.
 These are type of essay item but its contents are limited.
 Short Answer : these test required short answer two lines .
 Completion Items: There required to be filled with word or phrase ( Fill in the blanks).
Objective Type Test
 Multiple Choice Question (MCQ):
 It consists of problem statement and two or more option consisting of answer. It is the test
which is the most popular in class& word. It is the most often used test / Most widely used.
 It is the most popular test used by classroom teacher.
 The stem of multiple type questions should be meaningful.
 Its score are reliable. It does not measure higher cognitive skills.
 Stem: The statement of MCQ test is called stem.
 Distraction: Incorrect option in MCQ test are called distraction.
 Answer: The correct option in MCQ is called answer.
 Suggested Answer: list of all option choice Alternates.
 Alternative Question: These items require alternate option
 i.e. True/ False correct/incorrect Right/ wrong
 The most significant advantage of true false is wide sampling.

 Matching Items: A test format that requires students to match a series of Responses
 with corresponding terms in stimulus list is called Matching items/ column Matching.
 Usually it has two columns.
 Premise: The items in the column for which a match is sought is called premise.

 OBJECTIVE TYPE questions are easy to mark.

Types of Test

1- Maximum Performance Test:

It is a procedure used to determine person’s abilities. It determine what individual can do when performing at that best.
Example : (i) Aptitude Test (ii) Achievement Test
(i) Aptitude Test: It measures probability of success in an activity. The test designed to predict future performance. It
measures potential ability. Hidden skill, Dexterity , Creativity.
(ii) Achievement Test: Test that measures learning outcomes of the students.

2- Typical Performance test: It determine what individuals will do under natural conditions.
i. Attitude test: Test used for attitude / behavior measurement . Likert scale is used for attitude measurement.
ii. Peer appraisal: evaluation is done by the colleagues and fellows and their feedback is collected.
iii. Personality inventory: choice or priority or interest of person in known through personality inventory.

3- Written Test: The require answer of question in writing.

4- Oral Test: Students are required to answer orally/ speaking.
5- Mastery Test : A test used to measure minimum basic knowledge and skills. It is an early grade readiness test.
6- Speed Test: It measures the number of items an individual can attempt correctly in given time .
7- Power Test: It is designed to measure the learning during specific time.
Qualities of Test

 Validity:
 The quality of the test if “measures what is indented to measure”
 or It measures what it claims to measure is called Validity.

 Reliability:
 The quality of the test to give same scores / Consistent scores when administration at different occasion.

 Usability:
 The quality of the test showing easy of time , cost , administration and interpretation is called usability.

 Objectivity:
 The scoring of the test is not effected by any factor . Anyone’s opinion cannot influence test score.

 Adequacy: The sample of question in the test is suffiently large enough. The quality is called adequacy.

 Differentiability:
 The characteristic of test discriminate between high achievers and low achievers is called differentiability.
Types of Validity

1- Content Related
a- content validity
b- Face validity
c- Construct validity
2- Criterion Related
a- Concurrent validity
b- Predicative validity
c- Internal Validity
d- External Validity
Content Related Validity

 Content Validity:
 A degree to which test measures intended content area.
 Ability of the test cover all related content. I
 tem of test should be appropriate to the objectives of study.
 For this purpose , Table of specification is used.
 If test does not cover the related content , It will show poor content of validity.

 Face Validity:
 Test is valid by definition .
 It is the extent to which test is self-evident that it is measuring what is supposed to / intended to measure.
 Does a test appear to test what it aim to test ? It seems logically related when someone looks it.

 Construct Validity:
 When we construct , assume hypothesis .
 Some level or skill in students, and verify that assumption through test.
 Constructed validity is established through logical analysis.
Criterion Related Validity

➢ Concurrent Validity
➢ Score or performance in a test is compare with some already measured or established test . Does the
test relate to existing similar measure.

➢ Predictive Validity:
➢ The degree to which a test can predicts how an individual will do in future. Does the test predict the
later performance on related criterion .

 Internal Validity:
 A test is internally valid, if difference on dependent variable , not any other variable.

 External Validity :
 Test is externally valid if its result can be generalized to the population, out side the sample.
Threats to internal Validity

 History: When an unexpected event occurs which effects dependent variable.

 Maturation: Mental and physical changes over a period of time effect dependent variable.
 Pre-Test Sensitization: Familiarity of subject with instruments can effect.
 Instrumentation: Un-reliability / change in instrument can effect results.
 Statistical Regression: It is regression to mean. It happen when sample is not selected randomly.
 Bias/ Subjective Selection: Researcher divides group on person bias.
 Mortality: It includes drop out of the subject from study.


Subject of the study is influenced by the fact he/ she is being recorded
or being observed. Performance is effected for fear of being observed.
Types of Assessment

 Placement Assessment: Assessment conducted to knows whether people possess the perquisite skills needed to
successes in a unit it earlier knowledge of the student .It is the test conducted to place student in appropriate class or
level . It is done before instructions starts when child admits in schools to place him in class or in grade

 Diagnostic Assessment: Assessment that is conducted to sought out medical reason in student. Type of
assessment in which learning difficulties are diagnosed. Assessment used to know the problem of students. It is done
before the start of instructions to check permanent learning difficulties of students to adapt or adjust curriculum
according to the unique needs of the students.

 Formative Assessment: Assessment that is conducted during teaching learning process. Formative Assessment is
conducted during instructions to monitor pupils learning progress and it provides on going feedback to pupils and
 Benchmark Assessment : Assessment that is conducted during instruction after the completion of a unit or chapter.

 Summative assessment Evaluation : Assessment conducted at the end of teaching learning session . It is done after
the instructions at the end of year at completion of course of study final examination is done . Grades are assigned.
 It certifies judgment . students are promoted to next class on the bases of summative assessment.
Types of marking and reporting

 Traditional marking system: 1st , 2nd , 3rd , grades A , B , C

 Pass fail system: passing and failing against set criteria.
 Checklist of objectives: It is checked which objective student achieved and what not.
 Letter to parents: parents are called and results are shared.
 Multiple marking and reporting system : using multiple methods, ie. Grading and PTM, etc.

 BISE: board of intermediate and secondary education . BISE Lahore established in 1954 start working
under PU. Now 9 BISE working in Punjab in each division. DANISH SCHOOLS work under BISE Lahore,
 PEC: Punjab Examination commission was established on 16th January 2006.
 It evaluates the students of grade 5th and 8th in Punjab.
 Earlier PEC, Director Public Instructions ( DPI ) was responsible of the evaluation.
 NEAS: National Education Assessment System established in 2003.
 PEAS: Provincial Education Assessment System.
 ASER: Annual Status of Education Report established in 2008. it provides reliable estimate of schooling
status of children ages 3 - 16 years, residing in all rural and few urban districts of Pakistan.
Item Analysis

 Item analysis is done to analysis item of the test whether they are
full filling the objectives of the teat or not . It analysis :
 the appropriate level of difficulty
 Discrimination power
 and effectiveness of distractor

1- Item difficulty/ Difficulty level/ Facility index:

Facility index of an item determines ease or difficulty level .
Its formula is = F= NR x 100
F= facility Index
NR= No. of student with right answers
NT= No. of total students
❖ Item is acceptable if facility index Ranges from 30%-70% .
❖ Test item is very difficult when values of facility index is less then 30%
❖ Test item is very easy when its value is higher than 70%
Item Analysis

 Discrimination Power:
High achievers and low achievers are sorted out by Discrimination power.
Formula = D = NH-NL
D= Discrimination power
NH= No. of High achievers
NL= No. of Low achievers
n= No. of total students

❖ Discrimination power of an item is acceptable when its value Range from 0.30-1
❖ Test item Discriminate 100% when its value is 1.
❖ Test of the item cannot discriminate if its value is less then 0.30 .
Effectiveness of Distractor

 Good Distractor:
 Good distractors is one which attractors low achievers more then high achievers. It is also known as foil or trap ,
that attracts students with misconception, or error in thinking.

 Bad Distractor:
 Distractor is bad if , It attracts high achievers more than low achievers.
 Does not attract at all to any student. Equally attracts low & high achievers.

 Port Folio: The collection of student product work to evaluate performance of the student .It is collection of student
work. It is compilation of skills, learning activities of students.

 Working portfolio: It is collection of student on going work that tells about improvement over time.
 Showcase portfolio: It is the collection of students best work

 Table of specification (TOS): BISE follows BLOOM TAXONOMY

 Table of specification is also called Test blue print. PEC follows SOLO TAXONOMY
 It is used for test development by the teacher .
 It ensures uniformity of content form book according to desirable
 It is two way chart which relates instructional objectives to the course
content. It is draft of course content.
 It ensure validity and adequate sampling of a test.

 Rubrics: It is a tool used for scoring purpose . It is scoring guide. It is

scale that describes grading criteria.

 Anecdotal record: It is running description of students behavior.

 It is the running description of actual and active behavior of the
student observed by the teacher.

 Assessment For Learning – ( Formative Assessment):

 Assessment for learning is a continuous and on going assessment that allows teacher to monitor students on daily basis.
 It is also called Formative Assessment.

 Assessment of Learning-(Summative):
 Assessment of learning is used to evaluate students achievement at the END of the course.

 Use of ongoing self assessment by students themselves in order to monitor their own learning progress .
 Students reflect on and monitor their own learning progress . It is also called meta learning.

 Likert Scale:
 Scale used for attitude measurement.
 Respondent is asked any question or statement and he/she shows his level of agreement or disagreement .
 Developed by Rensis Likert.
 Often , 5, point scale. ( Some psychometricians use, 7 or 9 point also)
Example: using social media is essential today.




Naeem Ullah Qureshi

