Measurement & Evaluation
Measurement & Evaluation
Measurement & Evaluation
Test Measurement
assessment evaluation
Types of tests
Test by method
Test by purpose
Qualities of tests
Validity and its types
Types of assessment
Types of reporting
Assessment agencies
Item analysis
Table of specification
Rubrics
Guessing correction formula
Assessment for ,of , as learning
Test / Measurement
Measurement :
It is given to numerical value to test
Measurement is quantitative in nature.
It is limited to quantitative description of pupil.
It is the purpose of obtaining numerical values.
Product of measurement is score
It answer the question “ How much”.
ASSESSMENT / EVALUATION
Assessment:
Assessment involves the interpretation of measurement data.
It makes the sense of data collected on student performance PHASES OF EVALUATION
It is interpretation of numerical value.
PLANNING PHASE:
It is the first phase which involves situation
analysis, selection of objectives.
EVALUATION :
PROCESS PHASE:
It is Highest in Scope / Broader in Scope. in this phase evaluation is conducted.
Giving in judgment or decision about worth value. PRODUCT PHASE:
It involves test a analysis , scoring ,
It determines worth and value of something. interpretation of data and making
It is qualitative in nature recommendations on basis of results of the
students.
It is expressed in words.
Evaluation answer the question “ How good”.
TYPES OF TESTS
Test by Purpose
2. Standardized Test:
These are expertly constructed. Have well defined objective. Objectively is maximum. It has following steps
Purpose: The objective of test is set. These are externally mandated tests. 3 rd party evaluation.
Specification : Blue print/ outline of content is prepared .
Development: Expert panel writes each section of test item according to specification.
Pilot: Initial application is done to insure its applicability and to remove any flaws and re-piloting is done.
Forms: Final forms / papers are assembled.
TEST BY PURPOSE
Matching Items: A test format that requires students to match a series of Responses
with corresponding terms in stimulus list is called Matching items/ column Matching.
Usually it has two columns.
Premise: The items in the column for which a match is sought is called premise.
2- Typical Performance test: It determine what individuals will do under natural conditions.
i. Attitude test: Test used for attitude / behavior measurement . Likert scale is used for attitude measurement.
ii. Peer appraisal: evaluation is done by the colleagues and fellows and their feedback is collected.
iii. Personality inventory: choice or priority or interest of person in known through personality inventory.
Validity:
The quality of the test if “measures what is indented to measure”
or It measures what it claims to measure is called Validity.
Reliability:
The quality of the test to give same scores / Consistent scores when administration at different occasion.
Usability:
The quality of the test showing easy of time , cost , administration and interpretation is called usability.
Objectivity:
The scoring of the test is not effected by any factor . Anyone’s opinion cannot influence test score.
Adequacy: The sample of question in the test is suffiently large enough. The quality is called adequacy.
Differentiability:
The characteristic of test discriminate between high achievers and low achievers is called differentiability.
Types of Validity
1- Content Related
a- content validity
b- Face validity
c- Construct validity
2- Criterion Related
a- Concurrent validity
b- Predicative validity
c- Internal Validity
d- External Validity
Content Related Validity
Content Validity:
A degree to which test measures intended content area.
Ability of the test cover all related content. I
tem of test should be appropriate to the objectives of study.
For this purpose , Table of specification is used.
If test does not cover the related content , It will show poor content of validity.
Face Validity:
Test is valid by definition .
It is the extent to which test is self-evident that it is measuring what is supposed to / intended to measure.
Does a test appear to test what it aim to test ? It seems logically related when someone looks it.
Construct Validity:
When we construct , assume hypothesis .
Some level or skill in students, and verify that assumption through test.
Constructed validity is established through logical analysis.
Criterion Related Validity
➢ Concurrent Validity
➢ Score or performance in a test is compare with some already measured or established test . Does the
test relate to existing similar measure.
➢ Predictive Validity:
➢ The degree to which a test can predicts how an individual will do in future. Does the test predict the
later performance on related criterion .
Internal Validity:
A test is internally valid, if difference on dependent variable , not any other variable.
External Validity :
Test is externally valid if its result can be generalized to the population, out side the sample.
Threats to internal Validity
Placement Assessment: Assessment conducted to knows whether people possess the perquisite skills needed to
successes in a unit it earlier knowledge of the student .It is the test conducted to place student in appropriate class or
level . It is done before instructions starts when child admits in schools to place him in class or in grade
Diagnostic Assessment: Assessment that is conducted to sought out medical reason in student. Type of
assessment in which learning difficulties are diagnosed. Assessment used to know the problem of students. It is done
before the start of instructions to check permanent learning difficulties of students to adapt or adjust curriculum
according to the unique needs of the students.
Formative Assessment: Assessment that is conducted during teaching learning process. Formative Assessment is
conducted during instructions to monitor pupils learning progress and it provides on going feedback to pupils and
teachers.
Benchmark Assessment : Assessment that is conducted during instruction after the completion of a unit or chapter.
Summative assessment Evaluation : Assessment conducted at the end of teaching learning session . It is done after
the instructions at the end of year at completion of course of study final examination is done . Grades are assigned.
It certifies judgment . students are promoted to next class on the bases of summative assessment.
Types of marking and reporting
BISE: board of intermediate and secondary education . BISE Lahore established in 1954 start working
under PU. Now 9 BISE working in Punjab in each division. DANISH SCHOOLS work under BISE Lahore,
PEC: Punjab Examination commission was established on 16th January 2006.
It evaluates the students of grade 5th and 8th in Punjab.
Earlier PEC, Director Public Instructions ( DPI ) was responsible of the evaluation.
NEAS: National Education Assessment System established in 2003.
PEAS: Provincial Education Assessment System.
ASER: Annual Status of Education Report established in 2008. it provides reliable estimate of schooling
status of children ages 3 - 16 years, residing in all rural and few urban districts of Pakistan.
Item Analysis
Item analysis is done to analysis item of the test whether they are
full filling the objectives of the teat or not . It analysis :
the appropriate level of difficulty
Discrimination power
and effectiveness of distractor
Discrimination Power:
High achievers and low achievers are sorted out by Discrimination power.
Formula = D = NH-NL
n
D= Discrimination power
NH= No. of High achievers
NL= No. of Low achievers
n= No. of total students
❖ Discrimination power of an item is acceptable when its value Range from 0.30-1
❖ Test item Discriminate 100% when its value is 1.
❖ Test of the item cannot discriminate if its value is less then 0.30 .
Effectiveness of Distractor
Good Distractor:
Good distractors is one which attractors low achievers more then high achievers. It is also known as foil or trap ,
that attracts students with misconception, or error in thinking.
Bad Distractor:
Distractor is bad if , It attracts high achievers more than low achievers.
Does not attract at all to any student. Equally attracts low & high achievers.
Port Folio: The collection of student product work to evaluate performance of the student .It is collection of student
work. It is compilation of skills, learning activities of students.
Working portfolio: It is collection of student on going work that tells about improvement over time.
Showcase portfolio: It is the collection of students best work
TOS & RUBRICS
Assessment of Learning-(Summative):
Assessment of learning is used to evaluate students achievement at the END of the course.
ASSESSMENT AS LEARNING:
Use of ongoing self assessment by students themselves in order to monitor their own learning progress .
Students reflect on and monitor their own learning progress . It is also called meta learning.
Likert Scale:
Scale used for attitude measurement.
Respondent is asked any question or statement and he/she shows his level of agreement or disagreement .
Developed by Rensis Likert.
Often , 5, point scale. ( Some psychometricians use, 7 or 9 point also)
Example: using social media is essential today.