Bài 1+2

TESTING AND ASESSMENT IN
LANGUAGE TEACHING
Introduction to
the course
1. Evaluate a test using basic principles
of testing and assessment.
2. Analyze and apply the test
development and test administration
procedures.
3. Develop English tests for high school students.
4. Assess/Rate speaking and writing performances of

high school students.
5. Work with test results

Compulsory Materials
[1] Bachman, L.F. & Palmer, A.S, Language Testing in
Practice: Designing and developing useful language test,
New York: Oxford University Press, 2000.
[2] Brown, H. D, Language assessment: Principles and

Classroom practices, Longman, 2003.
Reference books
[3] Bachman, L.F, Fundamental considerations in language
testing, New York: Oxford University Press, 2001.
[4] Airasian, P.W. & Russell, M, Classroom assessment:

Concepts and applications (6th ed) Boston: McGraw-Hill
Higher Education, 2008.
Now, Let’s start!!!!!
The
1 definition
of Test
“ The test is a method of measuring
“
a person’s ability, knowledge, or
performance in a given domain
Brown (2002)
Sometimes, we misunderstood the
term test and assessment…
Test Assessme
 Administratively
prepared
nt
 Students’  Ongoing process
responses being  Wider domain
measured and  Assessed by self,
evaluated teacher, or peers.
 Employ many  Incidental or
tasks intended
Basic
principles
2 of test and
assessmen
t
Basic principles of test and assessment
 Reliability
 Validity
 Practicality
 Washback
 Usefulness
 Transparency
 Security
RELIABILITY
RELIABILTY
Reliability
/rɪˌlaɪ.əˈbɪl.ə.ti/
the quality of being able to
be trusted or believed because
of working or behaving well
RELIABILTY
The term reliability is used
to refer to the consistency
of test scores.
RELIABILTY
According to Brown (2010), a reliable
test:
- Is consistent
- Gives clear direction for
scoring/evaluation
- Has uniform rubrics for scoring
- Contain items/tasks that are
unambiguous to the test-takers.
RELIABILTY
Reliability
- The degree or extent to which an
assessment tool produces stable and
consistent results.
- Consistency, stability, dependability
and accuracy of the test results.
(McMillan, 2001)
RELIABILTY
Test- Retest Reliability
- The same test is re-administered to
the same people.
- It is expected the correlation between
the two scores of the two tests would
be high.
- The effect of practice and memory
may influence the correlation value
RELIABILTY
Inter-Rater Reliability
- Two or more judges or raters are
involved in grading.
- The score is more reliable and
accurate measure if two or more raters
agree on it or they assign similar
results.
RELIABILTY
Intra-Rater Reliability
- The consistency of grading by a single
rater.
- When a rater grades tests at different
time, he/she may become inconsistent
in grading for various reasons.
RELIABILTY
Test Administration Reliability
- This involves the conditions in which
the test is administered.
- Unreliability may occur due to outside
interference including noise, variations
in photocopying, light and sound in
different parts of the room.
RELIABILTY
Factors affecting test reliability
 Test factor
 Teacher and student factor
 Environment factor
 Test administration factor
 Marking factor
RELIABILTY
1. Test factor
- Longer tests produce higher reliability
- Due to the dependency on
coincidence and guessing, the scores
will be more accurate it the duration of
the test is longer.
- An objective test has higher
consistency compared to a subjective
test.
RELIABILTY
2. Teacher and student factor
- In most tests, the teachers normally
construct and administer tests for students.
- The teacher-student relationship would
affect the consistency of test result.
- Teacher’s encouragement, positive mental
and physical condition, familiarity to the test
formats could lead to higher consistency
RELIABILTY
3. Environment Factor
- An examination environment certainly
influence test-takers and their scores.
- Favorable environment will improve
the reliability of the test.
RELIABILTY
4. Test administration factor
- Students’ performance are dependent
on the way tests are administered
(instruction, time allowance, or careful
monitoring of tests).
RELIABILTY
5. Marking factor
- Human judges/raters have many
opportunities to introduce error in
scoring.
- Different raters may award different
marks for the same answer
VALIDITY
VALIDITY
The term VALIDITY is used
to refer to whether the test
is actually measuring what
it claims to measure
(Arshad, 2004)
Test scores reflect the achievement
validity
of learning outcomes and test-taker’s
ability.
The test is valid when it reflects what

the learners can do in a language.
VALIDITY
validity
- Face validity
- Content validity
- Construct validity
1. Face Validity
validity
- A test looks like a test even at first
impression
- Mousavi (2009) refers face validity
as the degree to which a test looks
right, and appears to measure the
knowledge and abilities it claims to
measure.
2. Content Validity
validity
- Assessment of course content
with clear reference to goals and
outcomes
- Use of formats and tasks familiar
to students
3. Construct validity
validity
- Refers to whether the underlying
theoretical constructs that the test
measures are themselves valid
validity
- Proficiency, communicative
competence, and fluency are
example of linguistic constructs;
self-confidence and motivation are
psychological constructs.
validity
- Grammar and Vocabulary – an essay or
multiple-choice?
- Reading – reading aloud or texts and
comprehension questions?
- Listening – a lecture or a series of dialogues?
- Writing ability – a dictation or a cover letter?
- Speaking – reading aloud tasks or face-to face
interviews?
 Does the test assess the skill (construct) that you
focus on in your class?
validity
 Does the test cover the content that you have been
teaching?
 Does the test look as if it is testing what it is
supposed to be testing?
 It is challenging / formal / adequate enough in the
eyes of the test-takers?
Put the following words into the correct
column
Construct Inter-rater Face Content Environment
factors
Intra-rater Consistency Curriculum Outcomes Test results
Reliability Validity
Reliability Validity
Inter-rater Construct
Intra-rater Face
Environment factors Content
Consistency Curriculum
Test results Outcomes
PRACTICALITY
PRACTICALITY “The logistical, down-to-earth administrative
issues involved in designing, admistering, and
scoring.”
These include “costs and amount of time it

takes to construct and to administer, the ease
of scoring, and ease of reporting/interpreting
results” (Mousavi, 2009)
PRACTICALITY A PRACTICAL TEST
- Stay within budgetary limits.
- Can be completed by test takers within the
appropriate time constraints.
- Has clear direction for administration.
- Appropriately utilise the available human
resources.
- Does not exceed available material resources
- Considers the time and effort involved for both
designing and scoring
IMPRACTICAL!!!
• … a test which is prohibitively expensive
• …a test of language proficiency that would take students 10 hours to complete
• …a speaking test that requires individual 10 minutes one-to-one talk for a group
of 50 test-takers and only one scorer;
• ……a test that takes students a few minutes to complete and several hours for
the examiner to prepare and/or correct
• …a test which can be scored only by computer in a location without easy access
to computers and internet connection
AUTHENTICITY
AUTHENTICITY A PRACTICAL TEST
”The degree of correspondence of the
characteristics of a given language test task to
the features of a target language task”
(Bachman & Palmer, 1996)
Language learners are more motivated to

perform when they are faced with tasks that
reflect real world situations and contexts.
AUTHENTICITY AN AUTHENTIC TEST
- Contain language that is as natural as
possible.
- Has items that are contextualized rather than
isolated.
- Includes meaningful, relevant and interesting
topics
- Provides some thematic organization to items,
such as through a story line or episode.
- Offer tasks that replicate real-world tasks
AUTHENTICITY
AUTHENTICITY Let’s think about a
listening test!!!!
How can the test be made

more authentic???
AUTHENTICITY
- Different accents
- Hesitations and pauses
- Background noises
- Monologue – Dialogue
- Interesting topics
- Interuptions
WASHBACK
WASHBACK WASHBACK EFFECT
- “WASHBACK” or “BACKWASH” (Hughes,
2003) refers to the impacts that tests have on
teaching and learning.
- Can have a positive or negative impact on the

teaching and learning process
WASHBACK POSITIVE WASHBACK
•Provide a qualification
•Provide motivation
On learners
•Serve as a revision tool
•Provide feedback
•Identify struggling learners in a class
On teachers •Diagnose common learner errors to
modify instruction
On teaching •Increase accountability of school
institutions •Identify weaknesses of a syllabus
and schools
•Encourage a balanced curriculum
WASHBACK POSSIBLE NEGATIVE WASHBACK
 Preparation for a test may take up teaching

time.
 A test can be used as a way for teachers to
exert their authority.
 Learners only practice the things that they
know will be in the test, and ignore everything
else.
 Learners feel stressed or nervous about the
test conditions, the results and their image.
WASHBACK POSSIBLE NEGATIVE WASHBACK
 Learners feel demotivated either by the prospect of
revising for the test or at the thought of getting low
marks.
 The way the test is marked may penalize errors
rather than give credit for what the learner has done
correctly.
 Test results may cause a feeling of divisions within
the class.
 Improving test results can seem more important
than learning – this often means that the range of
skills taught becomes narrower.
TRANSPARENCY
TRANSPARENCY  Availability of information about
assessment
 Information should include:
 what they have to do to succeed,
outcomes
 expected content and format
 time allocated for task, deadlines
 Weighing of items or sections
 grading criteria
 useful feedback for improvement
SECURITY
Students:
Cheating, “collaborative” test-taking,
SECURITY
plagiarism or any other kind of intellectual
dishonesty is forbidden
Staff:
There are clear security guidelines for all
stages of assessment that must be
followed
There are severe consequences for
breaches of security.
PRACTICE
Handout: In the handout, you find a description of the Preliminary English Test
(PET - Level B1) for Speaking skills, the test procedures and guidelines, the
sample speaking test, and the speaking assessment scale.
Purpose: The test is intended to be used as a speaking test in the National

examination for high school students. During years at high school, the course
book issued by the Ministry of Education and Training is used.
Test adminstration: In each test location, there are about 500 Grade 12th
students whose expected level of proficiency is B1. There are about 30
examiners invited to be raters and they are given a one-day training course on
the assessment scale.
2 months prior to the test day, information about the test and its format is
available on the Website of Minstry of Education and Training. Information is also
circulated to highschools throughout the country.

Bài 1+2

Uploaded by

Copyright:

Available Formats

Bài 1+2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bài 1+2

Uploaded by

Copyright:

Available Formats

TESTING AND ASESSMENT IN

4. Assess/Rate speaking and writing performances of

5. Work with test results

[2] Brown, H. D, Language assessment: Principles and

[4] Airasian, P.W. & Russell, M, Classroom assessment:

The test is valid when it reflects what

These include “costs and amount of time it

• …a test of language proficiency that would take students 10 hours to complete

Language learners are more motivated to

How can the test be made

- Can have a positive or negative impact on the

 Preparation for a test may take up teaching

Purpose: The test is intended to be used as a speaking test in the National

You might also like