Assessment and Learning-Gardner PDF

Assessment and Learning
Th,._ On.
UllllillilIIIlll16 II
CFDC-YRO-JQNJ
Edited by John Gardner
~ S G Publications
London l'haoJ_ Oab New DelhI
I
AS6II5Smml and Lo.arruns: An Introduction C John Gardner 2006

o,.pler I C Paul Blad< and Dylan 2006
Olapte. 2 C Mary JMne$ and D.,'W J\odder 2006
Chapter} C Mary J."....,. 2006
Chapter 4 C W)--.me Harlen 2006
Chapter 5 C Paullllacl< and Dylan WHiarn 2006
Chapler6 C WynM Hat"'" 2006
Chapter 7 C raul Blad< and. Dylan Wilia'" 2006
Chapter 8 C Gordon 5tobart 2006
Chapler 9 C Richard o..ugherty and Kathryn Ecd...tonl' 2006
Chap'''' 10 C Dylan Wiliarn 2006
Chapter 11 C Judy Sebba 2006
Aosess"""lt for learning: ACompelling Conceplualiution C John Gard"". 2006
Fi",t published 2006
Co Edited malerial C John Gardner 2006
Apvl from any fair d<-aling f,. W pml'l""'" 01 """an;h or
pri"ate otudy. or criticism or review, .... pennilU!d u...Mr the
Copyright. Dmgns and ratentsAet 1988, thi& publication moy
boP f1'prodUCf'd, ",,,..,.j or tranmUtI...:I in any fonn, or by lUll'
""'aM, only with the prio< permis8ion in writing 011""
publW>e... or in the ca... of reproduclion. in
ocrordar>C'e with tlw l<>rms of issued. by the Copyright
Li<wIsillg Agency. Enquiries """""ming reproduction QUl'iide
u..- telshould be ......1to the publishenl.
($)
SAGE Publicoli"". Ltd
1 Olive,', Yard
$5CiryRoad
london ECIY lSI'
SACE Pubhcati<:ln$ loc
2455 Telter Road
Thousand Oak>, California '01320
SACE Publications India P"tlld
8-12, P.""hsheel f.nda"..
Post 50, 411)9
New D<-Ihi 110 01?
British uJt.ry C1lilloguing in r"blic.olion D.I.
Acat.logue record for \his boo!< is av.ilable from the British
Ubr.ry
....
ISBN 1-4129-1051-X{pbk)
:::C::
Printed on paper from l'l'$Ollm!S
PrinlM in (;....1Brit.in by Athenaeum P...ss. GotO'Shea""
Ty..... & W...r
I
CONTENTS
Contributors'Details YII
AsSt'SSment and Learning: An Introduction 1
101m Gardner
Part I Practice 7
1 Assessment for Learning in the Classroom 9
Paul Blllct Imd Dvlsm Wiliam
2 Professional Learning as a Condition for Assessment for Learning 27
Mary }Illllts lind David Pedder
Pllrl /I Theory 45
3 Assessment, Teaching and Theories of Learning 47
Mllry lamo::>
4 The Role of Assessment in Developing Motivation for Learning 61
W'v.me Harltll
5 Developing a Theory of Formative Assessment 81
PI/ul Black Imd Dylan Wi/lam
Pari //I Eurmlltiw atrd Surnmatj!1f' Issue'S 101
6 On the Relationship Asst.'S5ment for Formative and
Summative 103
Wy111l1' Harlef!
7 The Reliability of Assessments 119
Palll BIllet lind DY/II" Wi/iam
8 The Validity of Formative Assessment 133
Gmla!! SloWlrt
Part IV Policy 147
9 Constructing Assessment for Learning in the UK Policy Environment 149
Richard ond IVII/rfll'l Ecc/(S/on,'
]0 Assessment for Learning: Why No Profile in US Poli9'? 169
Pyloll Wiliom
II Policy and Practict> in Assessment for Learning: the Experience of
5('lected DECO Countries 185
Judy Si'bba
Assessment for Learning: ACompdling Conceptualil..ation 197
101m Gardnrr
References 205
Author jndi'S 222
SUbjl'Ct index 225
CONTRIBUTORS' DETAILS
Paul Black
Paul is Emeritus Professor at King's CoUege, London. He has published exten-
sively on science education,. curriculum development and educational assess-
ment. He is co-author, with Professor Dylan Wiliam, of Inside lhe B/llck Box:
raising achil.'tl61lellt through {vrm/ltivt! (King's CoUege, London) and
Asse55ment lind Classroom lLaming (Assessment in Education 5(1) which was
commissioned by the Assessment Reform Group.
Ridulrd Dnugherty
Richard is Emeritus Professor in the School of Education and Lifelong Learning
at the University of Wales, Aberystwylh. His research interests are in the field
of assessment, education policy and geography education. In re<'l'nt years his
work has focused on National Curriculum assessment policies in Wales and
England and on education policy in Wales. His publications include National
Curriculum Assessment: /I rroiw of policy 7987-J994 (1995), Education Policy in
Wllies (edited, with Robert Phillips and Gareth Rees, 20(0) and the Rroiew of
Naliollul Curriculum arrallgements ill Wales at Siages 2 alld 3 (2004).
Kalhryn EccltslOllt
Kathryn is a senior lecturer in post-<ompulsory education and director of the
MSc in Educational Research at the University of Exeter. Her teaching and
research interests encompass posl-16 professional development, post-16
assessment policy in further and adult education, the effects of formative
assessment on learners' motivation and autonomy and the impact of action
research on post-16 teachers' assessment practice. She is currently carrying out
research on quality assuranre in post-16 qualifications for the Qualifications
and Curriculum Authority (QCA), formative assessment for the Learning and
Skills Development Agency (LSDA), improving formative assessment in the
14-19 curriculum (funded by Nuffield) and is convener of an Economic and
Social Research Council (ESRC) seminar series.
John G<lrdner
John is Professor of Education in the School of Education at Queen's University,
Belfast. His main n.'SCarch areas include policy and practice in assessment and
information technology in education. Since 1990, he has been principal
yij
Assessment and learning
gator in over 20 larse- and small-scale projects involving over 1.6 million
including research on the reliability and validity of the II-plus tests in North-
ern Ireland. He is currently part of a team conducting an Economic and Social
Council (ESRC)-funded project on pupils' participation in the assess-
ment of their own learning.
Wyl1m: Hllr/tIl
Wynne's main professional interests have been science education and assess-
ment and over the past 20 years she has been Sidney Jones Professor of Science
Education at the University of liverpooL then Director of the Scottish Council
for Research in Education and is currently Visiting Professor at the University
of Bristol. Her experience in assessment includes working for eight years as
deputy director of the UK-wide Assessment of Performance Unit and chair of
the Science Expert Group of the DECO Programme for International Student
Assessment (PISA) from 1998-2003. She is a founder member of the British
Educational Research Association (BERA) and of ARG. She is currently direct-
ing, on behalf of ARG, the Nuffield-funded Assessment Systems for the Future
(ASFj projt.>ct and is a member of the Evidence for Policy and Practice lnforma-
lion and Co-ordinating Centre (EPPI-Centre) review group on assessment. Her
publications include 25 research reports, over 140 journal articles and 27 books
of which she is author or co-author.
Mary lames
Mary is Professor in Education at the Institute of Education, University of
London and Deputy Director of the Economic and Social Research Council
(ESRC) Teaching and Learning Research Programme (TLRPj, the largest
programm{' of educational r{'search ever mounted in th{' UK. She has also
recently completed a major research and development project on 'Learning
How to Learn - in classrooms, schools and networks', funded within the TLRP.
Major recent publications include a series of papers (in press) arising from the
Learning How to Learn project and a book for school managers entitled Usil1g
for SclJoo/lmprovemtlll, published by Heinemann in 1998.
Dave Pedder
Dave has ten yeaP.> of classroom teaching experience, gained in schools in Japan
and Indonesia, and extensive experience in pro\'iding INSET opportunities for
serving teachers. He has also writlen textbooks for teaching secondary school
English in Indonesia. In 1995, he made a career change from teaching into
research covering his key interests in classroom teaching and learning, the pro-
fessional development of teachers and school improvement. He has carried out
research into the effects of class size on processes and outcomes of teaching and
learning at Ke)' Stag> 3, and into how teachers consult pupils and respond to
their views about teaching and leaming. Dave worked full time as senior
viii
I
Contributors' details
research associate on the Learning How to Learn pro;ect until 2005, when he
took up his current post as lecturer in School Leadership, at the University of
Cambridge.
Judy e ~
Judy is Professor of Education at the University of Sussex where she leads the
teaching, learning and assessment research, including assessment for learning
in the Portsmouth Learning Community Programme. Previously, she had been
Senior Adviser (Research) for the Standards and Effectiveness Unit, Depart-
ment for Education and Skills (DrES), with responsibilities which included
work on using assessment for learning research to inform policy and practice.
Previous research while in the universities of Cambridge and Manchester
included work on school improvement and inspection- self- and peer assess-
ment and special needs and inclusion. She is currently a member of the Evi-
dence for Policy and Practice Information and Co-ordinating Centre
(EPPI-Centre) review group on assessment.
GordQn SlobiJrt
Gordon is Reader in Assessment at the University of London. Institute of Edu-
cation haVing moved from the Qualifications and Curriculum Authority (QCA)
where he worked on both national tests and examinations. His membership of
the Assessment Reform Group reflects his commitment to the use of assessment
for learning as well as of learning. He ro-authored, with Caroline Gipps,
Asses5mrnt: a Teacher's Guide to the Issues (3rd edition., 19(7) and is currently
editor of the }Dumal Assessment in Educatioll.
Dylan Wiliam
Dylan directs the Learning and Teaching Research Center at the Educational
Testing Service (ETS) in Princeton- New Jersey. After teaching in serondary
schools in London for seven years, he joined King's College, London in 1984 and
continued teaching at college level until 2003, when he joined ETS. He served
as dean and head of the School of Education at King's before taking up the posi-
tion of assistant principal there. He is co-author, with Paul Black, of Inside /he
Black Box: raising achit:Vmumt through fomla/iut: assessmnJ/ (King's College,
London) and Assessment and Classroom Learning (Assessment in Education 5(1),
which was commissioned by the Assessment Reform Group.
Assessment and Learning: An
Introduction
John Gardner
On first inspection. t h ~ title of this book arguably places [earning. one of the
most fundamental processes in a person's lifecourse, secondary to one of the
most oonlri\'ed processes, the assessment of that learning. Our intention,
however, is quite the opposite. As members of the Assessment Reform Group
I'Ve have accumulated over seven!(>(>n years of collective research into assess-
ment policy and practice, and many more years as individuals. Asst'S5ment and
Leaming is a book, therefore, that places learning at the centre of our concerns
bUI unambiguously underscores the importance of assessment in that leaming.
The Assessment Reform Group is based in the UK and though it is natural
for us to tum 10 our own contexts to illustrate analyses of assessment practice,
the key aspiration throughout the group's existence has Ix.-en to collate and use
research from around the globe to develop a better understanding of how
assessment can significantly contribute 10 learning. The reader will therefore
fmd a liberal sprinkling of research-informed insights from a wide variety of
international contexts in addition to chapters thai specifically consider the poli-
cies in relation to asscssm(!ll! for learning in a variety of countries. Here and
there, throughout the book, we refer to various types of learning contexts in
these countries but it is fair to say that we draw heavily on the compulsory
phases of education {roughI}' 4-16 years in most countries) to contextualize the
practice of assessment for learning. II is also fair to say that it is in this context
that the majority of research and experimentation has been recorded. We r ~
ognize that il is beyond the capacity of anyone book to cover the huge span of
educational endeavour in a world in which lifelong learning is the name of the
game but we hope that the concepts and processes we illuminate throughout
the chapters, such as learner engagement, feedback, motivation and pedagogic
style, are key to any learning environment facilitated by leaching or instruction.
Translating them to other learning contexts such as work-based learning, adult
and community education or post-{'()mpulsory education, is not straightfor-
ward but the principles and practices will be relevant.
In most developed countries, the pursuit of reliable and valid means of
assessing people's learning generates high volumes of published discourse and,
not infrequently, dissent; the documentation on the various assessment policies,
practices and theories could conceivably fill whole libraries. Some of the dis-
course and much of the dissent relate to whether the use to which assessment
,
is put is valid or, to put it more mundanely, useful to the learners themselves or
to other audiences. Our pursuit is somewhat different. We would argue that
learning should take centre stage and we address the role that assessment
should play in this. Assessment is our focus but learning is the goal.
Two phrases, 'formative assessment' and 'assessment for learning', are
throughout all of the chapters that follow. The older phrase, 'formative assess-
ment', can be traced back to Scriven's (1967) concepts of formative and summa-
tive evaluation, distinguished at the time solely on the basis of when the
'valuation in question is carriro out. While timing is merely one of the distinc-
tions today, formative assessment remains a Widely used concept in roucation.
However, it is sometimes used to describe a process in which frequent ad hoc
assessments, in the classroom or in formal assessment contexts such as practi-
cal skills work, ar' carriro out over time and collatro specifically to provide a
final (summative) assessment of learning. Such assessments potentially do not
contribute to the students' learning. The second phrase, 'assessment for learn-
ing', came into use in the late 19805 and early 19905 and may therefore be con-
sidered a somewhat 'newer' concept.
In truth, though, assessment for learning comprises the same time-honoured
practices as formativ' assessment, that is, 'what good teachers do' (AAIA,
2005b) and have always done when using assessment to assist students to tak'
the next steps in their learning. In contrast to the ternl 'formative assessment',
however, assessment for learning is arguably less likely to be used to describe
the summative use of multiple assessments of learning. The words focus
squarely on the essence of our pursuit: the promotion of assessment to support
learning and this is neatly contra-distinct from assessment of learning. In the
final analysis there is little of substance to distinguish the two terms 'formative
assessment' and 'assessment for learning', but for the wider educational and
policy-making audiences we feel, as Daugherty and Ecclestone report in
Chapter 9, that the latter is more accessible than the more technicaltel1Il, 'form-
ative assessment'. That said, we are content to use both phrases interchange-
ably, when there is no ambiguity in the type of assessment process being
describl..>d.
In order to emure we remain consistent in how we describe the type of
process that assessment for learning is, we have defined it to be:
thr proass ofsnking lind intrrprrlingevidrnCTfor USf! try frllmn'S lind their ttll(:h-
trs, to identify when- thr ltllmrrs llrt in thrir Itllrning, whtrt tMy nerd to go lind
how best to grl therr. (ARG, 2002aJ
Unpacking this deceptively simple definition, in terms of classroom practi('l:',
reveals a complex weave of activities involving pedagogic style, student-
teacher interaction, self-reflection (teacher and student), motivation and a
variety of assessment processes. For example, teachers need to plan the learn-
ing environment and activities, students need to engage in the assessment of
their learning and teachers need 10 assess the extent of the students' under-
standing as they are learning. They then need to challenge and support these
2
Assessment and learning: An Introduction
students to enable them to reach the nexl stage in their learning progress. An
analysis of such a complex learning approach could never be exhaustive but we
have tried to make it accessible through a previous publication entitled Asstss-
numl for Learning: 70 Principlts. These principles are mentioned in various places
in the chapters that follow and are summarized below:
Assessment for learning
Is part of effective planning;
Focuses on how students learn;
Is central to classroom practice;
Is a key professional skill;
Is sensitive and constructive;
Fosters motivation;
Promotes understanding of goals and criteria;
Helps learners know how to improve;
Develops the capacity for self-assessment;
Recognizes all educational achievement. (ARG, 2002a)
All of these qualities, which we attribute collectively to assessment for learning.
appear in various guises throughout the book - in their practice, in the theories
underlying them and in the educoltional policies that relate to them.
Practice, theory and policy
Under these generic headings the structure of the book proceeds in four parts,
which in tum address how assessment for learning manifests itself in the class-
room, its theoretical underpinnings, how it relales to sumrnative assessment
and how nalional policies in a selection of nations do (or do not) reflect its role
in learning. Chapter 1 by Black and Wiliam drolws empirical evidence from the
King's-Medway-Oxfordshire Formative Assessment Project (KMOFAP) to
portray how the findings from fonnative assessment research may be trans-
lolted to classroom practice. In addition to Significant changes to pedagogy, the
chapter demonstrates that a full espousal of assessment for learning creates
inevitable changes in teaclwrs' and learners' understanding and attitudes to
learning itself. James and Pedder pick up the story in Chapter 2, which draws
on it second project. Learning How to Learn. to tackle how such changes can be
promoted and supported through teachers' professionalleaming. Central to the
findings is the observation that very often, in an assessment for learning
context, there is little to distinguish between the processes of learning for stu-
dents and teachers. These initial chapters comprise Part I, Practice.
Part II, ThaJry, takes readers to chapters that consider the existing knowledge
and understanding that underpin assessment for leaming as a concept and its
close relationship to pedagogy. In Chapler 3 James considers three major clus-
ters of leaming theory: behaviourist, constructivist and socio-culturaJ.
Acknowledging the overlaps, the chapter deals with the implications these the-
ories have for assessment and leaching generaUy and establishes the basis by
3
which the baton. of explaining the processes foslered by assessmenl for learn-
ing, is passed to Chapters 4 and 5. Harlen's Chapler 4 addresses a key element
of formative assessment's armoury. Drawing on rcscarch from around the
world, Harlen argues that some summative assessment practict'S may have a
negatin' impact on learners while steps that ameliorate the worst effects, by
developing and sustaining learnt'r motivation. are often based on the principles
of assessment for learning. Black and Wiliam return in Chapter 5 to propose a
tenlative four component model, developed using Engestrom's Activity System
(1987), as a theoretical framework for formative assessment. Cenlral to the
model is the triangle of teacher-tools-outromcs; the way that 'tools' (a I('rm that
includes procedures such as fet.>dback and pt.'er asscssmcnt) alter the teacher-
studenl inleraction; lhe role of !hI." leacher in the regulation of Il."arning and the
slud('nt's own role in th('ir learning; and how such changl'S bear upon the out-
comes of learning work.
Part III, F"rmillive lind Summllllv;' Issues, continul."S thl." theoreticallheme but
with a focus on more technical aspects of the distinctions and overlaps bE-tween
assessment for learning and summative assessment. In Chapter 6 Harlen
I."xplores whl."ther it is fl."asible for evidence gathl."red for one purpos.e to bE- used
for another. Proposing a spectrum of possibilities that allows an overlap of pur-
post"S in its middle region, the chapter highlights the pitfalls of blithely advo-
cating dual usagl." of the same information and suggests the conditions under
which ils integrity may be preserved. However, the question of whether teach-
ers' assessments are reliable looms in the background. Black and Wiliam's
Chapll."r 7 specifically deals with this issul." of a perceived lack of reliability in
any aSSCSSml'1lt carried out by teachers, whether formative or summative. The
chaptl'r debunks the corresponding folk-myth thai external asscssmmts (statu-
tory tests and so on) are highly reliable and argues that dependability in rontext
is the key attribute for any assessment. Stobart's Chapter 8 picks up the validity
issue and concludes that there is a simpl(' test for the validity of any assessment
for learning pl"OC('Ss: did learning take place as a consequence? A negative
response suggests that lhe formative assessment is invalid because it did not
achieve its purpose of promoting learning.
TIle book lhus far seeks to build a platform of disrourse, informed by prac-
tice and research, on which to articulate the processes and proclaim the benefits
of assessml."nl for learning. rolft IV, Policy, turns to the lhomy issue of how gov-
ernments and national educational organizations ref1C'ct devl."lopments in
assessment for It"aming in tht"ir policies. In Chaprer 9 Daugherty and Ecclestone
analyse the recent assessment policy activities in England, Scotland, Wales and
Northl."m Ireland in terms of the key policy texts and the discourse surround-
ing policy development. Wiliam's Chapter 10 charts thl." I."mphasis in the USA on
high stakes tests and the psychoml."lric dimensions of reliability and validity as
primary concerns. Although there is an emerging interest in monitoring student
progress current moves involving portfolio assessment, for example, are
limited and formative assessment is generally better understood as 'early
wanting' summalive measures in relation to predicting perlonnance in annual
510111" tesls. Concluding Part IV, Sebba's Chapter 11 acknowledges the probll."ms
4
Assessment and Learning; An Introduction
inherent in any attempt to compare practices across cultures but points to the
commonalities of understanding and practice across several nations.
Finally Gardnel"'5 Olapter 12 serves a5 a concluding discussion on the main
messages the book offers.
The AssHSment Reform Group
Ever 5ince its establi5hment in 1988, 015 the then A5se55ment Policy Task Group
of the Briti5h Educational Research A5SOCiation (BERA), the group has occa-
sionally changed in personnel but has doggedly pursued the agenda of improv-
ing assessment in all of its forms. 1111.' founding members were Patricia
Broadfoot, B'Yan Dockrell, Caroline Gipp5, Wynne Harlen and Desmond
Nuttall and its first task was to consider the implications of the 1988 Education
Reform Act. FollOWing Bryan Dockrell's retirement, Mary James and Richard
Daugherty joined the group in 1992 and, in 1994, after the untimely death of
Desmond Nuttall the previous year, Gordon Stobart and John Gardner also
joined. The membership then remained more or less unchanged until Caroline
Gipps and Patricia Broadfoot moved on, in 2000 and 2002 respectively.
Very able replacements were on hand and the current group now includes
Paul Black. Kathryn Ecdestone and Judy 5ebba. Dylan Wiliam was also a
member for a short time before leaving to take up a post in the USA. In this
book we are very pleased to welcome him back as a guest contributor, along
with Dave redder as a co-author of a chapter with Mary James. Since 1997,
when BERA ceased to Sp0n50r policy task groups, we have continued to work
as an independent group. Our primary source of funding has bet'n the Nuffie1d
Foundation, which has generously supported a variety of our activities includ-
ing group meetings, regional seminars and the dissemination of our work. This
hmding has been vital and we would be very remiss if we did not take the
opportunity here to acknowledge our grateful appreciation of both BERA's and,
in more recent times, the Foundation's support.
This introduction to the book would be seriously deficient if acknowledge-
ment of our sources and influences was not formally recorded. Over the period
of its existence, the group has worked with many people including teachers,
academics and curriculum and assessment agency personnel from around the
world, local authority advise.rs and district superintendents, government offi-
cials, politicians and, most importantly, students in a variety of national and
local contexts. There are too many to name and it would be inappropriate to
singl" out s?"cific people. However, the ront.,nt of this book has bt.'CIl in(h..-
eneed by them all and we humbly record our thanks to everyone with whom
we have had the privilege to work.
5
Part I Practice
Chapter 1
Assessment for Learning in the Classroom
Paul Black and Dyliln Wiliam
Assessment in education must, first and foremost, St'rve the purpose of
supporting learning. So it is filling to start a study of assessml.!Tlt with an
exploration of the meaning and practices of assessment which serve this purpose
most dil"\'ctly. This chapler is the story of a development which started with a
review of whal research had 10 say about formative assessment. The background
to this review, and tht' main features of its findings, are first described. Its results
led to development work with teachers to explore how ideas taken from the
research could be turned into practice. A description of this work is followed by
reflections on its outcomes and implications. There is then a fm<ll discussion of
the dissemination of the project's findings and the wider impact.
The research review
The IMckground
Studies over many years have shown thai formative assessment is an impor-
tanl aspect of teachers' classroom work and that attention to improving its
practice ca.n enhance the learners' achievements. Harry Black, a researcher in
Scotland who was unique at the time in working with teachers to develop
formative assessment, introduced his account of the subject by pointing out
that formative assessment has always been part of the practice of teachers,
quoting in evidence a letter written by the principal of Greenwich Hospital
School (quoted in O1adwick, 1864) and calling attention to its neglect in the
following trenchant terms:
COl1sider Ihil/mounl of timi, tnITKY and by bolh individualleachus,
lind schools in gtnera/, 0" sdtiug II",II1U1rl;ing co,lIimwus IISSNSlIImt lests, end
of session eXllminations Iwd mock '0' levels. Reflect on the I1Wlley spent by exam
inalion boards and the nllmber of /lSStSsmellt specialists employed by Ilrem. Read,
if Yi"j can find asabbalical term, the literalure 0" the ttrhnology ofIlsstSSmml for
reporting and cerlifklliion. Compllre these ill tllm with 1M rom/llell' lIle/.: of
Sllpporl Ilormlllly given 10 in devising and applyinK procedllres to pin-
poinl their siudellis' leanring problems. with the uirtual abstllce of olltside IIge,,-
des to develop fomratilH! aSSt'Ssmelll illstrlllllenis lind procedures, alld the limited
literature on the topir. (Black, 1986: 7}
9
I
Linn, writing three years later, made a different and prophetic point about what
might be involvlXl:
the design of tests usqul Ivr the ins/ruclulnal df'CisilJllS made in the classroom
r<'quim an integra/ion of testing imd instruction. 11 a/50 requires Q dear CQllcep-
lion of 1M curriculum, the goo/s, and 1M pr0ctS5 of instruction. And it requires a
Ilu!ory of hIs/ruction and learning and a much btl/er understanding of the cogn;-
tivt proasstS of Itamers. (Litlll, 1989)
These extracts should nol be misunderstood: it is clear from Harry Black's work
that his terms 'procedures' and 'instruments' were not references to conven-
tional summati\'e tests, and it should also be clear that Linn's 'tests useful for
instroctional decisions' was nol a reference to such tests either. However,
despite such insights in the writing of several authors, fonnalive assessment
was not regarded as a more than marginal component by many of those
involved in public debates on education. The report of the government's Task
Group on Assessment and Testing (DES/WO, 1988a) made teachers' formative
assessment a central plank for its proposals for !he new national assessment in
the UK. While this was accepted in principle, however, in practice it was
brushed aside by ministers and their advisers (Daugherty, 1995) as either
already in place, or as a dangerous fall back to discredited ideas on child
centred learning. or as a mere nod towards teachers' classroom work before
focusing on the serious business of raising standards through national tests.
Yet there was accumulating in the research literature on formative assess-
mt"!nt practiC<?S a fonnidablt"! body of evidence that could support claims for its
importance. Early reviews by Natriello (1987), Crooks (1988) and Marshall and
Hodgen (2005) drew attention to this evidence, but thest' Wl'l't.' neitht.'r suffi-
ciently comprehensive in scope, nor targeted dil't-'<ily at making the argument
that formative assessment was a powerful way to raise standards. In particular,
the TGAT group was not even able to call on the body of resean.:h already pub-
lished by ~ 9 with sufficient strength to support its argument {or giving pri-
ority to teachers' formative assessment, to offset the general belief in the
promise that external testing with public accountability would be the sure way
to raise standards (Black, 1997).
The Assessment Reform Group, however, whose main amcem was consid-
eration of research evidence as a basis for formation of assessment policy,
judged that further exploration of formative assessment was essential and in
1996 obtained funding from the Nuffield Foundation to support a review of the
research. The group then invited us to carry out this review, and because of our
long-standing interest in formative assessment, we WeT(' happy to agree to
undertake the task.
The review
Our survey of the research literature invol\'ed checking through many books plus
the issues of over 160 journals for a period of nine years and studying earlier
reviews of research (Crooks, 1988; Natriello, 1987). This process yielded about 580
10
articles or chapters to study. Out of this we prepared a lengthy review, which
used material from 250 of these sources. The review was published (Black and
Wiliam. 199&) together with comments on our work by experts from five differ-
ent countries. In studying these articles our initial focus was on those describing
empirical work, whether with quantitative or qualitative evidence, whidl were
exploring some aspect of on-going assessment and feedback between teachers
and students. As the work progressed, our view of the issues relevant to the field
also dlanged as our searches led us to look at work on issues related and relevant
to the practices and understanding of fonnalive assessment.
A first section of the review surveyed the evidence. We looked for studies
which showed quantitative evidence of learning gains by comparing data for an
experimental group with similar data from a control group. We reported on
about 30 such studies, all of which showed that innovations, which included
strengthening the practice of fonnalive assessment, produced significant and
often substantial learning gains. They ranged over various age groups (from 5-
year-olds to university undergraduates). across several school subjects and
involved several countries.
1he fact that sum gains had been achieved by a variety of methods which had,
as a common feature, enhan<:ed fonnative assessment indicated that it is this
feature that accounted, at least in part, for the successes. However, it did not follow
that it would be an easy matter to achieve such gains on a wide scale in !lOnnal
classrooms, in part because the research reports lacked enough detail about the
pr.lctical use of the methods, detail truJt would be nceded if replication was envis-
aged. More significantly, successful implementation of methods of this kind is
heavily deJX'r'ldcnt on the social and educational cultures of the context of their
development, so that they cannot be merely 'replicated' in a different context.
A second section covered research into current teacher practices. The picture
that \!merged was that fonnative assessment was weak. In relation to effec/illt
lel/ming it seemed that teachers' questions and tests encouraged rotc and superii-
dalleaming.. even where teachers said that they wanted to develop understand-
ing. There was also evidence of the nega/it't iml>IUt of a focus on comparing
students with one another, so emphasizing competition rather than personal
improvement. Furthennore, teachers' feedback to students often seemed to S('rve
social and managerial functions, often at the expense of the learning functions.
A third section focused on research into the involvement of students in fonn-
ative assessment. Students' beliefs about the goals of learning. about the risks
involved in responding in various ways, and about what learning work should
be like, were all shown to affect their motivation. Other research explored the
different ways in which positive action could be taken, covering such topics as
study methods, study skills and peer and self-assessment.
A fourth section looked at ideas that could be gleaned from the research about
strategies that might be productive for teachers. One feature that em('l'ged was
the potential of the learning task, as designed by a teacher, for exploring students'
learning. Another was the m p o r t ~ of the classroom discourse, as steered by
teachers' questions and by their handling of students' responses.
A fifth section shifted attention to research into comprehensive systems of
11
I
teaching and learning in which formative assessment played a part. One
example was mastery learning programmes. In these it was notable that stu-
dents were given feedback on their current achievement against some expected
levt'l of achievement (the 'mastery' level), that such feedback was given rapidly
and that students were gh'en the opportunity to discuss with their peers how
to remedy any weakn('sSoeS.
Asixth section explored in more detail the literature on feedback. The review
of empirical evidence by Kluger and DeNisi (1996) showed that this can have
positive effects only if the feedback is used as a guide to improvement, whilst
the conreplual analysis of the concept of feedback by Ramapra;;ad (1983) and
the development of this by Sadler (1989) emphasized thai le... mers must under-
stand both the 'reference level' - \hat is, the goal of \h"ir learning - and the
actual \(;wl of th"ir understanding. An equally important message here came
from the research on attribution theory (for example by Vispoel and Austin.
1995) showing that teachers must aim to inculcate in their shldents the idea that
sucress is du" to internal unstable specific factors such as effort, rather than on
stable general factors such as ability (internal) or whether on" is positively
regarded by the tcacher (external).
OveraJl, the features which seem to characterize many of the studies were:
Formative work involves new ways to enhance fl'ft:!back betW(,(,1l thos<'
taught and the teacher, ways which require new modes of pedagogy and sig-
nificant changes in classroom practice;
Underlying the various approaches are assumptions about what makes for
effective learning - in particular that students have to be actively involved;
For assessment to function formatively, the results have to be used to adjust
teaChing and learning - so a significant aspect of any progr.lmme will be the
ways in which teachers do this;
The ways in which assessment can affect the motivation and s<'lf-est('('m of
students, and the benefits of engaging stud"nts in s<'lf-asscssment, both
des<'rve careful attention.
The structure of the six sections outlined above did not emerge automatically:
it was our chosen way to summarize the relevant literature. Our definition of
'relevance' el<"panded as we went along, so we had to find ways of organizing a
widening field of research and of making new concephlal links in order to be
able to combine the various rmdings into as coherent a picture as possible. We
believe that our review gffierated a momffitum for work in this field by pro-
viding a new framework that would be difficult to create in any oth...r way.
Moving into adion
setting up project
The second stage of our story followed the first almost ineVitably. Givffi that
our revie,w had shown that innovations in formative assessment could rais<'
12
o
Assessment for Learning in the Clawoom
standards of student achievement, it was natural to think about ways to help
schools secure these benefits. However, even if a recipe for practice could have
been derived from the variety of research studies, our own experience of teach-
ers' professional development had Iaught us that the implementation of prac-
tices in classrooms, which would be new to most of the teachers involved, could
not be a straightforward matter of proclaiming a recipe for them to follow. We
believed that new ideas about teaching and learning could only be made to
work in particular contexts, in our case that of teachers in (initially) secondary
schools in the UK. if teachers an' able to transfonn these and so create a new
practical knowledge relevant to their task.
So we obtained funding initially from the UK's Nuffield Foundation and
later from the USA, for a two yt:ar development project. To fmd schools and
teachers to work with. we talked with assessment specialists from two local
(district) education authorities (LEAs) whom we knew would unde.rstand and
support our aims and whose districts, Oxfordshire and Medway, were within
easy reach of London. Six schools who taught students in the age range II to 18
years were then chosen by the LEA specialists, the criterion agreed with us
being that they were to avoid schools that were either in serious difficulties or
unusually succt'SSful. They agreed to collaborate with us and each selected two
science and two mathematics teachers. In the second yt:ar of the project we
added two teachers of English, from each of the same schools, and one addi-
tional mathematics and science teacher, so that in aU 48 teachers were involved.
1be LEA specialists supported and were involved with the work throughout.
The project was called the King's-Medway-Qxfordshire Fonnative Assessment
Project (KMOFAP) to highlight our close collaboration with these partners
(Black and Wiliam, 20(0).
The teachers within the schools were chosen by the schools themselves, with
advice from the LEA staff. As a result, they ranged across the spectrum to include
heads of subject departments, a teacher close to retirement, one that was repuled
to have been chosen on the grounds of 'needing some INSEr and a newly qual-
ified teacher. However, the majority were experienced and well qualified. Before
the start of the project's work, each school was visited by us, with the LEAadviser,
so that we could explain the aims and the requirements to the head teacher. We
chose to work with secondary mathematics and science because we were spe-
dalists in those two subjects at this level and believed that the nature of the
subject matter was important. English teachers were brought in when a colleagut!
specializing in education in English was able to;oin the King's team.
We advi5t.>d the tt!ac/l('rs to focus on a few focal dasses, and to eventually
choose only one, to try out their specific innovations with a caveat thatlhey might
do well to avoid those age groups (14- and l6-year-olds) where statutory national
tests could inhibit freedom to experiment. In the event some ignored this advice,
so that the classes involved ranged over ages 11 to 16. Whilst support from each
school's senior management was promised in prindple, it varied in practice.
Moreover, within the school subject faculty or department some had stronger
support from subject colleagues than others, and in fact the collegial support that
would be essential in an endeavour of this kind was largely provided by the
13
,
meetings - once every five weeks - when the project teachers <Ill spent a day
togethi'r with the staff al King's. There was evidence of interest and support from
other school colleagues and several productive ideas were injected into the group
from this type of source. It was soon dear that the ideas in the project were also
influencing teachers more widely, to the extent that in some cases it was diffirult
to find suitable 'control' classes for comparison of their test performanct' with
those of students in the focal classes of the project.
The practices that developed
11lese practices will be described here under fOUf headings: oral fa.'<Iback in
classroom questioning (more rea-ntly relabelled as dialogue), feedback through
markillg, pea and and the jormatiw of summa/it'" tests. The
account given will be brief - more detailed accounts have been published else-
where (Black et aI., 2003). These were defined and developed in the course of
the project, the process being one where we drew from the research findings a
variety of ideas for which there was evidence of potential value and then the
reachers selected from these and developed thcm in their own ways. Whilst the
focus of the first of these and of some aspects of the other three was on class-
room practice, it was not a criterion for selecting our inputs. The four themes
discussed below were an outcome of the projf.'ct. Although they were related to
our inputs, WI' could not have predicted at the outset that this model of four
groups of activities would emerge in the way that it did.
For classroom the aim was to improve the interactive feedback Ihal is
central 10 formative assessment. One account of wait time research (Rowe, 1974)
motivated teachers to allow longer time after asking a question so that students
would have time to think out responses. They could thus be expected to become
actively involved in question and answer discussions and would make longer
replies. Increased participation by studcnts also required that all answers, right
or wrong; be taken seriously, the aim being to develop thoughtful improvement
rather than to evoke expected answers. A consequence of such changes was that
teachers Ieamt more about the pre-knowledge of their students and about any
gaps and misconceptions in that knowledge, so that their next mo\'e could be to
address a leamer's real needs.
As they tried to develop this approach, teachers realized that morc effort had
to be spent in framing questions that would evoke, and so help to explore, crit-
ical indicators of students' understanding. They also had to listen carefully to
students and then formulate meaningful responses and challenges that would
help them to extend that understanding.
The task of developing an interactive style of classroom dialogue required a
radical change in teaching style from many teachers, one that they found chal-
lenging not least beaJuse it felt at first as if they were losing control. Some were
well over a year into the project before such change was achieved. Subse<juent
,
work with other schools has shown that it is this aspect of formative work that
teachers arc least likely to implement successfully.
To address fn:dback through marking, teachers were first given an account of
14
research studies which had established that whilst students' learning can be
advanced by feedback through comments, the giving of marks or grades has a
negative effect because students ignore comments when marks are also given
(Butler, 1988). These results surprised and worried the teachers because of
concern about the effect of returning students' work with comments but no
marks. However, potential conflicts with school policy were resolved as experi-
ence showed that the provision of comments gave both students and their
parents advice on how to improve. It also set up a new focus on the learning
issues involved rather than on trying to interpret a mark or grade. To make the
most of the learning opportunity created by feedback on wrillen work, proce-
dures that required students to follow up comments had to be planned as part
of the overall learning process.
One consequence of this change was that teachers had to think more care-
fully in framing comments on written work in order to give each student guid-
ance on how to improve. As the skills of formulating and using such feedback
were developed, it became more dear that the quality of the tasks set for written
homework or classwork was critical. As for oral questions, tasks had to be
designed to encourage students to develop and express key feah.1res of their
understanding,
For peer lind Sl'1j-lISsessmrnt, the starting point was Sadler's (1989) argument
that self-assessment is essential to learning because students can only achie\'e a
learning goal if they understand that goal and can assess what they need to do
to reach it. Thus the criteria for evaluating any learning achievements must be
made transparent to students to enable them to have a dear overview both of
the aims of their work and of what it means to complete it successfully. Insofar
as they do so they begin to develop an overview of that work so that they can
manage and control it; in other words, they develop their capacity for meta-cog-
nitive thinking. A notable example of the success of such work is the research of
White and Frederiksen (1998).
In practice, peer assessment turned oul to be an important stimulus to self-
aSliCSSment. It is uniquely valuable because the interchange is in language that
students themselves would naturally use, because they learn by taking the roles
of teachers and examiners of others (Sadler, 1998), and oc'Cause students appear
to fmd it easier to make sense of criteria for their work if they examine other stu-
dents' work alongside their own, A typical exercise would be on the marking of
homework. Students were asked to label their work with 'traffic lights' as an
indicator of their confidence in their learning. that is, using n.>d or amber if they
were totally or partially unsure of their success and green where they were con-
fident. Those who had used amber or green would then work in mixed groups
to appraise and help with one another's work, while the teacher would pay
s!X'dal attention to those who had chosen red,
Teachers developed three ways of making forma/if)(! use of summatroe liStS.
One way was to ask students, in preparation for a test, to 'traffic light' a list of
key words or the topiCS on which the lest would be set, an exercise which would
stimulate them to reflect on areas where they felt their learning was secure and
where they needed to concentrate their efforts. One reason for doing this was
15
I
that teachers had realized that many students had no strategy in preparing for
a test by formulating a strategic appraisal of their learning.
A second way was to mark one another's test papers in peer groups, in the
way outlined above for the marking of homework. This oould be particularly
challenging when they were expected to invent their own marking rubric, for to
do this they had to think about the purpose of a question and about which cri-
teria of quality to apply to responses. After peer marking. teachers could
reserve 8 time for discussion of those questions thai give particular difficulty.
A further idea was introduced from research studies (Foos et aI., 1994; King..
19'92) thai have shown that students trained to prepare for examinations by
generating and then answering their own questions oul-performed comparable
groups who prepared in conventional ways. Preparation of lest questions (ails
for, and so develops, an overview of the topic.
The teachers' work on summative assessments challenged our expectations
that, ~ the conted in which they worked, formative and summative assess-
ments are so different in their purpose that they have to be kept apart. The
finding that emerged was quite different - that summative tests should be. and
should be seen to be, a positive part of the learning process. If they could be
actively involved in the test process, stud{'nts might see that they can be bene-
ficiaries rather than victims of testing, because tests can help th{'m improve
their learning. However, this synergy could not be achieved in the case of high-
stakes tests set and marked externally; for these, as currently designed and
administered, formative use would not be possible.
Reflections on the outcome
It was dear that the new ideas that had emerged between the teachers and our-
selves involved far more than the mere addition of a few tactical tricks. Some
reflection was needed to tease out the more fundamental issues that seemed to
have been raised,
A focus on learning
One of the most surprising things that happened during the early project meet-
ings was that the participating teachers asked us to run a session on learning
theories. In retrospt.'d, perhaps, we should not have been so surprised. Whilst
teachers could work oul, after the event, whether or not any feedback had had
the desired effect, what they needed was to be able to give their students feed-
back that they knew in advance was going to be useful. To do that they needed
to build up models of how students learn.
Thus the teachers came to take greater care in selecting tasks, questions and
other prompts, to ensure that the responses made by students actually 'put on
the table' the ideas that they were bringing to a learning task. The ke)' to effCi:-
tive learning is to then find ways to help students restructure their knowledge
to build in new and more powerful ideas. In the KMOFAP classrooms, as the
16
I
Assessment for learning in the Classroom
leachers came to listen more attentively to the students' responses, they began
to appreciate more fully that learning was not a process of passive reception of
knowledge, but one in which the learners must be active in creating their own
understandings. These ideas reflecl some of the main prindples of the con-
structivist view of teaming - to start where the student is and involve them
actively in the process.
Students also changed, coming to understand whal counted as good work by
focusing on the criteria and their exemplification. Sometimes this was done
through concentrated whole-elass diSOlssion around a particular example; at
other times it was achieved through students using criteria to assess the work
of their peers. The activities, by encouraging students to review their work in
the light of the goals and criteria, were helping them to de...elop meta-rognili\'e
approaches to learning.
Finally, Ihe involvement of students both in wholt'-Class dialogue and in peer
group discussions, all within a change in the classroom culture to which all four
activities contributed, was crealing a richer community of learners where the
social learning of students was becoming more salient and effective.
A leemlng environment end chenges of role
Reflection on the described above ll'd to more profound thinking
by partidpants about their role as leachers and aboul the need to 'engineer'
learning environments in order to involve students more actively in learning
tasks. The emphaSiS had to be on the students doing thc thinking and making
thai thinking public. As one teacher said:
There was a definite Iransition at some /lOint,fromfocUSing on what I was llIll/ing
into the 10 what Ihe studenls UJere CCJIIlribuling. It became obvious that
olle way to ",au a significant sustainable chatrge was to get Ihe sludrnts doing
more of the thinking. liMn began to search for U1<lYS 10 make Ihe learning process
more IranSpolrtnt to the studrnls. Indud, 1nOU1 sptlld my time lookingfor ways 10
Sel students to take rcsponsibilily for their leamins and at Ihe Sflme lime makillg
tile leanrillg moTt collaborative.
This teacher had changed his role, from a presenter of contenl 10 a leader of an
exploration and development of ideas in which all students were involved. One
of the striking features of the project was lhe way in which, in the early stages,
many the new approach as 'scary', because they felt that they were
losing control of their classes. Towards the end of the project, they described
this same process not as a loss of control, but one of sharing responsibility for
the class's learning with the class - exactly lhe same process, but viewed from
two very different perspecthes.
The learning environment envisaged requires a classroom cultu!\' lhal may
well be unfamiliar and disconcerting for both teachers and students. 1lIe effecl
of the innovations implemenled by our teachers was to change the rules,
usually implicit, that govern the behaviours that are expected and seen a.s legit-
17
,
imate by teachers and by students. As Perrenoud (1991) put il:
11e1Y teacher who WQnts to practise formative Il$(S5mflll IIwsf reconstruct the
leaching roll/ract so as to coullttract the habits acquirtd by his pupils.
From the students' viewpoint, they have to change from behaving as passiVl'
recipients of the knowledge offered to becoming active learners who can lake
responsibility for their own learning. These students become more aware of
when they are learning and when they are nol. One class, who were subse-
quently taught by a teacher nol emphasizing assessment for learning, surprised
that teacher by complaining: 'Look, we've lold you we don't understand this.
Why are you going on to the next topic?'
What had been happening here was that everybody's role expectations, that
is, what teachers and students think that being a leachl'r or being a student
requires you to do, had been altered. Whilst it can seem daunting to undertake
such changes, they do not have to happen suddenly. Changes with the
KMOFAP teachers came slowly - over two years rather than ont' - and steadily,
as experience developed and confidence grew in the use of the various strate-
git's for enriching feedback and interaction.
Acollection of individual and group discussion data near the end of the project
did expose one unresolved problem; the tension between the fonnative approach
and summlltive demands. Some, but not all, leachers were confident that the new
work would yield better test results than 'teaching to the test'. However, for their
in-school summative tests many felt impelloo to use questions from the Key Stage
3 and GCSE tests despite doubts aOOut the validity of these in relation to the
improved student learning achieved in the project. The general picture was thal
despite developing the formative use of their summatiw tests, teachers felt that
they could not J't'CQncile the external test and accountability pressures with their
investment in improved formative assessment.
Research and pradice
Explaining success - the focus of the project
We weri surprised that the project was so successful in promoting quite radical
changes in the practices of almost all of the teachers involved and wondered
whether lessons could be learned from it about the notoriously difficult
problem of turning research into practice. One relevant factor is that the ideas
which the project set before the teachers had an intrinsic acceptabilitr to them.
We were talking about improving learning in the classroom, which was central
to their professional identities, as opposed to bureaucratic measures such as
predicting test levels. One feamre of our review was that most of it was ron
cerned with such issues as students' perceptions, peer and self-assessment, and
the role of feedback in a pedagogy focused. on learning. Thus it helped to take
the emphasis in fonnative assessment studies away from systems, with a focus
I.
I
Assessment for learning in the Classroom
on the formative-summative interface, and to relocate it on classroom
processes. Acceptability was also enhanced by our policy of emphasizing that it
was up to each teacher to make his or her own choice belween the different
formative practices; so teachers developed their own personal portfolios, sup-
plementing or dropping components as their own experience and that of their
colleagues led them to change.
Linked to the previous factor is that through our choice to concentrate on the
classroom processes, we had decided to live with the external constraints operat-
ing al the formativc-summative interface: the legislated attempts to change the
system, in the 1980s and 19905 in England, were set aside. Whilst it might have
been merl:'ly prudl'llt to not try to tilt at windmills a more fundamental strength
was that it was at the level chosen.. that of the core of learning.. that formative
work stakes its claim for attention. Furthermore, given that any change has to
work itself out in teachers' practical action, this is where reform should always
have started. The evidence of learning gains, from the literature revicw and from
our pro;ect, restates and reinforces the claim for priority of formative work that
earlier policy recommendations (DES, 1988a) tried in vain to establish.
Another factor that appears to have been important is the credibility that we
brought as researchers to the process. In their project diaries, several of the teach-
ers commented that it was our espousal of these ideas, as much as the ideas them-
selves, that persuaded them to engage with the pro;ect. Part of that credibility is
that we chose to work with teadlers in the three subjects - English, mathematics
and science when in each of these one or two members of the team had both
expertise and a reputation in the subject community. ThUs, when specific issues
such as 'Is this an appropriate question for exploring students' ideas about the
COIlU'pt of photosynthesis?' arose, we could discuss tht.-m seriously.
Expl.ining success: the process str.tegy
The way in whim teachers were involved was important. They all met with the
researchers for a whole day every five weeks and over two years in all. In addi-
tion, two researchers were able to visit thc schools, observe the teachers in their
classrooms, give them feedback. collect interview data on their perceptiOns and
elidt ideas aboul issues for discussion in the whole day meetings. The detailed
reports of our findings (Black 1.'1 aI., 200z.. 2003) are based on records of these
meetings, on the observations and records of visits to classrooms by the King's
team, on interviews with and writing by the teachers themselves, on feedback
from the LEA advisers who held their own discussions with their teachers and
on a few discussions with student groups. As the project de\'E'loped, the King's
team played a smaller part as the teachers took over the agenda and used the
opportunity for their own peer learning.
In our development model, we attended 10 both the content and the process
of teacher development (Reeves et ill., 20(1). We attended to the process of pro-
fessional development through an acknowledgement that teachers need time,
freedom and support from colleagues, in order to reflect critically upon and to
develop their practice whilst also offering practical strategies and techniques
19
about how to begin the proct'ss. By themselves, however, these are not enough.
Teachers also need concrete ideas about the directions in which they can pro-
ductively take their practice and thus there is a need for work on the profes-
sional development of teachers to pay specific attention to subject-specific
dimensions of teacher learning (Wilson and Berne, 1999).
One of the key assumptions of the project was that if the promise of form<l-
tive assessment was to be realized, a research design in which teachers <Ire
asked to test out and perhaps modify a scheme worked out for th(>m by
researchers would not be appropriate. We presented them with a collection of
ideas culled from research fmdings rather than with a structured scheme. We
argued thai a process of supported development was an esst>ntial next slep. In
such a proct'ss, the te<lchers in their classrooms had to work out the <lnswers to
many of the practical questions which the research evidence '0'1'(> had presented
could not answer. The issues had to be reformulated in collaboration with them,
where possible in relation to fundamental insights and certainly in tl'rmS that
could make sense to their peers in ordinary classrooms.
The key feat\Jre of the INSET !Il."Ssions was the development of action plans.
Since we were aware from other studies that effecti\'(' implementation of form-
ative assessment requires teachers to renegotiate the 'learning contract' that
they had evolved with their students (Brousseau, 1984; Perrenoud, 1991), we
decided that implementing formative assessment would best be done at the
beginning of <I new school year. For the first six months of the project (January
1999 to July 1999), therefore, we encouraged the teilchers to experiment with
some of the strategies and techniques suggested by the research, such as rich
questioning. comment-only marking. sharing criteria with leartl('rs and student
pt'i'r and self-assessment. Each teacher was then asked to draw up an action
plan of the practia-s they wished to develop and to identify iI single focal d<lSs
with whom these strategies would be introduced at the start of the new schoul
year in September 1999. (Details of these plans can be found in Black et aI.,
2003.) As the teachers explored the relevance of formative <lssessment for their
own practice, they transformed ideas from the research and from other t(>ach-
ers into new ideas, strategies and techniques and theSi' were in turn communi-
cated to teachers thus creating a 'snowball' effect. As we have introduced these
ideas to more and more teachers outside the project, we have become better al
communicating the key ideas.
Through our work with teachers, we have come to understand more dearly
how the task of applying research into practice is much more than a simple
process of 'translating' the findings of researchers into the classroom. The teach-
ers in our project were engaged in a process of creation, albeit of a
distinct kind, and possibly relevant only in the sett'ngs in which they work
(Hargreaves, 1999). We stressed this feat\Jre of our approach with the teachers
right from the outset of the project. We later that some of them did
not, at that st<l8e, believe us: they thought that we knew exactly what we
wantt.>cl them to do but were th(>m 10 work it out for themselves. As they
came to know us better Ihey realized thaI, al the level of everyday classroom
practice, we really did nol know what to do.
20
Assessment fof Leaming in the Classroom

1he arguments in this section are addressed only to the specific question with
which it started - why did this project work? - with the intent of thereby illumi
nating the vexed issues of the relationship between research and practice. They
cannot claim to address the question of whether an innovation with similar aims
would succeed in different drcumstanoes. Any attempt to answer suma question
would have to relate the context and particular features of our work to the context
and features of any new situation,. bearing in mind that any such innovation will
start from where our work finished and not from where it started.
Dissemination and impad
Publicity
Publidty designed to make a case for formative assessment started, alongside
the publication of the research review, in 1998. Although we tried to adhere
closely to the traditional standards of scholarship in the social sciences when
conducting and writing our review, we did not do so when exploring the policy
implications in a booklet entitled Inside the Black Box (Black and Wiliam, 1998b)
that we published, and publicized widely, alongside the academic review. This
raised a great deal of interest and creatoo some momentum for our project and
for subsequent dissemination. While the standards of evidence we adopted in
conducting the review might be characterized as those of 'academic rationality',
the standard for /lISide the Black &x was much closer to that of 'reasonableness'
advocated by Toulmin for social enquiry (Toulmin, 2001). In some respeds
Inside the Black Box represented our opinions and prejudices as much as any-
thing else, although we would like to think that these are supported by evi
dence and are consistent with the fifly years of experience in this field that we
had between us. It is also important 10 note that its success - il has to dale sold
about SO,<XXl copies - has been as much due to its rhetorical force as to its basis
in evidence. This would make many academics uneasy for it appears to blur the
line between fact and value, but as Flyvbjerg (2001) argues, social enquiry has
failed precisely because it has focused on analytic rationality rather than value-
rationality (see also Wiliam, 2(03).
The quantitative evidence that formative assessment does raise standards of
achievement was a powerful motivalor for the teachers al the start of the
project. One aspect of the KMOFAP project was that the King's team worked
with each teacher to collect data on the gains ill test performance of the students
involved in the innovation and comparable data for simillir classes who were
not in\'olved (Wiliam el aI., 2(04). The project did not introduce any tests of its
own; the achievement data used were from the tt'sts that the schools used for all
students whether or not they were in\'olwd in the project. The analysis of these
data showed an o\'erall and significant gain in achie\'ement outcomes. Thus the
evidence from the research review can now be supplemented by evidence of
enhanced performance in the UK national and in schools' own examinations.
This evidence was incorporated, with an account of the practical lessons learnt
21
I
in the KMOFAP project, in a second small booklet entitled Working Inside tht
Black Box (Black et aI., 2002) which has also been widely sucressful with about
40,000 copies sold to date, whilst a detailed account of the project's work (Black
et aI., 2000) has also been very well received. Other publicity for fonnative
assessment, further research results and practical advice notably from the
Assessment Reform Group (ARC, 1999, 200Za,b) have added to the impact.
Dissemination
Following this project. members of the King's team ha\'c responded to numer-
ous invitations to talk to other groups: in the space of three years they have
made over 200 such contributions. These have ranged across all subjects and
ilcrOSS both primary and secondary phases. (n addition. there has been sus-
tained work with four groups of primary schools. The King's team has also bet>n
involved as advisers to large-scale development ventures in several local gov-
ernment districts in the UK, with education ministries in Scotland and in Jersey,
and in a recent exploration of classroom outcomes for a government pro-
gramme which aims to improve teaching and learning practices in schools.
One reason for many of these requests is that assessment for learning was
adopted as one of several key areas in the UK government's Key Stage 3 initia-
tive, a programme focusing on improvements in methods of teaching and learn-
ing for students in the age range 11- 10 l4-years-old. The work of the ARC and
of the King's group has been quoted and used in the documents for schools and
in the training for local advisers that form the basic tools for this initialive. II has
turned out to be the area that schools in general have been most ready to adopt.
The Education Department of the Scottish Executive, which has full legislative
powers for education in Scotland, has also taken up the work as one of several
projects in its Assessment is for Learning [)e\'elopment Programme. This pro;ect.
entitled 'Support for Professional Practice in Formative ASS('SSmenl', involved
four groups of 8 or 9 schools, including both secondary and primary. They were
supported by anI.' development officer and staff from two university faculties and
also by rontributions from the King's project staff. The work started in May 2002
and an evaluation project, ronducted by IN> London Institute of Education, com-
pleted its work in summer 2004. The evaluation (Hallam et a!., 2004) reported the
following findings regarding impact on students:
A substantial increase in perceptions of pupils' engagement with learning..
with particular notable impact on lower allainers, shy and disengag..>d pupils
in a special school for pupils with complex learning needs;
Belter motivation, more positive attitudes to learning ilnd, for many,
enhanced confidence;
Some improvements in behaviour and more CQ-Opo?ration in class in team-
work and in leaming;
Dramatic improvements in pupils' learning skills. in learning about their
strengths and weaknesses and aboul what they needed to do to make
progress, so encouraging lhem to take more responsibility for their learning.
22
I
Assessm@nt for learning in the Classroom
As for the teachers, they reported greater awareness of the ne,..ds of individual
students and improvement in their motivation, confidence and enjoyment of
their work. They believed that their capadty for self-evaluation, reflection and
continuous improvement had been enhanced. A positive impact on their
schools as a whole was also reported and similar benefit for parents was
reported by the primary schools.
Just as these features reflected the experience of the KMOFAP project (which
was not independently evaluated), so did most of the points that were judged
to contribute to the success. "These included the provision of time out of class for
tearners to plan, prepare, reflect and evaluate the action research elements of
the project and the commitment of each school's head teacher and senior man-
agem('Jlt team.
The evaluation also revealed several challenges. One was that some staff
found that the initiative called for a fundamental change in their pedagogy
which they found stressful and for more priority in developing differentiation
in of the strategies. A n'ed to meet the demands of external
accountability was also a cause for concern. with teachers reporting tension
between the requirements of summative and the implementation of
new formative practices. Again, in all of these features there was dose corre-
spondence with the KMOFAP experience.
Future issues
Many questions arise from this work which await furthl'r research enquiry.
Some will be taken further in subsequent chapters of this book. "The nt..'Cd to c0-
ordinate all of the above issues in a comprehensi\'e theoretical framework
linking assessment in classrooms to issues of pedagogy and curriculum will be
tackled in Chapter 5. The tensions and possible synergies betwet!n teachers'
own assessments and the assessment results and methods required by society
will be explored further in Chapter 6.
A further issue exists in the assumptions about learning underlying the cur-
riculum and pedagogy. lhe beliefs of teachers about learning, about their roles as
assessors and about the 'abilities' and prospects of tOOr stud('Jlls, will affect their
interpretations of their students' l"aming work and wiU thereby determine the
quality of tht.>ir formative assessment. This will be taken further in Chapters 3 and
4. A parallel ('Jlquiry is also needed into the perceptions and Ixoliefs held by stu-
dents about themselves as learrK'ni and into their experience of lhe changes that
follow from inno\'ations in formative assessment. Exploration of this issue is a
current aim of the ESRC Learning How to Learn (L2L) project.
Light will also be cast by that project on the problem of the generalizabllity
of the findings, from both KMOFAP and Ihe Scottish initiative. The experience
so far of schools basing their own innovations on the existing findings of results
from reseim:h and from recently developed practice is that a sustained com-
mitment over at least two years is needed. that evaluation and fL"edback have to
be built in to any plan and that any teachers involved need strong support, both
23
I
Assessment C1nd Learning
from colleagues and from their school leadership. The mOTe recent LlL project
- a collaboration between Cambridge, King's College and the Open University
- has implemented interventions based in pari on the findings of our project.
This project has given less intensive support 10 schools, bUI at the same time has
been researching the beliefs and practices of both students and teachers and
issues of professional development within and across a large number of
schools. This work is reportt.'CI in Chapter 3.
Oth issues that might repay further exploration are:
The surprising feature - thai the resean;h in this field has paid virtually no
attention to issues relating to race, class and gender;
The effect on practice of the wnlenl knowledge, and the pedagogical conlen!
knowledge, that tcachers deploy in particular school subjects. Issues (or
enquiry would include the way in which these resoUTCE'S underlie each
teacher's composition and presentation of the learning work, and the inter-
pretative frameworks that they usc in responding to the evidence provided
by feedback from students;
11lt' need to pursue in more detail the many issues about pedagogy that are
entailed in formative assessment work, notably the deployment in this
context of the results of the numerous studies of classroom dialogue (see for
example Alexander, 2004);
The nature of th... social setting in the classroom, as influenc.::d both by the
divisions of responsibility between learners imd t.::ach...rs in formative assess-
ment, and by the constraints of the wider school system:
TIle need to extend work of this nature to other groups, notably pupils in
infant and junior school and students in post-16, tertiary and non-statutory
assessment settings (Ecclestone et aL, 2004).
More generally, this work raises questions about the 'appliC,ltion' of research to
practice and the links betwet>n this and the professional developmt'nt of teach-
ers (Black and Wiliam, 2003). Researching how teachers take on re;earch, adapt
it and make it their own is much more difficult than researching the effects of,
for example, different curricula, class sizes or the contribution of classroom
assistants. Furthennore, the crileria applied in judging the practical v,llue of
rest!arch aligned to development can easily be milde too stringent - if, as we
believe is reasonilble, an approach in which 'the bilJance of probabilities' rath(>r
than 'bey<)nd reasonable doubt' WilS adopted as the burden of proof. then this
typo.' of cilucational research would be accepted as haVing much to say. Thus we
take issue with the stance of some policy makers who appear to want large-
scale research conducted to the highest standards of anal)'tic rationality, bul
must illso display findings which are relevant to policy. It may often be the case
that these two goals are, in fact, incompatible. To put it another W,l)', when
policy without evidence meets development with some evidence, dcvc!opm('nt
should prevail.
This chapler is baSl.'d on a story. \Ve claim that it is an important slory, in that
the succcss of the projl.'Ct that it describes helped to give impetus to the w ~ r
24
I
A5sessment for Learning in the Classroom
adoption of formative assessment practices and to recognition of their potential.
The significance for this book is that those practices, developed with the
teachers, helped to put classroom flesh on the conceptual bones of the idea of
assessment for leaming. Given that serving learning is the first and most impor-
tant purpose of assessment, this proVides an appropriate starting point for the
comprehensive picture of assessment that is developed through subseo:juent
chapters.
25
Chapter 2
Professional Learning as a Condition for
Assessment for Learning
Mary Jam" and David Pedder
Chapter I described what research has to say about assessment for learning
and how the King's-Medway-Oxfordshire Formative Assessment Proje<:t
(KMOFAP) soughllO tum ideas from research into praclice. Implications for
changes in classroom roles were made explicit. Such changes are not trivial;
they involve changes in understanding, valu(>s and attitudes as well as
behaviour. MOTCQVCf, both tt'Jchers and students need 10 change. It was
significanl that teachers in the KMOFAP were ket'n to tell their JX'tsonal stories
of change (Black el al.. 2003: 80--90). However, this raises questions about the
conditions within and across schools that enable such change to take place
and, specifically, what kind of professional learning by teachers is most
conducive to the development of Jssessment for learning practice. Such
questions were one focus of the Learning How to Learn - in the Classrooms,
Schools and Networks Project (funded by the UK Economic and Social
Research Council (ESRC)l) which sought to build on the work of the KMOFAP
and extend it.
This chapter examines further the issue of learning by teachers in support of
assessment for learning and reports analysis of a questionnaire to staff in 32
secondary and primary schools in England.2 The questionnaire explored
associations betwl'en teachers' reported classroom assessment practices and
values, and their own professional learning practices and values. The
importance of focusing professional learning on classrooms emerges strongly.
The chapter concludes by argUing for an interpretation that draws parallels
between processes of assessment for leJrning for students and inquiry-based
learning by teachers.
Assessment for learning: implications for classroom
roles
As Chaptf'r I made dear, assessment for learning carries m\lch potential for
transfonning teaching and learning processes in ways that cnhanl:(' learning
outcomes. These outcoml'S include attainments as conventionally assessed
through perfonnance tests and examinations (Black and Wiliam, 1998a) and
other valued outcomes such as the development of motivation for learning as
27
an enduring disposition (see Chapter 4). The transfonnation of classroom
processes entails change in what teachers and students do. The focus is p.lrtic.
ularly on change in pedagogical practice, though the concept of 'practice' as
understood in socio-cultural theory (see Chapter 3) involves much more than
change in surface behaviours. It implies behaviour imbued with deeper under-
standing. and values infonnro by nonns associated with particular construc-
tions of appropriate roles for teachers and for learners. Whilt makes ilsscssment
for learning both e"citing and challenging is that, to be truly cffecti\'t'. it
requires teachers and students to change the way they think ilbout their class-
room roles and their nonns of behaviour.
This was a main message of Chapter 1. In its fullest expression it gives
explicit roll'S to learners, not iust to teachers, for instigating teaching and learn-
ing. Thus students arc not merely the objects of their teacher's behaviour, they
are animators of their own effective teaching and learning processes. This has
its clearest embodiment in processes of peer and self-assessment when stu-
dents: (i) individually or collaboratively, develop the motivation to reflect on
their previous learning and identify objectiVes for new leilrning; (ii) when they
analyse and evaluate problems they or their peers are experiencing and s t u ~
ture a way forward; and (iii) when, through self-regulation, they act to bring
about improvemt'nt. In other words, they becomt autonomous, indepo.'Odent
and active learners. When this happens, teaching is no longer the sole pTL'SCf\'e
of the adult teacher; learners arc brought into the heart of teaching and learn-
ing processes and decision making as they adopt pedagogical practia'S to
further their own learning and that of their peers. [t gives the old expression of
being 'self-taught' a new meilning.
These expanded roles for learners are reflected in thl:' Assessment Refonn
Group's ten principles for assessment for learning which, although the focus is
mainly on what teachers can do, state that the role of teachers should lx, to hl'lp
students take on new roles as learners, spo.-'Cifically to:
Understand the l..arning goals th..y ar.. pursuing and to identify the criteria
they, and their peers and/or their teacher, will use for assessing progress;
Understand how they are learning as w(>\l as what th('y ar(' learning;
Reflect on their learning strengths and we.lknesses and to develop
approaches to learning that build on thes<';
Make progress through constructive formative feedback from peers and
tht'ir teilcher on how 10 improve upon tht'ir work;
Think about their learning and progress in relation to their own previous
perfonnance rather Ihan in comparison with others;
(kvelop the skills of peer and self-asSl:.'Ssment as an important way of engag-
ing in sdf-reflt'Ction, identifying the next steps in their le.lrning ilnd encour-
aging their peers to do the same.
In brief, effective assessment for learning involves radical lransfnrmiltion in
classroom teaching and I('aming through the d('\'elopment of two kcy aspects.
First, new understandings and pen;pecti"..s need to bt> developed among teach-
28
Professional learning as a Condition for Assessment for learning
ers and students about each other and, therefore, about the nature of teaching
and of learning. Second, new altitudes to and practiCt-'S of learning and teach-
ing, shaped by explidt and critically reflective modes of partidpation, need to
be acquired and implemt.-nt",d. This crudally involves developing a language
and disposition for talking about teaching and leaming. Just as such transfor-
mation requires new dimensions of studentleamins- so it is esS('ntial for teach-
ers to learn if they are to promote and support change in classroom assessment
roles and practires. One of the Assessment Reform Group's (ARC, 200201) ten
prindples makes this explicit:
Asse:;snJCIlI for I<'amitl,\; should l>t' regarded as a key profrssioual skill for leachtrs.
u'arhers rl'quirr IIII' pro!essiolHII browlrdgt' mId skills 10: pla'l for asS<'SsnJful:
obSl'n:>e Iramillg; IUmlySl' alld i"lerp,..,1 rvirl<,IICI' of Irllmillg; Sil'/' ff'f'dback 10
Il'aml'fs aurl Slll'/Hlri k'lmers ill Sl'lf-asS<'SslII""'. u'arllt'rs S/lOlIlr1 "" Sllpporled;"
d,'vl'1<lp;n.i: 1/"'51' skills tlmmsil i"ilial a"d e,mlillumg /uvfessim'liI dl't't'loimieni.
However, such learning is not a straightforw,ud maller, any mon' than it is for
students. It is not achieved simply by telling teachers what they have to do,
issuing them with ring-bind,'rs containing information and advice, shOWing
l'xamples of 'wt practice', and ....infordng thl' messages through inspection.
Leaming that im'olves radical transformation in roles always requires change
in normatiV(' orientations. This, in tum, invokes development of frameworks of
values and principles to guide action when faced with dffisions about how lx'St
to act in novel or unpTl.'(lictable situations. Rational-empirical or power-cocr-
ciw strategit,s will not do. But "ltl'mativl' normatiw-re-toducatiw "pproachL'S
(Bennis et aI., 1%1) r<'quire opportunities to try out and evaluate new ways of
thinking and pr.tctising. Thus the metaphor of 'learning as participation' may
be- important, to Sl't alongside the moT!" familiar metaphor of 'learning as acqui-
sition' of knowledge, skills and understanding (SC'l' Sfard, 1998), because teach-
ers nCt..>t:I to prachSt-' nelV roles.
This continuous 11'aming in the cont..xt of praclic(' is describo..'<i by a Sl'nior
tl'acher from une of the S<.'Condary schools in the Learning How to Learn
Project, In the following quotation from an interview shl' explains why teacher
tc"ming is l'SSl'ntial for th(' promotion of asscssmt'nt for learning:
W<,II 'I jusl 11U1'pl'u. I Ilriuk I<,I,cil,'r ICIlrt/ins is ,'s,.,,,li,,1 10 il ... I l/rillk 511me-
Iiliug needs to />t, ;m,<'SII'1f ill orderft>r III,'rc 1" 1,<' I/Ir IJulmm<'. And il lI<'I.'d.< 1/' k&I'
go;"g /1,'(IIU5I' in 11 !I'IlY SOIlU'P! Ihl'St' 1/''''Sti(HIS illl/]Iy 11"" II'S Il surl of finiS/leI!
bu>;nl'S>. 'We krww IlblmI Iilal 1I1Jll'. Yo,. ills! dl1 riell qUl'Sliouiug, do lire s.'1j Ilml
/I,'rr IISSl'SSIIII'III, do 11'/IIII<"1'1.'r il mislit IJe. fudlmck I..illroul marks amllhal's all
51"I/'d. ' Wdl il ;,11'1, allli IIral '5 aClually al III,' hl'llrl of il. 11', a d,'lldolJllIell I of p.'r-
Ilumenl "-jlrel iOIl IIl1d r<,fi"emelll amI III<,re's 110 <'11I1 if you like. YOII call'I n'aclr Iile
slas" u,I,,'r,' you C/lli say 'OK, dIm,' tlml, gill tlwn', ,;orlt'd, I'w dOlle Illy Ifllmillg
alld IIOW I'm leadlillg OK.' TIlt' s!lm/ld b,' likl' Ihalllllyway IIl1d
,ll> IJrtifc'Ss;PII sl"mld, aClually. For PIIt' tllillg 1/", /'xlarml cirflulISlallcrs are elmn:;:-
illS, bUI IIlw, wlllll n'l.' ,wdt'rslalld aboul leaming is dlllllSillg all III,' lilli,'. /l"s
29
,
what we understand professionally, lOt 115 II profrssirm of teachers, or
what we as il/diuidlllds rmdeTsIlll/d, or what we with 1/bJ1 particular group of stu-
dtnts UndtTStll1Jd. It keeps clumging; it's Il dynamic pruass. Aud 1know that
Ilppronches which WQfk with oue cltw mlly treed to /It subtly dumged in order /0
WIITk with anolher rlass. So if you like, what you'rt' learning is judgment lind
you'u ltilming mort! uhoul how to judge wilhin II set of principles.
Questions for research
The of the argument so far can be summarized as follows: (i) the effec-
tive promotion of assessment for learning in classrooms requires a radicallrans-
formation of both leaching and learning roles; (ii) this requires oonsid.nable
innovation in teachers' practices; (iii) teachers need 10 learn Ih!'$e new practices;
(iv) this, in tum, needs to be encouraged by a supportive culture for continuous
professional learning that gives teachers permission and opportunity to
develop critically reflective modes of participation, for themselves and for their
students.
An investigation of these links between changes in classroom assessment
practice, teachers' learning and cultures for continuous professional develop-
ment is at the heart of the Leaming How to Learn Project. For the purposes of
the argument we want to develop in this chapter, we will draw on evidence
from one particular dataset to explore a main empirical rt.'SCarch question
arising &om the discussion above: Whal kinds of relationships are there
between teacher learning and classroom assessment practice? More specifically:
How are different aspects of teachers' classroom assessment practices influ-
ena!d by teacher learning practices?
Which teacher learning practices are most strongly related 10 teachers' class-
room assessment practices?
How do teachers and school managers construe the relationship between
teacher learning and the promotion of assessment for learning?
Evidence from responses to the Learning How to Learn staff
questionnaire
The Learning How to Learn Project sought to develop understandings of the
organizational conditions of schools in which teachers are successf\l1 in devel-
oping students' knowledge and practices in learning, and in learning how to
learn through assessment for learning. This reflected our aspiration to bring
insights from classroom-level and school-le\'el research into closer alignment
with one another. James et al. (2iXl3: 3) argue that such alignment offers our best
chance of furthering understanding of efft.'Ctive learning.. its nature, the teach-
ing practices that promote it and the kinds of profes.sionallearning and institu-
tional conditions that help teachers to adopt new practices. These avenues of
ill\'estigation constitute an enonnous range of inquiry and much more than we
30
can report here (see www.\eamtoleam.ac.uk for other publications from the
project as they emerge). However, one salient aspect of this inquiry was our
attempt to conceptuallze teacher learning practices and values. We wanled to
find out how teachers value and practise professionalleaming (see redder et
aI., 2005, for details about the literature and findings) and how teacher learning
practices and classroom assessment practices are related. This second question
is the particular focus of this chapter.
Here we draw on the first (baseline) administration of the 'staff question-
naire' - a survey intended for all staff (managers, teachers and teaching assis-
tants) in project schools. The use of this questionnaire enabled us to make some
generalizations across our sample and insofar as we had over 1,000 responses
from teachers across 32 primary and secondary schools, they may be represen-
tative of the views of teachers in other schools in England. Unlike observation
data, which the project also collectt'd but from a smaller sample of teamers, the
questionnaire relied on self-report of practices and values. We acknowledge the
possibility of a discrepancy betw!"t'n what teachers say they do, and what they
actually do, although other evidence from this SUf\"CY suggests they were open
and honest about values-practice gaps (redder et aI., 2(05).
The staff questionnaire was designed to generate systematic data on teach-
ers' views about classroom assessment, professional learning and school man
agement, and to enable analysis of the pattems of associations in these three
areas. It therefore consists of three sections: Section A with 30 statements about
classroom assessment; Section Bwith 28 statements about teachers' professional
leaming; and Section C with 26 statements about school management practices
and systems. The focus in this chapter is on data generated from sections Aand
Bof the questionnaire - and the relationships ~ t w n them. Some 1,397 teach-
ers and managers at 32 schools were included in the first administration of this
questionnaire and 1,018 completed questionnaires were returned, representing
a return rate of 73 per cenl.
Section A: classroom assessment
Staff were asked to make two kinds of responses3 to each of the 30 question-
naire items in Section A. The first response focused on assessment practices. We
asked respondents to tell us, with reference 10 their own practices in the case of
teachers with no managerial responsibility (It .. 558) or with reference 10
perceived practice across the school in the case of senior and middle managers
(n - 460), whether particular practices were never true, rarely tnu.', often true or
mostly true. They were then asked to make a second response, this time about
their values, and to indicate how important they felt any giwn practice was in
creating opportunities for students to learn. The response categories were: not
at all important, of limited importance, important or crucial. Afifth option was
providt'd enabling respondents to record particularly strong negative value if
they considered the statement to constitute bad practice. Figure 2.1 below pro-
vides an illustration of the dual-format scales which enabled us to compare
practices (scale X) with values (scale Y) and identify practice-value gaps.
31
I
Figure 2.1: Dual S(lllt jrmnlll far A of 1M ItIlc/tn qutstionnll;'"
I
Me1iorl "
._...
__men!
p<Kticft
,,.,..,.
guld....... 10
help my
"udmu_

WDrk.
5<.1. Y
II.... ialpo<t.>nt H< you.............1pt_r... <1ulinl
.. f'" 'lud,nt> 101,..,1
(About your nl_)
NO! .t .u Of limi,'" Imponan' Crucul Ilod
imponam
In formulating Section A items, WI' operationali.zed conceptual and empiricitl
insights from the assessment literature. We drew on distinctions between assess-
menl ifleaming (summalive assessment) and assessment for leaming (formati ....e
assessment) and associations with performance versus leaming (mastery) goal
orientations (Ames, 1984; Dweck, 1986; Watkins, 2lXXJ) as well as notions of con-
vergent and divergent approaches 10 assessment (Torrance and Pryor, 1998) and
constructive aligrunent of pedagogy and assessment (Biggs, 1m). Particular
attention was given to assessment fvr learning practices such as planning.. ques-
tioning. reedbad. sharing objectives and criteria, pt"'r and St'lf.evaluation (Black
and Wiliam. 1998a; ARC, 2(X)2a), underpinned by constructivist and social con-
structivist models of assessment and learning (see Qapter 3).
Section B: teachers' professional learning
As in Section A, teachers were asked to make hvo kinds of response to each of
the 28 items in Section B. All teachers and managers were asked to make judge-
ments abol.lt current practice, not by reference to their own practices, but
according to their per'ptions of these practices as: true of no staff. few staff,
some staff or most staff. Respondents could choose to tick a fifth box labelled
'don't know'. Teachers were also asked to make a second response about their
own values using the same Ukert scale as in Section A, scale Y.
Again, theoretical and empirical insights influenced the construction of ques-
tionnaire items. On the basis of a review of over thirty years of research into
teachers' professional learning.. four hypotheses were developed: (i) teachers'
leamirlg is an embedded fealure of teachers' classroom practice and reflection;
(ii) tearners' learning is extended through consulting different sources of
knowledge; (iii) teachers' learning is expanded through collaborative activity;
(iv) teachers' learning is deepened through talking about and valuing learning.
By construing teacher learning in tenns of these four characteristics we w('re
able to develop questionnaire items to operationalize them (see Pedder et aI.,
2005, for details). This enabled Section B to reflect a balanced interest in
32
I
Protessionallearning as a Condition tor Asseument for learning
individual and social, interpersonal and intra personal, public and private
processes of teacher learning.
Dimensions of classroom assessment and teacher learning
The data generated by the total of 58 items in two Sfftions of our questionnaire
provided opportunities for different kinds of analysis. However, in order to
relate responses in Section A to those in Section B we found it helpful to reduce
the data through factor analysis. Factor analysis was carried out separately for
each SCC1ion and was baS<.>d on respondents' practice scores (derived from scale
X). Analysis of Section A data yieldl-'ti three factors wilh four from the Section
B data. Cronbach's alpha values dl'monstrated that the internal consistency of
the main factors was good (values grealer than 0.7) with, as might be exp..'Cled,
th(' last filctor in eilch group being a liltle less slable. The items underpinning
dimensions to classroom assessment practices (Soxtion A factors) are set out in
Tilbles 2.1 to 2.3. It should be noted that our questionnaire attempted to capture
a wide range of classroom asse-ssment praclires, although it was pel'O?ptions of
formative assessment that were of particular interest. The three factors were
interpreted as: making learning explicit; promoting learning autonomy and
perfomlance orientation. The interpretations and the constituent il('ms of the
factors are set out in thr('(' tilbles as follows:
Table 2.1 'Making uarning Explicit' (Far'"r AI) i'rms
Foctor AI: Making Learning uplicit: eliciting. darifyin& and responding to e...,d....-.::e of
Iuming; ....or\:jng with Sludents to develop a positiV<' learning orientation (alpha 0.73(2)
Item r-;o
10
"
"
15
"
18
'"
2J
Item T..xt
Assessmffil provides me with usefulevid..nce of stud..nts' Und..rst.lndings
which they use to plan subsequffitlessons.
Students are lold how .....ell t""y have done in ..,lation 10 their Own p..,,,i(>Us
performance.
Students' l..aming obj..d;vcs are di!;CU,5Cd wilh sludmls;n ways lhey
understand.
I idffitify ,ludenls' strengths and ad...iselhem on how IQ d.....elop them further.
Studenl.'i are h.. lped to find ways of addn-s.sing probl..ms they ha,'" in I""it
learning.
Student. are encouraged to ...iew mistakes as ,'a1uabl.. ltamlng opportunities.
I use questions mainl}' 10 elicil reasons and expl"",,liom from students.
StudffilS' errorsare valued for the insights they rewal aboul how students
an! thinking,
Studt"llt. are helped 10 und..rsland the learning purpose!! of each lesson or
series of ~
Pupil effort is seen as important when assessing lheir learning.
33
I
Table 2.2 'Promoting !.etml;llg AI'tonomy' (FlJClur A.U items
FiKtor A1:. Promoting Learning Autonomy: widenins of 5rope for students to lake on greatN
independl1'lCl> Over their learning obje<:tives and the dS$eS5menl of their own and each other'
work (alpha- 0.7(36)
IlemNo
6
13
19
"
"
Itl.'ll1 Text
Students are given opportunities to decide their own learning objectiH'S.
I provide guidance to help students l\55IISS Ihftr own work.
J provide guidance to help students 10 assess one another's work.
I provide guidance to help students ;0._, their 0"'0 leaming.
Students are givCII opportunities to aS5l'SS one another's work.
Table 2.3 'Ptrformanrt Orinltation' (F,u:lur All iltms
FilCtor A3: PerfomlMCl' Orienliltion: concern to help students comply ",ith performance goals
precribed by tlw curriculum,. through closo>d questioning.. and measu-J by marIr.s and grades
(alpha" 0.5385)
Hem No
2
,
7
8
12
Item Text
The next Il'55Ollls dctl"llllined more by the prescribed curriculum than by how
well students did in the last lesson.
TIle main t'1llphasis in my assessments is on wllether students know, understand
or can do prescribed elements of thccurriculum.
r u,," questions mainly to .,licit factual knowledge from my students.
I ronsider W most worthwhile assessment to be assesSmffltthat is undertaken
by the teacher.
Assessment at students' work consists primarily at marks and gradl"S.
For Section 5, four factors were developed from the teachers' responses to Items
about their proft.>ssionalleaming practices. The four factors were interpreted as:
inquiry, building social capital. critical and responsive learning a.nd value learn-
ing. The interpretations and the a)nstituent items of the factors are set out in the
next four tables (Tables 2.4 to 2,7).
Relationships tMtw..n assusment praeticu and teacher learning
pt'"aetices
Three separate multiple regression analyses were carried out in order to explore
relationships between threoe teacher learning practice 'independent' variables;
51: inquiry; 52: building social capital and 53: critical and responsive learning
(the fourth variable, valuing learning. was excluded from this analysis because
the scores were not normally distributed) and each of the Ihret' classroom
assessment practice 'dependent' variables; AI: making learning explicit; A2:
promoting learning autonomy and A3: performance orientation. We wanted to
find out how much variance in each of the three classroom assessment practice
34
I
Table 2.4 'Inquiry' (faclor HI) it<'lll$
Foctor 81: lnquiJy: l1Sing and responding to different 5OIIfO'5 of ffidenao; c.1rT)ing out joint
with rolk.'agues (alpha ..
Item No
,
l
5
6
12
15
IIl"m T<");!
Staff draw on good practU from othI>r as. means to furth.er their OWn
profe'!lSionaI development.
Staff read rrsl'an::h rqx>rts as one of useful ideas for improving their
J""dire.
Sta/f use the web as one of uscrul ideas for improving their practice.
Students ar<' roffiulted about how !hey learn mosc effectively.
Staff relate what works in thrir own practire 10 rrsl'an:h findings.
Staff modify their praetio' in the Iighloi published _rch
Staff C<lrT)' out joint reseilrch/evaluation with oro!' or more roIleagues as a "'ay of
improving
Table 2.S 'Building SociIlI Ofpi/al' (flldo. 82) ilems
FilK:lor B2: Building !IOl.ia1 learning, working, supporting and talking "1th eh other
(alpha .. 0.7476)
hem No
16
19
20
21
"
2J
"
JIl"m Tellt
Staff regularly collaborate 10 plan their teiIK:hing.
If staff have I problem l'>1th their teaching they usually tum to colleagues
for help.
suggest ide.IS or appr0ache5 for ro!kagues 10 by out in class.
Teachers make roIltive agn.-ements 10 ltSl out new ideas.
TeolChers dis<:usa openly with rollugues what and how !hey are learning.
Staff mqu......t1y use informal opportunities to discu5l; how studmts leam
Staff offer one another reaS5Urance and support.
variables W3S aCCOUnted for by our selectL'd teacher learning variablcs. Separate
models were tested for teachers without managerial responsibility and for
middle and senior managers.
Results for teachers without managerial responsibility
As Table 2.8 shows, the teacher teaming practices, taken together, accounted for
a rather low proportion of the variance in all three variables of classroom
assessment practices.
35
Asseument and Learning
Table 2.6 'CrI'licQI Qnd UIlTni"K' (fQctor BJJ ;Itms
helor 83: Critical aM responsive learning: through reflection. experimentaliol
and by responding to fl?l!dback {alpha 0.7573)
lll.'m No
,
,
10
13
14
ltern Ton:t
Staff a"" able to sec how priJCtim> th<Jt wurk inone <'OIltext might be adapted to
othPr rontexts.
Staff rt'ned on thPir practice as a way of idl'1ltifying professiOlllllleaming lll't'\ls
Staff "penment with their prac:ti as a conscious strategy fur improving
classroom teaching and learning.
Staff modif)' thPir priOCtke in the light of {erobitck from their students.
Staff modify their priOCticfo in the light of from of their
classroom practice.
Staff modify thPir in the light <X evidence from eo.'aluations of their
classroom pra.clire by manal\ers or othl'r rolleafiUl'S.
Table 2.7 Wlr.. ;"K' (fllCtor 84) ilrms
Factor 83: Valuing learning: believing that all .sllKknts art' rnpabll' of leaming: the provision by
ll'achers of an affective eT1vironrnent in which students ran take risks with their Ie.1ming, and
teachers <'OIlrnbuting to the learning orientation of their i1Chools by idmtifying t!ll'mselve as
well as lheirstudents as learners (alpha 0.6252)
1
25
26
27

Staff as well as students learn in this i1Chool.
Staff believe that aU students all' rap"bIe of learning.
Students in this JKhoo:lI enjoy learning.
Pupil success is regularly celebrated.
Table 2.8 proportio" of uorriQ"u in l'ilC" of tilt t"t dimtnsions of dllSSlOOm ilS$l'SSmtnt
IlroJunlMjoT IIy II'OCMr ll'<lrnillg prllCtim; (ttlK"rn' riSpOli54'S}
Variables
INming explicit
Promoting IHming autonomy
Pcrfurmance orientation
Variance (%)
13.0
'.0
0.1
However, when we compared the strength of association between each of the
teacher leaming variables and each of the classroom assessment variables we
found thaI 'inquiry' (teachers' U5eS of and responses to evidence, and their col-
laboration with colleagues in ;Oint research and evaluation activity) was most
36
Professional Learning as a Condition for Assessment for learning
strongly associated with 'making learning explicit' and the 'promotion of learn-
ing autonomy', 'Building social capital' and 'critical and responsive learning'
had only weak and non-significant associations with all three classroom assess-
ment variables. None of the teacher learning variables was significantly related
to the 'perfonnanre orientation' variable.
Result5 for middle and senior managers
As Table 2.9 shows, when we analysed the school managers' responses, the
thrl'l.' teacher l{'aming variables accounted for much more of the varianre in
each of the three classroom assessment practices than was the case with data
from teachers without managerial responsibility.
Table 2.9 The proportion Of V1lrillncr iu each of tile tllree dime'lsious of c/IlSSTOOm
aSses5mflllaccou,lIedfor by teacher leaming practices (managers' respouses)
Dependent Variables Varianu ~
Making learning explicit 36
Promoting leaming aulooomy 29
Perlonnance orientaliOll 7
Independ<'fll variabl"": ilKluiry. building oocial capital critical and rcsponsi..... If'aming
All teacher teaming variables were significantly associated with 'making learning
explicit'. 'Critical and responsh'e learning' and 'Inquiry' had the strongest associ-
ations. 'Inquiry' was the only teacher leaming variable that was significantly
related to the 'promotion of learning autonomy' and here the relationship was
strong. 'Building sodal capital' and 'critical and responsive leaming' had only
weak and non-significant relationships with 'perfomlance orientation'.
Discussion
A number of themes of considerable interest arise from this an;dysis. First, there
are the differences betwi'en the responses of teachers and managers. There is a
stronger relationship between managers' perceptions of teacher learning prac-
tices and classroom assessment praclic.-'S in their schools than there is between
'ordinary' teachers' perceptions of learning by teachers in their school and their
own classroom practires. On the surface this might be expected because man-
agers with some responsibility for the work of others are likely 10 po:orceh'e these
aspects more clearly, or may even wanlto 'talk Ihem up'. However, we have
been cautious about coming to any such conclusion because teachers and man-
agers were asked different questions about classroom assessment (related to
teachers' own practices but managers' perceptiOns of others practices) which
render direct comparisons problematic. Also we know that a few managers,
37
including one head teacher, either through choice or misunderstanding, actu-
ally completl..od the teachers' version of the questionnaire.
More interesting are the areas where results for both teachers and managers
are similar, although the strength of association differs. Four points are worthy
of note, First, the three teacher learning variables account only poorly for the
\'ariance in 'performance orientation' in classroom assessment. This suggests a
weak association and might reasonably lead to a conclusion that any perform-
ance orientation in classroom assessment derh'es very little from teachers'
l"arning practices, and probably owes more to structural constraints in the
environment such as curriculum prescriptions and performance management.
Second, and in contrast, teacher learning variables do seem to account for
variance in the two classroom assessment variables most closely allied with
assessment for learning: 'making learning explicit' and 'promoting learning
autonomy'. The fact that the strongest associations are with 'making learning
explicit' is not surprising. Although this 'baseline' questionnaire was
administered in 2002, before project development work was properly begun in
Learning How to Learn schools, there was already considerable activity in
England under th" banner of assessment for learning: some stimulated by
national initiatives such as the DfES Key Stage 3 and Primary Strategies, and
some promoted by researchers, such as the King's College, London assessment
group (see Chapter 1), or by, consultants (Sulton, 1995; Oarke, 1998, 2001). The
national strategies put particular emphasis on making learning objectives and
success criteria explicit. as does Clarke in her advice to primary schools.
TIUrd, the relationship of 'inquiry' to 'promoting leaming autonomy' is par-
ticularly interesting. The strength of the relationship is not as strong for tcach-
ers as it is for managers but the clear association suggests that teachers' uses of.
and responses to, different sources of evidence (from more formal research and
their own inqUiries) together with their collaboration with colleagues in joint
research and evaluation acth'ity, are important for the development of assess-
ment practices that lead to autonomous, independent and acth'e learning
among their students. This insight may be a key finding because other analyses
of Section A data (to be reported in detail in other publications) suggest that
'promoting learning autonomy' (for example, giving students opportunities to
decide their own learning objectives and to peer and self-assess) was the dimen-
sion of claSS1lXlm assessment practice that teachers were having the greatest
difficulty implementing, despite believing illo be important or crucial.
Final1y, and perhaps surprisingly, 'building social capital' does nol appear to
be strongly related to change in class1lXlm assessment practice, at least not in a
straightforward, linear kind of way. This implies that teacher learning practices
focused on building social capital through, for example, team building,
networking, building trust and mutual support, may be of limited value
without a clear focus on specific changes to be brought about in classrooms. In
other words, processes m.oed content (a point made in Otapter 1). So, we may
need to be cautious about allocating time, energy and resources to building
social capital if it lacks an explicit classroom focus. Indeed, teachers might
develop and use social capital as a 'polite' way of protecting their classroom
38
privacy. By agreeing to collaborate with colleagues in 'safe' aspects of their
work,. such as giving and receiving moral support, exchanging resources and
the like, they can effectively keep coUeagues at 'arm's length' from the class-
room issues that really need attention. In particular, classroom-based modt'S of
collaboration can be avoided.
Conclusion: the centr.llty of le.rnlng by te.chers for the
development of .5Hument for lurning
These interpretations of results from the Learning How to Learn Project cany
thl'('(' strong messages for teachers' professional learning if it is intendl'<l to
support the developmffit of assessment for learning. First, classroom assess-
ment for learning practices are underpinned most strongly by teachers' learn-
ing in the contexts of their classrooms with a clear forus on change in teacher
and learner roles and practices and on interactions between assessment, cur-
riculum and pedagogy. Insights from work in other projects, as dC$Cribed in
Chapters 1 and 5, are thus corroborated by evidence from the Learning How to
Learn project's survey of 1,000 teachers. This implies that programmes of pro-
fessional development, whether school-based or course-based, should be
focused on classrooms and classroom practice.
The growth of interest in 'research lessons' (Stigler and Hiebert, 1999) offers
one possible approach. The idea derives from Japan where teams of teachers
identify an aspect of thl!ir teaching which is likely to have an impact on an area
of need in students' learning. They spend between one and thret' years working
in groups, planning interventions which may prove effective, closely observing
these 'research lessons' and deconstructing and writing up what they learn -
from failures as well as successes. At the end of a cycle of studies they may
teach a 'public research lesson' before an audience of peers from local schools
and colleges in order to share the practice and widen the critique. These studies
are Widely read by Japanese teachers who contribute more than 50 per cent of
the educational research literature produced in the country (Fernandez, 20(2).
Lesson study has been developed in a number of locations in the USA over the
past seven years. It is also used in the National Collo:>ge for School Leadership's
Networked Learning Communities projects in England and is the particular
forus of a research training fellowship linked to the Learning How to Learn
Project (Dudley, 2(04). There are other possible approaches, of course, and the
Learning How to Learn Project will report some of these in other publications.
Second, as tne above account of research lessons demonstrates, both n ~
vidual and social processes of teacher learning are to be valued. Our survey
findings indicate that both are important conditions for the promotion of
assessment for learning in classrooms. This justifies the approach taken in thc
KMOFAP and the Learning How to Learn Pro;ect in providing. or encouraging.
opportunities for teachers to learn together in in-school teams, departmental
teams and across-school groups.
Third, if 'promoting learning autonomy' is the ultimate goal but the greatest
challenge, as our evidence suggests it is, and if 'inquiry' approaches to teacher
39
learning are productive in this respect. then more emphasis needs to be placed
on providing opportunities and encouragement to teachers to engage with and
use research relevant to their classroom interests, and recognizing the value of,
and supporting, teachers' collaborative inquiries into their own practices. The
first was a strength of the way that the findings of the 1998 Black and Wiliam
review of classroom assessment l'E'search was disseminated to teachers and
used as a basis for in-service work (see Chapter 1). The second reflects
Stenhouse's (1975) belief that 'it is teachers who, in the end, wj]] change the
world of the school by understanding it' and that a 'research tradition which is
accessible to teachers and which feeds teaching must be created if education is
to be significantly improved' (p. 208). He argued for teacher research on the
grounds that, 'It is not enough that teachers' work should be studied, they
need to study it themselves' (p. 208). In the thirty years since he wrote these
words, many forms of teacher research and inqUiry have flourished, some
more focused on student learning than others. Our research suggests that
classroom-based teacher research and inquiry is not only an important strand
of teachers' continuing learning, as Stenhouse argued, but also an important
faclor in helping students develop independence and autonomy in their
learning. The explanation for this might be quite simple, yet profound. If
teachers are prepared and committed to engage in the risky business of
problemattr:ing their own practice, seeking evidence to evaluate in order to
judge where change is needed, and then to act on their decisions, they are thus
engaging in assessment for learning with respect to their own professional
learning. Helping students to do the same with respect to their learning
bt!comes less challenging because teachers are familiar with the principles and
processes through inquiry into their own practices. In other words, they are
well on the way to conceptualizing, developing and valUing expanded roles
for themselves and their students in teaching and learning.
Assessment for learning and inquirybased learning by
teachers as parallel processes
In a prt'Sl'ntation on the theme of 'Conditions for Lifelong Learning', to an audi-
ence of policy makers from the UK Department for Education and Skills and
related agencies in July 2003, we argued that teachers and schools needed to
develop the processes and practices of learning how to learn if they are to create
the conditions for students to learn and to learn how to learn. We saw assess-
ment for learning at the heart of this. For teachers this implied that they need
to: valul' learning and engage with innovation; draw on a widl' range of evi-
dence, for example peer observation, student consultation, research results and
web resources; reflect critically on and modify their practice; and engage in
both individual and ooUective learning, in an atmosphere of confidence that
they can help students improve. We made reference to the Assessment Refonn
Croup's (ARC, 2OO2a) definition of assessment for learning and argued that:
'Whether learners are students, teachl'rs or schools, learning how to learn is
40
achieved when they make sense of where they are in their learning.. decide
where they need to go, and how best to get there'. We hypothesized that the
processes are parallel for both students' learning and teachers' learning. The
evidence of analysis of responses to the learning How to learn Project's teacher
questionnaire, suggests that this hypothesis was well founded.
Implications
In the context of this book, the implications of this study are substantial
because there is still evidence that classroom assessment practices need to be
improved. The Annual Report of Her Majesty's Chief Inspector of Schools for
2003/04 states that teaching and learning could be improved by the better use
of assessment in primal')' schools. In secondary schools the situation is e\'en
worse: 'The use of assessment in meeting individual students' needs remains a
weakness generally and is unsatisfactory in well over a tenth of schools' (see
OFSTED, 2005). If this situation is to be remedied, our evidence suggests that
proper attention needs to be given to teachers' professional development in
this area. However, current approaches to 'rolling out' the lessons from
effective small-scale research and development may not meet the need. As the
Learning How to learn Project has discovered, whilst some changes in
practice, such as sharing learning objecti\'es with students, have been achieved
by many teachers there is a danger that these can remain at the level of surface
changes in procedures. Oc<>per changes, and those that are perhaps even more
fundamental, such as promoting indept!ndent and autonomous learning,
remain more difficult. The5e changes do not happen through the agency of
consultants and the distribution of ring-binders fuIl of material, although
these can have a role. In the end, change can only be embedded if teachers
actively engage with the ideas and principles underpinning the advocated
practices and if the environments in which they work are supporting such
engagement.
Our evidence suggests that the encouragement and support required to
make classrooms the focus and site of individual and ooUective professional
development are vital, In this way teachers can, in the context of reflective prac-
tice, develop the principles for action that provide frames of referenCt' when
confronted with unexpected circumstances. For this reason, an inquiry-based
approach is also vital, although it has profound implications for school leader-
ship and policy more widely, Not only do teachers need access to relevant
knowledge llJ\d skills for drawinS on research and developing their own
inquiries, they also need pennission to experiment. and occasionally to fail, and
then to learn from these failures. In the current climate of accountability this is
difficult - but not impossible. As one infants' teacher involved in the Learning
How to Learn Project claimed:
TIu! focus on learning how /0 learn enabled professiollill dialogue /0 flourish, p ~
mo/ed col/abora/IV( learning opportunities for childrtn lind lIdul/s lind d(l)t'/oped II
darer ulldffSlaudillg of some of the f f l ~ n l s Ihllt contribute 10 suuessful
4'
I
leaming. It has bee,. one of tit.! most powerful professional dev;'lapmmt opportu-
nities in my raret'r and has my teaching and ll'amillg.
Likewise, a serondary school head teacher;
Assessmtnl for learning has bmt II joy. 11 is intellectually profound, yet eminently
practical and acclSsible. {II/ hils enhanced 1M learning oills al/. f have lUI dOI,bt
l!ul! Qur childrm art now bettt1' taught than ever before. /I has bern the IJi.sI tdu-
elltiollal dwtlapmelll of my career.
What is particularly interesting about these two statements is that neither
dearly distinguishes between the learning of students and the learning of teach-
ers. The two are closely associated. Moreover, the classroom and school bel:ome
the environment for effective learning by both groups, and assessment for
learning is at the heart of this.
Notes
I The Le;llming How 10 u-am - in Classrooms, Schools and Networks Project
was a four year development and research project funded from January 2001
10 June 2005 by the UK Eronomic and Social Research Council as pari of
Phase II of the Teaching and Learning Research Programme (see
http://www.tlrp.org ). The Project (ref: L139 25 1020) was directed by Mary
James (Institule of Education- University of wndon) and co-directed by
Robert McConnick (Open University). Other members of the research team
were Patrick Carmichael, Mary-Jane Drummond, John MacBeath, David
Pedder, Richard Procter and Sue Swaffield (University of Cambridge), Paul
Black, Bethan Marshall (King's College, wndon), Leslie Honour (Vni\'ersity
of Real:ling) and Alison Fox (Open University). Past members of the team
were Geoff Southworth, Uni\'ersity of Reading (until March 2002), Colin
Conner and David Frost, University of Cambridge (until April 2003 and April
2004 rt"speelively) and Dylan Wiliam and Joanna Swann, King's College,
London (until August 2003 and January 2005 respedively). Further details
are available al hltp:llwww.leamtoleam.ac.uk.
2 Forty-three schools were inilially recruited to the project from five local edu-
cation authorities (Essex. Hertfordshire, Medway; Oxfordshire, Redbridge)
and one virtual education action zone (Kent and Somerset VEAZ). During the
lifetime of the project five schools withdrew. The criteria used for selfftion of
schools were:
A willingness by schools to be involved in the for the durOltion,
and actively to contribute to project ideas;
Six schools to be chosen from each main LEA (fewer for Redbridge and
the VEAZ) with the proportion of one se.::ondary school to two primary
schools, preferably in clusler groups;
A range of contexts to be represented in the overall sample: urban/rural;
small/large; mono-ethnic/multi-ethnic;
42
Schools' pl'rformance at one or two key stages to have been allocated a 'C'
benchmark grade, in !he Office for Standards in Education Performam:!"
and Assessment (PANDA) Report at the beginning of the project, thai is,
based on their results in 2OCXJ. This is a crude measure of 'averagely' pl'r-
fonning schools. Not all schools in Rcdbridge, which were added to the
sample in response to a spedal request from the LEA, conformed to this
criterion;
Schools to be within a reasonable distance from the unh'ersity
bases of researchers.
3 1lle dual scale fonnat adopled for all three sections of this questionnaire was
shaped by assumptions similar 10 those thai informed the design of the
'teacher questionnaire' used in the Improving School Effectiveness Project
(lSEP) (MacBcath and Mortimore, 2001).
43
Part II Theory
Chapter 3
Assessment. Teaching and Theories of Learning
Mary James
The discussion of formative ilssessment practice and implications for leachers'
professional learning.. in ChapteTS I and 2 draws attention to the dose rela-
tionship between assessment and pedagogy. Indeed, the argument in bolh
chapters is that effective assessment for learning is central and integra! to teach-
ing and learning. This raises some theoretical questions about the ways in
which assessment and learning are conceptualized and how they articulate.
nus chapter considers the reliltionship belwr.'Cn assessment practice and the
ways in which Ii'll' processes and outCOffi(,S of learning are undt'rstood, which
also has implications for Ii'll' curriculum and teaching.
Starting frum an assumption that there should be II degree of alignment
between assessment OUT understanding of lo.'arning. a number of different
approaches to the practice of classroom assessment are described and analysed
for the !X'rspediws on learning that them. Three clusters of the-ories
of learning are identified and their implications for assessment practice are dis---
cussed. The point is made that learning theorists themselves rarely make state-
ments about how learning outcomes within their models should be assessed.
This may account for the lack of an adcqu<lte theoretical base for some assess---
ment practiCl.-"S and, conversely, for a lack of development of assessments
aligned with some of the most intl'fCsting nl'W learning theory. The chapter con-
cludes with a discussion of wheth('r eclectic or synthetic models of assessments
matched to learning arc feasible. The intention here is to treat the concepts
broadly and to pruvide a basis for more specific consideration of particular
issues in the two dmpters following this one, and indf.>ed in the ro.>st of the book.
Thus Chapter 4 the role of assessment in motivation ror learning <lnd
Chapt!!r 5 on the thL'Ory uf fonnative assessment.
Alignment between assessment and learning?
The alignment (Biggs, 1996; Biggs and Tang. 1997) of assessment with learning.
teaching and content knowledge is a basis for claims for the validity of assess---
ments (see Chapter 8), but the relationship is not straightforward and cannot be
taken for granted. Indeed there are plenty of examples of assessment practices
that have only tenuous or partial relationships with current understanding of
47
I
\
.i
h
\.'k' h .
cammg wlI In parhcu aT omams. ,;1 c, or mstance, S ort answer lests In
science that require a recall of facts but do not begin to tap into the under-
standing of concepts or the in....estigative pl'OCE'SSeS, which are central to the
'ways of thinking and doing' (Entwistle, 2005) thai characterize so:;ience as a
subject>discipline. Nor do assessment practices always take sufficient account of
current understanding of the ways in which students learn subject maller, the
difficulties they encounter and how these are overcomt>.
Historically, much assessment practice was founded on the conlent and
methods of psychology, the kind of psychology especially that deals with
mental traits and their measurement. Thus classical test theory has primarily
been concerned with differentiating between individuals who possess rerlain
attributes, or in d('termining the degree to which they do so. This 'differential-
ist' perspedive is still very evident in popular discourse {sec for ('xample,
Phillips, 1996). The focus tends to be on whether some behaviour or quality can
be detected rather than the process by which it was acquired. However, during
the twentieth century our understanding of how learning occurs has developed
apace. It is no longer secn as a private activity dependentlargel)', if not wholly,
on an indh'idual's possession of innate and usually stable characteristics such
as general intelligence. Interactions between people, and mediating tools such
as language, now seen to ha\'(' crucial roles in learning. Thus the assessment
of learning outcomes needs to take more account of the social as well as the
individual procCSSl'$ through which learning occurs. This ret:]uires expansion of
perspedives on learning and assessmt'nt that take more account of
from the disciplines of social-psychology, and anthropology.
Similarly, insofar as assessments are intendt'd to asS(>ss 'something'. thai is,
somt' content, account needs to be taken also of the way the subrect domain of
relevance is structured, the key concepts or 'big ideas' associated with it, and
the methods and prOCt'sses Ihal characterize practice in the field. This is an
important basis for construct validity without which ilssessments are valueless
(sec Chapter 8). This "'-"quirement implies some engageml'nt with idl'ils frum
the branch of philosophy that deals with the nature of knowledge, that is, epis-
temology. Thus psychological, social-ps)'chological, sociological and I'pistemo-
logical dimensions all need to be taken into consideration al some level in the
framing of assessment practice. This is no easy task for assessment experts and
may seem far too an expectation of classroom teachers; yet one
exped their training to provide them minimall)' with pedagogical content
knowledge (Shulman. 1987), a basic understanding of how people learn (Ieilm-
ing theory), and some assessment literacy (Earl et aI., 2000) in order to plltthese
things together. The difficulty, in the climate that has developed around initial
teacher training over the last fiftet'n years, has beo!n the reduction of teaching to
a fairl), atomistic collection of tcrhnical competences. This is antilheticilllO the
s)'noptic and synthetic apprwch that teachers may nCi:'d to acquire in ordl'r to
align their teaching and assessment practice with th..ir undl'rstanding of learn-
ers, learning and subiect knowledge.
Teachers are not helped by the fact that formal external assessments - often
with high stakes allached to them - are oHen not well aligned either. Whilst
48
Assessment, Teaching and Theories of learning
exdting new developments in our understanding of learning unfold, develop-
ments in assessment systems and technology sometimes lag behind. Even some
of the masl inJlO\'ative and novel developments, say, in e-assessment, are
underpinned by models of learning that are limited or, in some cases, out-of-
date. This is understandable too oc>eause the development of dependable
assessments - always an important consideration in large-scale testing -is asso-
dated with an elaborate technology which takes much time and the skills of
measurement experts, many of whom haVing often acquired their expertise in
the very specialist field of psychometrics. This is especially true in the USA
which has a powerful influence on other anglophone countries (s.ee Chapter 10).
In this book we are primarily interested in classroom assessment by teachers,
but research tells us that teachers' assessment prama- is inevitably influenced
by external aSSl'SSment (Harten, 2(04) and teachers often usc these assessments
as models of their own, even if they do not use them directly. By using models
of assessment borrowed from elsewhere, teachers may find themselves sub-
scribing.. uncritically or unwittingly, to the theories of learning on which they
are based. Some teachers do have clear and internally consistent theork'S of
learning to underpin their assessment practict', and they are able to articulate
them, as teachers invoh'ed in the KMOFAP (Black et aI., 2003; see Chapter l)
and others investigated by Harlen (2000) illustrate. But some disjunction
between 'espoused theory' and 'theory-in-practice' (Schon, 1983) is common, as
is a lack of thL'Qretical cohert'na-. This raises a question about wht'ther it really
matters which conceptions of learning underpin classroom assessment prOle-
Ua.>s if they are deemecl to 'work' well enough, and whether the need for con-
sistency between teaching.. learning and assessment might be overrated.
My view is that it does matter because some assessment practices are very
much less effective than others in promoting the kinds of learning outcomes that
are l"lIX'ded by young people today and in the fuh.lre (see James and Brown. 2005,
for a discussion of questions for assessment arising from different conceptions of
learning outcomes). As Chapler 4- makes clear, the most ,""luable le"rning oul-
comes in enabling human flourishing - as citizens, as workers, as family and
community members and as fulfilled individuals- are those tlult allow continued
learning.. when and whef(' it is required, in a rapidly changing information- and
technology-rich environment. Thef(' is a need, therefore, for teachers to have a
view about the kinds of learning that af(' most v"luable for their students and 10
choose and develop approaches to teaching and assessment accordingly.
Helping teachers to become more effective may therefore mean both changes
in their assessment practice and changes in their beliefs about le"ming. It will
entail development of a critical awareness that change in one will, and should,
inevitably lead to the need for change in the other. So, for instance, implement-
ing assessment for learning/formative assessment may require a teacher 10
rethink what effective learning is, and his or her role in bringing it about. Sim-
ilarly, a change in their view of learning is likely to require assessment practice
to be modified. While the focus of this book is mainly on formative assessment,
a good deal is relevant to classroom-based summalive assessment by which
teachers summarize what has been achieved at certain times.
49
I
Examp'" of dlff....nt classroom assusment practices
So, what might classroom assessments practices, aligned with differenllheories
of leaming. look like? Consider the following examples. They are written as car-
icatures of particular approaches in order to provide a basis for subsequent dis--
Ulssion. In reality, the differences are unlikely to be so stark and teachers often
blend approaches. The focus of the examples is a secondary school teacher who
has JUS! received a new student inlo her English class. He has recently arrived
in the rountry and English is an additional language for him although I'll'
speaks it reasonably well. The teacher wants to assess his writing. [f she chooses
one of the following approaches what would il say about her model of knowl-
~ r ~ ._m,,"
She sits him in a quiet room by himself and sets him a timed test thai consists
of short answer questions asking him, without recourse to reference material or
access to other students, to: identify parts of given sentences (nouns, \'erb5, arti-
des, connectives); make a list of adjectives to describe nouns; punctuate sen-
tences; spell a list of ten words in a hierarchy of difficulty; write thn>e sentences
describing a favourite animal or place; write the opening paragraph of a story.
She then marks these using a marking scheme (scoring rubric) which enables
her to identify incorrect answers or weaknesses and compare his perfonnance
with others in the class. As a result she places him in a group with others at a
similar level and then provides this group with additional exeroses (0 practise
perfonnance in areas of weakness. When he shows improvement she is liberal
with her praise and then moves on to the next set of skills to be learnt. learn-
ing by rote and practice is a dominant feature of this approach.
Example 2
As part of her class teaching.. she has been covering work on 'genre' in the pro-
gramme of study. Her current focus is narrative and esp..'Ciaily the aspect of
temporal sequencing. The class has been reading Tolkien's The Hobbit and she
used this as a stimulus for their own writing of stories of journeys in search of
treasure. 11te students discuss the qualities of Tht Hobbit that make it a good
story, including structure, plot, characterization, use of language and dramatic
t('nsion (all key concepts to be understood). These they note as things to con-
sider in their own writing. Using a writing frame they first plan their stories and
then try out opening paragraphs. They write their stories over a series of
lessons. At draft stages they review their work, individually with the teacher
and through peer discussion using the criteria they have developed. Then they
redraft to improve their work using the feedback they have received.
The teacher monitors this activity throughout and observes that her new
student has a rich experience of travel to draw on, although some of those expe-
riences have bt.>en negative and need 10 be handled sensitively. With English as
50
I
an additional language he knows more than he can say and needs to be helped
to acquire a wider vocabulary. He also has problems with sequencing which she
thinks could indicate a specific learning difficulty or a different cultural con-
ception of time. She makes a mental note 10 observe this in future activities. In
the meantime she decides to provide lots of opportunities for him to engage in
classroom talk to help with the first difficulty. To help with the sequencing dif-
ficulty, she suggests that he writes topic sentences on card and cuts them out so
that he can physically move them round his table until he gets them in a satis-
factory order. When his story is complete, the student is asked to record his own
self-evaluation and the teacher makes comments on this and his work whim
they discuss tugether to decide next steps. She does not make much use of
praise or numerical scores or grades because, by making learning explicit, he
understands the nature and substance of the progress he has made.
Example 3
1lle teacher regards one of her main aims as helping to develop her students as
writers. To this end she constructs her classroom as a writing workshop. The
new student is invited to ;oin this workshop and all participants including the
teacher and any learning support assistants are involved, on this occasion. in
writing stories for children of a different age to themselves. Although their own
writing or that of others including established authors is used to stimulate
thinking and writing.. aU members in the group, from the most expert to the
most novice, are encouraged to ~ t their own goals and to choose an individual
or group task that will be challenging but achievable with the help of the
knowledge and skill of others in the group. There is no concept of a single spe-
cific goal to be achieved or a performance 'gap' to be closed but rather a
'horizon of possibilities' to be reached. The broad learning goal is for all
membf.>rs of the group to develop their identities as writers.
By participating together in the activity of writing.. each member of the group
has the opportunity to learn from the way others tackle the tasks (rather than
being told how kl do things). Different members of the group take on the role
of student and teacher according to the particular challenges of a given activity.
For example, if the teacher wants to write a sklry for young people she might
need to learn about street language from her students. Thus they become her
teachers. At intervals the members of the group read their work to the rest and
the group p p r ~ it, drawing on the criteria they use to judge what counts as
good work. 'nU$E' criteria may be those shared by writers more generally (as in
Examples I and 2 above) but the dynamic of the group might allow new crite-
ria to emerge and be accepted as norms for this group. For example, the intro-
duction of a new student member with a different cultural background could
encourage more experimental work in the group as a whole.
The model is in some respects similar to apprenticeship models, although
these lend to be associated with the preservation and maintenance of guild
knowledge. In other respects it goes beyond this and, like the University of East
Anglia's well-known creative writing course, it seeks to foster creativity. Our new
51
student begins by being a peripheral participant in this wntmg workshop,
ob5erving and le,lming from what others do, but gradually he is brought into the
group and becomes a full participating member. Assessment in this context is
ongoirlg,. continuous and shared by all participants (not justlhe preserve of the
teacher) but linked very specifically to thl;' p<lrticular activity. Thl;'re is often less
concern to make general statements about competence and more concern to
appraise the quality of Ill<' particular perfonnance or artefact, and the process of
producing it. It is ronsid('red especially important to ('valuate how well the
student has used the resources (tools) available to him. in tenns of materials, tt'Ch-
nology, people, language and ideas, to soh'(' the particular problf.'IDS hf.' faced.
TheJeaming is focusl.,d on an authentic proiect so one of the most important
indicators of success wlll be whether the audience for the stories produced
(other children) responds to them positi\'elr- Their response will also provide
key fonnative feedback to be used by the individual student and the group in
future projt"Cts. The role of the English teacher is therefore not as final arbiter of
quality. but as 'more expert other' and 'guide on the side'. Learning outcomes
are be!.it recorded and dcmollStrilted to othf.'rs through portfolios of work, rather
like those produced by art students, or through the vehicle of the 'masterpit"Ce'
(the 'piece for the master craftsman' designed to be a demonstration of the ~
of whid. an apprentice is capable - also a model for the doctoral thesis).
Each of thl.'Se cxamplt.'S looks very different as a model of teaching. learning
and assessment, yet each is internally CQnsistent and df.'monstrates alignment
between: a conception of valued knowledge in the sub-domain (writing in
English); a view of learning as a process and its implications for teaching; and
an appropriate method for assessing the prQO.'SS and product of such learning.
Of course, I;'ach of thesf.' eleml.'nlS may be contested, as al"(> thl.' theories on which
they all' founded. These theories are elaborated in the next section.
The theoretical foundations of learning and
assessment practice
In this section I will considl.'r three views of learning. identifyinf; th..ir manifl.'s-
tation in classroom practice and the role of assessment in ea(h. Thl.' three I.'xam-
pIes given in the previous section were attempts to portray what each of these
might look like in thl.' real world of schools: to put flesh on theorf.'tical bones. In
reality however, teachers combine these approaches by, for instance, incorpo-
rating elements of Example I into Example 2. or combining clements of
Examp\(' 2 with Example 3. Thus boundaries are blum'!!. Similarly, the per-
spectives on learning considered in this section are broad clusters or families of
theories. Within ..ach c1ustl.'r there is a spectrum of v;ews that sometimes over-
laps with another cluster, therefore it is difficult to claim exclusivity for each
category. For example, constructivist rhetoric can be found in beha\'iOllfist
approaches and the boundary between cognitivist constructivism and social
construttivism is indistinct. This may be helpful because, in pr;lclice, teachers
often 'cherry-pick'. Whilst theorists can objt"Ct that this does violenn' to the
52
coherence of their theories and their intellectual roots I will argue, in the next
section of this chapter, that teachers may have grounds for combining
approaches.
In the US literature (Greeno et a1., 1996; Bredo, 1997; Pellegrino et at., 20(1)
the three perspectives are often labelled 'behavorist', 'cognitive', and 'situated',
but within the UK,. drawing more on European literature, the labels 'behav-
iourist', 'constructivist', and 'socio-uJltural' or 'activist' are sometimes pre-
These two sets of labels are combined in the descriptions below tx'Causc
they are roughly equivalent Each of these perspectives is based on a view of
what learning is and how it takes place; it is in respect to these key questions
that they differ. However, and this is an important point - they do not neces-
sarily claim to have a view about the implications for the construction of learn-
ing environments, for teaching or assessment This has sometimes created
problems for learning theorists tx'Cau5e practitioners and policy makers usually
expect them to have a view on these mailers, and if this is not the case then
there arc those who will try to fill the gap; some successfully and others less so.
The Learning Working Group set up in 2004 by David Miliband, the then
Minister for School Standards in England, noted this with respect to Gardner's
theory of multiple intelligences:
171 Ill.. caSt' of multiple inlelligences Iht'r!' haVf' ulldOi/btedly ban colISt'ljuellas ill
fflUClltiOlI thai Gardller did not illtend. alld SOOIl h.. b.!gan 10 distance himSt'1ffrom
som!' vf lile applielltions in ilis nllme thai he witllt'SSt'd ill schools:
, ... I learned that all elltire slale in Auslralill had advl'led all eduelltiolla/ program
bused ill /lIlrl on Ml throry. The morel /earllI'd Ilwut this program, Ihe/ess rom-
fi!rtaMe I WOIS. While pilrls of IIlI' program w"re ",aSOllllblf alld oosed 011 rt'St'arclr,
mudl of it WOIS 11 mishmash ofpractices, with ueithl'r seie,'tijic jtJlIndatioll nor clin-
ical warrant. Left-brain aud rigllt-braill cOl1lrllsls, scl/sory-based leanlillg styles.
"nium-linguistic Illld MI approachl'S rommillg/I'd u'ilh dauting
promiscuity.' (Hllrgrern.'I'S, ZOOS: 15}
The theory of multiple intelligences is not a theory of learning, stricti)' speak-
ing, but a theory of mental trails. The point is an important one be<;:ausc the
scholarship of learning theorists is, by definition, focused on learning per sc and
not necessarily the implications and application of their ideas for pedagogic
practice. To take this SE'cond step r<X!ui .... 'S applications to be equally rigorously
investigated if they are to be warranted (see James et aI., 2005). In Gardner's
caSE' this lVas the reason for his key role in HaT\'ard's Project UfO which applied
his ideas to practice (Projcct Zero, 2005).
Bearing these cautions in mind the following account summarizes, in a
schematic and IlCCt'$3rily brief way, the key ideas associated with each 01 the
three families of learning theories. First, how learning takes place (the process
and environment for learning) and second, how achievement (the product of
learning) is construed. This is as far as some theories go. However, and vcry
tentatively, I will also extract some implications for teaching and assessment
53
,
I
thai would seem to be consistent with the theory, as illustrated in the examples
in the section above.
Behaviourlst theories of leaming
Behaviourisl theories emerged strongly in the 1930s and are most popularly
associated with the work of Pavlov, Watson, Skinner and Thorndike. Behav-
iourism t;emained a dominant theoretical perspective into the 1960s and 1970s,
when some of today's teachers w('re trained, and can still bt> Sft'n in behaviour
modification programmes as well as everyday practice. Breda (1997), who is
particularly interesting on the subject of the philosophical and political move-
ments that provide the background to these developments, notes the' associa-
tion with the political conservatism that followed the end of World War I and
the growth of positivism,. empiridsm, technicisrn and managerialism.
According 10 these theories the environment for learning is the determining
factor. U!aming is viewed as the conditioned response to edernal stimuli,
Rewards and punishments, or at least the withholding of rewards, are power-
ful ways of forming or extinguishing habits. Praise may be part of such a
reward system, Thesc theories also take the view that complex wholes are
assembled out of parts, SO learning can best be accomplished when complex
performances are deconstructed and when each element is practiscd, reinfora.-d
and sub!ll'l:Juenlly built upon. These theories have no concept of mind, intelli-
gence, ego; there is 'no ghost in the machine'. This is not to say that
such thearish d<!I1Y the existence of human consciousness but that they do not
feel that this is necessary to explain learning; they are only interested in obS('ry-
able behaviour and claim that this is sufficient. From this perspective, achiey('-
ment in learning is often equated with the accumulation of skills and the
memorization of information (facls) in a given domain, demonstrated in the for
mation of habits that allow spt'l'dy performance.
Implications for teaching construe the teacher's role as being to train people
to r=pond to instruction correctly and rapidly. In curriculum planning.. basic
skills are introduced before complex skills. Positive feedback, often in the form
of non-specific praisc, and correction of mistakes are used to make the connec-
tions behveen stimulus and response. As for the environment for learning..
these theories imply that students are best taught in homogeneous groups
according to skill level or individually according to their rale of progress
through a differentiated programme based on a fixed hierarchy of skill acquisi-
tion. Computer-based typing 'tutors' are paradigm examples of this .. Uhough
the approach is ..Iso evident in vocational qualific<ltions post-16 (for example,
the UK General National Vocational Qualification or GNVQ) where leaming
outcomes are broken down into tightly specified components. In the early days
of the national curriculum the disaggregation of allainment levels into atom-
ized statements of attainment reflected this approach. The current Widespread
and fre,uent use of Key Stage 2 practice tests to enhance scores on national tests
in England also rests on behaviourist assumptions aboutleaming.
Implications for assessment are that progress is measured through unseen
54
I
timed tests with items taken from progl'l:'SSive levels in a skill hierarchy. Perfor-
mance is usually interpreted as either correct or incorrect and poor perfonnance
is remedied by more practia' in the incorrect items, sometimes by deronstruct-
ing them further and going back to even more basic skills. This would be the
only feasible interpretation of formative assessment according to these theories.
Example 1 in the previous section comes close to this characterization.
cognltl..... c;onstructl...lst theorl.s of learning
As with behaviourist and sodo-cultural theories, these derive from a mix of
intellectual traditions including positivism.. rationalism and humanism. Noted
theorists include linguists such as Chomsky, computer scientists such as Simon.
and cognitive scientists such as Bruner (who in his later writing moved towards
socio-cultural approaches; see Bruner, 1996). Rec:ently, neuroscientists ha\'e
joined these ranks and are offering nC\',' perspectives on theories that began
their real growth in the 19605 alongside and often in reaction to behaviourism.
Learning., according to these theories, requires the active engagement of
learners and is determinL>d by what goes on in people's heads. As the refel'\!nce
to 'cognition' makes clear, these theories are interested in 'mind' as a function
of 'brain'. A particular focus is on how people construct meaning and make
sense of the world through organizing structures, concepts and principles in
schema (mental models). Prior knowledge is I'\!garded as a powerful determi-
nant of a student's capacity to learn new materia1. There is an emphasis on
'understanding' (and eliminating misunderstanding) and probll'm solving is
seen as the context for knowledge construction. Processing strategies, such as
deductive reasoning from principles and inductive reasoning from evidence,
are important. Differences between experts and novices are marked by the way
experts organize knowledge in structures that make it more retrievable and
useful. From this perspective, achievement is framed in terms of understanding
in relation to conceptual structures and competence in processing strategies.
The two components of metacognition - St'lf-monitoring and St'lf-regulation -
are also important dimensions of leaming.
This perspective on learning has received extensive recent attention for its
implications in relation to teaching and assessment. The two companion volumes
produced by the US National Research Council (Bransford et al., 2000; Pellegrino
et aI., 20(1) are perhaps the best examples of the genre currently a\'ailable. With
the growth of neuroscience and brain research. there are no signs that interest will
diminish. The greatest danger seems to be thatlhe desire to find applications will
rush ahead of the science to support them (see the quote from Gardner above).
Cognitivist theories are complex and differentiated and it is difficult to summa-
rize their overall implications. However, in essence, the role of the teacher is to
help 'novices' to acquire 'expert' understanding of conceptual structures and pro-
cessing strategies to solve problems by symbolic manipulation with 'less search'.
In view of the importance of prior learning asan influence on new learning.. fonn-
ative assessment emerges as an important integral element of pedagogic practice
because it is necessary to elicit students' mental models (through. for example,
55
classroom dialogue, open-ended assignments, thinking-aloud protocols and
concept-mapping) in order to scaffold their understanding of knowledge strnc-
hHes and to provide them with opportunities to apply concepts and strategies in
novel situations. In this context teaching and assessment are blendoo towards the
goals of leaming, particularly the goal of closing the gap between current under-
standil)g and the new understandings sought Example 2 in the previous section
illustrates some aspects of this approach. [t is not surprising therefore that many
formulations of fonnative assessment are associated with this particular theoret-
ical framework (see Chapter 5). Some experimental approaches to summative
assessment are also founded on these theories of learning, for example, the use
of computer software applications for problem-solving and as
a measure of students' learning of knowledge structures (see Pellegrino et aI.,
2001; and Bevan, 2004, for a teacher's usc of these applications). Huwe""r, thl.'Se
assessment technologies are still in their infancy and much formal testing still
relies heavily on behavioural approaches, or on psychometric or 'differentialist'
models. As noted earlier, these are often not underpinned by a thoory of learning
as such because they regard individual ability to learn as lx-ing related to innate
mental Characteristics such as the amount of general intelligenre possessed.
Soc:io-<ultural, situated and activity theories of learning
The socio-<:ulrural perspective on learning is often regarded as a new develop-
ment but Bredo (1997) traces its intellectual origins back to the conjunction of
functiol\ill psychology and philosophical pragmatism in the work of James,
Dewey and Mead at the beginning of the twentieth century. Associated also
with social democratic and progressivist values, these theoretical approaches
acrually stimulated the conservative backlash of behaviourism. Watson, the
principal evangelist of \x>haviourism, was a studt'nt of Dewey at Chicago but
admitted that he never understood him (citoo in Brooo, 1997: 17). The intcrac-
tionist views of the Chicago school, which viewed human development as a
transact\On between the individual and the environment (actor and structure),
derived from Gemlan and British (Darwin) thought but also had some-
thing in common with the development of cultural psychology in Russia, asso-
ciated with Vygotsky (1978) and deri.ed from the dialectical materialism of
Marx (see Edwards, 2005, for an accessible account). Vygutsky was in faci
writing at the same time as Dewey and there is some <,vidence that they actu-
ally met (Glassman, 2(01).
Vygotsky's thinking has subsequently influl.'nced theorists such as Bruner
(1996) in the USA and Engestrom (1999) in Finland. Bruner has been interested
in the education of children but Engeslrom is known principally for recunfig-
uring Russian activity theory as an explanation of how ll.'aming happens in the
workplace. Other key theorists who regard individualleaming as in
the social environment indude Rogoff (1990) and Lave and Wenger (Lave and
Wenger, 1991; Wenl;jer, 1998) whu draw on anthropological work to character-
ize learning as 'cognitive apprenticeship' in 'communities of practice'. Given
the intellectual roots - deriving as much from social theory, sociology and
56
Assessment, Teaching and Theories of Learning
anthropology as from psychology - the language and concepts employed in
socio-rultural approaches are often quite different. For example, 'agency', 'com-
munity', 'rules', 'roles', 'division of labour', 'artefacts' and 'contradictions'
fealure prominently in the discourse.
Acrording to this perspective, learning occurs in an interaction between the
indi\'idual and the social environment. (It is significant that Vygotsky's seminal
work is entitled Mind ill Society.) Thinking is conducted through actions that
alter the situation and the situation changes the thinking; the two constantly
interact. Especially important is the notion that learning is a mediated activity
in which cultural artefacts have a crucial role. These can be physical artefac15
such as books and equipment but they can also be symbolic tools such as lan-
guage. Since languagl', which is central to our capacity to think, is dc\elopt.'1.1 in
relationships between people, social relationships are necessary for, and
pTC.'de, learning (Vygotsky, 1978). Thus learning is by definition a social and
collaborative activity in which peoplc develop their thinking together. Group
work is not an optional extra. Learning invoh'es participation and whilt is
learned is not necessarily the property of an individual but shared within the
social group, hl'llce the concept of 'distributed cognition' (Salomon, 1993) in
which collective knowledge of the group, community or organization is
regarded as greater than the sum of the knowledge of individuals. The out-
comes of learning that are most valued are engaged participation in ways that
others find appropriate, for example, S4.'Cing the world in a pilrticular way and
acting acrordingly. The de\"t'lopment of identities is particularly important; this
involves the learner shaping and being shaped by a community of practice.
Knowledge is not abstracted from conte)(t but seen in relation to it, thus it is dif-
ficult to judge an individual as having acquiTL'1.1 knowledgt' in generaltenns,
Ihat is, e)(tracled from praclice.
These theories provide very interesling descriptions and explan,ltions of
learning in communities of practice bulthe newer ones arc not yet well worked
out in tem\s of their implications for teaching and assessment, particularly in
the case of the 1aller and especially in school contexts. Example 3 in the section
above is my altempt to t')(trapolate from the theory. According 10 my reading..
socio-rullural approaches imply that the leacher needs to create an environ
men! in which people can be stimulated to think and act in authentic tasks (like
apprenticl'S) beyond tht'ir currt'nt level of compelence (bul in what Vygotsky
calls their 'zone of proximal development'). Access to, and use of, an appropri-
,lte range of tools are important aspects of such an expansive learning environ-
m..n\. It is important to find activities that learners can complete with assistance
but not alone so that the 'more experl other', in some cases the teacher but often
a pt'Cr, can 'scaffold' their learning (a concept shared with cognitivisl
approaches) and remove the scaffold when they can cope on their own. Tasks
need to be collaborative and students must be involved both in the generation
of probll'ms and of solutions. Teachers and students jointly solve problems and
all develop their skill and understanding.
Assessment within this perspective is weakly conceptualized at prescnt.
Since tht' model draws extt'nsively on anthropological concepts one might
57
expect forms of ethnographic observation and inference to have a role.
However, Pelltogrino et aI. (2001: 101) devote only one paragraph to this possi-
bility and make a single reference to 'in vivo' studies of complex situated
problem solving as a model. In the UK. Filer and Pollard (2000) provide an
ethnographic account of the way children build learning identities and the role
assessment plays in this. As they show, learning can be inferred from active par-
ticipation in authentic (real-world) activities or projects. The focus here is on
how well people exerdse 'agency' in their use of the resources or tools (intel-
lectual, human, material) available to them to formulate problems, work pro-
ductively, and evaluate their efforts. Learning outcomes can be captured and
reported through various forms of recording.. including audio and visual
media. 'The portfolio has an important role in this although attempts to 'grade'
portfolios according to 'scoring rubrics' seems to be out of alignment with the
socio-cultural perspective. Serafini (2000) makes this point about the slale-man-
dated Arizona Student Assessment Program, a portfolio-based system, which
reduced the possibilities for 'assessment as inquiry' largely to 'assessment as
procedure' or even 'assessment as measurement'. Biggs and Tang (1997) argue
that judgement needs to be holistic to be consistent with a socio-cultural or sit-
uated approach. Moreover, if a key goal of learning is to build learning identi-
ties then students' own self-assessments must be central. However, this raises
questions about how to ensure the trustworthiness of such assessments when
large numbers of studffits are involved and when those who are interested in
the outcomes of such learning cannot partidpate in the activities that generale
them. aearly, more work needs 10 be done to develop approaches to assess-
ment coherent with a socio-cultural perspective on learning.
Possibilities for ecledicism or synthesis
The previous two sections have attempted 10 show the potential to develop
consistency between assessment practice and beliefs about learning and to
proVide a basis for arguing that change in one almost always requires a change
in the other. I have noted, however, that assessment practice is sometimes out
of step with developments in learning theory and can undermine effective
teaching and learning because its washback effect is so powerful, especially in
high stakes settings. It would seem. therefore, that alignment between assess-
ment practice and learning theory is something to strive for. But is this realistic
and how can it be accomplished? Teachers are very interested in 'what works'
for them in classrooms and will sometimes argue that a blend or mix of practi-
cal approaches works best. They will wonder if this is acceptable or whether
they have to be purist about the perspective they adopt. They might ask: Do I
have to choose one approach to the exclusion of others? Can J mix them? Or is
there a model that combines elements of all? These questions are essentially
about purism, or synthesis. An analogy derived from chemistry
might help to make these distinctions clear.
The paradigm purist might argue that, like oil and water, these theories do
5.
Assessment. Teaching and Theories of Learning
not mix. A theory, if it is a good theory, allempts to provide as complete an
account as possible of the phenomenil in question. Therefore one good theory
should be sufficient. Howe\'er, if the bounds around a set of phenomena are
drJwn slightly differently, as they can be with respect to teaching ,md learning
beciluse it is a wide and complex field of study, then a number of theories may
overlap. Thus behaviourist approaches seem to work perfectly well when the
focus is on the development of some basic skills or habitual behaviours. In these
contexts, too much thought might actually get in the way of execution. On the
other hand, cognitivist approaches seem to be best when deep understanding
of conceptuill structures within subject domains is the desired outcome. Thus,
'fitness for purpose' is an important consideration in making such judgements
and a blending of approaches. like a mixture of salt and bicarbonate of soda as
a substitute for toothpaste, might work well. Such a combination would consti-
tute an edectic approach. Nonetheless, there are practices that contradict each
other and to employ them both could Simply confuse students. The use of non-
specific praise is a case in point. Whilst the use of such praise to reinforce the
desired behaviour may be effective in one context, in another context it can bt!
counter-productive to the development of understanding (see Chapter 4 for
more discussion).
The nature of the subject domain might also encourage consideration of
whether priority should be given to one approach in preference to another. For
example, subject disciplines such as scienC{' and mathematics, with hierarchi-
cally-ordered and generally-accepted conceptual structures, may lend them-
selves to constructivist approaches beller than broader 'fields' of study with
contested or multiple criteria of what counls as quality learning (&Idler, 1987),
such as in the expreSsive arts. It is perhaps no surprise that teaching and assess-
ment applications from a constructivist perspecth'e draw on an overwhelming
majority of examples from science and mathl'malics (see Bransford et aI., 2000,
and Pellegrino et aI., 2001). Many elaborations of formative assessment also do
so (Black et al., 2003) although accounts of applications in other subjects are
being de\'eloped (Marshall and Hodgen, 2005) with a resulting need to critique
and adapt earlier models (see Chapter 5).
Most importantly, the constructivist approach in both theory and practice
has taken on board the importance of the social dimension of learning: hence
the increasing usc of the term 'social constructivism'. Similarly, there is now
evidence that socio--cultural and activity theory fram('works are invoh-ed in a
'discursiv(' shift' to recognize the cognitive potential to explain how we learn
new practices (Edwards, 2005). This seems to suggest possibilities for synthesis
whereby a more complete thl'Ory can emerge from blending and bonding key
elements of previous thrones. The analogy with chemistry would be the cre-
ation of a new compound (for example, a polymer) through the combining of
elements in a chemical reaction. Thus synthesis goes further than eclecticism
towards creating a nC'w alignment. Could it be that one day we will have a more
complete meta-theory which synthC'sizes the insights from what now appear to
be rather disparate perspectives? Could such a throry permit a range of assess-
ment practices to fit different contexts and purposes whilst still maintaining an
59
internal consistency and coherence? Chapter 5 goes some way to meeting this
challenge with respect to fonnative assessment/assessment for learning. Cer-
tainly, the possibility for a more complete and inclusive theory of learning to
guide the practice of teaching and assessment seems a goal worth pursuing,
to the end, however, decisions about which assessment practices are most
appropriate should flow from educational judgements as 10 preferred learning
outcomes. This forces US to engage with questions of value - what we consider
to be worthwhile, which in a sense is beyond both theory and method.
60
Chapter 4
The Role of Assessment in Developing
Motivation for learning
Wynne Harlen
This chapteT is about motivation for learning and how assessment for different
purposes, uS(.'<l in various ways, can affect it, both bl.>ncfidally and detrimen-
tally. It begins with a brief di!;CUssion of some key components of motivation for
learning and some of thc theories relevant to it. This is followed by refercnre to
research evidence relating to the impact of summati\'!' assessment on motiva-
lion for learning. Despite the great range and variety in the research studies.
their findings conv(>rge in providing (>vid..nce that soml' summatiw assessnwnt
practices, particularly high stakes tests, have a negalive impact. At the same
time, the evidence points towards ways of avoiding such impact. Not surpris-
ingly, these actions suggest classroom practin-s that reflect m,my of the features
of '(ormatiV<' assessment', or 'assessment (or learning', these two terms being
used interchangeably here to describe assessment when it has the purpose and
effect of enabling students to make progress in their learning. The chapter ends
by drawing together implications (or assessment policy at the school, local and
national levels.
The importance of motivation for learning
Motivation has been described as 'the conditions and processes that account for
the arousal, direction, magnitude, and maintenance of effort' {Katzell lind
Thompson, 1990: 144}, and motivation (or learning as the 'engine' that drives
teaching and ]<'arning (Stiggins, 2001: 36). It is a constmct of what impels learn-
ers to spend Ihe time and effort nceded for learning and solVing problems
(Bransford et aI., 2(00). [t is clearly central to learning, bul is not only needed as
an input into education. [I is also an essential outcome of education if students
are to be able to adapt to changing conditions and problems in their lives
beyond f<.lrmal schooling. Thc more rapid thc chiln!::c in th... "S(! conditions, the
more important is strong motivation to learn new skills and to enioy the chal-
lenge.
Consequently, d(>vcluping motivatiun (or learning is seen as an important
outcum(' of educatiun in the twenty-first century and it is eSS(>ntial to be aware
of what aspects of leaching and learning practia' act to promote or inhibit il.
Assessment is one of the key factors that affect motivation. Stiggins claims that
61
I
teachers can enhance or destroy students' desires to Jearn more qUickly and
more permanently through their use of assessment than through any other
tools al their disposal (2001: 36). In this chapter we look at this association and
take it further to suggest ways of using assessment to enhance motivation for
learning. However, it is first necessary to consider the nature of motivation in
some detail, for it is not a single or simple entity. By rerognizing some of ils
complexity we can see how assessment interacts with it.
The concept of motivation for learning
In SOffie sense all actions are motivated, as we always have some reason for
doing something.. even if it is just to fill an idle hour, or to experience the sense
of achievement in meeting a challenge, or to avoid the consequences of laking
no action.
People read, or even write books, climb mountains or take heroic risks for
these reasons. We may undertake unpleasant and apparently unrewarding
tasks because we know that by doing so we avoid the even more unpleasant
consequences of inaction or, in other circumstances, achieve the satisfaction of
helping others. In tasks that we enjoy, the motivation may be in the enjoyment
of the process or in the product; a person might take a walk because he or she
enjoys the experience or because the destination can only be reached on foot, or
because of the knowledge that the exercise will be good for the health. In such
cases the goals are clear and the achievement, or non-adue\"ement, of them is
made evident in a relatively short time. In relation to learning. however, the
value of making an effort is not always apparent to the student. This underlines
the importance of understanding how learning contexts and conditions, and
particularly the emdal roll' of assessment, impact on motivation.
ExtrinSK .nd intrinsic motiv.tion
There is a well-established distinction betwL'l'n intrinsic and extrinsic motiva-
tion. When applied to motivation for learning it refers to the difference between
the learning process being a source of satisfaction itself or the potential gains
from learning being the driving force. In the latter case, extrinsic motivation, the
benefit derived may be a result of achieving a certain level of attainment but is
not related to what is learned; learning is a means to an end, not an end in itself.
On the other hand intrinsic motivation describes the situation in which leamers
find satisfaction in the skills and knowledge that result and find enjoyment in
learning them. Intrinsic motivation is seen as the ideal, since it is more likely to
lead to a desire to continue learning than leaming motivated extrinsically by
rewards such as stars, certificates, prizes or gifts in the absence of such external
incentives. Most teachers ha\"e come across students who constantly ask '[s it
for the er-amination?' when asked to undertake a new task. This follows years
of being told how important it is to ~ the examination rather than to hereme
aware of the usefulness and interest in what is being learned.
62
The Role of Assessment in Developing Motivation for learning
1he distinction betwt>en intrinsic and extrinsic motivation for learning is a
useful one when one considers the extremes. 1here are times when effort is made
in undertaking a task because of enjoyment in the process and satisfaction in the
knowledge or skills that result. There are also times when the effort is made
because either there are penalties for not accomplishing a task according to expec-
tations or there are rewards that have little connection with the learning task
(such as a new bicycle for passing an examination). However, there is a large area
between the extremes where is il difficult to characterize a reward as providing
extrinsic or intrinsic motivation. For example, tm- desire to gain a certificate
which enables a learner to pass on to the next stage of learning could be regarded
as extrinsic motivation,. but on the other hand the certificate can be st'en as sym-
bolic of the learning achieved. Similarly praise can be a confirmation that one has
achieved something worthwhile or a reason for expending effort.
Furthermore, to regard all extrinsic sources of motivation as 'bad' and all
intrinsic motivation as 'good' ignores the reality of the variety of learning. of
learning contexts and goals as learning. Hidi (2000) suggests that what may
apply to short-term or simple tasks may not apply 10 long-term and complex
activities. She contends that 'a combination of intrinsic rewards inherent in
interesting activities and external rewards, particularly those that proVide per-
formance feedback, may be required to maintain individuals' engagement
across complex and often difficult - perhaps painful- periods of learning' (Hidi
and Harackiewicz. 2000: 159). Nevertheless, there is strong evidence, reviewed
by Dt.'Ci et al. (1999), that external rewards undermine intrinsic motivalion
across a range of activities, populations and types of reward. Kohn has written
extensively about the destructive impact of external rewards, such as money, on
student learning. From experimental studies comparing rewarded and non-
rewarded students he concludes that those students offered external rewards:
choose cll5ier tllSks, IlSS ill IIs;ng the information available to solve
novel problems, and lend to be answer-on'entated and mort illogical in tlltir
probll:m-solving strattgits. They ste1I1 to work harder and product mort activity,
but tilt activity is ofa Iowtr quality. contains mort and is more stereotyped
and less thQn the work of subjects wurking on tht prob-
lems. (7993: 471-2)
Although the quality of this particular research by Kohn has been criticil.ed
(Kellaghan et al., 19%), the findings arc supported by similar studies and Kel-
laghan et al. (1996) themselv"", report evidence that intrinsic motivation Is aSSO-
ciated with levels of engagement in learning that lead to conceptual
understanding and higher level thinking skills. The review by Crooks (1988)
also drew attention to research that indicates the problems associated with
extrinsic motivation in tending to lead to 'shallow' rather than 'deep' learning.
'Intrinsic' and 'extrinsic' are descriptions of overall forms of motivation but
to understand how to promote intrinsic motivation in individualleamers it is
necessary to consider some underlying factors. Rewards and punishments are
only one way of influencing motivation and people vary in their response to
63
them; the reward has to be valUl-'l:l. if it is 10 promote the effort needed to achieve
it. The effort required for leaming is influenced by interest, goal-orientation,
locus of control, self-esteem, self-efficaey and self-regulation. These are inler-
connected components of motivation for learning and there is a good deal of
eVidente that assessment has a key role in promoting or inhibiting them and
henct' affects the nature of the leaming achieved in partirular cirrumstances.
Components of motivation for learning
Interest
Interest is the result of an interaction between an individual and certain aspects
of thl! cnvironml'flt. It has a powerful impact on leaming. Hidi and Harackiewicr
suggest that 'it can be viewed as both a state and a disposition of a person, and it
has a cognitive, as well as an affective, component' (2(XX): 152). As it depends on
the individual as well as on the environment, studies have identified two aspects:
individual or personal interest, and 'situational' interest, residing in context\lal
factors of the en"ironment. Individual interest is considered to be a relativt'Iy
stable response to ct'rtain experiences, objects or topics that develop over time as
knowledge increases and enhances pleasure in the activity. Situational interest
resides in certain aspects of the environment that attract attention and mayor
may not last. Not surprisingly those with personal interest in particular activities
persist in them for longer, learn from them and enjoy the activities more than
those with less personal intef('St. Wht!re personal interest is absent, situational
interest is partirularly important fur involvement in leaming. Featuf('S of learn-
ing activities such as IlQvelty, surprise and links to existing e.xperience provide a
meaningful context and can therefore help to engage studl'f1.ts' interest. Some
potentially boring activities can be made interesting through, for example,
making them into games. 11 has also been found that changing the social en,i-
ronment can encourage inten-'St; for instanct', some students show more when
working with others than by themselves (Isaac et aL, 1999).
1l\c aim of creating situational interest is to get students to participate in
learning1tasks that they do not initially find interesting, in the hope that per-
sonal interest may develop, at the same time as some learning taking plact'. This
is more likely to happen if students are encouraged to see the purpose of their
involvement as leaming. Thus tm- development of interest !holt leads to learn-
ing is ('Onnected with goal orientation and with the type of feedback they
receive, both of which are closely connected with assessment as discussed later.
Goal orientation
How learners see the goals of engaging in a leolming task determines the direc-
tion in which effort will be made and how they will organize and prioritize (or
not) time spent for learning. The nature of the goo.l that is adopted is clearly crit-
icaL Goals will only be selected if they are understood, appear achievable, and
are seen as worthwhile. As Henderson and Dweck (1990) point out, if students
..
,
The Role of A$wssment in Developing Motivation for Learning
do not value the goals of academic achievement they are unlikely to be moti-
vated to achieve them.
The relationship between the goals embraced by a learner and ho..... they
respond to learning tasks is ellpressed in tenns of two main types of goal. These
are described as 'learning (or mastery) goals' and 'perfonnance (or ego) goals'
(Ames, 1992). Those motivated by goals identified in tenns of learning apply
effort in acquiring new skills, seek to understand what is involved rather than
just committing infonnation to memory, persist in the faa- of difficulties, and
generally try to increase their competence. Those oriented towaros goals iden-
tified as a level of performana- seek Ihe easiest way to meet ret:juiremcnts and
achieve the goals, compare themseh'es with others, and consider ability to be
more important than effort.
A good deal of research evidence supports the superiority of goals as learn-
ing o\'er goals as performance. For example, Am('!j and Archer (1988) found
those who hold with goals as learning seek challenging tasks and Benmansour
(1999) found a particularly strong association between goal orientation and the
use of active learning strategies. The use of more passive learning strategies and
avoidance of challenge by those who see goals as performance is particularly
serious for lower achieving students. lndL'Cd BUller (1992) found thai the effects
of different goal orientations are less evident among high achieving students or
those perceiving themselves as perfonning well than among those performing
less well. But the extent to which goal orientation is a dichotomy has been chal-
lenged by evidence that goals as learning and goals as performance are uncor-
related (Mcinerney el aI., 1997) and that there may be students who endorse one
or other, both or neither. The fact that researchers have sct up experimental sit-
uations that induce different goal orientations in order to investigate their effect
(as in the study by Schunk, 1996, outlined later) indicates that they are subject
to change and manipulation and so can be influenced by classroom culture.
The evident value for school work of goals as learning leads to the question
of how students can be oriented or re-oriented towards these rather Ihan goals
as pl'rformanee. This question of how individuals rome to embrace gools is dis-
cussed by Kellaghan et al (1996). They cite evidence of the need to ensure that
goals are understood, that they are challenging but achievable, seen to be ben-
eficial to the learner and are valued by them, and that the social and cultural
conted facilitates opportunities for learning. In relation to the last of these con-
ditions they romment:
$o(';al and cu/lu.al an: ;mporlanl aspecls of wcauSl' Ihey
elln inftuenct students' of self, their beliefs Ilboul aehirocnlenl, lind the
selection ofgools. Thus II may, 0. mll!J 1101, Ildopl Ilchiromrelll gools 10 gllill
or ka-p the approval of othen;. ... If aaldemic IlchievemtJIl is nol VIIlued in II
stud"nt's neighbourhood, pet" group, orfamily, 1M sl udelll will be Ilffected bylhis
in ronsidmng whethrr or nol to Ildopt Ilclldem;c gools. Evell if Ilclldemic IlchiL'!1e-
menllllld the rtWQrds lISSOCililed with illire perceived 10 have Ixd"e, II sluJelrt may
decide thaI home and school support art inadequate /0 help him or her succ:eed.
(1996: 13-14)
65
Assessment <1nd Le<1rning
This further underlines the of the components of motivation
chosen for discussion here. It also draws attention to the extent to which learn-
ers feel themselves to be in control of their learning.. the "locus of control', the
point to which we now tum.
Locus of control
As just suggested, 'locus of control' refers to whether learners perceive the
cause Of their success or failure to be under their control (internal 10001S) or to
be controlled by others (external locus). Locus of control is a central concept in
attribution theory 1979). A sense of internal control is evident in those
who recognize that their success or failure is due to factors within themselves,
either their effort or their ability. They see themselves as capable of success and
are prepared to invest the necessary effort to meet challenges. Those with a
sense of external control attribute their success or failure to external fadors,
such as their teacher or luck. They have less motivation to make an effort to
overcome problems and prefer to kl'ep to tasks where they can succeed.
In addition. the beliefs of learners about whether their ability is something
that can or cannot be changL>d by effort affects thL'ir response to challenging
tasks (DwCC"k. 1999). Those with a view that their effort can improve their ability
will not be deterred by failure, but will persist and apply more effort. Those
with a view of their ability as fixed find, in success, support for their view. But
failure casts doubt on the ability they regard as fixlXl. So risk of failure is to be
avoided; when not confident of success, they are likely to avoid challenge. As
in the case of goal orientation. the consequences are most serious for those who
perceive/heir ability to be low, for the chance of failure is higher and th('y learn
to expect it. The implication for their feeling of self-worth as a leamer, and self-
esteem more generally, is clear.
Self-esteem
Self-esteem refers to how people value themselves both as people and as learn-
ers. It shows in the confidence that the person feels in being able to learn. Those
who are confident in their ability to learn will approach a learning task with an
expectation of sucress and a determination to overcome problems. By contrast,
those who have- gained a vicw of themscl\'cs as less able to succeed art:' likel)' to
be tentative in attempting new tasks and deterred by problems encountered. As
a resullthey appear to make less effort to learn and find less and less enjoyment
in the learning situation. As noted, this is related to their view of whether they
have control over their performam:t' and whether effort can improve it.
5elf-efftucy
Sclf-efficacy is closely related to self-esteem and to locus of control, but is more
directed at Specific tasks for subjects. It refers to how capable the learner fC(!ls
of succeeding in a particular task or type of task, It is characterized as 'I can'
"
The Role of Assessment in Developing Motivation for Learning
versus 'I can't' by Anderson and Bourke (2000: 35) who state that it is a learned
response, the learning taking place over time through the student's various
experiences of success and failure. Clearly, the more a student experiences
failure in relation to a type of task the more likely it is that they will become
convinced of not being able to suCC('('d. The student develops a condition
described as 'learned helplessness', characterized by a lack of persistence with
a task or even an unwillingness to put enough effort into it to ha\'e a chance of
success. Assessment must have a key role in this development, so it is impor-
tant for learning that the assessment is conducted so as to build self-efficacy.

Self-regulation in learning refers to the will to act in ways that bring about
leaming. It refers to learners' consciously controlling their attention and actions
so that they are able to solve problems or carry out lasks successfully. Self-reg-
ulated learners select and use strategies for l('aming and evaluate their success.
They take responsibility for their own learning and make choices about how to
improve. Those not able to regulate their own learning depend on others to teU
them what 10 do and 10 judge how well they have done il. Young children are
able to regulate their leaming by adopting simple strategies relevant to learn-
ing, such as focusing their attention on key features to detect changes or 'clus-
tering' to aid their memory. Bransford et al. (2000) quote the example of third
year school students outperforming college students in memorizing a list of 30
items. The younger students grouped the items into dusters with meaning for
them which aided recall. It would appear from examples such as this that learn-
ing depends on a control of strategies and not just on an increase in experience
and information.
Consciously selecting relevant strategies is a step towards students reflecting
on leaming and becoming aware of their own thinking. leading to ffida-eogni-
tion. For this they need a language to use when talking about learning and about
themselves as learners. Developing and using this language, in a context where
each person is valued, were found by Deakin Crick et al. (2002) to be central in
developing students' strategic awareness of their learning. Promoting self-regu-
lation and mela-cognition enables effort to be directed to improve performance.
Assessment and motivation for learning
How leaming is assessed is intimately related 10 views of learning. Behaviourist
views of learning. which continue to permeate classrooms and indeed to influ-
ence education policy decisions, are based on reinforcing required behaviour
wilh rewards and deterring unwanted behaviour with punishments. Student
assessment is generally the vehicle for applying these rewards and punish-
ments. Constructivist views of learning focus attention on the processes of
learning and the leamer's role. Teachers engage students in self-assessment and
use their own assessment to try to identify the leamer's current understanding
67
Auesment and learning
and level of skills. These are matters discussed in detail in Chapter 3. Our focus
here is on how assessment affects each of the components of motivation dis-
cussed in the last section. As we will see there are both negative and positive
dfeets and by considering both we can draw out in the next section, the ways
in which assessment can promote motivation for learning.
The research studies of how assessment impacts on motivation for learning
are variable in design. population studied, and in quality. A systematic review
of research on this impact, conducted by Harlen and Deakin Crick (2002, 2(03)
identified 183 potentially relevant studies, of which 19 remained after succes-
sive roundS of applying inclusion and exclusion criteria, and making judgments
on the weight of evidenct' each study provided for thl' questions addressed. The
rest.'arch discussed ht>re draws ht>avily on this review, mainly on the 12 studies
that provided evidence of high weight for the review questions. The focus was
on thl' impact of summative a5Sl'ssment, some conducted by teachers and some
by external agencies. These are the most common fonns of assessment encoun-
tered by students for, as Black and Wiliam (1998a) point out, current practice of
assessment lacks many of the features that are required for assessment to be
formative. The findings indicate how assessment can be practised so that, even
though its purpose is summative, it can support rather than detract from moti
vation for learning.
Motivation, as we have seen, is too complex a ooncept for it to be studied as
a Single dependent variable. Rather, research studies have concerned one or
mol'(' of tht> oomponents indicated in the last section. underlining their inter-
relatedness. The studies do not fit neatly into categories identified by the com-
ponents of motivation as dependent variables. Thus the <1pproach taken here is
to outline the findings from some key studies grouped <1ccording to the inde-
pendent variable, the assessment being studied, and then to draw together the
motivation-related themes emerging from them.
Studies of the impact of the national testing and assessment In
England and Wales
Several studies were <1ble to take <1dvant<1ge of the introduction into England
and Wales of fonnal tests and teachers' assessments from the beginning of the
19905 in order to explore the changes associated with the innovation. In primary
schools the national curriculum tests represented <1 considerable change from
previous practice and a unique opportunity to rompare students' experiences
before and after this innovation. Part of one such study was reported by Pollard
et al. (2000). The research was one element of a larger longitudinal study, which
mapped the educational l'xperiences in a cohort of students as they passed
through primary school beginning just one year before the introduction of the
national tests and <1sscssment in England and Wales. Over the eight years of the
study, personal interviews with head teachers, teachers and students were
some of the most important sources of d<1ta. Other procedures included ques-
tionnaires for teachen;, observation in classrooms using systcm<1tic quantitative
procedures and qU<1lit<1tive approaches, open-ended or partially structured
.8
field notes, and children's cartoon bubble completions. Sodometric data on chil-
dren's friendship patterns and tape recordings of teadw.rs' interactions with
children were also collected.
The study found that in the initial stages of national testing the teachers tried
to 'protect' studt>nts from tht> effects of the new assessment requirements, which
they saw as potentially damaging. But as time went on,. teachers became more
accepting of a formal structured approach to student assessment. As the stu-
dents became older they were aware of assessment only as a summative activ-
ity. They used criteria of neatness, correctness, quantity, and effort when
commenting on their own and others' work. There was no evidence from stu-
dents that teachers were communicating any formative or diagnostic assess-
ment to them. Feelings of tension. uncertainty and test anxiety were reported.
The researchers concluded that pressure of external assessment had had an
impact on students' attitudes and perceptions. Students became less confident
in their self-assessmmts and more likely to attribute success and failure to
innate characteristics. They were less positive about assessment interactions
that revealt'd their weaknesses. The assessment process was intimately associ-
ated with their developing sense of themselves as learners and as people. They
incorporated their leachl'rs' evaluation of them into the construction of their
identity as learners.
Another study of the impact of the national curriculum tests in England and
Wales focused specifically on students' self-esteem. Davies and Brember (1998,
1999) conducted a study beginning two years before the introduction of
national tests and extending for several years afterwards, using successivt>
cohorts of Year 2 (7-year-old) and Year 6 (ll-yeal'-Old) students. They adminis-
tered measures of self-esteem and some standardized tests in reading and
mathematics. For Year 2 children, self-esteem dropped with each year, with the
greatest drop coinciding with tht> introduction of the national curriculum tests.
Although there was a small upturn for the fifth cohort, the level still remained
lower than the third and very much below the se.::ond cohort. Mean levels of
self-esteem for the pre-national test cohorts were significantly higher than for
the post-national test cohorts. TIle difference in self-esteem across cohorts was
highly significant for Year 2 children but not for Year 6 children. Before the
introduction of the national tests there was no overall relationship between self-
esteem and achievement in reading and maths on the standardized tests.
However, there was a positive correlation between self-esteem and perform-
ance after the introduction of national curriculum lests. TIle authors suggested
that the lack of correlation between achi"vem"nl and 5Clf-eSlee.m before the
national curriculum tests meant that the children's view of themselves was
apparently less affected by their attainments than in the case of the post-
national test group.
A small-scale study by Reay and Wiliam (1999) concerned the experienCt'S of
Year 6 (11-year-old) student.s in one primary sd1.ool in the term before taking the
national tests. TIle researdl.ers observed in the class for over 60 hours and inter-
viewed students in groups. They described the class as being at 'fever pitch'
because of the impending tests. l1\e results of the.se test.s had in fact little conse-
69
quem:t' for the srndents, but because the school was held responsible for the le\"els
that they reached and was charged to make in scores from one
year to another, the tcsts had high stakes for the teachers involved" In the
observed class, the teacher's anxieties were evident in !h<.> way he berated the chil-
dren for poor performance in the practice tests. Even though the students reoog-
nized that the tests were about how well they had been taught they still worried
about their performance and about possible consequences for th...ir own furnn'.
They were beginning 10 view themselves and others differently in terms of tl'!it
results, equating cleverness with doing v.-ell in the tests, and inCI'('asingly refer-
ring to the levels they expected themselves and others to achieve.
Studi of selection tlKts in Northern Ireland
While the tests for II-year-old students in England and Wales were not used for
selection until 2003, tests of 11-year-olds in Northern Ireland were used for the
highly competitive selection for admission to grammar school. Two studies of
contrasting design reported different kinds of evidence abut the impact of the
tests on aspects of students' motivation for learning. johnston and McClune
(2000) invcstigated the imp3et on teachers, students and students' learning
processes in science lessons through interviews, questionnilires and classroom
observations_ Leonard and Davey (2001) reported the students' perspectives of
the process of preparing for taking and coming to terms with the results of these
tests, generally known as II-plus tests.
Johnston and McClune (2000) used several instroments to measure students'
learning dispositions, self-esteem, locus of control and attitude to scl<.>nce and
related these to the transfer grades obtained by the students in the II-plus
examination. They found four main leaming dispositions, using the Learning
Combination Inventory (Johnston, 1996). These were described as:
'Precise processing' (preference for gathering.. processing and utilizing lots of
data, which gives ris.e to asking and answering many questions and a pref-
erence for demonstrating learning through writing answers and factual
reports);
'Sequential processing' (preference for clear and explidt directions in
approaching learning tasks);
'Technical processing' (preference for hands-on experience and problem-
solVing tasks; wlllingness to take risks and to be creative);
'Confluent processing' (typical of creative and imaginative thinkers, who
think in terms of connections and links between ideas and phenomena and
like to see the 'bigger picture').
Classroom observation showed that teachers were teaching in ways that gave
priority to sequential processing and which linked success and ability in science
to precise/sequential processing. The statistical analysis showed a positive cor-
relation between precise/sequential learning dispositions and self-esteem. The
more positive a student's disposition towards precis.e/St.'quential or technical
70
processing. the higher is their self-esteem and the more internal their locus of
control. Conversely, the more confluent the student's learning orientation, the
more external their locus of control and the lower is their self-esteem. Inter-
views with teachers indicated that they felt the need to teach through highly
structured activities and transmission of information on account of the nature
of the selection tests. However, the learning dispositions of students showed a
preference for technical processing. that is. through first-hand exploration and
problem solving. Thus teachers appeared to be valuing precise/sequential pro-
cessing approaches to learning more than other approaches and in doing so
were discriminating against and demoraliZing students whose p r r n ~ was
to learn in other ways.
A study by Leonard and Davey, (20l(1) funded by Save the Children, was
specifically designed to rewal students' views on the II-plus tests. Studl'nts
were interviewed in focus groups on three occasions, and they wrote stories
and drew pictul'("S about their experiences and feelings. The interviews took
place just after taking the test then in the week before the results were
announced, and finally a week after the results were known. Thus th;> various
phases of the testing process and its aftermath could be studied at times when
these were uppermost in the students' minds. As well as being th... cause of
extreme lest anxiety, the impact on the self-esteem of those who did not meet
their own or others' expectations was often devastating. Despite efforts by
teachers to avoid value judgements being made on the basis of grades achieved,
it was clear that among the students those who achieved grade A were pt!r-
ceived as smart and grade D students were !X'fCeived as stupid. The self-esteem
of those receiving a grade D plummeted. What makes this impact all the more
regrettable is that thc measures are so unreliable that many thousands of stu-
dents are misgraded (see Chapter 7).
Studies of regular classroom assessment in North
America
Brookhart and IANoge (1999) studied US third grade students' perceptions of
assessment 'events' taking place in the course of regular classroom work. They
collected data by questionnaire from students about their perceptions of a task
(as 'easy' or 'difficult', and so on) before auempting it. After the ev;>n! they
asked students about how much effort thcy felt they had applied. Selected stu-
dents were then interviewed about their perceptions of the assessment. The
results weTC USl.'d to test a model of the role of classroom assessment in student
motivation and achievement. The findings indicated that students' self-efficacy
judgements about their ability to do particular classroom assessments were
based on p'vious experiences with similar kinds of classroom assessments.
Results of previous spelling tests, for example, weTC offered as evidence of how
students expected to do on the current spelling test. Judgemental feedback from
previous work was used by students as an indication of how much effort they
needed to invest. Students who were sure that they would succeed in the work
71
might not put effort into it. However this would depend on their goal orienta-
tion. Those st'eing goals as perfonnance might apply effort, if this was how they
would be judged, in order to gain approval.
The authors also found that teachers' explicit instructions and how they pre-
sented and treated classroom assessrnt.>fI( events afft..eted the way students
approached the tasks. When a teacher exhorted a student to work towards a good
grade that teacher was, on the one hand, motivating students and on the other
was setting up a perfonnance orientation that may have decrea.o;ed motivation.
Duckworth et at (1986) also studied the impact of nonnal classroom grading
procedures but in this case with high school students in the USA across differ-
ent subjects. Their aim was to understand the relationship between effort, effi-
cacy and futility in relation to types of teacher feedback at the individual
student level, at the class levet and at the school level. Questionnaires were
administered to a cross-seetion of students in 69 schools to proVide indices of
effort, efficacy and futility. At the individual [e\'el they found efficacy positively
correlated with effort across all ability levels and subjects. These same n'lation-
ships were stronger at class level. However, there was only scattered support
for the hypothesis that the fit between the tests and what had been shJdied
would be positively associated with efficacy and negatively associated with
futility. At the school level, collegiality (amount of constructive talk about
testing) among teachers was related to students' perceptions of desirable testing
practices and students' feelings of efficacy and effort. School leadership was
needed to develop ..md foster such collegial interaction.
Some of the detailed findings antidpated those of Brookhart and DeVoge
(1999). In particular, Duckworth et al. (1986) found students' perceptions of com-
munication, feedback, and helpfulness of their teachers to be strongly related to
feelings of efficacy of study and effort to study. They also found that the students'
perceptions, in relation to the communication. feedback and helpfulness of their
teachers to be strongly related to their feelings of the efficacy versus futility of
study and of their own efforts to study. The authors suggested that the difference
found betv.-een results for specific events and the more gencralll,'actions was p0s-
sibly due to the infonnal culture of expectations, built up over the year by teach-
ers' remarks and ll,'adions that had operated independently of the specific
practices studied. This may be part of a 'halo' effeet from desirable class testing
practices. They therefore argued lhat increasing student perceptions of desirable
class testing practices may increase feelings of efficacy and levels of effort.
Students' understanding of the grades they were given by their teachers was
the subject of a study by Evans and Engelberg (1988). Data were collected by
questionnaire from students in grades 4 to 11 in the USA, about understanding
of grades, attitude to grades, and attribution. In terms of understanding of
grades the authors found, as hypothesized, that older students understood
simple grOldes more than younger ones, but even the older students did not
understand complex systems o( grades in which judgments about effort and
behaviour were combined with academic achievement. The experienn' of being
given a grade, or label, without knowing what it meant seemed likely to lead to
a feeling of he[plessness. In tenns of attihJdcs to grades, not surprisingly,
72
higher-achieving studtmts were more likely to regard grades as fair and to like
being graded more than lower-achieving students. Oearly, receiving low
grades was an unpleasant experience which gave repeated confirmation of per-
sonal value rather than help in making progress. It was found that younger stu-
dents pen:cived grades as fair more than older ones, but they also attached less
importance to them. Evans and Engelberg also looked at attribution and found
that lower achieving and younger students made more external attributions
than higher achieving and older students who used more ability attributions.
This suggested that low-achieving students attempted to protect their self-
esteem by attributing their relative failure to external factors.
In her study of self-regulated learning conducted in Canada, Perry (1998)
divided teachers of grade 2 and 3 students into two groups b ~ d on a survey
of their classroom activities in teaching writing. One group was of teachers
whose natural teaching style encouraged self-regulated learning. In these high
self-regulated classrooms teachers prOVided complex activities, they offered
students choices, enabling them to control the amount of challenge, to collabo-
rate with peers, and to evaluate their work. The other group was of teachers
who were more controlling, who offered few choices, and students' assessments
of their own work were limited to mechanical features (spt'lling, punctuation
and 50 on). lhese were described as 'low self-regulated classrooms'. Question-
naires were administered to students in these two groups of classes and a
sample of students in each group was obsc:<rved in five sessions of writing.
Although there were some limitations to this study, the findings were of
interest. There was a difference between the responses of children in high and
low self-regulated classrooms to being asked what they would want the
researcher to notice about their writing whilst looking through their work.
Although a large proportion of students in both contexts indicated that the
mechanical aspects of writing were a focus for them, many more students in
high selfregulated classrooms alluded to the meaningful aspects and intrinsic
value of their work. Students in the low self-regulatl.'d classrooms also were
more likely to respond 'I don't know' or suggest that they did not care. Simi-
larly, in interviews, the shldents observed in the high self..regulatL>d classrooms
indicated an approach to learning that renected intrinsic motivation. They
showed a task focus when choosing topics or collaborators for their writing and
focused on what they had learned about a topic and how their writing had
improved when they evaluated their writing products. In contrast, the students
in the low self-regulated classrooms were more focused on their teacher's eval-
ulltion." of their writing and how much tht.'}' got right on a particular aSSign-
ment. Both the high and low achievers in the low self-regulated classes were
concerned with getting 'a good mark'.
5tuclles of experimental m.nipul.tion of fft'dback .nd geNII
orl..,tatlon
A study by Butler (1988) of the effect of different forms of feedback, involving
fifth and sixth grade studenls in Israel, is well quoted for its rt.'sults relating to
73
changes in le\'els of achievement. However, the study also reported on the
interest shown in the tasks used in the study foUowing different forms of feed-
back. The students, first and sixth graders, were randomly allocated to groups
and were given both convergent and divergent tasks. After working on these
tasks they received feedback on their performance and answered an interest
questionnaire. Three feedback conditions were applied to different groups:
Comments only: feedback consisted of one sentence, which related specifi-
cally tb the performance of the individual student (task involving);
Grades only: these ",rere based on the scores after conversion to follow a
nomlal distribution with scores ranging from 40 to 99 (ego-involving);
Grades plus comments.
High achieving students expressed similar interest in all feedback conditions,
whilst low achieving students expressed most interest after comments only, The
combined interest of high achieving students receiving grades and grades plus
comments was higher than that of the lower achieving students in these condi-
tions. However, the interest of high and low achieving students in the com-
ments only grades did not differ Significantly. The author concluded that the
results indicated that the ego-involving feedback whether or not combined with
task-involving feedback induced ego-involVing orientation. that is, a motiva-
tion to achieve high scores rather than promoting interest in the task. On the
other hand, promoting task involvement by giving task related non-ego-involv-
ing feedback may promote the interest and performance of all students, with
particular value for the lower achieving students.
In the experimental study of goal orientation and self-assessment by Schunk
(1996) in the USA, fourth grade students were randomly assigned to one of
four eXPfrimental conditions: goals as learning with self-assessment; goals as
learning without self-assessment; goals as performance with and without self-
assessment. The students studied seven packages of material, covering six
major types of skill in dealing wilh fractions and a revision package, for 45
minutes a day over seven days. The difference between the goal instructions
lay in a small change in wording in presenting each package. Self-assessment
was undertaken by the relevant groups at the end of each session. Measures of
goal orientation, self-efficacy, and skills in the tasks (addition of fradions) were
administered as pre- and post-tests. The result of this study was that the effect
of goal orientation on achievement was only apparent when self-assessment
was absent. Self evaluation appeared to swamp any effect of goal-orientation.
Therefore, in a second study all students engaged in self-assessment but only
at the end of the programme rather than in ('ach session, to equalize and
reduce its effect. With self-assessment held constant, the results showed
Significant ('ffeels of goal orientation for self-efficacy and for skill in th('
addition of fractions. The scort'S of the group working towards learning-goals
were significantly higher than those of the performance-goals group on both
measures.
Of relevance here lire several studies, not included in the systematic review,
74
I
I
reported by Dweck (1999). When Elliott and Dweck (1988) introduced some
tasks 10 different groups of fifth grade studenls in the USA, they did this in a
way whereby some regarded the goal as performanre and others as learning.
The two groups performed equally well when they experienced SlI0ce5S, but
there was some difference in the groups' ft'Sponse 10 difficult problems. Many
of those given goals as performance began to show paltems of behaviour
reflecting helplessness and t ~ problem-solving stratt'gies deteriorated, whilst
most of those who saw goals as learning remained engaged and continuccl 10
use effective strategies.
Dweck and Leggett (1988) found a relationship between students' theories
about their general ability (intelligence) and goal orientation. This was one of a
series of investigatior\5 into the effects of believing, on t ~ one hand, that intel-
ligence is innalt' and fixed, and on the olher, that intelligence can be improved
by effort. The view of intelligence by some eighth grade students was identified
by asking for their agreement or disagreement with statements such as 'you can
learn new things but you can't really change your basic intelligence' (Dweck,
1999: 21). The students were then offerccl iI seriL'S of tasks, some of which were
describl.>d in terms of 'goals as performance' and some in terms of 'goals as
learning'. They found a significant relationship between bt!licfs about their
ability and the studenls' choice of task, with those holding a fixed view of their
ability choosing a performance goal task.
These findings suggest that students who are encouraged to set' learning as
their goal feel more capable, apply effort, and raise their performance. This is
less likely to happen where students are ori('nted to performance which other
research shows inevitably follows in the context of high stakes summati\"e
assessment. For instance, Pollard et al. (2000) found thai after the introduction
of national tesls, teachers increasingly focused on performance oulcomes rather
than the learning process, Schunk's (1996) findings, however, suggest that
student self-assessment has a more important role in learning than goal orien-
tation,. but when it is combined with goals as learning it leads to impro\'ed per-
formance and self-efficacy.
Using assessment to promote motivation for 'earning
In the foregoing sections we have discussed various forms and components of
motivation ilnd considered some evidence of how it is affected by assessment.
As a start in bringing theSl' together, it is useful to restate the reasons for bt'ing
concerned with motivation for learning. In plain terms, these are because we
want, and indt't'd society needs, students who:
Want 10 learn and value learning;
Know how to learn;
Feel capable of learning;
Understand what they have to learn and why;
Enjoy learning.
7S
How does assessment affect these outcomes? We will first bring together the
features of assessment practice that need to be avoided. Then we will look at the
more positive side of the relationship.
Impacts of assessment to be avoided
Assessment, particularly when high stakes art' attached to the results, creates a
strong reason for learning. But this reason is, for the vast majority of srudents, to
pass the test/examination at thl' necessary level to achieve the reward. Students
who are extrinsically motivated in this way see their gools as perfonnance rather
than as learning.. and the evidence shows that this is associated with seeking the
easiest route to the necessary perfonnance. Students with such goal orientation
use passive rather than active learning strategies and avoid challenges; their
learning is described as 'shallow' rather than 'deep' (Ames and Archer, 1988;
Benmansour, \999; Crooks, 1988; Harlen and James, 1997). Students are encour-
aged, sometimes unWittingly, by their teachers in this approach to their work. The
way in which teachers introduce tasks to students can orientate stud<!llts to goals
as performance rather than goals as learning (Brookhart and [)('Voge, 2000;
Schunk, 1996). Repeated tests, in which are they encouraged to perform well to
get high scores, teaches students that performance is what mailers. This perme-
ales throughout classroom transactions, affecting students' approach to thl'ir
work (Pollard et aI., 2000; Reay and Wiliam, 1999).
Pollard et al. (2(0')) suggest that making teachers accountable for test scores
but not for effective teaching. encourages the administration of practice tests.
Many tl!achers also go further and actively coach students in passing tests
rather than spending time in helping them to understand what is being tested
(Gordon and Reese, 1997; Leonard and Davey, 2001). Thus the scope and depth
of leaming are seriously undennint'd. As discussed in Chapter 8, this may also
affed the validity of the tests if coaching in test-taking enables students to
perfonn well even when they do not have the required knowledge, skills and
undl'rstanding.
E\ocn when not directly teaching to the tests, teachers change their approach.
Johnston and McClune (2000) reported that teachers adjusted their teaching
style in ways they perceived as nl'Cl'SSilry because of the tests. They spent the
most timl' in direct instruction and less in prOViding opportunities for students
to leam through enquiry and problem solving. This impairs leaming.. and the
feeling of being capable of learning.. for those students who prefer 10 do this in
a more active way.
The research confirms that feedback to students has a key role in detennin-
ing their feeling of being capabll' of learning.. of tackling their classroom activi-
ties and assessmentlasks successfully. Feedback can come from several sources:
from the reactions of the teachers to their work, from others, including their
pt"'E'rs, and from their own previous performance on similar tasks. In relation to
teachers' feedback, there is strong evidence that, in an atmosphere dominated
by high stakes tests, teachers' feedback is largely judgemental and rarely fonn-
ative (Pollard et aI., 20(0). Butler's (1988) experimental study of different kinds
76
of feedback indicated that such feedback encourages interest in performance
rather than in learning and is detrimental to interest in the work,. and achieve-
ment, of lower achieving students.
The feedback that students obtain from their own previous performance in
similar work is a significant element in their feeling of being able to learn in a
particular situation (Brookhart and DeVoge, 1999). Consequently, if this is gen-
erally judgemental in nature it has a cumulative impact on their self-efficacy.
The opportunity for past experience to help further learning is lost.
Feedback from these different directions adds to the general impression that
students have of their teachers' helpfulness and interest in them as learners.
Indeed, Roderick and Engel (2001) reported on how a school providing a high
level of support was able to raise the effort and test performance of vel)' low
achieving and disaffl"cted students to a far greater degrl't' than a comparable
school providing low level support for similar students. High support meant
creating an environment of social and educational support, working hard to
increase students' sense of self-efficacy, focusing on learning related goals,
making goals explicit, using assessment 10 help students succeed and creating
cognitive maps which made progress evident. They also displayed a strong
sense of responsibility for their students. Low teacher support meant teachers
not seeing the brget grades as attainable, not translating the need to work
harder into meaningful activities, not displaying recognition of change and
motivation on the part of students, and not making personal connections with
students in relation to goals as learning. There are implications here and in
Duckworth et ai's (1986) study for school management. Pollard et aI. (2000) and
Hall and Harding (2002) also found that the assessment discourse and quality
of professional relationships teachers have with their colleagues outside the
classroom influence the quality of teaching and learning inside the classroom.
In summary, assessment can have a negative impact on student motivation
for learning by:
Creating an classroom culture which favours transmission teaching and
undervalues variety in ways of learning;
Focusing the content of teaching narrowly on what is tesled;
Orienling students 10 adopt goals as performance rather than goals as learn
ing;
Providing predominantly judgmental feedback in terms of scores or grades;
Favouring conditions in which summativl.' judgements permeate all teach
ers' aS5eSSment tramactions.
Assessment practices that preserve student motivation
Each item in the above list indicates consequences 10 be avoided and so sug
gests what nollo do. However, the research evidt.'T'lce also provides more posi.
tive implications for practice. One of the more difficult changes to make is to
convince teachers thai levels of achievemenl can be raised by means other than
by teaching to the tests. Certainly students will have to be prepared for the tests
77
I
they are r;quired to take, but this besllakes the fann of explaining the purpose
and nature of the lest and spending time, not on practising past lest items, but
on developing understanding and skills by using assessment to help learning.
The work of Black el al. (2003) in development of practical approaches 10 using
assessment for learning has added 10 the evidence of the positive effect of fonn-
ati\"e assessment on achievement (set' Chapter 1). Since the measures of change
in achievl!IIlent used in this work are the same statutory tesls as are used in all
schools, the results show that improvement can be brought about by attention
to learning without teaching 10 the test.
The particularly serious impact of summath'e assessment and tests on lower
achieving students results from their repeated experi>Tlce of failure in compar-
ison with more successful students. There are implications here for two kinds
of action that can minimize the negative impact for all students. The first is 10
ensure thai the demands of a test are consistent with the capability of the stu-
dents, that is, that students are not faced with tests that are beyond their reach
(Duckworth el al., 1986). The nolion of 'testing when ready' is relevant here. It
is practised in the Scottish national assessment programme, where students are
given a test at a certain level when the teachers are confident that based on their
professional judgement they will be able to succeed. Thus all students can expe-
rience success, which preserves their self-esteem and feeling of self-efficacy. The
result also helps students to recognize the progress they are making in their
learning.. noted as important in the research (Roderick and Engel, 2001; Duck-
worth et aI., 1986). The se<:ond action is for teachers actively to promote this
awareness of progress that each student is making and to discourOlge students
from comparing themselves with each other in terms of the levels or scores that
they have aUained.
The rt.'Search also underlines the value of involving students in self-assess-
ment (Schunk, 1996) and in decisions about tests (Leonard and Davey, 2001;
Perry, 1998). Both of these necessitate helping students to understOlnd the
reasons for the tests and the learning that will be assessed, thus helping to
promote goals as learning. These practices are more feOldily applied to those
tests that leachers control rather than to external tests. However, there is abun-
dant evidence that the majority by far of tests that students undergo are
imposed by teachers, either as part of regular checking or in practising for exter-
nal tests. Thus a key action that can be taken is to minimize the explicit prepa-
ration for external lests and use feedback from regular dasswork to focus
students on the skills and knowledge that will be tested.
If teachers are to take these actions, they need support at the school level in
the form of an ethos and policy that promotes the use of assessment to help
learning as well as serving summative purposes. There are implications for the
management of schools in establishing effective communication about assess-
m('llt and developing and maintaining collegialily through structures and
expectations that enable teachers to avoid the negative impact of assessment on
motivation for learning. These school procedures and policies have also to be
communicated to parents.
Finally, there are of course implications for local and national assessm<-nt
78
policies. The force driving teachers to spend 50 much time on direct preparation
for t('S1S derives from the high stakes attached to the results. The regular
national or state-wide tests for all students throughout primary and secondar)'
school have greater consequences for teachers and schools than for students.
But whether the stakes are high for the student (as when the rt.'Sults are used for
certification or selection) or for the teacher and school (as when aggregated
student tests or examination results are used as a measure of teacher or school
effectiveness), the consequence is that teaching and learning are focused on
what is tested with ali the consequences for motivation for learning that have
been discussed here.
The iron)' is that, as an outcome of the high stakes use, the tests do not
provide the valid information required for their purposes. In particular, tests
taken b)' all students can only cover a narrow sample (and the most reliably
marked sample) of student attainment; teaching how to pass tests means that
students may be able to pass evo:!n who:!n they do not have tho:! skills and under-
standing which the test is intended to measure (Gordon and Reese, 1997).
Further, the reliability of the lL'Sts as useful indic.ators of students' attainment is
undennined b)' the differential impdct of the testing procedures on a significant
proportion of students. Girls and lower achieving students are likely to ha\e
high levels of test anxiety that influence their measured perfomumce (Evans
and Engelberg, \988; Benmansour, \999; Reay and Wiliam, (999). Older lower
achieving students are likely to minimize effort and may even answer ran-
domly since they eXJX!ct to fail anyway (Paris et aI., 1991). Thus results may be
unreliable and may exaggerate the difference between the higher and lower
achieving students.
To avoid these pitfalls, the Assessment Rt>(ornl Group (ARC), as a result of
consultation with policy makers and practitioners on the implications of the
research, concluded that designers and users of assessment systems and tests
should:
Be marc actively aware of the limited validity of the information about pupil
attainment that is being obtained from current high stakes testing pro-
grammes;
Reduce the stakes of such summativc assessments by using. at national and
local levels, the performance indicators derived from them morc S{'lectivl'ly
and more sensitivdy. They should take due account of the potential for those
indicators to impact negatively on learning. on teaching and on the curriru-
lum;
Be more aware of the true costs of national systems of testing. in terms of
teaching hme, practice tests and marking. This in t\lm should lead policy
makers to come to reasoned conclusions about the bt.'nt'fits and costs of each
element in those systems;
Consider that for tracking standards of attainment at national level it is
worth b..'Sting a sample of pupils rather than a full age cohort. This IVould
reduce both the nt>gative impacts of high stakes tests on pupil motivation
and the costs incurred;
79
Use test development expertise to create fonns of tests and assessments that
will make it possible to assess all valued outcomes of education. including
for example creativity and problem solving;
Develop a broader range uf indicators to evaluate the perfonnance of
schools. Indicators that are derived from summative assessments should
therefore be seen as only one element in a more broadly-based judgment.
This would diminish the likely impact of public judgments of school per-
formance on those pupils whose motivation is most 'at risk' (ARG, 2002b:
11-12).
This chapter has discussed evidence that the way in which assessment is used
both inside the classroom by teachers. and outside by others, has a profound
impact on students' motivation for It'aming. It is evident that motivation has a
kt'y role in the kind of learning in which students engage; a central concern of
this book.
It is natural for students and teachers to aim for high performance, but when
this is measured by external tests and when the results arc accompanied by
penalties for low performance, the aim becomes to perfonn well in the tests and
this is often not the same as to learn welL Moreover, when there are high stakes
attached to the test results the tests are inevitably designed to have high relia-
bility and focus on what can be tested in this way. Although the reliability of
these tests may not be as high as assumro (see Chapter 7), the atll.'mpt to aspire
to 'ob;cetivity' is generally to the detriment of the validity of the IlOSt. The
inevitable consequence, as the research shows, is to narrow the learning expc-
rienCt.'S of the students. However, the impact of high slakes testing may wen
have longer-tenn conse<Juences than the narrowness of corriculum experience.
Further learning and continued learning throughoullife depend on huw people
view themselves as learners, whether they fcc-I lhey can achieve success
through effort, whether they gain satisfaction from learning.: all aspects of moti-
valion for learning.
The impaclthat assessment can have on students can be eilher positive, as
discussed in Chapter I, or negative as set oul in this chapter. What happens
dcpendJ on how the teacher mediates the impact of assessment on studt'nls.
Chapter 3 showed that teachers' views of learning affect their pedagogy. When
teachers see this role as helping students 10 pass tests, by whatever means, their
teaching methods and the experiences of the students are distorted. The align-
ment of assessment, corriculum and pedagogy is most easily upset by chango'S
in assessment and this has 10 be taken into account in designing assessment
policy.
80
Chapter 5
Developing a Theory of Formative Assessment
A model for classroom transactions
Whilst previous chapters have described the development of (ormati,c assess--
ment practices, and have explored \'arious specific aspects of these and their
operation, the aim in this chapter is both more holistic and more ambitious. We
will attempt to set oul a theory of formative assessment. Such 11 theory should
help intem"late the discussion so far within a single comprehensive framework
and thereby provide a basis for further exploration. It would be extravagant to
claim lha! it achieves this purpose, not least because its limited basis is our find-
ings from the King's-Medway-oxfordshire Formative Assessment Project, the
KMOFAP example as described in Chapter 1.
Thai project was designed to enhance learning through thll development of
formative assessment. The basic assumptions that informed the design of the
work wen' in part pragmatic, arising from the evidence that fonnative assess-
ment work did enhance students' perfonnance, and in part theoreticaL One the-
oretical basis was to bring together evidence about classroom questioning
practices (for example research on optimal 'wait lime') with the general princi-
ple that learning work must start from the leamer's existing ideas. The other
was provided by arguments from Sadler that self-assessment and assess-
ment were essential to the effective o;x'ration of fonnalive assessment, a view
that was supported in some of the research evidence. nombly the work of White
and Frederiksen (1998),
However, these are too narrow a basis for making sense of our project's out-
comes. The need to expand the theoretical base was signalled in the response
made by Perrenoud to our review:
This lfeedbackl 110 longer SNms to me, to the central issuc. 11 W(luld
seem more impertl/nl to amcntrate on /hroretkl/l models of lel/ming I/nd its
reguilltiml and implemmtatimr. constitute thl' real SystflflS of thoughl
and ac/iO/I, in which feedback is only one element. 0998: 86)
By 'regulation', he meant the whole process of planning, classroom
implementation, and adaptation, by which teachers achieve their l<laming
intentions for their students. In what follows, we will try to link the ideas
81
,
expressed in this statement with an expanded theoretical perspective. The
principal aim is 10 provide a framework within which we can make sense of
whal it was that changed in those classrooms where teachers were dC\'cloping
their use of formative assessment
It is obvious that a diverse collection of issues is relevant 10 the understand-
ing of classroom assessment and so il follows thai, if there is 10 be a unifying
framework,. it will have to be edcctic yet selective in eliciting mutually consis-
tent messages from different perspectives. As one study expresses it:
.. , an attempt to ulldersland IIssessmenl tIIusf involve Q critical comb!-
lIalicm alrd ro-ordinatiQI1 of insights d<Tived from a number of I'sychologicallmd
sociological sllllldpoints, none of which by themselves pnJt!ide asUjjidtrll basis for
analysis. (Torrance and Pryor, 1998: 105)
However, if such a framework is to be more than a mere collection, it will have
to serve to interrelate the collection in a way that illuminates and enriches its
components. It should also suggest new interpretations of evidence from class-
rooms, and new ideas for further research and development work.
In what follows, we will develop our theory on the basis of Ihe work
described in Chapter 1. However, other approaches are mentioned throughout,
and near the end we shall use the framework to make comparisons between this
and other projects which were also designed to study or change teaching and
learning in classrooms.
Starting points
We will begin by considering the classroom as a 'community of practice' (Lave
and Wenger, 1991; Wenger, 1998) or as a 'figured world' (Holland et aI., 1998).
In both these perspectives, the focus is not so much on 'what is' but rather on
what the various actors involved take things to be:
By figLed world', then, we nle,m II socially and ellltllrally colls/rlleted reilim of
inlerprrta/ion in which partiell/llr chllraclf'TS lind lIc10rs aft' recosniud,
Clmet is altaeMd to artllin aels, and parlielllllr olllcomes art 1I11/14ed aver olhers.
wch is a simplified world popllia/ed by a sel of agenls .,. who engage ill a limited
range of meaningfll/acls or changes ofslllle ... liS mOVl"d by II specific;;el afforces.
(We/Iger, 1998: 52)
The focus of the approach is a careful delineation of the constraints and aHor-
dances (Gibson, 1979) provided by the 'community of practice' or 'figured
world' cOmbined with a consideration of how the actors or agents, in this case
the teacher and the students, exercise agency within these constraints and aHor-
dances. Their actions are to be interpreted in tenns of their perceptions of the
structure in which they have to operate, in particular the significance they
attach to beliefs or actions through which they engage, that is, the ways in
82
I
which they as agents interact with the other agents and forces. These ways serve
to define the roles that they adopl Many of the changes arising in our project
can be inlerpmed as changes in the roles adopted, both by teachers and stu-
denls. However, these perspectives proved inadequate as explanatory or illu-
minative mechanisms.
This was because although the notions of communities of practice and
figured worlds accounted well for the ways in which the actions of agents are
structured (and that of the figured world in particular accounts for the
differing degrees of agency exhibited), neither conct'ptual framework provides
for the activities of agents to change the strocture. In Wenger's example people
learn 10 bel::ome claims processors, and are changed in the process, but the
world of claims processing is hardly changed at all by the enculturation of a
new individual. Similarly, in the examples used by Holland et aI., agents
develop their identities by exercising agency within the figured worlds of, for
example, college sororities, or of Alcoholics Anonymous, but the figured
worlds remain substantially unaltered. In contrast, the agency of teachers and
students, both as individuals and as groups within the classroom can have 11
substantial impact on what the 'world of that classroom' looks like.
Furthermore, our particular interest here is more in the changes that occurred
in teachers' practices, and in their classrooms, than in the continuities and
stabilities.
For this reason,. we have found it more productive to think of the subject
classroom as an 'activity system' (Engestrom, 1987). Unlike communities of
practice and figured worlds, which emphasize continuity and stability. , ...
activity systems are best viewed as complex formations in which equilibrium is
an exception and tensions, disturbances and local innovations are the rule and
the engine of change' (Salomon, 1993: 8-9).
For EngesWm the key elements of an activity system are defined as follows:
77u! sul7jtXt rtftrS 10 tlu' individual or subgroup whOS(' agnu,!! is chasm as the
point of view in Ihe imalysis. The objecl refers 10 Ihe 'raw matmal' or 'problem
spa' at which lire activily is dirfed and which is moulded or transformed illto
outcomes Wit/I the help ofphysicul imd symbolic. external and inlemal tools (medi-
ating instrummts and signs). The community comprisn mulliple individuals
andlor subgroups who slum tire snme object. The division of labour refers to bolh
the horiumtal division of tasb e t ~ n tire membos of the rommullity and to tilt
verticill division of f"OlWr and status. Finally the ruin r(fer to the (;>;pUcit and
implicit rr'gulations, norms and conventiolls tluft constrain actiolls and interac-
tions within th( activity system. (fngestrom, 1993: 67)
These elements form two interconnected groups. The first group constitutes the
sphere of production - the visible actions undertaken within the system directed
towards achieving the desired goals - but these are merely the 'tip of the
iceberg'. Underlying these elements are the social, culturlll and historic conditions
within which the goals are sought, and these two groups of elements and the
dialectic between them together constitute an activity system.
83
I
As noted above. we believe that the most useful st.arting point for analysis is to
analyse the classroom as an activity system. It would, of course, be possible to
consider the whole so::hool or even the wider community as an activity system,
but such an analysis would necessarily ignore the particularities of the features of
individual classrooms that would in our view paint too simplistic a picture. At
the olher extreme, we could view small groups of students in classrooms as an
activity system, with the classroom as the wider context in which they act, bul
such groups are nol well defined in most of the classrooms we observed and thus
would be rather artificial. Adopting the classroom as the activity system allows
other sources of influence to be taken inlo aCa)unl. lhe students' motivations and
beliefs are strongly shaped by their lives outside the school, whilst the classroom
is itself embedded in the rontext of a particular school
How teachers act, and how their students participate, in classrooms study<
ing particular subjects will be influenced by their experiences in oth"'r subject
classrooms, by the cthos of the school and by the wider rommuruty. Thcn.'fon.',
we believe that it is important that the activity system is the 5ubjrct classroom.
Thl'Te are important diffen>nces between a group of students and a teacher
gathering in a particular place for the ll'aming of mathematics and those
meeting to learn sden((' or English. Whilst this view derives in part from the
initial emphasis of our work on classrooms in secondarylhigh schools, our more
recent experiences with primary schools also suggest that, in primary class-
rooms also, the subject being taught at th(> time l'xl'rls a strong on the
way that fonnative practices are impl(>m(>nted.
Before ronsidering the implications of treating the subj<:t classroom as an
activity syst('m, we nCt'd to discuss in more detail the changes in the practice of
the KMOFAP teilchers. We shall do this in tenns of four key aspects, which w('
will suggest prOVide the minimal elements of a theory of formative assessment.
First, wt' discuss changes in the relationship between the teacher's role and the
nature of the subjl"C! discipline. Second, we discuss changes in the teachers'
beliefs about their role in the regulation of the learning process (derived from
their implicit theories of leaming). Third, we discuss the student-teacher inter-
action focusing specifically on thl' roll' of feedback in this process, which
involves discussion of the levels of feedback, the 'fine-grain of fl-edback', ilnd a
brief discussion of the relev'lIlce of Vygotsky's notion of the 'zone of proximal
develop,menl' (ZPD) to the regulation of learning. The fourth e!l'ment of the
model is the role of thl' student.
While a theory that focuses on thl'S(' four components and the way that they
play out in the classroom may not have sufficient explanatory powl'r to be
useful, we do not believe that any attempt to underst.md the ph(>nomena that
we are without taking thl'S(' factors into is likely to be suc-
cessful. Wl' have fonnulated tht-'S(' components because we believe. on th", basis
of the data available to us, that they fonn key inputs for the fonnulation of any
theQry. Our intl'nlion is also to show that Ihl'Se four components fonn a frame-
work which can be incorporated in, and illuminated by. a treatment of thl'
subject classroom as an activity systl'm.
84
I
a Theory of Assessment
First component: tnehers. lumers and the subject discipline
As the project teachers became more thoughtful about the quality, both of the
questions they asked and of their responses to students' answers, it became
evident that the achievement of this quality depended both on the relevance of
questions and responses in relation to the conceptual structure of the subject
matler, and on their efficacy in relation 10 the learning capacities of the recipients.
Thus there was a need to analyse the interplay betwa'fl teachers' views of the
nature of the subject matler particularly of appropriate epislemology and onlol-
ogy, and the selection and articulation of goals and subject mailer that followed
on the one hand, and their models of cognition and of learning (new theories of
cognition amid well be cenlral here - see Pellegrino et al., 1999) on the other. The
types of classroom interaction entailed in the learning contexts of different subject
matlers will not necessarily have a great deal in common with one another.
Comparisons between our experiences of work with teachers of English,
science and mathematics respectively have strengthened our view that the
subjl..'Ct disciplines create strong differences between both the identities of
teachers and the conduct of learning work in their classes (Crossman and
Stodolsky, 1994; Hodgen and Marsh,JlI, 2005). One clear difference between the
teaching of English and the teaching of mathematics and science is that in the
laller there is a body of subject mailer that teachers tend to regard as giving the
subject unique and Objectively defmed aims. It is possible to 'deliver' the subject
malter rather than to help studffits to learn it with understanding, and even
where help with understanding is given priority, this is often simply designed
to ensure that every student achieves the 'correct' conceptual goal.
In the teaching of writing, there is little by way of explicit subject malter to
'deliver', except in the case of those teachers who focus only on the mechanics
of grammar, spelling and punctuation. 50 there is no single goal appropriate for
all. Thus most teachers of this subject are naturally more accustomed to giving
individual fC'l'dback to help all students to improve the quality of their individ-
ual efforts at wrillen conununication. There is a vast range of types of quality
writing the goal can be any point across an entire horizon rather than one par-
ticular point. These inter-subject differences might be less defined if the aims of
the teaching were 10 be changed. For example, open-ended investigations in
mathematics or science, or critical study of the social and ethical consequences
of scil."ntific discoveries, are activities that have more in common with the pro-
duction of personal writing or critical appreciation in English.
It is also relevant that many teachers of English, at least at high-school 1.,,,,,1,
are themselves writers, and students have more direct interaction with the
'subject' through their own reading and writing than they do with (say) science.
Nevertheless, whilst teachers of English might naturally engage more with use
of feedback than many of their science colleagues, the quality of the ft"t'dback
that they provide and the overall strategies in relation to the metacognitive
quality of that feedback still need careful, often radical, development.
While much research into teacher education and teacher development has
focused on the importance of teachers' subject knowledge, such rt"Search has
85
rarely distinguished between abstract content knowledge and pedagogical
content knowledge (Shulman,. 1986). Astudy of elementary school teachers con-
ducted for the UK's Teacher Training Agency in 1995-1996 (Askew et al., 1997)
found no relationship between learners' progress in mathematics and their
teachers' level of qualification in mathematics, bUI a strong positive correlation
existed rl.'garding their pedagogical conlent knowledge. This would suggest
that it is important to conceptualize the relationship between teacher and
subject matter as a two-way relationship, in that the teacher's capacity 10
explore and reinterpret the subject matter is important for effective pedagogy.
What is less dear is the importance of change in lhe interaction between stu-
dents and the subjects they are studying. In the main, most middle and high
school students seem to identify a school subject with the subject leacher: this
teacher generally mediates the student's relationship with the subject, and there
cannot be said to be any direct subject-student interaction. However, one aim of
the teacher could well be to enhance the leamer's capacity to interact directly
with the subject's productions. which would involve a gradual withdrawing
from the role of mediator. The meaning 10 be attached to such a change, let
alone the timing and tactics to achieve this end, will dearly be different between
different subjects. In sub;ecls that are even more clearly performance subjects,
notllbly physiCilI eduCiltion and musical performanct', feedback is even less
problematic in that its purpose can be evident to both teacher and student. and
it is clear that the leaming is entirely dependent on it. The students-as-groups
aspect ffilly also emerge more clearly insofar as students work together to repro-
duce, or at least to simulate, the community practices of the sub;ect areas, for
example as actors in a slagI' drama, or as a team in a science investigation.
Second component: the teacher's role and the regulation of
learning
The assessment initiatives of our project led many teachers 10 think aboul their
tl.'aching in new ways. Two of them described the changes as follows:
I /lOW think mQno Ilbout t/le contomt of the lesson. The influence hilS shifted from
'what am I going to tellch and whafart the pupils going to do?' tOUNlrds 'how 11m
/ going to tellch this lind whllt Ilrt the pupils going to learn?'
~
Thtrt WllS adefinite transition lit some 1/Oint, fromjoel/sing 011 whllt 1was putting
into tht proass, 10 wlult tire pl/pi/s wert colltribl/ting. It became obvious tlult one
Wlly to mllke a sig/lijialllt sustllinablt changt !VIIS to gtt the pupils doillg mort of
tilt thinking. I then began to search for Wllys to /rlakt the ltllrning pl'CJaS5 IIwrt
trll/lspllrrot to the pupils. Indud 1 now spend my time looking for ways to gd
pupils fa tllkt respcJltsibilily for tlreir lellming lit the SIllllt time making tht l l l r n ~
ing mort rollllborativt. This inevitllbly lrads to mort interactive Imming activities
ill tire classroom.
86
These teachers' comments suggested a shift from the regulation of activity
('what art' the students going to do?') to the rt'gulation of learning ('what are
the students going to learn?'). In considering such regulation,. Perrenoud (1998)
distinguishes two aspects of teacher action. The first involves the way a teacher
plans and sets up any lesson. For this aspect, we found that a teacher's aim of
improving formative assessment led them to change the ways in which they
planned lessons with a shift towards creating 'didactic situations' - in other
words, they specifically designed these questions and tasks so that they gener-
ated 'teachable moments' - occasions when a teacher could usefully intervene
to further learning. 1be second involves teacher action during the implementa-
tion of such plans, determined by the fine detail of the way they interact with
students. Here again teachers changed, using enhanced wait time and altering
their roles from simply presentation 10 encouraging dialogue.
OveralL it is also clear from these two quotations that the teachers were
engaged in 'interactive regulation' by their emphasis on the transfer to the stu-
dents of responsibility for their learning. This transfer led teachers to give
enhanced priority to the need 10 equip students with the cognitive strategies
required to achieve transition to the new understandings and skills potentially
accessible through the subject matter. This implied giving more emphasis to cog-
nitive and meta-eognitive skilJs and strategies than is usually given in schools.
Such changes Wt're evident in the shifts in questioning.. in the skilful use of com-
ments on homework,. and particularly in the new approadl to the use of tests as
pari of the learning process. It is significant that- a few months into the pro;eo,
the teachers asked the research team to give them a talk on theories of learning, a
topic that v..e would have judged too theoretical at the start of the projl'ct.
Some teachers have seemed quite comfortable with this transfer of responsi-
bility to the student, and the implications for change in the studL'fll's role and in
the character of the teacher-student relationship are clear. However, some other
teachers found such changes threatening rather than exciting. Detailed explo-
ration of the trajectories of development for difftorent teachers (see for example,
Lee, 2IXXJ, and Black et aI., 2003) showed that the changes have been seen as a loss
of control of the learning. by some who were trying seriously to implement them.
Although one can argue thaI, objectively, teacher control was going to be just as
strong and just as essential, subjectively it did not feel like that to these particular
teachers, in part because it implied a change in their conception of how learning
is mediated by a teacher. Such a shift alters the whole basis of 'interactive regula-
tion' which is discussed in more detail in the following sectiOfl.
Third component: feedtNck _nd the student-teKher interKtlon
The complex detail of feedback
It em('rges from the above discussion that in the four-<omponent model that we
would propose, the crucial interaction is that between t('acher and student, and
this is clearly a central feature in any study of formative assessment. As already
87
pointed out, our starting position was based in part on the seminaJ paper by
Sadler (1989) on formative assessment. One main feature of his model was an
argument that the leamer's lask is 10 close the gap between the present state of
understanding and the leaming goal, that self-assessment is essential if the
leamer is to be able to do this. TIle teacher's role, then, is to communicate
appropriate goals and to promote self-assessment as students work towards
them. In this process, feedback in the classroom should operate both from
teacher to students and from students to the teacher.
Perrenoud (1998) criticized the treatment of feedback in our 1998 review.
Whilst we do not accept some of his interpretations of that paper, his plea that
the concept of feedback be treated more broadly, as noted earlier, is a valuable
comment. The features to which he drew attention were:
The relationship of feedback to concepts of teaching and leaming;
The degree of individuaJization (or personalization of the feedback);
The way the nature of the feedback affects the cognith-e and the socio/affec-
tive perspectives of the pupils;
The efficacy of the feedback in supporting the teachers' intentions for the
pupils' learning;
The synergies between feedback and the broader context of the culture of
classroom and school, and the expectations of the pupils.
Some aspects of these points have already been alluded to above. However, a
more detailed discussion is called for which will be set out here under three
headings: the differenllevels of feedback; the fine-grained features of feedback;
the relevance of Vygotsky's notion of the zone of proximal development (and in
particular the importance of differentiation).
Levels of feedback
The enactment of a piece of tl.'aching goes through a sequence of stages as
follows:
a) A design with formative/feedback opportunities built in:
b) Implementation in which students' responses are evoked;
c) Reception and interpretation of these responses by a teacher (or by peers);
d) Further teaching action based on the interpretation of the responses:
e) Reception and interpretation of these responses by thl.' student:
f) Moving on to the next part of the design.
This is sel out to make clear that the students in (b) and (e) and the teachers in
(c) and (d) are invol\'ed in feedback activities. Feedback can involve different
lengths of loop, from the short-term loops (c) to (d) to (e) and back to (c), to
longer-term loops around the whole sequllnce, that is, from (a) to (e) and then
back again when the wholt" sequence may be redesigned. The concept of regu-
lation involves all of these.
Two points made by Perrenoud are relevant here. One is to t"mphasize that
the mere presence of feedback is insufficient in judging the guidance of learn-
88
ing (set' Oed and Ryan, 1994). The other is that learning is guided by more than
the practice of feedback. In particular, nol all regulation of learning processes
uses formative assessmenl.lf, for example, the leaching develops metacognilive
skills in the students, they can then regulale !heir own learning 10 a grealer
extent and thus become less dependent on feedback from others. More gener-
ally, it is important to look broadly al Ihe 'regulation pohmtial' of any
leaming acli\'ity, noting however that this depends on the conlexl, on what stu-
dents bring.. on the classroom culture that has been forged 'upstream' (that is,
the procedures whereby a student comes to be placed in a context, a group, a
situation), and on ways in which students im'est themselves in the work.
Several of the project Il!achers ha\'e commented Ihat when they now take a class
in substitution for an absent teacher, the interactive approaches that they have
developed with their own classes cannot be made to work.
The fine-grain of feedback
Whilst the inclusion in our framework of models of learning, of teachers' per-
ceptions of the subject matter and of their pedagogical content knowledge deals
in principle with the nec:essary conditions for effective feedback. these are but
bare bones and in particular may mislead in paying too little attention 10 the
complexity of what is involved, 11te compleXities are discussed in some detail
by Perrenoud, and some of his main points are briefly summarized here.
The messages gi\'l'n in fl't'dback are U!lCless unless studenls are able to do
something with them. So the tearner nt-'eds to understand the way studt.'nls think
and the way in which they take in new messages both at general (sub;ect disci-
pline) and specific (individual) levels, The problem is that this calls for a theory
relating to the mental processes of students which does nol yel exist (allhough
SQme foundations have been laid: see Pellegrino, et aI., 20(1). Teachers usc intu-
itive rudimentary theories, but even if good theory were to be available, applying
it in any specific context would be a far from straightforward undertaking.
For both the teacher, and any obscn'er or researcher, it follows that they can
only draw conclusions from situations observed in the light of theorl'tical
models. As Perrenoud argues:
Witholl' a 'heure/ira/ made! of Ihe media/ions Ihrough U'llich all situll-
lion injlul'nces roglli,ion, alld ill parlicular the lellming prouss, l!>e clln obSt'rvt
tlwllSl/llds ofsilualiOIlS withoul beillg able to draw auy conclusions. (1998: 95)
In framing and guiding classroom dialogue, judgmenls have to be grounded in
activity but must achieve detachment from it (that is, to transcend il) in order to
focus on Ihe knowledge and the learning process. A teacher's intervention to
regulale the learning activit)' has to invol\'e:
, ., all illCUrsioll inlo lire rqlTl'St'lllaliOlI and thought of pupil to
/'Tille II breaklhroJlgh in understanding, 11 /1/.'11' point of l,iew or thl' shapiug of Il
1I0'iou which am immedillle!y bf-wml' opallliv/'. (1998: 97)
89
1
Torrance and Pryor (1998) studied the fine grain of feedback through video
re<:ordings of episodes in primary school classrooms. Many of their findings eo:ho
those of our study, albeit as an analysis of the variations in practice between
teachers rather than as p<lrt of an intervention. What they are keen to emphasize
is the complexity of the sodal interaction in a classroom. which leads them to look
closely 011 issues of power mainly as exercised by teachers al different levels, for
example f)lcrting power OVt't students with dosed questioning, or sharing power
with students (Kreisberg.. 1992) using more open questioning. Torrance and Pryor
also give an example of how fet.>dback,. which does no more than guide the group
discussion lhal a teacher is mainly trying 10 obseJ"Ve, transfers power. However,
this is then unevenly distributed amongst the students.
The zone of proximal development and differentiation
Sadler's emphasis on a teacher's task in defining t ~ gap bt'hveen what the
learner can achieve without help and what may be achieved with suitable ~ l p
and the fact that this lays emphasis on the social and language aspects of leam-
ing, might seem to connect directly with a common interpretation of VygOlsky's
concept of a ZOne of Proximal Development (Vygotsky, 1986). Also relevant are
the conrepts of S{;llffoJding as developed by Wood et al. (1976), and Rogoff's
(t990) broader notion of guidf'd participation, which serve to emphaSize and
clarify the role of a teacher.
However, discussions of the ZI'D are difficult to interpret without knowing
precisely how the authors interpret the concept. Here we draw on the analysis
of ChaikJin (2005), who points out that for Vygotsky the zone has to be defmed
in termsjof a model of development. These different 'ages' of development are
defined as a sequence of coherent structures for interacting intellectual func-
tions. A learner will have achieved a particular 'age' of de\'elopment, and
possess immature but maturing functions which will lead to the next 'age'. In
an interactive situation, one which may be aimed at diagnosis rather than for
specific teaching purposes, the learner may be able to share, in collaboration..
only the mature functions: 'the area of immature, but maturing.. processes
makes up the child's zone of proximal development' (Vygotsky, 1998: 202).
Teaching should then focus on those maturing functions which arc needed to
complete the transition to the next age period. Whilst the age periods are objec-
tively defined, the ZPD of each learner will be subjecti\'ely defined. Interven-
tions such as those by the thinking skills programmes (Shayer and Adey, 1993)
may s u ~ because they focus on maturing processes of general import.mce.
It fol1ows that what is needed is those learning tasks in which a learner is
involved in interaction with others, and these will serve to identify the particu
lar areas of intellectual function which" in relation to achieving the next 'age' of
development for that leamer, are still immature. TIlis has to be done in the light
of a comprehensi\'e model of 'ages' of intellectual development.
TIlis is dearly a task of immense difficulty, one that is far more complex than
that implied by the notion of a 'gap', which many see as implied by Sadler's
analysis. It is probably true that less sophisticated notions of a 'gap', and of scaf-
90
folding interventions to close such, are of practical value. However, they cannot
be identified with Vygotsky's concept of a ZPD, and they will not attend to the
real complexity of the obstacles that learners encounter in advancing the matu-
rity of their learning.
This argument serves to bring out the point that success in fostering and
making use of enhanced teacher-student interactions must depend on the
capacity to adapt to the different ZPDs in a class, that is, on the capacity of a
teacher to handle differentiation at a rather subtle level of understanding of
each learner. However, it does not follow that the problem reduces to a one-Qn-
one versus whole class dichotomy, for social leaming is a strong component of
intellectual de\'elopment and capacity to learn in interaction is an essential
diagnostic tool. peer assessment, peer teaching.. and group
learning in general have all been enhanced in our project's work. and the way
that the need for differentiation is affected by these practices remains to be
studied. The fact that in some research studies enhanced formative assessment
has produced the greatest gains for those classified initially as 'low-achievers'
may be relevant here.
The overall message seems to be that in order to understand the determi-
nants of effectiw ft-'edback.. or broaden the perspt.'C!ive whilst detecting and
interpreting indicators of effective regulation, we will need theoretical models
that acknowledge the situated nature of learning (Greeno et OIL, 1998) and the
operation of teaching situations. We have to understand the context of schemes
of work by teachers and w(' have to study how they might plan for and interact
on the spot to explore and meet the needs of different students. This sets a for-
midable task for any research study of formative work in the classroom.
Fourth component: the student's role in learning
The perceptions of our teachers, as reported above, are that their students have
changed role from being passive redpients to being active learners who can
take responsibility for and manage their own learning. Another teacher
reported this as follows:
Tiley/tel thai th.' pressure to succeed in tesls is /JeiIJE rry/laced by lhe ueed 10 under-
stand the work thai hils bel'l nnPfTed aud the trst is jusl lin lIssessl7lentll/ollg lhe
way of wllllineeds morr !l.Vlrk alld whal seems to be fine ... They havt rommenlt:d
ml Ihl' fact IhIlt they think I am interesled ill Ihe !(enrral way 10 Rei 10 an
allswer Ilia/l aSPf!ciftc selulioll am! when Clare [a researcllrr/ interviewed thelll Ihey
decided this was sc Ihal they could apply IIleir ulldersllllldillg ill a wider sense.
Other, albeit very limited, int... rviews with students have also produced evi-
dence that students SilW a change in that their teacher seemed really interested
in what they thought and not merely on whether they could produce the right
answer. Indeed, on... aspo..'Cl of the project has been that students responded very
positively to the opportunities and the stimulus to take more responsibility for
their own learning.
91
These changes can be interpreted in terms of two aspects. One already men-
tioned in an earlier section is the de\'elopment of meta-rognition, involving as
it must some degree of reflection by the student about his or her own I..aming
(Hacker et al., 1998). Of significance here also is Ihe concept of self-regulat..d
learning as developed by Schunk. (1996) and Zimmerman and Schunk (1989),
and the findings of the Melbourne Project for Enhanced Effective Learning
(PEEL) summarized in Baird and Northfield (1992).
Analysis of our work may be taken further along these lines, by relating it
to the Iit.. rature on 'meta-learning' (Watkins et aI., 2001). Many of the activities
described in our first section could readily be classified as meta-cognitive, on
the part of both teachers and their students. The distinction, emphasized by
Watkins et aI., between 'learning orientation' and 'performance orientation'
(see Dweck, 1986, 1999) is also intrinsic to our approach. The achievement of
meta-learning is less clear, for what would be required is that students would
reflect on the new strategies in which they had been involved, and would seek
to deploy these in new contexts. The practice of active revision in preparation
for examinations, or the realization that one n{'('ds to.seek clarity about aims if
one is to be able to evaluate the quality of one's own work, may well be
examples of meta-learning, but evidence about students' perceptions and
responses to new challenges would be needed to support any claims about
outcomes of this type.
Asetond aspect, involving conative and affective dimensions, is reflected in
changes in the students' perceptions of their teacher's personal interest in them.
Mention has been made above, in the report on the abandonment of giving
marks or grades on written work, of Butler and Neuman's (1995) account of the
importance of such a change, It is nol merely that a numerical mark or grade is
ineffective for I('arning because it does not tell you what 10 do; it also
your self-perception. If the mark is high, you are pleased but have no impetus
to do better, If it is low, it might confirm your belief that you are not 3bl(' to learn
the subJect. Many other studies have explored the neg3tivc effects not only on
learning but also on self-concept, self-efficacy and self-attribution of th" class-
room culture in which m3rks and grades come to be a dominant currency of
classroom relationships (see for example, Ames, 1992; Cameron and l'it'rce,
1994; Buller 3nd Winne, 1995; Vispocl and Austin, 1995). In particular, as long
as students believe that efforts on their part cannot make much difference
because of their lack of '3biJity', efforts to ('nhancc their capability as learners
will have little effect.
The importance of such issues is emphasized by Cowie's (2004) study which
explored students' reactions to formative assessment. One of h('r general
was that students are in any activity balancing thr('e g03ls
simultaneously, n3mely, compidton of work tasks, "ffective learning and
social-relationship goals. When these conflict th('y tend to prioritize the
social-relationship goals at the expense of le3rning goals; so, fur cxample,
many wiII limit disclosure of th('ir id('3s in the classroom for fear of harm to
their feelings and reputation. The way in which the teach('r deals with such
disclosures ;s crucial. The respect shown them by a teacher and their trust in
92
I
that teacher affect students' responses to any feedback - they need to feel safe
if they are to risk exposure. Cowie also found that the students' responses to
formative feedback cannot be assumed to be uniform. Some prioritize learning
goals and so look for thoughtful suggestions, preferably in one-to-one
exchanges, whilst others pursue performance goals and so want help to
complete their work without the distraction of questions about their
understanding. Sadly, many felt that the main responsibility for their learning
rested with the teacher and not with themselves. In an activity theory
representation, as exemplified later in this chapter (Figures 5.1 and 5.2), all of
the issues raised by such work are represented by the element labelled
'community'; the connections of this element with the other elements of the
diagram are both important and complex.
Much writing about classroom learning focuses on the learner as an individ-
ual or on learning as a social process. Our approach has been to treat the social-
individual interaction as a central feature, drawing on the writings of Bredo
(1994) and Bnmer (1996). Thus, feedback to individuals and self-assessment has
been emphasized, but so ha\"e peer assessment, peer support in learning and
class discussion about learning.
For the work of students in groups, the emphasis by Sadler (1989, 1998) and
others that peer assessment is a particularly valuable way of implementing
fomlative assessment has bo..>en amply borne out in the work reported here. The-
oretically, this perspective ought to be evaluated in the broader contllXt of the
application to classrooms and schools of analyses of the social and communal
dimensions of learning as developed, for example, in Wenger's (1998) study of
communities of practice. These points are ilIustrdted by the following extrdct
from an interview with a student in the KMOFAP; discussing peer marking of
his investigation:
After a pupil marking my intotStigation, J con now acknowledge my mistokts
lasier. 1 hope Ihat it is /101 just me who [euml from Ihe illue;tigatioll but tht pupil
who marked it did also.
Next time! will havc 10 make my txplllnalions dea,er, lIS they said '/I i ~ hard 10
lmdtrstond', so l must Ill"xl time make my equatioll dl"arrr. J will now explain my
equalum again so il is c ~ o r
11l.is quotation also bears oul Bruner's (1996) emphasis on the importance of
externalizing one's thoughts by producing objects or oeuv,es which.. being
public, a..., accessibl" tu ...,flection and dialugu,-" leading to enrichment through
communal interaction. He points out that awareness of one's own thinking, and
a capacity to understand the thinking of others, provide an CSSl:'ntial reasoned
base for interpersonal negotiation that can enhance understanding.
The importance of peer assessment may be more fundamental than is
apparent in accounts by teachers of their work. For self-assessment, each
student has to interact mainly with text; interactions with the teacher, insofar as
they are personal, must be brief. Discussing the work of Palincsar and Brown
(1984) on children's reading, Wood states:
93
I
This work, motivated by Vygolsky's thtory of developmelll /lnd by his writings on
literacy, startedfrom lISSumption Ihlll some children fail to Ilduanct beyond Ihe
initial stages of reading baust IMy do no/ blOW how to 'interllef' with terl, thai
is, they do not b..'COme aclively (ngaged ill IIttempts to interpret whllt tlwy read,
1M: intmJtnlion I/'dmiquts involved bringing in/a Ihe making
public lind audible, U1<Iys of interacting with tat thai skilled readers usually
urrderltVre au/omal/allly lind soundltsSly, 0998: 22o--J)
Thus if a student's interpretation of aims and of criteria of quality of perform-
ance is to be enriched; such enrichment may well require 'talk about text', and
given thai it is impracticable to achieve this through teacher-student interac-
tions the interactions made possible through peer assessment may meet an
essential need.
Overall, it is clear that these changes in a student's role as learner are a sig-
nificant ftoature in the reform of classroom learning.. that our formative assess-
ment initiative has been effective in its impact on these features, and that
changes in a student's own beliefs and implicit models of learning also under-
lie the developments involved.
Applying adivity theory
[n considering the interpretation of these four components in terms of a repre-
sentation of the subject classroom as an activity system, we have concentrated
mainly on the 'tip of th.e iceberg': subjects, objects and cultural resources, and
the relationships between th.ese three elements. As will be clear in our exposi-
tion of tlwse ideas, the nature of these relations is strongly influenced by the
other elements of activity systems, that is, rules, community, and division of
labour. The discussion of these relationships will be brief; a full exploration
would a far longer treatment than is possible here.
In the activity system of the subject classroom, the tools or cultural "-,sources
that appear to be particularly important in the de\'elopment of formative
assessmmt are:
Views and ideas about the nature of the subject, including pedagogical
content knowledge;
Methoos for enhancing the formative aspects of interaction. such as rich
questions, ideas about what makes feedback effective and techniques such as
'traffic lights' and SO on;
Vil'wS and ideas about the nature of learning.
The subjtds are, as stated earlier, the teacher and the students, although it is
important to acknowledge that it useful to distinguish between students as
individuals and students in groups in the classroom (Ball and Bass, 2000).
The object in most of the subject classrooms we studied was increased studcnl
success, either in temlS of better quality learning or simply better scores on
state-mandated tests. Many teachers spokl' of their interest in participating in
94
I
I
I
the project because of the promise of better results. However, as well as this
object which, as noted above, was secured by most of the participating teachers,
the ou/collln of the projects included changes in the expectations that teachers
had of their students, and also changes in the kinds of assessments that these
teachers used in their routines. The most important change in the teachers' own
assessments was a shift towards using those that provided information for the
teacher not only about who had learnt what, but also proferred some insights
into why this was, in particular - when interpreted appropriately -those that
gave some idea as to what to do about it. In other words, a shift towards assess-
ments that could be formative jor tf'ilcher.
Tools
Subjects
Objecls,
outcomes
Tools

...... ........ of .......J.w,....
k_
,
.....: 1loIe,,( ""-..
I"'_p'
of til< It""",,,

,.... io.:Ii\-..lu&J1
r."'......
--------.
E<l<'nO&Ilr ........
"l:':'::""".
aod ........
r"" .......
Figure 5.1; Patttrll!< Df influence in Iht KMOFAP and BEAR projtcts (SlJlid-/u'adtd arrows
n'/'rrsrot injlurnas in KMOFAP; optn-l1faurd arrows ftprtsmtinjlulntr.iin BEAR)
Figure 5.1 uses the context of the KMOFAP and a US example, the Berkeley
Evaluation and Assessment Research project (BEAR; see Wilson and Sloane,
2000), to illustrate the various components of the theoretical framework out-
lined above and their interrelationships. Components 1, 2 and 4 are represented
as tools, while component 3 is reprcsentL>d in the links belwC<'n the teacher and
the students (both individually and in groups). Solid-headed arrows are used
to represent the key influences in the KMOFAP project while the open-ended
arrows represent influences in the BEAR project. Using this framework, the
course of the KMOFAP can be seen as beginning with tools (in particular find-
ings related to the nature of feedback and the importance of questions) which
prompt changes in the relationship betw.?e" IIJe subjects (that is, in the relation-
ship between the teacher and the students) which in tum prompt changes in the
95
subjects themselves (that is, changes in the teachl'r's and studl'nts' roIl's). These
changes then trigger further changes in other tools such as the nature of the
subjl'ct and the view of learning. In particular, the changes prompted in the
teache.r classroom practices involved moving from simpl(' associationist views
of Il'aming to embracing constructivism and taking responsibility for learning
linked to self-regulation of learning.. metacognition and socialleaming.
Figure 5.1 does not represent an activity system in the canonical way. This
more common representation. using the nested triangles, is shown in Figure
5.2. Here the relationships are brought out more dearly by placing tools at the
apex with subjects, and objects and outcomes on the base of the upper triangle.
Thus it would be possible in principle to map Figure 5.1 into this pari of Figure
5.2 but much of the d('tail would either be lost or appear confusingly complex.
I
i u ~ 5.2: Elnnrllts of adivity systems (EnKo'$lrom, 1987/
However, what the canonical representation makes more explicit are the {'Ie-
ments in the lowest row of Figure 5.2 and their links with th(' re-sl. Whilst the
community, deemed as the subject classroom, is a given. both the rults and the
division of labour are changed by a fonnative innovation. For the rules, if teach-
ers cease to give grades or marks on homework in order to focus on feedback
through comments, th{'y may be in conflict with management rul('s and
parental expectations for many schools. Yet in two of the KMOFAP schools such
rules were eventually changed, the new rule being, for the whole school, that
marks and grades were not to be given as feedback on writlen homework. The
more pervasive 'rule' - that schools are under pressure to produce high grades
in national tests - did limit some fonnative developments, and it is clear that
synergy between teachers' formative practices and their responsibilities for
summative assessments would be hard to achieve without some room for
manoeuvre in relation to high-stakes testing.
The division of labour is a feature that is radically transfonned, as made dear in
the second component for changes in the teacher's role, and in the fourth compo-
nent for changes in the student's role. One aspect of the transfer of power and
responsibility thai is involved here is that the students begin to share ownership
of the tools, for example by involvement in summative testing processes, and by
becoming less dependent 00 the teacher for their access to subject knowledge.
What is obvious from this discussion is that there are strong interactions
between the various clements of the system. This suggests that any attempt to
I
record and interpret the dynamics of change as an innovation. notably in form-
ative assessment, could do well to adopt and adapt an activity theory approach
along the lines sketched here,
Strategies for development
The KMOFAP and BEAR projects
[t is useful at this point to contrast the approach adopted in our proje<:t with an
alternative strategy; clearly eXl'mplified in the BEAR project (Wilson and
Sloane, 2000) and impressive in the evidence of learning gains associated with
an emphasis on formative assessment. This differed from the work described in
the first part of this paper in the following ways:
It was part of a curriculum innovation into which were 'embt.'tided' nt.'w
formative assessment practices;
An important aim was to secure and establish the reliability and \'alidity of
assessment practices so that assessment by teachers could with-
stand public scrutiny and claim equal status with the external standardized
tests which have sum negative effects on education in the USA;
The aims were fonnulated as a profile of a few main components, with each
component being set out as a sequence of levels to reflect the expected pro-
gression of learning within each;
The assessment instlUments were written tests proVided externally, some to
be used as short-term checks on progress, some of greater length to be used
a medium-term meeks;
Whilst formative use was emphasized, there was very little acrount of the
ways in which feedback was deployed or received by students.
To over-simplify, it could be said that the apparent weakness of the BEAR
project lies in those aspects in which our project was strong. At the same time
its strengths, in the quality of the assessment instruments and the rigour in their
use and interpretation, throw into sharp relief the weakness of our projL'C!, for
the cognitive quality of the questions used by our teachers and of their feedback
comments whether oral or written still needs further attention. Whilst the two
approaches may be seen as complementary, and each may have been the
optimum approach for the particular context and culture in which it was
designed to operate, there remains the issue of whether some aspL'Cts of either
could be incorporated, albeit at a later stage in implementation, in the other.
In terms of our model, the BEAR project imports thL'Ories of the subject and
of learning and requires teachers to work to these models, but is not explicit on
the nature of the teacher-student interactions or the change in roles of either
teachers or students. Thus the project does not seem to affected the class-
room community through any Significant shift in the division of labour. Similar,
although not identical, contrasts could be drawn by analysis of many of the
research initiatives described in the 1998 Il;'view by Black and Wiliam. The
97
I
contrast between our work and that of the BEAR project is brought out dearly
in Figure 5.1, which shows the patterns of innuence in the two projects.
This comparison can help to draw attention 10 the options available in any
programme of teacher development. The partial successes of our own approach
have a peculiar significance in thai they have led 10 changes tranSl;Cnding the
boundaries envisaged by our initial concentration on fannative assessment.
This cXpaJ\Sion may in part have arisen because of our emphasis on the respon-
sibility otthe teachers as partners with us, sharing responsibility for the direc-
tion of change. It might have been predictable that their initiati\'es would
broaden the scope, because their work has to marry into the fuJI reality of class-
room work and cannol be limited 10 one theoretically abstracted feature. Indced
we have come to think of fonnalive assessment as a 'Trojan Horse' for more
general innovation in pedagogy - a point to which we shall return in the
conduding section below.
I
Other related research and development studies
The BEAR study was similar in many respects to our own, so it is particularly
interesting to explore the (omparison in detail. However, we have developed
the view that what is at issue is a throry of classroom and from this
perspective the number of relevant studies becomes far too great for any syn-
thesis to be attempted here.
Three of related studit.'S may suffict' to indicate possibilities. The
first is the cognitive ac(eleration work associated with Shayer (1999). In com-
parison with the cognitive aca-leration initiative, our fonnativt"' intervmtion
did not target specific reasoning skills and so does not call for ad hoc teaching.
although within the set pie' lessons of that initiative many of the practices
have much in common with the fonnative practices. In tenns of the scheme of
Figure 5.1, the work involves very specific tools and is characterized by a more
explicit - and thereby less eclectic -Irarning analysis which impacts directly on
the role of Ihe teacher. It resembles the BEAR project in these respects, but it does
not resembh.' it in respect of the dired link to externally sel tesls alld criteria.
A second example is the work on 'Talk Lessons' developed by Neil Mercer
and his colleagues (Mera-r, 2000; MerCf.'r et aI., 2(04). These lessons could
indeed be seen as a powerful way of strengthening the development of peer
assessment practices in enhancing students' capacity to learn. This initiative
develops different specific tools but it also, in terms of Figure 5.1, works to
direct links betwt"Cn the leaflling rmalys;s, the ;n/erae/ioll methods and the division
of labour by focusing its effort on the role of the s/udellt ;'1 a group.
The third example is related to the second, but is the broader field summa-
rized in Alexander's (2004) booklet TDWIlrds Dialogic Teaching, which draws on a
range of studies of classroom dialogue. The main argument here starts from the
several studies that have shown that classroom dialogue fails to develop stu-
dents' active participation, reducing dialogue to a ritual of superfidal questions
in a context of 'delivery' teaching whereby thoughtful participation cannot
98
develop. His arguments call for an emphasis on all three of the tools areas in
Figure 5.1. and puts extra emphasis on the community element represented
directly in Figure 5.2 but only indirectly in the connecting arrows between
teacher role and student roles in Figure 5.1.
Conclusions and Implications
We have focused the discussion in this chapter on our own study, in part
because our approach to theory was grounded in that work. in part because we
do not know of any other study which is grounded in a comparably compre-
hensive and sustained development with a group of tearners. Whilst we regard
the theory as a promising start there is clearly further work to be done in both
developing it and relating empirical evidence to it.
If we consider the potential value of the four component model that we have
explored and discussed. an obvious outcome is that it could be used to suggest
many questions which could form the starting point for further empirical
research, many of which would requiring fine-grained studies of teacher-
student interactions (see for example, Torrance and Pryor, 1998; Cowie, 2004).
However, the more ambitious target for this chapter is more fundamental - to
help guide the direction and interpretation of further re!iCarch through the the-
oretical framework that is proposed.
We have explored above, very briefly, the possibility for developing the
theory through attempting new interpretations of initiatives already published.
TItis exploration, which involves attempting to embed the formative aspect in a
broader view of pedagogy, reflects the point made by Perrenoud quoted at the
beginning of this chapter that it is neo:essary to consider formative feedback in
the wider context of 'models of learning and its regulation and their imple-
mentation'. This may seem to be over-ambitious in attempting a complete
theory of pedagogy rather than only that partiC\llar aspect of pedagogy which
is l ~ l l e 'formative assessment'. However, such an altempt seems inevitable
given our c.xperience of the initially limited aim of developing formative assess-
ment leading to much more radical changes.
One function of a theoretical framework should ~ to guide the optimum
cooice of strategies to improve pedagogy; by identifying those key determi-
nants that have to be evaluated in making such choices and in learning lessons
from experiences in other contexts. It follows that the framework might ~ used
to evaluate, n>trospectively or prospectively, the design of any initiative in
teaching and learning. In the ca!iC of the KMQFAP initiative, it should help
answer the question of whether it was the optimum way of devoting effort and
re50urces towards the improvement of classroom pedagogy. TItis would seem
a very diffiC\lIt question to answer in the face of the potential compleXity of a
comprehensive theory of pedagogy that might prOVide the basis for an answer.
However, some significant insight could be distilled in a way that would at least
help re50lve the puzzle of the project's unexpected success, represented by the
metaphor of the Trojan Horse mentioned in the previous section.
99
The argument starts by pointing out th,lt th(' examples of chang'" which the
teachers described seemed to confirm that working to improve the teacher-
student interaction through formative assessment could serve to catalyse changes
in both the teacher's role and those adopted by that teacher's students. The
changes motivate, perhaps demand, an alteration in the various interactions of
both students and teachers with their throries of learning, and with the ways in
which they perceive and relate to the subject matter that they are teaching. Thus
whilst we cannot argue that development of formative assessment is the only
wa)', or e,'en the best way, to open up a broader range of desirable changes in
classroom learning.. we can see that it may be peculiarly effective, in part be1:ause
the quality of interacth'e f ~ c k is a critical feature in determining the quality
of learning acti\'ity, and is therefore a central featur(' of pedagogy.
We might also speculate that a focus on innovation in formative assessmomt
may be productive because many teachers, regardless of their perceptions of their
teaching role and of the learning roles of their students, can see the importance of
working on particular and limited aspects of feedback but might then haw their
perspectives shifted as they undertake such work. In the project, the tools pro-
vided led teachers to think mor(' deeply - about their pedagogical content knowl-
edge, about their assumptions on learning and about interactions with their
students; hence activating all of the components of our framework for thcm.
Given that a development of formative assessment has this peculiar potential
to catalyse more radical change, a theory that helps design and track such change
would be an important resource. The approach sketcrn.>d out here may help such
tracking, inasmuch that the compone1lts of our model interpreted in terms of an
activity system framework do seem to interact strongly and dynamically. and
would help in interpreting any change pt'lX"CSS. A central feature may be that
inconsistencies between the various elements of the classroom system are hard
for the actors to tolerate. The interaction lines in the frameworks of Figures 5.1
and 5.2 are all-important for they signal that any irmovation that s u ~ s in
changing one element might well destabilize the existing equilibrium, so thaI the
whole pattern of pedagogy is affectl'<i to achil'Vl' a new equilibrium.
Acknowledgements
We woulli acknowledge the support given by the Nuffield Foundation in
funding the first phase of the KMOFAP, and by the National Science Founda-
tion for funding the subsequent phase through their support of our partnership
wilh the Stanford CAPITAL projed (NSF Grant REC-99(9370). This present
paper ft'P,Orts the findings of our work in England to date: comparative and
synthesized findings with the Stanford partners will be subjects for latt"r study.
We are grateful to Sue Swaffield from Medway and Dorothy Kavanagh from
Oxfordshire who, on behalf of trn?ir authorities, helped to create and nurture our
links with their l'I.'Spective schools. 1be teachers in this pro;ect have been the main
agents of its success. 11teir willingness 10 lake risks with our ideas was essential,
and their voices are an important basis for the main message of this paper.
100
Part III Formative and Summative

Issues
Chapter 6
On the Relationship Between Assessment for
Formative and Summative Purposes
Wynne Harlen
This chapter considers the relationship between assessment for Iwo of its main
purposes -to help learning and to summarize what has been learned. Both pur-
poses are central to effc.::tive educational practice. bul there are issues relating
to whether it is useful to consider them as conceptually or pragmatically dis-
tind. As a basis for this discussion Ihe first part of the chapter suggests a T{>P-
resenlation of the distinction using models of assessment for II'i1ming and
assessment of learning. The second and third sections address questions as to
whether evidence gatherx'd for ont:' purpose can be used for the other. The
fourth pari considers wht'ther there is a dimension of purposes rather than a
dichotomy and whether the distinction between formath'e and summative
assessment is useful or redundant. At a t ~ whffi practice is only bl>ginning to
catch up with theory in relation to formative assessment, it is inevitable that this
chapter raises more questions than it finds answers for. It ends by conduding
that at present the distinction between formalive and sUffimiltive purposes of
assessment should be maintained whilst assessment systems should indude the
provisions that make it possible for information gathered by teachers to be used
for both purposes. It should be noted thai the terms 'formative assessment' and
'summative assessment', and 'assessment for learning' and 'assessment of
learning' are used interchangeably in this chapter.
A view of how assessment serves formative and
summative purposes
11 is gffierally agreed that assessment in the context of education involves decid-
ing, collecting and making judgements about evidence relating 10 the goals of
the learning being assessed. This makes no reference to the use of the evidence,
who uses it and how. These are mailers at the heart of the distinction between
formative and summative assessment. What we now identify as summative
assessment (assessment of learning) has been part of education for centuries.
Some would argue that formative assessment (assessment for learning) has an
equally long pedigree, with Rousseau's 1762 exhortation to 'Begin by making a
more careful study of the scholars, for it is dear that you know nothing about
them', (Preface: 1). However the identification of formative assessment as a dis-
103
tinct purpose of assessment, requiring precise definition, is more recent.
Identifying assessment in tenns of its purposes, although not new, became
widespread wheo the Task Group on Assessment and Testing (TGAn reJXlrt of
1988 distinguished between assessment for four different pulposcs: formative,
diagnostic, summative and evaluative (DES/\NO, 1988a). 1llt' word 'formative'
was used to identify assessment that promotes learning by using evidence about
where students have reached, in relation to the goals of their leaming. to plan the
next steps in their learning and know how to take them. To all intents and pur-
poses the term 'formative assessment' includes diagnostic assessment, which is
often taken to concern difficulties in learning since formative is oonct'med with
both difficulties and positive achievements. 'Summative assessment' provides, as
the term suggests, a summary of achievements at a particular JXlint. It is a neces-
sary part of an assessment system as it provides information to those with an
interest in students' achievements: mainly parents, other teachers, employers,
further and higher education institutions and the students themselves. Assess-
ment serves an evaluative purpose when the performance of groups of students
is used 10 reJXlrt on the work of a class, a teacher, a school or another pari of an
educational system. 1l\e information about students that is used for this purpose
is necessarily based on summative assessment. II is widely agreed, however, that
the evaluation of education provision must use a good deal of information other
than that which derives from the assessment of students.
Where individual students are concerned, the important distinction is Ix-tween
assessment for formative and summative purposes. Using the terms 'formative
assessment' and 'summative assessment' can give the impression thai these an'
different kiuds of assessment or art' linked to different methods of gathering evi-
dence. This is not the case; what matters is how the information is used. It is for
this rt'ason that the tenns 'assessment for learning' and 'assessment of learning'
are sometimes preferred. 1he essential distinction is that assessment for learning
is used in making decisions that affect teaching and learning in the shorl-term
future, whereas assessment of learning is used to ll"COrd and reJXlrl what has
been learned in the past. lhe difference is reflected in Figures 6.1 and 6.2.
Figure 6.1 represents formative assessment as a cycle of e\'ents. Evidence is
gathered during activity A and interpreted in terms of progress towards the
lesson goals, Some notion of progression in relation to the goal is needed for this
interpretation. so that where students are can Ix- used to indicate what next step
is approRriate. Helping the students to take this next step, leading to activity B, is
the way in which the evidence of current learning is fed back into teaching and
learning) This feedback helps to rt>gulate teaching so that the pace of moving
toward a learning goal is adjusted to ensure the active participation of the stu-
dents. As with all regulated processes, feedback into the system is the important
mechanism for ensuring effective operation. Just as feedback from the thennostat
of a hl'aling or cooling system allows the temperature of a room to be maintained
within a particular range, so fet..>dback of information aboutleaming helps ensure
that new experiences are not too difficult nor too easy for students.
In the case of teaching. the feedback is both to the teacher and to the students.
Feedback to teachers is needed so that they can consider appropriate next steps
104
I
I
On the Relation$hip Between for Formative and Summative Purpose$

and the action that will help the students to take them. Feedback to students is
most effective in promoting learning if it involves them in the process of decid-
ing what the next steps should be, so they are not passive recipients of the
teacher's judgments of their 1V0rk, Thus the students are at the centre of the
process and the two-headed arrows in Figure 6.1 indicate that they have a role
in the collection,. interpretation and use of the evidence of their learning,
Figure 6.): A$Sl'$$mml for lellrn;n8 lIS a of mmls (lUlapttd from Harlm, 2000J
The actions indicated by the boxes in Figure 6.1 are not 'stages' in a lesson or
necessarily conscious dedsions made by the teacher. They represent a frame-
work for thinking about what is involved when focusing on what and how stu-
dents are learning. and are using this to help further learning. In some cases it
may be possible for teachers and students together to decide on immediate
action. For example, if a teacher finds some students' ideas about an event they
are investigating in science are not consistent with the scientific explanation,. it
may be possible for them to help the students to set up a test of their ideas and
so see for themselves the need to consider alternative explanations. In other
cases, a teacher may take note of what is needed and provide it at a later time.
Implementing formative assessment means that not everything in a lesson can
be planned in advance. By definition,. if students' current leaming is to be taken
into account. some decisions will depend on what this learning is. Some ideas
can be anticipated from teachers' experience but not all. What a teacher needs
is not a prescribed lesson content but a set of strategies to deploy according to
what is found to be appropriate on a particular occasion.
Figure 6.2, when compared with Figure 6.1, shows the essential difference
between assessment for learning and assessment used to report on achieve-
ment. In Figure 6.2, evidence relating to the goals of learning may be gathered
105
I
from regular activities or from special assessment tasks or tests. The interpreta-
tion is in terms of achievement of ct'rtain skills, understandings and attitudes as
a result of a number of activities. It will be criterion-referenced, using the same
criteria for all students because the purpose is 10 report achievement in a way
that is comparable across srndenls. There is no feedback into teaching - at least
nol in the same immediate way as in the assessment for learning cyde. In this
model the students have no role in the assessment, although some argue that
they can and should have a role (see for example, Frederiksen and White, 2004).
T"SIS, DC
regular activilies
I
- ole
l'fJIating to goals
Evidence
/'
Reporl Ol'l
Interpretation of
od>ievnnml
.""""'"
.dlievement
(ml.1ion
Figure 6.2: Assessment of Itarning by Itllchrn (adaptt'd frcm Hllrltn. 2000)
These models represent two conceptually different uses of evidence that
enable clear distinctions to be made between aSS<:':ismcnl for learning and
aS5!>SSmmt of learning. But are these distinctions quite so sharp in practice?
Can evidence collected to summarize leaming be used to help learning? Can
evidence collecled for fonnative purposes also be used for summative pur
poses? If so, how is this done and what does it say about any real distinction
between formative assessment and assessment? These are the ques-
tions addressed in the next two sections of this chapter, which draw upon some
material from Harlen (2005).
Using summative assessment evidence to help learning
TIle question here is whether evidence gathered so that it efficiently and reli
ably serves a summative puTpOSll can also serve to help learning. lhis evidence
might be, for example, collected by te5ts or special task5 or a summation of
coursework, in order 10 see to what extent certain skills or understandings have
been acquired. Can the same evidence be used to help further learning? There
are examples of practice provided by Maxwell (2004) and by Black ...1al. (2003)
that would suggest a positive answer to this question.
Maxwell describes the approach to assessment used in the Senior Certificate
in Queensland, in whidl evidence is collected over time in a student portfolio
as 'progressive assessment'. He states that:
'.6
I
On the Relationship Between Assessment for Formative and Summative Purposes
All progressive /lS5tSSmtnt nece5S4lrily involves feedback to the studtnt llbout tht
quality of their pc[vmlllna. This (llll bt exprt$Std ill terms of the student's
progress towards desired lellrning olltcomes lind suggested sleps for further devd-
opmtul and improvemtnt ...
For this IIpPTOQch to work, it is Iluessary to express the learning exptctaliOltS in
terms of common dimensions of learning (criteria). Then therr clln be discussion
lloout whether the on-targe:t with respect 10 lhe learning
and what neros to be done to improve perfOrmance 011 future IlSStSSmtnt wlJt7"t the
S4lme appear.
As the student builds up the portfolio of evidence of their perftmnllnce. earlier
assessment may bt: by lattr /lSStssment covering the s;lme uuderlying
dimensions of lemling. The aim is to report 'where the studell/ golla' in their
learning ;oUrlley, nol where Ihl:lJ stllrted or where Ihey wert on the avtrllge llCTOS5
the who//' course. (Mll;m",U. 2004: 2-3)
The identification of goals and assessment criteria in terms of a 'wmmon dimen-
sion of learning' is central to this approach. Descriptions of these dimensions of
learning need to be detailed 10 be capable of giving guidance, yet not SO pre-
scriptive as to infringe teachers' ownership of the curriculum. As the researcl1.
reviewed by Harlen (2004) shows, the dependability of assessment is enhanced
when teachers have a thorough understanding of the goals and of the nature of
progression towards them. In Queensland, this is facilitated by schools being able
to make decisions about their own work plan and by teachers' regular participa-
tion in the process of moderation. Time for this participation and respect for till'
professionaUsm of teacht>rs (Cumming and Maxwell, 20(4) are also important.
Conditions that promote dependability are clearly essential when teachers'
assessment has high slakes for individual students. However, a significant
feature of the Queensland system is that the assessment of students in the Senior
Certificate is detached from school and teacher accountability procedures.
Black et al. (2003) include the formative use of summative assessment as one
of four practices that teachers found were effective ways of implementing form-
ative assessment (the others being questioning. feedback by marking and student
peer and self-assessment - sec Chapter I). These practices were all devised or
elaborated by teachers as they strove, working with the researchers, to make
changes in their classrooms so that assessment was used to help learning. In rela-
tion to the formative use of summative tests, the tead1ers devised throe main
ways of using classroom tests, beyond just assessing attainment, to develop stu-
dents' understanding. The first of involved helping students to prepare for
tests by reviewing their work and screening past test questions to identify areas
of insecure understanding. This reflection on their areas of weakness enabled
them to focus their revision. TIle second innovation was to ask students to set lest
questions and devise marking schemes. This helped them 'both to understand
the assessment process and to focus further efforts for improvemenf (Black et aI.,
2003: 54). The third change was for the teachers to use the outwme of tests diag-
nostically and to involve students in marking each other's tests, in some cases
after devising the mark scheme. This has some similarity to the approach
107
reported by Carter (1997), which she caUed 'test analysis'. In this the teacher
returned test papers to students after indicating where there were errors, but left
the students to find and correct these errors. The students' final marks reflected
their response to the test analysis as well as the initial answers. Carter described
this as shifting the responsibility for learning to the students, who were encour-
aged to work together to identify and correct their errors.
Whilst there is clearly value in using tests in these ways, there are S(>\eral1imi-
tations to thl'Sl" approaches. First,. the extent to which this information could guide
teachers in how 10 help shldents work towards particular lesson goals would
depend on how often tests or special tasks were set. Moreover, if the tasks are
designed to summarize learning relatt.od to general criteria, such as statements of
attainment in national curricula, they will not have the detail that enables them to
be diagnostic in the degree needed to help specific learning. Although it is possi-
ble to use some external tests and examinations in this way, by obtaining marked
scripts and discussing them with students, in practict' the approach is one that
teachers can use principally in the context of classroom tests over which they have
complete control. Black et al. (1993) noted that when exlemaltests are involved,
the process can mO\'e from developing understanding to 'teaching to the test'.
More generally, the pressures exerted by current external testing and assessment
requirements arc not fully consistent with good formative practices (Black et aI.,
2003: 56). There would be a strong tendency to gather frequently what is essen-
tially summatiVl' evidence as a substitute for evidence that can be used forma-
tively. Whilst the teachers described by Black et al. (2003) used their creativity to
graft formative value on to summative procedures, a more fundamental change is
needed i( assessment is to be designed to serve both purposes from the start.
The ten principles of assessment (or learning (ARC, 2002a) provide a means
of checking the extent to which evidence from a summative assessment can be
truly formative. Before assuming that such evidence is capable of helping learn-
ing, a teacher might judge it against these principles by asking, for instance:
Does it focus on how students learn?
Is it sensitive and constructive?
Does it foster motivation?
Does it promote understanding of goals and criteria?
Docs it help learners to know how to improve?
Docs it develop the capacity for self-assessment?
Docs it recognize all educational achievements?
The Queensland portfolio system could be said to match most of these require-
ments quite well; indeed, it is dcsigned to serve a formative purpose as well as
a summative one. But the same cannot be said of using tcsts and examinations.
These can make a contribution to helping identify further [earning, but can
never be sufficient to meet the requirement of assessment for learning, because:
The collection of summative evidence does not occur sufficiently frequently;
The information is not sufficiently detailed to be diagnostic;
It only occurs in reality in relation to tasks chosen,. administered and marked
by teachers:
,oa
There is a danger of mistaking frequently collected summative evidt"l'K'e for
evidel'lC'e that can be used fonnatively and therefore neglecting genuine
assessment for learning;
Using external tests in this way risks teaching to the tests;
It rarely matches the principles of assessment for learning.
Using formative assessment information for
summative assessment
lhe approaches discussed above are linked to summative assessment which is an
occasional, if regular, event. In between classroom tests or summary grading of
COUI'Se work, there are other innumerable classroom events in which teachers and
students gather evidence about the latter's on-going achievements. They do this
by observing.. questioning.. listening to infonnal discussions among students,
reviewing written work and by using students' self-assessment (Harlen and
James, 1997). As noted earlier, when utilized to adapt teaching and learning this
evidence may be used immediately to provide students with help or it may be
considered later and used to plan subsequent learning opportunities.
Evidence gathered in this way is often inconclusive and may be contradic-
tory, for what students can do is likely to be influenced by the particular
context. This variation, which would be a problem for summative assessment
(see Chapter 7), is useful information for formative purposes, suggesting the
contexts in which students can be helped to develop their ideas and skills. By
definition, since it is gathered in the course of teaching evidence at this level of
detail relates to an aspects of students' leaming. It is valuable in that it relates
to the goals of specific lessons or activities and can be used in deciding next
steps for individualleamers or for groups. An important question is: can this
rich but sometimes inconsistent evidence be used for summative assessment
purposes as well as for fonnative assessment, for which it is so well suited? U
not. then separate summative assessment will be necessary.
A positive answer to this question was given by Harh,," and James (1997)
who proposed that both formative and summative purposes can be served pro-
vided that a distinction is made between the evidence and the intaprelalion of the
evidel1ct. For fonnative assessment, the evidence is interpreted in relation to the
progress of a student towards the goals of a particular piece of work, next steps
being decided according to where a student has reached. lhe dedsion that is
infonned relates to what needs to be done to help further learning, not what
level or grade a student has reached. There are two related but different matters
to be considered in moving from using evidence in this way to using it to sum-
marize wha.t has been learned. These are the goals involved and the basis of
judgment of the evidence.
60111s .t different levels
First of all the gools of a lesson, shared with the students, will be specific to the
subject matter of that lesson. Addressing these specific goals will contribute to
'"
,
the development of a more general understanding or improved skill, that is, to
goals 011 a more generic level than the specific lesson goals. For example, the
goals of a specific lesson might include an understanding of how the structure
and form of a snail is suiled to the places where snails are found. This will con-
tribute to an understanding of how animals in general are suited to their habi-
lats, but achieVing thJs will depend on looking at a variety of animals which
will be the subject of other lessons with their own specific goals. Similarly, skills
such as planning a scientific investigation are developed not in one lesson but
in different contexts in differenllessons.
So, while teachers and students use evidence to help learning in relation to
specific goals the evidence from several lessons can CQntribute to a more
generalleaming outcome. For this purpose it is important for teachers to have
a view of progression in relation to the understanding and skills they are
aiming for their students to achieve through particular lessons. The course of
progression can be usefully expressed in tenns of indicators, which serve both
the purpose of focusing attention on relevant aspects of students' behaviour
and enable teachers to sec where students are in development. An example of
indicators for the development of planning scientific investigations in the
context of science at the primary level is given in Box 6.1.
Box 6.1: Example of developmental indicators
Things students do that are indicators of planning scientific investigations:
Suggest a useful approach to answering a question or testing a prediction
by investigation. even if details are lacking or need further thought;
Make suggestions about what might happen when certain changes are
made;
Identify the variable that has to be changed and the things which should
be kept the same for a fair test;
Identify what to look for or measure to obtain a result in an investigation;
Select and use equipment and measuring devices suited to the task in
hand;
Succeed in planning a fair test using the support of a fr.lmework of ques-
tions or planning board;
Spontaneously structure a plan so that variables are identified and stl'PS
taken to make results as accurate as possible. (Harlen, 2005)
These indicators have been developed from what is known about progression
from research and practice, but they are not by any means definitive. It is not
likely that there is an exact and invariable sequence that applies to I'very
student, but is it helpful to have a rough idca, Examples of similar lists have
been published in Australia (Masters and Forster, 1996) and dl'veloped in Cali-
fornia (as the Berkeley Evaluation and Assessment Research (BEAR) system
(Wilson. 1990; Wilson et at., 2(04). In these lists, the earlier statements indicate
understanding. skills or attitudes that are likely to be developed before those
110
I
Copyrighted Material
following later in ttl<> lisl. There is no need for 'levels', grades or stages to be
SUggl'StL>d; just a S(.'<juence expected for students in a partirular age range (in
the example, in primary and early scrondary school years). For formative
assessment it is not necessary to tie indicators to grade level expcclation; all that
is reqUired is to see where students are and what is the next step in their further
progress. This is consistent with the fonn of fL'('dback for studenls in formative
asS(.'SSment, which should be non-judgemental and focused on the next steps in
learning (see Chapters 1 and 4).
lcachers need thoroughly to understand, internalize and own these indicators
if they are to be useful in their everyday work with students. This raises a polen-
tial problem rel.. ting to ensuring that they are carefully ..nd rigorously dcfinLxl to
ensure validity, and at the same time understood and owned by teachers. It is not
realistic for teachers aloll(' to create such indicators for themselves, nor would it
lead to valid outcomes; il requires input from researchers and developers with
expertise in students' development as well as c1assnxlm experience. But
somehow it is nL'Ct'Ssary to avoid a top-down imposition of a set of indicators
which could tum their usc into a mechanistic process. Research suggests that this
c;>n be avoided and ownership encouraged by training that includes experience
of devising some indicators without haVing to cover the full set with the thor-
oughness rC<Juired for usc by all teachers in a range of circumstances.
As for reporting allainment there is another step to be taken, since the pro-
gressive criteria arc too detailed for most summative purposes. What is
rC<Juired for summary reporting is, at most, an overall judgement ilbout what
has been achieved in terms of, for instance, 'knowledge and understanding of
life processes and living things' or 'scientific enquiry skills'. A further aggrega-
tion of evidence is needed across the range of ideas and processes that arc
included in these global terms. To take the example of science enquiry skills,
evidence from several lessons in which the gool is to help studenls plan scien-
tific investigations will lead to identifying where students arc in planning
investigations. This skill, in tum, is pMt of a bro.,der aim of developing enquiry
skills, a target of the national curriculum on which teachers provide a summary
end-of-year or key stage report. Thus there is a thrco.. ....step process, depicted in
Figure 6.3, of using information from individual lessons to produce a summary
relating to reporting categories. But this is not a straightforward aggregation, as
we S(.'C in relation to the S(.'Cond maller to Ix' considerLxl in summarizing the evi-
dence; the basis of judgment.
Changing the basis of judgement of the evidence
As noted earlier, when evidence is gathered in a lesson (A. B, C. and so on in
Figure 6.3) it may be used on the spot or later to help students or groups achieve
ttl<> lesson goals. In doing this a teacher will interpret the evidence in relation to
the progress of the individuals involved, so the judgment will be student-refer-
enced (or ipsative) as well as being related 10 the criteria. However, if thf.'f.'vidl'nce
is also used to report on achievement of broader aspects of skills or knowledge.
moving from left to right across Figure 6.3, then it must be evaluated according
111
to the criteria only. Thus the evidence can be u5ed for two providing it
is reinterpreted against criteria that are the same for all students. nus means that
if the information already gathered and u5ed formatively is to be used for sum-
mative assessment it must be reviewed against the broader criteria that define
levels or gT3des. This involves finding the 'best fit' between the evidence gathered
about each student and one of the reporting levels. In this pnx:ess the change over
time can be taken into account so that, as in the Queensland portfolio assessment.
preference is given to evidence that shows progress during the period covered by
the summative assessment.
lesson 80015
(specific to
iOCtivities A, B, C,
dC.)
A
C
",.
Progressive crileria for the
skill or undttStanding
developro in diffen>nt
11'5SOl\5 (fine..grained)
t'!5- pl.lnning investigations
Criteria for
NflOrting levels
of achievement
(roane-grained)
eg. e",!uiry skil'"
figure 6.3: Cools "I ..,rWus lrods of dt/"il
These considerations show that it is quite possible to use evidence gathered
as part of teaching both to help learning and for reporting purposes. But. as
with the use of summative data for formative purposes, there are limitations to
the process. By definition, in this context, the judgement is made by the teacher
and for summative purposes there needs to be some assurance of dependabil-
ity. Thus some quality assurance procedures need to be in place. The more
weight that is given to the summative judgement, the more stringent the quality
assurance needs to be, possibly including some inter-school as well as intra-
school moderation in judgements of evidence. This is difficult in relation to out-
comes where the evidence is ephemeral and a consequence of this can be that
the end-use of the evidence influences what evidence is gathered and how. It
could result in a tick-list approach to gathering evidence or a series of special
tasks that give concrete evidence, making fonnative assessment into a succes-
sion of summative assessments.
A further limitation is that if summative assessment is based solely on evi-
dence gathered within the context of regular classroom activities, this evidence
will be limited by the range and richness of the educational provision and the
efficiency of the teachers in evidence. In some circumstanC'('S the evi-
dence reqUired to summarize learning may need to be supplemented by intro-
ducing spccialtasks if, for instance, a teacher has been unable for one reason or
another to collect all that is necessary to make judgements about all the stu-
dents. For fonnative purposes it is often appropriate to consider the progress of
groups rather than of individual students. Additional evidence may then be
needed when making a on the achievement of individual students.
112
In summary, limitations on using evideoa:' galhl'red for fonnali\'l' asSl'SS-
menl, if it is to ml'<.'tthe requirements of summative assessment, are:
It is essential to reinterpret the evidence in relation to the S<lme criteria for
each student;
It is important for teachl'rs to be very ckar about when lewl-baSl.-'d criterill
arc appropriate and not to use them for grading students when the purpose
of the assessment is formative;
Since the formative use of evidence depends on teachers' judgments, add i-
tion1l1 qUlllity assuranCl:' prOCl.-'(\urcs will be nl'l'ded when the infonnation is
used for a different purpose;
Teachers may need to supplement evidenCl:' from regular classroom events
with special tasks to ensure that all nccessary evidence is collected for all stu-
dents;
The difficulty of dealing with ephemeral cvidence could lead to a tick-list
approach or a series of summati"c tasks;
This may \\'ell change the nature of formative assessment, making it more
formal.
Revisiting the relationship
A dichotomy or a dimension?
The discussion in the previous two parts of this chapter indicates that there is
no sharp discontinuity between assessment for learning and assessmcnt to
rq:>ort learning. In particular, it is possible to view the judgements of l'vidl'nce
against the progressive criteria in the middle column of Figure 6.3 both as fonn-
ative, in helping decisions about next steps, and as summative in indicating
where students have reached. This suggests that the relationship between form-
ative and summatiVl' assessment might be described as a 'dimension rather
than a 'dichotomy'. Some points along the dimension Me indicated in Figurl' 6.4
(dcrived from Harlen. 1991.
At the extrcml'S are the practio::s and uses that most typify assessment for
learning and aSSl'ssment of learning. Atlhe purely fonnative end is assessment
that is integral to student-teacher interaction and is also pMt of the student's
role. The teacher and student consider work in rcliltion to the goals that arc
ilppropriilte for the particular learner and so the judgements arc essentially
student-referenced. The central purpoSl' is to enable teacher and students to
identify the next steps in leamingand to know how to take theSl'. Atthe purely
summative end of the dimension the purpoS<.' is to give an ilccount of whilt has
been ilchieved ilt Cl:'rlain points. For this purpose, the ilsscssment should result
in a dependable report on the achievements of each individual student.
Although Sl'lf-asSl'ssment milY be part of the prOC\'ss, the ultimate responsibil-
ity for giving a fair account of how l'ach student's INlming compares with thl'
criteria or standards rests with the teilchcr.
113
Between these ends it is possible to identify a range of procedures having
various roles in teaching and learning. For instance, many teachers would begin
a new topic by finding out what the students already know, the purpose being
to inform the teaching plans rather than to identify the point of development of
each individual. Similarly, at the end of a section of work teachers often give an
informal test (or use 'traffic-lighting' - see Chapter 1) to assess whether new
ideas have been grasped or need consolidation.

Summalive
W",mol F_
Informal IFormal
formative formal;ve summative summal;ve
Major focus What art' lIMo nexl steps in \'INI has befon achieved 10
lumin ? date?
1""1-
ToWo"" To inform TOl1\Of'litor
To """"
next steps in nexl steps in
F"""",,
achievement.
]eamin tNeron a ainst
"""
of individuals
How is

Introduced Introduced Separate task
evidence pari of class into normal ;nlO rooona] or lesl
collected? wmk elMs work class work
",",,01 Studenl Student and Criterion CrilerKlf\
judgemenl
.f"""",,, criterion referenced refe""",ed
rei,,,,,,,,,,
Judged by Student and Teacher Teacher or
teaclMor
..
marker
Action taken Feedback to Feedback into Feedback Into Report 10
students and teaching teaching student.
teacher plans parent, other
teachers ek.
Epithet Assessment Matching Dipstkk A..........ment
for l"amin
of
Figure 6.4' 1\ plI5Siblr dimrnsion of I155rSSmrnt pllfPOS'S Imd
There is some parallel here with intermediate purposes for assessment
tified by others. Cowie and Bell (1999) interpreted their observation of the
assessment practices of ten teachers in New Zealand as indicating two forms of
formative assessment: planned and interactive. Planned formative assessment
concems the whole class and the teacher's purpose is to find out how far the
learning has progressed in relation to what is expected in the standards or cur-
riculum. Information gathering, perhaps by giving a brief class test or spedal
task. is planned and prepared ahead; the findings are fed back into teaching.
This is similar to 'informal summative'. Interactive formative assessment is not
planned ahead in this way; it arises from the learning activity. Its function is to
help the learning of individuals and it extends beyond cognitive aspects of
learning to social and personalleaming; feedback is both to the teacher and the
learners and is immediate. It has the attributes of 'informal formative'.
Cowie and Bell's interactive formative assessment is similar to the classroom
assessment that Glover and Thomas (1999) describe as 'dynamic'. Like Black
114
and Wiliam (1998b), they emphasize the involvement of students in learning
and indeed speak of 'devolving power to the learners' and suggest that without
this, dynamic assessment is not possible. Unlike Cowie and Bell, however, the)'
claim that all assessment must be planned.
There are also different degrees of formality towards the summative end.
!;\/hat is described as 'informal summative' may involve similar practice 10
'formal formative', as is illustrated in Figure 6.4. However, the essential difference
is the use made of the evidence. If the cycle is closed, as in Figure 6.1, and the evi
dence is used in adapting teaching, then it is formal formative. If there is no feed-
back into teadting, as in Figure 6.2, tm'fl it falls into the CiltegOry of 'informal
summative', even though the evidence may be the same classroom test.
Yet rather than trying to make even more distinctions among assessment
procedures, this analysis perhaps ought to be taken as indicating no more than
that there are different ways of practising and using formative assessment and
summative assessment. If this is so, do we then need the distinction at all?
Is there ~ t i v and summative assessment or just good
assessment1
Some of those involved in developing assessment have argued that the forma-
tive/summative distinction itself is not helpful and that we should simply strive
for 'good assessment'. Good formative assessment will support good judge-
ments by teachers about student progress and levels of attainment and good
summalive assessment will provide feedback that can be used to help leaming.
Maxwell (2004) describes progressive assessment as blurring the boundary
between formative and summative assessment.
TIle discussion of Figure 6.4 certainly indicates a blurred boundary. Added
to this, the recognition of how evidence can be used for both purposes would at
first sight seem to add to the case against retaining the distinction between
formative and summative assessment. In both cases there arc limitations in the
dual use of the evidence, but on closer inspection these are seen to be of rather
different kinds. The limitation of using evidence which has initially been gath-
ered for a summative purpose to help learning bears on the validity of the evi-
dence; it is just not sufficiently rich and readily available to be adequate for
formative use. The limitation of using evidence which has initially been gath-
ered to help learning to report on learning. bears on the reliability of the evi-
dence. In this case there are steps that can be taken to address limitation and
increase reliability; training can ensure that teachers collect evidence systemat-
ically and with integrity whilst moderation can optimize comparability.
When procedures are in place to assure quality in this way, then evidence
gathered by teachers at a level of detail suitable for helping learning can also be
used for dependable assessment of learning. This is what happens in the
Queensland Senior Certificate, where all the information nceded to grade stu-
dents comes from evidence collected and judged by teachers. However, the
reverse situation cannot be found; there are no examples of all the needs of
assessment for learning being provided from evidence collected for summative
115
purposes. Of course, il is 1'101 logical that this could be so.
This asymmetry in dual use seems to be a strong argument for maintaining
the distinction in purposes. We need to know for what purpose the evidence
was gathered and for what purpose it is used. Only then can we ('valuate
whether it is 'good' or not. One can conduct the same assessment and use it for
different purposes just as one can between two places for different pur-
poses. As the purpose is the basis for evaluating the success of the Journey, so
the purpose of assessment enables us to evaluate whether the purpose has been
achieved. If we fuse or confuse formative and summative purposes, experience
strongly suggests that 'good assessment' will mean good assessment of learn-
ing. not for learning.
Conclusion
The notion of progression is one of two key clements in the discussion in this
To develop students' understanding and skills teachers need to hav.. in
mind some developmental criteria in order to see how the goals of specific
If:'SSOns are linked to the progression of more general concepts and skills. For
formative purposes, these criteria do not need to be linked to levels; they just
provide a guid.. to the next steps in leaming. For summative assessment pur-
poses, where a summary is required, the use of levels, standards or grades is a
way of communicating what a student has achieved in ternlS of the criteria I"\.'p-
rest'nted by the levels, standards or grades. This process condenses evidence and
m.'CE'ssarily means a loss of detail. The uses to which summative informJtion is
put require that the levels and their like mean the same for all students. That is,
putting aside for the moment the possibility of mis-grading discussed in Chapter
7, a level or grade X for student A means this student has achieved roughly the
same as student Bwho also achieved a level or grade X. It is only for colwenlence
that we use levels; in theory we could "'port perfornlance in terms of a profile
.lCrosS succession of progressive criteria, but this would probably provide far
too much detail for most purposes where a concise summary is required.
It is the different purposes of the information, the second key feature of this
chapter, that create a distinction between formative and summative assessment.
We hay!.' argued that !.'vidence gathered as part of teaching and learning can be
used for both fonnative and summative purposes. It is used to help learning
when interpreted in tcnns of individuals' progress towards lesson goals. The
same can also be interpreted against the general crih?ria used in
reporting achievement in terms of levels or grades. However, we have noted
that evidence gathered in a form that is already a summary, as from a tl'St or
('xaminatio", generdlly lac.ks the detail needed to identify and inform next steps
in learning. This means that we cannot usc any evidence for just any purpose.
It argues for maintaining a dear distinction betWl"i'n formative and summativc
in terms of the usc made of the evidence.
Although ther(' an.> shadt.'S of fonnality in the ways of conducting formative
assessment and summative assessment, as illdicat('d in Figure 6.4, the diffcr-
116
ence in purpose remains. We cannot make an assumption tnat the way in which
('vidence is gathered will determine its use in learning; a classroom test can be
used to inform teaching witnout any reference to levels or it can be used to
provide a grade or 11'\1'1 for end of stage reporting. The asymmetrical relation-
ship - that evidence collected for formative assessment can be used for sum-
malive assessment but not vice \ers.l - means Ihat removing the labels and
referring only to 'assessment' would ineVitably favour summative purposes.
It is both a weakness and a strength that summative assessment derived by
reintl.'Tpreting fomlatiw evidence means that both are in tne hands of Ihe
teacher. The weakness arises from the known bias and errors that occur in
teachers' judgements. All assessment involves judgement and will therefore be
subjt->et to some error and bias. \Vhile this aspect has been given alt.mtion in the
context of teachers' assessment for summative uses, it no doubt exists in teach-
ers' assessment for fornlative purposes. Although it is not necessary to be over-
conC'l!med about the reliability of assessment for this purpose (because it occurs
regularly and the teacher wi11 be able to use feedback to correct for a mistaken
judgment), the more carefully any s ~ s s m e n t is made the more value it will
have in helping learning. TIle strength, therefore, is that the procedures for
ensuring more dependable summative assessment, which need to be in place in
a syst{'m using teachers' judgements, will benefit the formative use, the
teacher's und...rstanding of the learning g0.11s and tlw nature of progression in
achieving them. Experience shows that moderation of teachers' judgements,
necessary for external uses of summative assessment, can be conducted so that
this not only serws a quality control function but aloo has a quality assuranC'l!
function, with an impact on the process of assessment by teachers (ASF, 2004).
This will improVl' the collection and use of evideno..' for a formativl' as well as
a summative purpose.
This chapter has sought to exploTl' thl' rl'lationship between formative
assessment and summative assessment with a view to using the same evidence
for both purposes, We have Sl'{'n that there are potential dangers for formatiw
assessment in assuming that evid...nce gathered for summative assessment Clln
serve formaliw pUTpost-'S. Similarly, additional measures need to be put in
place if summati\'e assessment based on evidence gathered and used for form-
ative assessment is to be adcquat{'ly rcliJble. These issues are key to protecting
the intel;rity of as&>SSIllent and parlirular to protecting the integrity of forma-
tive assessment so that assessment has a positive impact on learning. which is
th{' centrJ! concern of this b<xlk.
117
Chapter 7
The Reliability of Assessments
Paul Black and Dylan Wiliam
The discussion at the end of the previous chapter raises the question of the part
that tcachen; play in the summative assessment of their students. For many of
the decisions made within schools, summative assessments made by teachers
play an important role and affect the progress of students. For summative
assessments that are used outside the school whether for progress 10 employ-
ment, further stages of education or for accountability purposes, the stakes are
even higher. The question of the extent to which such assessments should be
entrusted to teachers and schools is a key issue in assessment policy.
Any assessments should be so designed thai the users of the results, be they
the students, their parents. their teachers or the gatekeepers for further stages of
education or employment, can have oonfidence in the results. 1bere are two main
criteria of quality of an examination result thai should be a basis for such confi-
dence: reliability and validity. This chapter is concerned only with the first of
these, although there are areas of ov('rlap bctw('Cll th('J1l. The tenn 'dependabil-
ity' is used to signify the overall judgement of quality for an assessment which
may be influenced by both rellability and validity, and by other features also.
It is not possible to optimize the systems for producing summative assess-
ments, t.'ither for use within schools or for more general public use, unless both
the reliability and the validity of the various methods available are carefully
appraised. Both qualities are essential. However, the public in geTK.'ral and polley
makers in particular do not understand or pay attention to reliability, TIley
apJX'ar to have faith in the dept.'lldability of the results of short tests when they
are in fact ignorant of the si7..es of the inescapable errors that accompany this and
any other measure. This is II serious failing. Decisions which will have an impor-
tant effect on a student's future may be taken by placing more trust in Oil test-score
than in otht>r evidence about that student, when such trust is not justified.
In this chapter, the first section discusses what is meant by the reliability of
the score obtained from a summative test and the second examines published
evidence about lest reliabilities. The third section then looks at decision consis-
teney, thai is, the effects of limited reliability on the errors that ensue in assign-
ing candidates to specific grades or levels on the basis of lest srore5. The srope
of the considerations then broadens in the next thret' sections, which discuss in
tum the o\'erlap between reliability and validity; the reliability of formative
assessments and the broader issue of dependability. The leading issues are then
highlighted in a closing summary.
119
I
A5.sessmel'lt and learning
Thre.ts to reliability
No test is perfectly reliable. It is highly unlikely that the SOJTC thai somCQne gets
on one occasion would be exactly the same as on another occasion, c,cn on tht!
same test. However, if they took the same or similar tests on a number of occa-
sions then the average of all those scores would, in general be a good indicator
of their capability on whatever it was that the test was measuring. This awrage
is sometimes called the 'true score'. Thus, the starting point (or the
reliability of a lesl is to hypothesi7.e that each student has a 'truc score' on a par-
ticular lest - this does not mean that we believe that a student has a true 'ability'
in (say) reading, nor thaI the reading score is in any Soense fixed.
The main sources of error that can threaten the rclinbility of an examination
result are:
Any particular student may perform belter or worse depending on the actual
questions chosen for the particular administration of the lest;
The same student may perform better or worse from day-ta.-day;
Different markers may give different marks for the same piece of work.
(Black and Wiliam, 2(02)
The first of these three is a problem of question sampling. On any syllabus,
there will be a wry large number of questions Ihal can be set. Questions can
differ both in their content (for example, force. light. electricity in physiC5) and
in the type of atlainment Ihat they lest (for example. knowledge of definitions,
solution of routine short problems. design of an experiment to test a hypothe-
sis). Those who sel the UK General Certificate of Secondary Educatiun (GCSE)
and Advanced-level examinations usually work with a twa.-dimensional grid
with (say) content topiC5 as the rows and types of attainment as the columns.
The relative weights to be given 10 the ce11s of Ihis grid are usua11y prescribed
in the syllabus (but they can vary between one syllabus and another); so across
anyone examination, the examiners must reflecl this in the distribution of the
questions (for example, one on the definition of force, ""0 on applying the
concept in simple quantitative problems, and so on). In addition, they may
deploy different types of qUL'Stions, for example using a set of 40 mullipl ..-
choice questions 10 lest knowledge and simple applications so as to cover m.my
cells, and then having a sma11 number of longer problems 10 lesl application of
concepts and synthesis of ideas (for example, design of an experiment involv-
ing detection of light with devices which give oul elL'Clrical signals).
What the examples here demonstrate is that the composition of an examina-
tion is a delicate balancing act. There is a huge number of possible questions
that can be set on anyone syllabus: the examiners have to select a tiny propor-
tion and Iry to make their selection a fair sampl.. of Ihc whole.1 If the time
allowed for Ihe test or tests is very short, the sample will be very small. The
smaller the sample, the less confidence one can have thai the result for anyone
candidate would be the same as that which would be given on anolher sample
composed in the s.lme way. Thus, any examination can become mort' reliable if
il can be given a longer time.
120
I
No examination can prooua.' a perfect, error-free result. TIle size of the errors
due to the first of the sources of error listed above can be estimated from thc inter-
nal consistency of a test's results. If, for a test composed of Sl'VCral ilt'ms, candi-
dates are divided according to their overall score on the test, then one can look at
each component question (or item) to see whether those with a high overall score
have high scores on this question. and those with low overall scores have low
scores on this question. If this turns out to be the case, then the question is said to
have 'high discrimination' .If most of the questions have high discrimination. then
they are consistent with one another in putting the candidates in more or less the
samc order. The reliability-ooeffident that is often quoted for a lest is a meil5l.lre of
the internal consistency between the different questions that make up the test. Its
value will be a number between ".era and one. The measures usually employed
are the Kuder-Richardson coefficient (for multiple-choice tests) or Cronbach's
alpha (for other types of test); the principle underlying these two is the same.
If this internal consistenty is high, then it is likely that a much longer test
sampling more of the syllabus will give approximately the same result.
However, if checks on internal consistency reveal (say) that the reliability of a
test is at the level of 0.85, then in order to increase it to 0.95 with questions of
the same type, it would be ne<essary to more than triple the length of the test.
Reliability could be increased in another way - by removing from the test all
those questions which had low discrimination and replacing them with ques-
tions with high discrimination. This can only be done if questions are pre-
tested, and might have the effect of narrowing the diversity of issues
represented in the test in order to homogenize it.
Indices based on such checks are often claimed to gh'e the reliability of an
examination result. Such a claim is not justified, however, for it takes no account
of other possible sources of error. For example, a second source of error means
that the actual score achieved by a candidate on a given day could vary sub-
stantially from day to day. Again, this fig\lre could be improved. but only by
selling the test in sections with each taken on different days. Data on this source
are hard to find so it is usually not possible to estimate its effed. It would seem
hard to claim a priori that it is negligible.
The third source - marker error - is dealt with in part by careful selection and
training of markers, in pari by rigorous rules of procedure laid down (or
markers to follow and in part by careful checks on samples of marked work.
Whilst errors due to this SOUITI' could be reduced by double marking of every
script, this would also lead to very large increases both in the cost of examina-
tions and in tlw time taken 10 determine results. Particular cases of marker error
justifiably attract public concern, yet overall the errors due to this source are
probably small in comparison with the effects of the other sources listed here.
II is important to note, th'refore. that the main limitations on the accuracy of
e,\CaminJtion results are not the fault of testing agencies. All o( the sources could
be tacklC'd, but only if increases in costs, examining times and times taken to
produce results were to be aa:epted by the edocational system. Such acceptance
seems most unlikely; in this, as in many other situations, the public gets what it
is prepared to pay for.
'21
I
Evidence about reliability
Because there arc few published studies relating to the reliability of public
examinations, the proportions of candidates awarded the 'wrong' grade on any
one occasion arc nol known It is very surprising that there are no serious
attempts to research the effects of error in public eJlaminations, let alone publish
the rt'Sults.
The crucial criterion is therefore how dose the score I'll' get on a particular
testing occasion is to the 'true score', and given possiblt' error in a final mark,
there follows the possibility that a candidate's grade, which is based on an inter-
pretation of that mark, will also be in error.
Thus this criterion is ooncemcd with the inevitable chance of error in any
examination result. Four studies serve to illustrate the important(' of this crite-
rion. The first iSi! study by Rogosa (1999) of standardized lesls used in the state
of California. nus shows that even for tests with apparently high indices of reli-
ability, the chances of a candidate being mis-classified are high enough to leild
to serious consequences for many candidiltes. His results were expressed in
terms of percentiles, a measure of position in the rank order of all candidates. If
a candidate is on (say) the 40th percentile this means thai 40 per cent of all can-
didates have marks at or below the mark achieved by that candidate. His results
showed, for example, !hilt in gT3de 9 mathemiltics there is only a 57 per cent
probability that candidates whose 'true score' would put them in !he middle of
the rank order of candidates, that is, on the 50th percentile, will ach.Lally be clas-
sified a5 somewhere within the range 40th to 60th percentile, so that the other
43 pl'r cent of candidates will be mis-dassified by over ten percentile points. For
Ihose under-classified, this could lead to a requirement to repeat a grade or to
attend a summer school. II could also result in assignment to a lower track in
school, which would probably prejudice future achievement. Of the thn."C
sources of error listed above, this study explored the effects of the first only, that
is, error due to the limited sample of all possible questions.
The second is a report by Black (1963) of the use of tv.o parallel forms of tests
for first-year physics undergraduates, the two being taken within a few days of
one another and marked by the same markers to common criteria. The tests
were designed to decide who should proceed to honours study. Out of 100 can-
didates, 26 failed the first paper and 26 failed the ~ o n but only 13 failed
both. Half of those who would be denit"d further acress 10 the honours course
on the one pilpcr would have passed on the second, and vice versa. Untillhal
year decisions about proceeding to different courses had been taken on the
results of a single paJX'r. The effects illustrated by this study could ha\'e arisen
from the first two sources of error listed above.
The third study has proVided results which are more detailed and compre-
hensive. Gardner and Cowan (2000) report an analysis of the 11-plus selection
examination in Northern Ireland, where each candidate sits two parallel forms
of test with each covering English, mathematics and science. They were able to
examine both the internal consistency of each test and the consistency between
them. Results are reported on a six-grade scale and each selective (grammar)
122
I
school admits its applicants on the basis of their grade, starting with the
highest, and working down the grades until all the places are filled. Their
analysis shows that if one expects to infer a candidate's true grade from the
reported grade and one wants to be correct in this inference 95 per cent of the
time, then for a candidate in a middle grade one can say only that the true score
lies somewhere between the highest and the lowest grades (the '95 per cent con-
fidence interval' thus ranges from the lowes! to the highest grade). For a candi-
date just in the highest grade the true score may be anywhere within the top
four grades; given that this is the 95 per C('Jlt confidence interval 5 per cent of
students will be mis-<Iassified by an even greater margin. Of course, for stu
dents dose to the threshold, even a small mis-classification might lead to the
wrong decision. Given that 6-7000 candidates secure selective places, it is likely
that around 3000 will be mis-classified to the extent that either secures or denies
acceptance of their entry to grammar school due to the unreliability of the test.
1ltis study reflects the effects of all three possible sources of error.
The fourth source of evidence was provided by an analysis carried out by
Wiliam (2001) of the key stage tests used in England at ages 7, II and 14 respec-
tively. He concluded that the chances of a student's level result being wrong by
one level were around 20-30 per cent - this being an underestimate as it was
based onJy on the variability revealed by the internal consistency of perfonn-
ances on the single test occasion. 1ltis example is discussed in more detail
below. This study is similar to that by Rogosa quoted above, in that it explores
only the effects of errors due to the limited sample of all possible questions.
One can note that three of the reliability studies quoted above were carried
out by critics outside the systems crilidzed. There have been no funnal allempts
by governments or their agendes to conduct thorough research to establish the
reliabilities of high-stakes examinations. If this were to be done, il seems likely
that the resulting probabilities of mis-grading would be large enough to cause
some public concern. Thus it is essential that such research be undertaken and
the data made public. The following conclusion of Gardner and Cowan about
the Northem Ireland test applies with equal force to all of our public testing:
The publishN illfimnatioll 011 lilt does 1101 '1It.'rl1he mjuimnents of the interna-
ti0n4/ slandards 011 rollCQtional bolh gentrally in lilt prooision of standard
reliability and VIllidity information and particularly, for uample, in lilt validation of
the Test outcoma in rtlalion to its predictirw power ifor 'potential to
benefit from agrammar school rollcation'), rstablishing norms, plVllidillg informa-
tion on pott'll/ial mis-c/assijicalion, and accommooating disability. (1000: 9)
Estimating the consequences: decision consistency
As noted above, an individual's true score on a test is simply the average score
that the individual would get over repeated takings of the same or a very
similar test. The issue to be explored in this section is the possible effect of
errors in their actual test scores on decisions taken about the classification of
123
candidates in grades or lewis (for the examples '1uoted in the pTl'vious 5{'(tion
it was these Con5(.'quend.--s which were the focus of attention).
Knowing a student's mork on 0 test is not very informDtive unless we know
how difficult the test is. Because calibrating the difficulty of tests is complex the
results of many standardized tests al"{' Tl'ported on a standard scale, which
allows the performance of individuals to be compaTl'd with the performance of
a representative group of students who took the test nt some point in the past.
When this is done. it is conventional to scale the scores so that the overage score
is 100 nnd the standard deviation of the SCOIl'S is 15. This means that:
68 per cent (th<lt is, roughly two-thirds) of the population scure betwL'Cn 85
and 115;
For the other 32 per cent, 16 per cent SCOI"{' below !IS and 16 per cent score
above 115;
96 per cent score between 70 and DO.
So we can S<ly that the level of pcrformance of someone who scores 115 on a
reading test would be achieved or surpassed by 16 per cent of the population,
or that this leve! of performance is at the 84th percentile.
From this it would be tempting to conclude that som{'{)ne who scored 115 on
the test really is in the top 16 per cent of the population, but this may not be the
case because of the unreliability of the test. To explore the consequences of error
in the seore of any candidate, the first step is to examine the internal consisten-
cies amongst the test's scores. This can be used to calculate the conventional
measure known as the 'reliabi lity CUC'fficient'. A value for this coefficient of 1.0
means that the errors ure zero. so there is no error and the test is perfectly reli-
able. A )Cffident of (J.O means that the errors are very variable und the spread
in their likely values is the S<lme as that of the observed scores. that is, the scores
obtained by the individuals are all error so there is no information about the
individuals at nil! When a tL'St has a reliability of zero the rt'Sult of the test is
completely rundom.
The reliability of tests procluaxl in schools is typically around 0.7 to 0.8 while
that for commercially produced educational tests range from 0.1'1 to 0.9, and can
be over 0.9 for specialist psychological tests (a reputable standardized test will
provide details of the reliability and how it was calculated). To 5(.'C what this
means in practicr. it is useful to look at some specific kinds of tests.
If we assume a value for the reliability of a test, then we can estimate how far
the observed score is likely to be from the true saJre. For example. if the relia-
bility of a test is 0.75, then the standard deviation (5D) of the errors (a measure
of the spread in the errors) turns out to be 7.5.2 The consequences of this for the
standardized test will be that:
For 68 per cent of the candidates their actual scores will be within 7.5 (that is,
one 5D) of their true scores;
For % pcr crnt of the candidates their actual SCOIl'S will be within 15 (that is,
two 50s) of their true scores;
124
For 4 per cent of the candidates their actual score will be at least 15 away
from thcir true score.
For most studenlS in a class of 30, their actual score will ~ close to their true
score (that is, what they 'should' have got), but it is likely Ihal for at least one
the score will be 'wrong' by 15 points (but of course we do oot know who this
student is, nor whether the score they got was higher or lower than their true
score). For a test with a reliability of 0.75, this means that someone who scores
115 (who we might think is in the top sixth of the population) might on another
occasion score just 100 making them appear average. or as high as 130. pUlling
them in the top 2 per cent (often used as the threshold for considering a studmt
'gifted'). U the reliability were higher then this spread in the errors would be
smaller - for a reliability 01 0.85. the above value of 1.5 for the SO would be
replaced by a value of 6.
8ecauSll' the effects of unreliability operate randomly the averages across
groups of students, howt'wr, are quite accurate. For every student whose actual
score is lower than their true score there is likely to be one whost> actual score
is higher than their true score. so the average observed score across a class of
studfmts will be very dose 10 the average true score. But just as the person with
OIW foot in boiling waler and one foot in ice is quite comfortable 'on avt'rage'
....-e must bt> aware that the results of even the best tests can be wildly inaccurate
for a few individual students, and therefore high-stakes decisions should never
be based solely on the results of single tt'5ts.
Making sense of reliability for the key slage tests used in England is h.l.rder
because these are used to assign levels rather than marks, for good reason. It is
tempting to regard someone who gets 15 per cent in a test as being better than
someone who gets 14 perrent. ewn though the second pel!/Oi' might actually have
a higher true score. In order to avoid unwarranted precisian, thl-n:forc, ....-e oftl'1'I
just report levels. The danger, howcvt'r, is that in a\'Oiding un.....arranted procis.ion
....-e end up fulling victim to unwarranted accuracy - while we can sec that a mark
of 15 per cent is only a little better than 14 per rent, it is tempting to conclude thilt
level 2 is somehow qualilath-ely better than level 1. Firstly, the difference- in per-
(ormance betwel'O SOIlll'OO(' who scored level 2 and someone who scored le\'d I
might be only a single mark, and secondly, because o( the unrcliability of the tesl,
the person scoring 1('\.'('1 I might actually ha\-e h"d a hight.... true score.
Only limited data have bet>n published about the f't'liability of nalional cur-
riculum tests, although it is likely that the reliability of national curriculum lest5
is around 0.80 - perhaps slightly higher fot malhematiCtl and C i e n ~ Assum-
ing this reliability value, it is possible 10 ("alcuhtte the proportion of .'itudrnts
who would bc awarded the 'wrong' levels al each key stage of the national cur-
riculum. The proportion varies as a result of the unreliability of the tests as
shown in Table 7.1.
It is clear thai lht> greater the precision (that is,. thl! mOf\c> differenl levels into
which studl.'Ots are to be classified as they move from KSI to K(3) the lower the
accuracy. What is also clear is that although the proportion of mis-dassilkations
declines steadily as the reliability of a test increases, the impro\'Cmt.'I\t is very slow.
125
I
TablE> 7.1: in prrJFJ('rtirm fJ/ in n"tio,,"' clirril:ullim tests willi
",liability
R..Ii.bIlJly of T..sl
0."
0.65 0.10 0.15
""
0.85 0.90 0.95
Pm:enllge t%) of Students Misclassifid .., Each
Key Stage
Key Suge
""
27 25 23 23
" "
" "
"SJ
...
" "
36
"
27 23
"
"53
55 53
"
" " '"
"
"
We can make tests more reliable by improving the items included in the tests
"nd by making the marking more consistent, but in general the effect of such
changes iii smaU. There are only two ways of achieving a significant increase in
the reliability of a test: make the scope of the lest narrower so you ask more
questions on fewer topics, or make the lest longer so you ask more questions on
all of the topics.
It tums out that3 if we have a test with a r'-'liability of 0.75, and we want to
make it into a test with a reliability of 0.85, we would need a lest 1.9 times as
long. In other words, doubling the length of the test would reduce the propor-
tion of students by only 8 per cent at Key Stage t by 9 per cent at
Key Stage 2 and by 4 per cent at Key Stage 3. It is clear thilt increaSing the reli-
ability of the test has only a smilJl effect on the accuracy of the levels. In filct, if
we wilnted to improve the reliability of Key Stage 2 tests so that only 10 per cent
of students were awarded the incorrect level, we should need to increase the
length of the tests in each subject to o\'er 30 hours.
Now il seems unlikely thai even the most radicdl proponents of schools tests
would countenance 30 hours of testing for each subject. In survey tests, which
use only limited samples of students to obtain an overall evaluation of Shldents'
achievement, it is possible by giving different tests to different sub-samples to
use 30 hours of tC'sting (see for example, the UK Assessm('nt of Perfomlance
Unit surwys - Black, 1990). However, the reliability of the overall test perform-
ance of a group will be far hight'r than that for anyone individual, so that in
optimizmg the design of any sud! survey the extra testing time available has
been used mainly to increase the variety of IX'rformance outcomes assessed,
that is, to enhance validity.
Fortunately, there is allOthl'r way of increasing the effective length of a test,
without increasing testing time, and that is through the use of teacher assess-
ment. By doing this we would, in effect, be using assessments conducted over
tens if not hundreds of hours for ead! student so that there would be the potcn-
tial to achieve a degree of reliability that has never been ad!ieved in any system
of timed written examinations. This possibility has to be explored in the light of
evidence about the potential relilJbility of teachers' summative assessments. A
I
I
I
I
mriew of sudl evidence (see Harlen. 2004; ASF, 2004) dors show t.hJ,t it is pos-
sible to achieve high reliability if the procedures by which teachers arrive OIt
summativt' judgments are carefully designed and monitored.
The ov.'" betwn reliability and validity
'There are several issues affecting the interpretation of assessment results that
involve overlap between the concepts of reliability and validity. One S\Ich issue
bears on whether or not have been so romposed and presented that
the student's response will give an authentic pict\lre of the capability being
tested - a feature which may be called the 'disclosure of a question' (Wilia""
1992). Good disclosure is not l'asy 10 attain. For example. SE'\'t!ra1 rtSE'arch
studies have established that in multipll."-Choice rests in sOenct' many of those
making a correct choice among the alternatives had made their selection on the
basis of incorrect reasoning. whilst others had been led 10 a wrong choice by
legitimal" re.lsoning combined with unexpected interpretations of the question.
It would seem that in such tests approximittely one third of students are incor-
rectly evaluated on any Ont' question (Tamir, 1990; Towns and Robinson. 1993;
Yarroch 1991). It has abo been shown.. for open-ended questions. that misin
terpretation frequently leads candidates to fail to display what they know and
understand (Gauld, 1980). This source of error might arise from a random
sourre, for example careless reading by the student, and might have less impact
if the student were to attempt a larger number of questions; it would then
become a reliability issue. However, it might !'i'fled a systematic wt'akness in
the reading and/or interpretation of questions. which is not relevant to the per-
formance th.:lt lhe test is designed to measure; il would then be a validity issue.
A similar ambiguity of overlap arises in considering the use of tests for pre-
diction. For example, we might like most secondary.schools in the UK want to
use the results of IQ or aptitude tests taken at the age of 11 to predict scores on
GCSE examinations taken at 16, or use such tests at the end of high school to
predict performance in tertiary level work (Choppin and Orr, 1976). What we
would need to do would be to compare the GCSE scores obtained by students
at age 16 with those scores which the same students obtained on the IQ tests
five years earlier, when they were II. In gerK'ral we would find thai those who
got high scores in the IQ tests at 11 get high in GCSE, and low scorers
get lower grades. there will also be some students getting high SCOI1!S
on the IQ tests who do not go on to do well at GCSE and vice versa. How good
the predk:tion - oflen c:.alled the 'predictive validity of lhe - is u5ually
expressed as a correlation coeffident. Acorrelation of one means the correlation
is perfect, while a correlation of zero would mean that the predictor te1J5 US
nothing at all about the criterion. Generally, in educational testing. a COtTetation
of 0.7 between predictor and criterion is regarded as good.
In interpreting these coeffidents. can! is often needed because they an:
quently reported after 'correction for unreliabUity'. The validity of IQ scores as
predictors of GCSE is usually taken to mean the comlation between true.scores
on the predictor and true scores on the criterion. Howewr. as we have seen.. we
127
I
never know the true scores - all we have are the observed scores and these are
aflected by the unreliability of Ihe tests. When someone reports a validity ClX'f-
ficien! as being corrected for unreliability, they are quoting the correlation
between the true scores on the predictor and criterion by applying a statistical
adjustment 10 the correlation between the observed scores, which will appear to
be much better than we can actually do in practice because the effects of unre-
liability are inescapable. For example, if the correlation between the true scores
on a pfedictor and a criterion - that is, the validity 'corrected for unreliability'
- is 0.7, but each of these is measured with tests of reliability 0.9, the correlation
between the actual values on Ihc predictor and the criterion will be less than 0.6.
Adecline from 0.7 to 0.6 might seem small, but it should be pointed out Ihal the
proportion of the common variance in the results depends on the square of the
correlation coefficient. so that in this case there will be a decrease from 49 per
cent to 35 per cent in the variance in the scores that is common to the two tests.
A similar issue arises in the common practice of using lest results to select
individuals. If we use a test to group a cohort of 100 students into four sets for
mathematics, with, say, 35 in the top sel, 30 in set 2, 20 in set 3 and 15 in set 4,
how accurate will our setting be? If we assume that our selection test has a pre-
dictive validity of 0.7 and a reliability of 0.9, then of the 35 students that we
place in the top set, only 23 should actually be there - the other 12 should be in
sets 2 or 3. Perhaps more implrtantly, given the rationale used for setting, 12
studentJ who should be in set 1 will actually be placed in set 2 or even set 3.
Only 12 of the 30 students in set 2 will be correctly placed there - nine should
have been in set 1 and nine should have been in sets 3 and 4. The complete sit-
uation l!il shown in Table 7.2.
Table 7. 2: A l T ~ y of 5<'1/;"8 with ~ Irsl of oolidity of 0.7
NUDlbetof Stud.nlJ that Should be Plac.d in
Earn S.l
So< ,
S.t2
So"
" '" '"
s. ,
"
, ,
~ in which Shld.nlJ Au
s.,
,
"
S
Adu.o.Uy Plac.d
So< ,
,
S 7
So<
,

So..
"
,
,
In o!h{'r lordS, because of the limitations in tilt' reliability and validity of the test..
only half of the students all." placed where they 'should' be. Again. it is worth
noting that these are IlOIweaknt'sses in the quality of the tests but fundamental
limitations of what tests can do. If anything. the assumptions made hell." all."
rather conservative - reliabilities of 0.9 and predictive validities of 0.7 all." at the
limit of what we can achieve with rorrent methods. As with national curriculum
".
testing, the key to improwd reliability lies with increased use of teacher assess-
ment,. standardized and moderated to minimize the potential for bias.
A different issue in the relationship between reliability and validity is the
'trade-off' whereby one may be enhanced at the expense of the other. An
example here is the different structures in the UK GCSES papers for different
subjects. A typkal9l).minute paper in science includes approximately 12 struc-
tured questions giving a total of about 50 sub-sections. For each of these, the
space allowed on the examination paper for the response is rarely more than
four lines. The large number of issues so covered, and the homogeneity in the
type of response demanded, help to enhance reliability but at the expense of
validity, because there is no opportunity for candidates to offer a synthesis or
comprehensi\'C discussion in extl!nded prose. By contrast, a paper in (say) a
social science may require answers to only two or thret' questions. This makes
the test valid in prioritizing modes of connected thinking and writing, but
undennincs reliability in that some candidates will be 'unlucky' because these
particular questions, being a very small sample of the work studied, are based
on topics that they have not studied thoroughly.
An extreme example of this trade-off is the situation where reliability is dis-
regarded because validity is all important. A PhD examination is an obvious
example; the candidate has to show the capability to conduct an in-depth explo-
ration of a chosen problem. The inference that may be made is that someone
who can do this is able to 'do research', that is, to do work of similarly good
quality on other problems. This will be a judgment by the examiners: there is
only a single task, and no possibility of looking for consistency of perfonnance
across many tasks, so that estimates of reliability are impossible. Here, validity
does not depend on reliability. This is quite unlike the case of inferring thai a
GCSE candidate is competent in mathematics on the basis of an aggregate score
over the st'\'Cral questions attempted in a written test. Here, reliability is a prior
condition necessary but nol sufficient - for achieving validity. There are more
complex intennediate cases, for example if a certificate examination is com-
posed of marks on a wrilten paper and an assessment of a single substantial
project; in such a case, the automatic addition of a it'SI paper score iIInd a project
score may be inappropriate.
Aell.bUtty tor formative assessments
Adifferent arena of overlap is involved in the consideration of fonnative assess-
ment, given that all of the above discussion arist.'S in relation to summative assess-
ments. The issues here are very different from the summative issues. Any
evidence here is collected and interpreted for the Pu.rposl' of guiding learning on
the particular task involved and generalizatiOtl across a range of tasks to fonn an
overall judgment is irrelevillnt. Furthermore, inadequate disclosure which intro-
duces irrelevant variation in a test and thereby reduces reliability (as well as
validity) is less important if the teacher can detect and col'll'ct for it in continuing
interactiOtl Vlith the learner. However, some fonnative assessment takes place
over longer lime intervals than that of interactions in (say) classroom dialogue
129
and involves action in response to a collection of several pieces of evidence. One
example would be response to a class test, when a teacher has to decide now
much time to gh'e to remedial work in order to tackle weaknesses revealed by
that test: here, both the I'\'liability and validity of the test will be at issue, although
if short-term interactions are to be part of any 'improvement' exercise any short-
comings in the action taken should become evident fairly quickly and can be cor-
rected immediately. This issue was expanded more fully in Wiliam and Black
where the argument was summed up as follows:
As nottdII/.Iow. summlltitot lind frnmlltifJt functions IIrt, for tM pUrpost' ofthis dis-
cussion, charaderiud as the ffld of a ron/inuum alang which Il5se5STllt1lt can be
locllted. At alit atreme (the jonnativtJ the problems of Cffllting s/UlTtd m/!/lnings
beyolld tile immed;ate settillg are ignored; are evaluated by the extent
/0 which they provide II /Jolsis jor slICU5S!II/llclion. At the other extreme (lhe Slml-
mlltive} sh/lred mtllnings art milch more importlln/, lind /he considerable dis/or-
tions and ulldtsirabll' conseqU(lICtS that are often justified by to the
need to create ronsistency of interpretation. Presenting this argu1lll.'n/ somewhat
stllrkly, when fornUltiue functions are pIlramount, meanings aTt often wlida/ed by
their consequences, and when sllmmative functions art paramount, COllseqU(llctS
art by meanings. 0996: 544)
Conclusion
TIUs chapter ends with inoomplcle arguments be<:auSoC of the overlaps between
reliability and validity. Of importance here is dependability, which is essentially
an overall integrating concept in which both reliability and validity are sub-
sumed. It follows that a comprehensive consideration of this overall issue
belongs to the next chapter on validity.
Hov,.oever, the issues discussed here are dearly of great importance and ought
to be understood by both designers of assessment systems and by users of test
results. One arena, in which this has importance, is the use of assessment results
within schools as guidance for students and decisions about them. The fact that
teachers may be unaware of the limited reliability of own tests is thus a
serious issue. Where assessment results are used for decisions beyond schools,
knowledge of reliability is also importanl The fact that, at least for pubilc exam-
inations in the UK, reliability is neither researched nor discussed is a serious
weakness. Data on reliability must be taken into account in designing test
systems for optimum 'trade-off' between the various constraints and criteria
that determine dependability. [n the absence of such data, optimum design is
hardly possible because it is not possible to evaluate fully alternative design
possibilities. As emphasized at the beginning of this chapter, this absence is also
serious because all users can be Sl."riously misled. For example, decisions that
ha\"e an important effect on a student's future may be taken by placing mOll."
trust in a test-score than in other evidence about that student, when such trust
is dearly not justified.
130
Oyt'wll, ont' consequt'nce of tht' absenct' of feliabil ity data is that most teach-
ers, th(> public in general. and policy mllkl'rs in particular do not understand or
1lltend to test reliability as lin issue ilnd some ilre indeed reluctilnt to promote
res<'arch into reliability because of a fear thilt it will undermine public confi-
dt'nce in examinations. Of course, it may well do so to an unrt'asonable degrl-'e
wherl' the mlxliil and thl' public gent'rilily do not understand rona'pts of uncer
tilinty ilnd error in datil. A debilte that promotes the development of such
understilnding is long overdue.
Notes
In eXilminiltion jargon, the whole collection of possible questions is Cllned il
'domain', ilnd the issue just discussed is called 'domain sampling'. In any
sllbject, it is possible to split tht' subject domain (say physics) into S('veral sub-
domains. t'ithcr according to rontcnt or to typt'S of atlllinment (say under
standing of concepts, llppHCiltion in complex problems, design of experi-
ments). One might then test each domain separately with its own sct of ques-
tions and report on a student's attainment in each of thCSl' domains scp..lrately,
so giving 1l profile of atlllinments insklld of a singl(> rt'suH. Howcwr, if this is
done by splitting up but not inCfCllsing the total testing time, then each
domilin will be tested by a very small number of questions so thilt the SCOfc
for each element of the profile will be filr less reliable than the overall sCtlre.
2 SinCl' the stilndard dl'viation of tht' scorcs is 15 for a rcli,lbility of 0.85. from
our key formula we Ciln say that the st,lOdard deviation of the errors is:
j1 (J.S5 x 15
which is just under 6. Similarly. for a reliability of 0.75, the same formula will
give II valut' of 7.5.
3 In gcn(>ral if we hllve lltest of relillbility r and we want a reliability of R, then
we need to lengthen the test by a fllctnrof" given by:
R(l- r)
If - r(l R)
4 The classifieiltion consistency increilSCS broadly ilS the fourth root of the test
length. so a doubling in classification consistency r<,<!uires incl'\'asing the
test length 16 timl'S.
5 GCSE is the General Ccrtificat(> of Secondary Education which comprises a
sct of subject eXilminiltions, from which most students in secondary schools
in England. Wales and Northern Ireland al age 16, that is, at the end of Ctlm-
pulsory l-xlucation, chOOS(' to tal<(> 11 ft'w (gent'rally betwl--'t'n 4 and 7) subjeds.
131
Chapter 8
The Validity of Formative Assessment
Gordon Stolwlrt
The deceptively simple claim of this chapter is that for formative assessment to be
valid it must lead to further learning. 1be validity argument is therefore about the
consequences of assessment. The assumption is that formative assessment gener-
ates information thai enables this further learning to take place - the 'how to gel
there' of our working definition of assessment for learning (see the Introduction).
One implication of this is that assessments may be fonnative in intention but are
not so in practice because they do not generate further learning.
This 'consequential' approach differs from how the validity of summative
assessments is generally judged. Here the emphasis is on the trustworthiness of
the inferences drilwn from the results. It is about the meaning attached to an
assessment and will vary according to purpose. Reliability is more ct"ntral to
this because if the results are unreliable, then the inferences drawn from them
will lack validity (see Chapler 6).
This chapter examines current understandings of validity in relation to both
summative and formative assessment. It then explores the conditions that
encourage assessment for leaming and those which may undermine il Two key
factors in this are the context in which learning takes place and the quality of
feedback. The learning context includes the socio-cultural and policy environ-
ment as well as what goes on in the classroom. Feedback is seen as a key
element in the teaching and learning relationship. These factors relate directly
to the treatment of 'making learning explicit' in Chapter 2, to motivation in
Chapter4 and the formative-summalive relationship in Chapter 6. Reliability in
formative assessment is discussed in Chapter 7.
Validity
Most of the theorizing about validity relates to testing and this will be used as
the basis for looking at validity in formative assessment. In relation to testing.
validity is no longer simply seen as a static property of an assessment, which is
something a test has, but is based on the inferences drawn from the results of
an assessment. This means that each time a test is given. the interpretation of
the results is part of a 'validity argument'. For example, if a well-designed
mathematics test is used as the sole selection instrument for admission to art
133
,
school we may immediately judge it as an invalid assessment. [f it is used to
select for mathematics classes tnen it may be more valid. [t is how the assess-
ment information is understood and used that is critical. AI the heart of current
understandings of validity are assumptions that an assessment effectively
samples the construct Ihal it claims to assess. Is the assessment too restricted in
wh.1.1 it covers or does it actually assess different skills or understandings to
those intended? This is essentially about fitness-for-purpose. It is a property of
the test scores rather than the lest itself. The 1985 verSion of the American Edu-
cational Research Association's Standards for Educatiollal alld Psych%gleal Testing
was explicit on this: 'validity always refers to the degree 10 which , .. evidence
supports Ihe inferenres Ihal are made from the scores' (1985: 9).
Ths approach was rnampioned by Messick:
Validity is all illttgrated ewJuative judgtmt'nt of tilt degref' 10 which tmpirical <'Vi-
dellce alld theoretical ratiollaJes support the adequacy mId appropriateness of infrr-
enas and actions bas.!d fJlr ttst srore5 or other modI'S of IIssessmenl. (1989: 13)
TIle validity of summative assessment is therefore essentially about
thiness, how well the construct has been assessed and the results interpreted.
This brings into play both the interpretation of the construct and the reliability
of the assessment. Any unreliability in an assessment weakens confidence in the
inferences that (an be drawn. If there is limited confidence in the results as a
consequence of how the test was marked or how the final grade was decided,
then its validity is threatened.
Howevt'r, a test may be highly reliable yet sample only a part of a con-
stmcl. We can then have only limited confidence in what il tells us about a
student's overall understanding. Take the exampl(' of a reading test. To 1x> valid
this test mUSI assess competence in reading. But what do we mean by reading?
Thirty years ago a widely used reading tesl in England was the SchonI'll's
Graded Word Reading Test which required readers to pronounce com'Ctly
single deoontextualized words of increasing difficulty (for example, tree - side-
real). The total of correctly read words, with the test stopping after ten conser:-
utive failures, was then converted into a 'Reading Age'. By contrast, the current
national curriculum English tests for II-year--olds in England are based on a
construct of reading that focuses on understanding of, and making inferences
from, written text. Responses are written and there is no 'reading out loud'
involved. Gearly a key element in considering validity is to agree the construct
that is being assessed. Which of the above provides a more valid reading score
or do both suffer from 'construct under-representation' (Messick, 1989)?
This approach links to the previous h\1O chapters. Because the purpose of the
assessment is a key element in validity - 'it does what it claims to do' - then th('
validity argument differs for formative and summative assessment (see
Chapter 6). In fonnative assessment it is about consequences - has further
learning laken place as a result of the assessment? In summative assessment it
is the trustworthiness of the inferences that are drawn from the resulls - does
our interpretation of students' results do justia!' to their understanding? On this
134
basis much of what is discussed as reliabilily in Chapter 7 is subsumed into
validilyarguments.
Threats to validity
This chapler uses Ihe idea of 'threats to validity' (Crooks et aI., 1996) 10 explore
whert" the validity argument may be most vulnerable. Crooks el al. use an
approach which sees the validity process as a series of linked stages. The
weakest link in the chain is Ihe mosl serious Ihrt"al to validity. If Ihere is limited
confidence in the results as a consequence of a highly inconsistent administra-
tion of a test, then this may be the most important threat to validity. Or it could
be \hat a tesl was fairly marked and graded but the interpretation of the results,
and Ihe decisions made as a consequence, were misguided and therefore under-
mine its validity.
One of the key threat:> 10 validity in test-based 5ummati\"e assessmenl is tholt,
in the quest for highly reliabll' assessml'nt, only thl' more l'asHy and rt:'liabl)'
assessed paris of a construct art:' assessed. So speaking and listening may be left
out of language t s ~ because of reliability issues and writing may be assessed
through multiple-choice tests. \\'hile reliability is necessary for \'alidit)', a highly
reliable tl'St may be less valid because it sampled only a small part of lhe construct
- so we cannot make confident generali7.alions from the results. One of the strong
arguments for summative teacher assessment is that it does allow a construct to
be more fully; and repeatedly, sampled. So even if it may seem less reliable
because it cannot be standardized as precisely as examination marking. il may be
a more dependable assessment of the construct being measured (see Chapter 7).
A second major threat to validity is what Messick obscurely calls 'construct
irrelevant variance'. If a test is intended to measure reasoning skills but
students can do well on it by rote leaming of prepan.-d olnswefl;, then it is not
doing Wholl it claims it is doing; it therefore lacks validity. This is because
suCO?ss has come from perfonnance irrelevant to the construct being assessed
(Frederiksen and Collins, 1989).
This is important be<:ause it means that leaming cannol simply be Ci:juated
with perfonnance on lests. Goo<! scores do not necessarily mean that effective
learning has taken place. This was evidenced by Gordon and Reese's
conclusions on their study of the Texas Assessment of Academic Skills, which
students were passing
et>t'll thlJUgh Iht students hlWt nroer lellrned the amcepts on which thry are bting
tested. As tellehl'TS become mare IIdept lit this I'rocI"SS, they elln roen telleh s/udellts
to carrn:tly 1l/ISWtr ttSt it.-Ills ill/ended to Illl'llSUre sludellts' Ilbility to apply, or
sY/lthtSiu, rot'/J though tile Sludents hill!/' not droelopl'd appl!catillrJ, analysis or
synthtSis skills. (1997: 364)
If increasing proportions of ll-year-olds reach level 4 on the national curricu-
lum assessment in England, have educational standards risen? There is limited
135
public recognition that there may be an 'improved test taking' factor that
accounts for some of this and there may not be the same degree of improvement
on other, similar, measures for which there has not been extensive preparation
(Tymms, 2004; Linn, 2OCXJ).
This brief review of how validity is being interpreted in summative assessment
proVides a framework for considering the validity of formative assessments. Here
validity arguments go beyond the focus on the inferences drawn from the results
to consider the consequences of an assessment.. a contested approach in relation
to summative assessment (see Shepard, 1997; Popham,. 1997).
V.lid fOmMtive .ssessment
It is consequential validity which is the basis for validity claims in formative
assessment. By definition, the purpose of formative assessment is to lead to
further learning. If it fails in this then, while the intention was formative, the
process was not (Wiliam and Black, 1996; Wiliam,. 2(00). This is a strict, and cir-
cular, definition which implies that validity is central to developing or practis-
ing formative assessment.
Validity in formative assessment hinges on how effectively this learning
takes place. What gets in the way of this further learning can be treated as a
threat to the validity of formative assessment. This parallels processes for
investigating the validity of tests (Crooks et 011.,1996) and of teacher-based
performance assessments (Kane et a1.,1999). In the follOWing sections some of
the key factors that may support or undermine formative assessment are
briefly considered. The learning context in which formative assessment takes
place is seen as critical, This includes what goes on outside the classroom.. the
social and political environment, as well as expectations about what and how
teachers teach and learners learn within the classroom. At a more individual
level, feedback has a key role in formative assessment. What we know about
successful feedback is discussed, along with why some feedback practices may
undermine learning.
The learning context
If validity is based on whether learning takes place as a consequence of an
assessment, what can encourage or undermine this learning? Perrenoud (1998)
has argued that formative assessment is affected by what goes on 'upstream' of
specific teacher-learner interactions and that this context is often neglected,
partly because it is so complex. Some of the cultural assumptions on which
assessment for learning is based are a product largely of developed anglophone
cultures (particularly Australia, New Zealand, the UK and the USA), with their
'whole child' approaches, individualism and attitude to motivation. II is there-
fore worth briefly considering how some different social and cultural factors
may affect what goes on in the classroom, since these are Iiktoly 10 provide dif-
fering threats 10 effective formative assessment.
Outside the classroom
At the macro level, the role and stah.Ls of education within a society will impact
on students' motivation to learn and the scope for fonnative assessmenl. In a
society where high value is plan>d on education, for example in Chinese edu-
cation. student motivation to ll!arn may be a 'given' (Watkins, 2iXXl) rather than
having to be fostered by schools as is often the assumption in m3ny UK and
North American schools (Hidi and H3rackiewicz, 2iXXl; ARC, 200201). Similarly,
the emphasis on fl'edback being task-related rather than self-related may sit
more comfortably in cultures which see the role of the teacher as to instruct (for
example, France) rather than as to care for the 'whole child' (for example Ihe
UK, see Raveaud, 2(04). There are also cultural differences around the extent to
which education may be Sl'i'n as a collective activity with group work as a
natural expression of this, or as an individualistic activity in which peer assess-
ment may seem alien (Watkins, 2000).
The curriculum and how it is assessed is another key 'outside' contelltual
factor. The opportunities for fonnative assessment, in a centralized curriculum
with high-stakes national testing. will be different for those teachers who enjoy
more autonomy over what they have to cover and how they assess il. In
Chapter 6 the question is raised as to whether an outcomes-based/criterion-
related curriculum, with its attempts to make learning goals and standards
explicit, provides improved opportunities for fonnative assessment.
Inadequate training and resources are obvious threats to fonnative
assessmenLIn many large and badly resourced classrooms, ideas of individual
feedback or of regular groupwork are non-starters. In some countries very
large classes may be taught by teachers with limited subject knowledge who
an' teaching an unfamiliar curriculum - all of which will limit potential (Meier,
2000).
The culture of schooling will also impact on the effectiveness of
fonnative assessment. Entrenched views on teaching and learning may
undennine or support fonnative assessment, as might deeply embedded
assessment practices. For example, in a culh.Lre where the dominant model of
teaching is didactic, moves towards peer and self-assessment by learners may
involve radical, and managerially unpopular, changes to the classroom ethos
(Carless, 2005; Black et al., 2003). Similarly, where continuous assessment by
teachers is high stakes, detennines progression and is well understood by
parents, any move by a classroom teacher to provide feedback through
comments rather than marks or grades is likely to meet ~ s t n (Cark"Ss,
2005). This may come from both outside and inside the school - parents may
see the teachers as not doing their jobs properly and the students may not c0-
operate because work that does not receive a mark does not count, and so is
not worth doing.
These are just a few examples of the ways in which the social and educational
context will shape and control what is possible inside the classroom and in indi-
vidual teacher-sh.Ldent interactions. TItese social and cultural factors will con-
dition how effective fonnative assessment in the classroom will be.
137
Inside the dassroom: lumlng context and fHdback
For formative assessment to lead to learning.. the classroom context has to be
supportive and the feedback to the learner productive. Uthe conditions militate
against learning they become threats to validity.
Crooks (2001) has outlined what he considers are the key issues that influ-
ence the validity of formative assessment in the classroom. He groups these into
four main factors: affective, task, structural and process. To reflect the themes of
this book Ihey can be organized in terms of trust and motivation; explicit learn-
ing; and the fonnative/summative relationship.
Trust and motivation
Chapter 4 demonstrates that learning involves trust and motivation. For
Crooks, trust implies supportive classroom relationships and attitudes where
the student feels safe to admit difficulties and the teacher is constructive and
encouraging. Motivation involves both teacher commitment to the student's
learning and the student's own wish to learn and improve. TheN may also be a
strong contextual element in this, so that il will also fluctuate across different
situations -I may be much more willing to learn in drama than I am in maths.
Crooks's approach to affective factors may reflect the cultural assumptions of
English-speaking industrialized societies. In other cultures, motivation and tnIst
may be differently expressed. For example, commentarlcs on Russian schooling
suggest a very different attitude to praise and to building self-esteem. Alexander
observes that while there was only a handful of praise descriptors in Russian, 'the
vocabulary of disapproval is rich and varied' (2CX. 375). Yet in this more aitical
climate, there is strong evidence of Russian students pursuing mastery goals and
being willing to risk mistakes. Hufton and Elliott report from their comparative
study that 'it was not unusual for students who did not understand something.
to request to work in front of the class on the blackboard, so that teacher and
peers could follow and correct their working' (2001: 10). Raveaud's (2004) account
of French and English primary school classes offers some similar challenges to
anglophoroe understandings of fostering trust and motivation.
II may be that we need a more robust view of trust to make sense of such find-
ings. TIle Russian example of trust may be a lot less likely to occur in cultures in
which teachers seek to minimize the risk of error and to protect learners' self-
esteem. 1lle trust seems to be based on the assumption that the teacher is there to
help them learn bUI is not necessarily going to rescue them immediately from mis-
takes or misunderstandings. it is this kind of trust that makes the idiosyncratic and
unplanned formative interactions in the classroom powerful; there is confidence in
the tearner who has, in tum, confidence in the student's capacity to learn.
Explicit learning
An element of this trust is that the teacher knows what is to be learned. Explicit
learning incorporates the teacher's knowledge and understanding of the task,
the aiteria and standards that are to be met and how effectively these are com-
138
municated to the learners. Oarke (2001, 2005) has continued to draw attention
to the importance of being explicit about 'learning intentions'. The importance
of subject knowledge may have been underplayt.'d in some of the earlier writ-
ings on assessment for learning. though there is now an increased recognition
of the importance of pedagogical content knowledge (see Otapter 2).
The reason for being more explicit about learning intentions is, in part, 10
engage the student in understanding what is required. One of the threats to
validity is that students do not understand what they are supposed to be learn-
ing and what reaching the intended standard wi.ll involve. A recent survey of
l3-year-olds in England (Stoll et aI., 2003), which involved:z.ooo students, asked
'what helps you learn in schoolf. The largest group of responses involved the
teacher making clear what was being learned:
My sdmu le/lchu /lnd English /e/lcher wriles oul Slims for the Irsson. which
helps undus/llnd whal wt' 11ft going 10 do in lesson. I don'lUb lellchffs
wIw jusl give you II of IOOrk /lnd up1 us to know whu/ 10 do, tlzey have 10
the piue of ulOrk. (2003: 62-3)
Other work with students has also brought home how bewildered some are
about what is being learned:
II's /loiliult llulven't Ifllmt much. It's just tlult 1 don't really undusland what
/'m doing. 05-year-old studffll, Harris el al., 1995)
Understanding 'where they need to get to in their learning' is a key ('lement in the
definition of aSSCSllment for learning (ARG, 2002a). For Sadler (1989) it is this
understanding of where they need to get to ('the standard') thai is critical to suc-
cessful feedback. When we are not sure what is needed, it is hard to make sense
of feedback. At a more theoretical level this can be linked to construct validity
what is to be learned and how does this relate to the domain being studied?
Dilemmas in 'making explidt'
How explicit should learning intentions be? How do w(' strike a balance which
encourages deep learning procesSt.'S and mastery learning? If the intentions are
too general the learner may not be able to appreciate what is required. If they
are too specific this may lend itself to surface learning of 'knowledge in bits'.
The level descriptions used in the assessment of the nationlll curriculum in
England, and other comparable 'outcomes-based' approaches around the
world, run the risk of being either too general or too 'dense' to be self-evident.
11ley may need considerable further mediation by teachers in order for learners
to grasp them. For example, this is the level description for the perfonnance in
writing expected of l1-year-olds in England:
Ltve/4 Writing
Pupils' writing in II rllnge offorms is liuely lind thoughtful. ldellS 11ft often sus-
t/lined lind dtvtloptd in intertS/ing WlIys lind organized approprlatdy for
139
purpos4! of the reoder. Vocabulory choices art aftffl adventurous and words Ori u$M
for e!frct. Pupils are bf'ginning to ~ grammatically complex sentem:ts, extalding
meaning. Spelling, including thot of polysyllabic words that conform to rigular
pill/ans, is generally accurate. Full stops, capilllllmrrs lind qlltstion morks are
u ~ correctly, lind pupils lire bf'ginning to u ~ punctuation within the sentence.
Handwriting style is fluent. joined and legibk (QCA. 2005)
While this may provide a good basis for explicit leaming intentions, the practice
of many !iChools to 'level' (that is, to award a level to) each piece of work is likely
to be unproductive in terms of understanding the standard. This is especially so
as the descriptions are used in a 'best fit' rather than in a criterion-referenced way,
allowing the studt'nt to gain a level 4 without fuUy meeting aU the requirements.
In a criterionreferenced system, in which the student must meet every state-
ment at a le\'el to gain that leveL the threat is that the standard may become too
detailed and mechanistic. This may encourage a surface learning approach in
which discrete techniques are worked on in a way that may inhibit 'prindpled'
understanding. For example, some of the occupational qualifications in
England have been made so specific that 'learning' consists of meeting hun-
dreds of competence statements, leading to a 'tick-box' approach in which stu-
dents are 'hunters and gatherers of information without deep engagement in
either content or process' (Ecclestone, 2002: 36). This approach is paralleled in
highly detailed 'assessment objectives' in national tests and examinations which
may encourage micro--teaching on how to gain an extra mark,. rather than a
broader understanding.
The formativelsummative relationship
Crooks (2001) identifies 'connections' and 'purposes' as structural factors which
influence the validity of formatiVl.' assessment. His concern is the relationship
of formative assessment to the end product and to its summatiVl.' assessment
(see Chapter 6). His assumption is that formative assessment is part of work-in-
progress in the classroom. How does the fmal version benefit from formative
assessment? While the salience of feedback on a draft version of a piece of in-
class coursework may be obvious, it is less clear if the summative element is an
external examination. The threat to validity in the preparing-for-tests classroom
is that the emphasis may shift from learning to test-taking techniques, encour-
aging 'construct-irrelevant' teaching and learning.
When the work is strongly criterion-related, so that the standard of perform-
ance required to reach a certain level is specified, then this has implications for
the teacher's role. The formative element in this process involves feedback on
how the work relates to the criteria and how it can be improved to reach a par-
ticular standard. The summative judgement is whether, at a given point, the
work meets the standard. The dilemma here for many teachers is how to play
the roles of both facilitator and examiner. There is evidence from portfolio-
based vocational qualifications that some teachers have found this problemati-
cal (Ecclestone, 2(02).
140
The validity of Formative Assgffient
In summary, wilhin the classroom factors such as tnJst and motivation,
clarity about what is being learned and the relationship of formative assessment
to the summative goals will all affect the validity of fonnalive assessments. This
leads 10 the recognition that the possibilities for fonnalive assessment are betler
in some learning conlexts than others. 11le task is then to improve the learning
rontext so as to increase the validity of formative assessment.
Validity and feedback
Feedback is one of IN> central components of assessment for I('aming. If feed-
back is defined in terms of 'closing the gap' between actual and desired per-
formance then lhe key consequential validity issue is whether this has occurred.
What the resench evidence make! clear, however, is just how complex the
of feedback in learning is. While we can give fet'dback th"t is intended
to hdp the learner to close the gap, this may not happen. It is not
just that feedback does not improve learning. it may even interfere with it.
Kluger and DeNisi ronclude from their metaanalysis of the psychological
research that:
'n Olin' third of 1M aJSt$ FmllxIck Inlmltfltiuns mlucnl pt'fiJlf/l(lnct ... 1
bdiew tlull T'tStllrrhc's lind prlUli/ionn5 IIfiu their jminp thlll jMlbIldc
is dtsirllblt with 'luts/ion of whetlln- FttdblUk fntmltfllion bcrntfils p<rform-
,nct. (J996: 275. 277)
For feedback in the classroom. the following play an important role in the estab-
lishment of valid feedback:
It is dearly linked to the learning intention.:
The learner understands the success criteria/standard;
It gives cues at appropriate levels on how to bridge the gap:
a) seU'regulatol)'/met3cognitive
b) process/deep learning
c) task/surface learning;
It focuses on the task rather than the learner (self/ego);
It challenges, requires action, and is achievable.
The Hrst two points refe.r back 10 the task factors and reinforct' the relationship
between darily about what is being leamt."<! and those assessment criteria
which relate directly to it.
'Cues at appropriate levels' is derived from psychological construdS used by
Kluger and DeNisi and needs some decoding. 11le thrust of their argument is
that if feedback is pitched at a particular level then the response to it is likely to
be at that leW'l. For example. if feedback is in te.rms of encouraging persever-
ance with a task ('self-regulation') the mponse will be in terms of more effort.
While this in itself will not lead to new learning it may provide the rontext for
seeking further feedback 1'1 the process or task level. Feedback is most power-
14'
fuJ when it i! provided at the process level and seeb to 1Ilah- connections and
grasp underlying principles. Feedb<tck at the task level is producth'e when it
deals with incorrect or partial information. though It'Ss 50 when the
task/concept is not understood.
This line 01 rt'asoning means that feedback at the selfJego level will focus atten-
tion at this level. The gap to be dosed is then Ies6 about students' learning than
their self-perception. Kluger and DeNisi discuss this In terms of reducing 'self-
related discrepancy' which may involve switching 10 other tasks 'that Io'iOUld
signal attainment of 8e1iview'(l996: 266).. a prOCtiiS which depleres the
cognitive resources available fOt the task. Ii I am given feedback that my work has
disappointed my trocher, who knows I could do better, I will seek ways of rec-
onciling these judgt'lTlents to my own 8elf-understanding. I may attribute the
quality of my WQfk to lack oJ effort proleding my view of myself as having the
ability to do it (a favoured male It.''Chn.ique7). HoweV\"!', if the teacher', judgement
was on a task I had done my best on. I may begin 10 doubt my ability - a pr0ct'S5
which if continuously repeated may lead 10 a stale of 'learned helpJessness'
(Dweck, 1999). In this I dedart' ') am no good al this' and may avoid any further
exposure, for example by dropping that5Ubjee:t and finding an easier one.
Research into dassroom assessment (Gipps et at, 20(0) has shown that e\'en
with expert teachers relatively little of this process or task-focused 'descriptive'
feedback takes place. Rather, most feedback is 'evaluative' and takes the fonn
or the leadll.'r signalling approval or disapproval, with judgements about the
effort made. While evaluative feedback may have a role in terms of motivation
and effort, it is unlikely 10 lead directly to leaming and so is nol valid fonnatlve
8Slle$$ment.
MIUb Imd grllda lI:5 Ihrml! to VAlid jlmuJlivt lIS5l'S5mt7ll. Treating marks and
grades as threats to valid fonnative assessment is one of the most provocative
issues in assessment for II is not a new claim. 1homdike, one of lhe
founding fathers of behaviourism. claimed that grades can impede learning
because as a feedback ml."Chani.sm 'Its vice was ils relativity lcomparison to othersl
and indefini\enes$ [low level of specifk:ityJ' (1913: 286). Buildin8 on the won: of
Butler (1988).. which showed significant student leilming gains from comment-
onJy marking when compared with marking which U5Il.'d gradt"S and comments.
ther't' ha5 been encouragement to move to 'comment-onIy' marking (Black et al,
2003; Darke, 2(01). The rationale for this Is that grades. marks and lewis do not
provide information about how to move forward; any infonnation is too deeply
encoded. For many students they will have a negative effect because:
Leaming Is likely to stop on the task when a summalive grade Is awarded for
it (Kohn. 1993);
The level of rnponsoe may shifl to a .self/ego level in which the learners' ener-
gies go Into recondling the mark with their view of themselves as \earners;
They may enoourage a perfonnance orientation in which the focus Is SUcces&
In relation to others rather than Ieaming. This in tum may have negalivt'
motivational and learning; ronsequenc:es for those who gct low grades (ARC,
2002b; Reay and Wiliam. 1999).
'42
For many this is an area in which social and political expectations make any
such move problematic. as evidenced by the press furore when Insidt tht BIlla
Box (Black and Wiliam, 1998b) was launched, with national newspaper head-
lines such as:
DON'T MARK HOMEWORK - It upstts dunm; says top educlltion txpert
(Daily Mirror (6102198);
"d
nvo OUT OF TEN - For eduClllionlllisls wlw want the world to be Il dijfr:mTt
place lThe Times, tditorial, 0610ZJ98)
Smith and Gorard (2005) also provide a cautionary example of where 'comment
only' was introduced with some negative leaming consequences. TItis was the
result of teachers simply providing the evaluative comments they usually made
alongside their marks ('try and improve'), rather than providing feedback on
'where ... to go and how best to get there'. For the students it meant they were
confused about the standard they had achieved, as marks at least provide an
indication of the relative merit of the work.
Praise as a lhrtlll to valid ji:mllativt IISStssmelit. This is another highly sensitive
area. The logiC behind this claim is that praise is unable to directly improve
learning. What it may do is motivate or enrourage future learning.. but this does
not constitute formative assessment. The threat is that it may even get in the
way of leaming. Praise is essentially self, rather than task. focused (or will be
treated that way by the recipient). Kluger and DeNisi (1996) suggest that while
praise may help when a task is relatively simple, it impairs performance on cog-
nitively demanding tasks, partly because it shifts attention away from the task.
What Gipps et al. (2001) have shown is that praise was one of the predomi-
nant forms of feedback in the classrooms they observed, with task focused feed-
back infrequent. While we might understand this as busy teachers keeping
students motivated, with detailed feedback a luxury under normal classroom
conditions, the teachers would probably consider that they were giving fom,a-
tive feedback. This is a misunderstanding that has regularly been noted (ARC,
1999). An unpublished study br Bates and Moller Boller (2000) has supported
this, Their research involved examining, as part of a local authority review of
schools' assessment policies, the marking comments over a scven month period
across 12 subjects in one ll-rear-oJd's work books. Over 40 per ccnt of the 114
written comments were praise unaccompanied by feedback. A further 25 per
cent we.re presentational comments: 'don't squash up your work'; 'pleasc take
care spelling'; 'very good - always write in pen'. The highJy generalized
nature of the feedback was bome out by it being impossible to determhw to
which subjects the majority of the feedback related. In only 23 per ccnt of the
cases was there specific process level feedback, for example: 'what parts of her
character do you like?'; 'why did you do 2 different tests?'; 'Why is this? What
do you think you really learned?'.
143
1
Dweck (1999) and Kohn (1993) have dted the negative impact of praise and
rewards on conceptiOns of learning. Dweck's experiments have shown how
those receiving constant praise and rewards are likely to attribute their success
to their ability. This is perceived as a fixed entity, as opposed 10 'incrementalists'
who take a more situational and effort-based view of successfulleaming. The
consequence of this can be that 'straight A' students will do all they can 10 pre-
serve tht'ir reputation, including laking easier courses and avoiding any risk of
failure. The emphasis is then on perfonnance - gaining good grades ('proving
competence') - rather than on mastery with ils attendant risks of sct-backs and
even failure ('improving competence'; Watkins et aI., 2001). Owed gOl'S on to
show the negative impact on 'top students' (particularly femaJes) when they
progress to colleges where success may be marc e u s i ~ e whu generate self-
doubt about whether they really had the 'ability' they had been conditioned
into thinking they possessed.
This approach raises questions aboul the use of merits, gold stars and smiley
faces in the classroom. These are not about learning so much as motivation, and
this foml of motivation can undemline decp Jearning and encourage a per-
formanCt' motivation (S(>(' Chapter 2). Clarke has taken a practicaJ look al some
of the 'sticky issues' of external rewards (2001: 120) in relalion to formative
assessment in the primary classroom.
Valid!(edback Challenges, requires actior! alld is IlchiWllble. Clarke (2001) has also
observed that one problem arca of classroom feedback is that while sludents are
gi\'en feedback on a piece of work Ihey are oflen not re<Juired 10 do anything
active with it; in effect it is ignored. This is particularly unproductiv" wh"n the
comment is made repeatedly (for example, 'you must improve your presenta-
tion'). Research from the LEARN project (Weeden el aI., 2002) found that
written fce.dback was sometimes undermined by leach{'TS being unclear, bolh
because the handwriting was hard to J'('ad and be<:ause Ihe language used was
difficulilo understand.
While 100 little challenge in feedback does not directly encourage learning,
too much can make the gap S(>('m impoSSible 10 bridge. Most of us will have
experienCed 'killer feedback' which makes such huge, or numerous, demands
that we decide il i ~ nol worlh the {'ffort.
A rurther, and salutary, factor in feedback not leading to learning is Ihal the
learner has a choice as to what to do with Ihe feedback. If 'learners must ulti-
mately be responsible for their learning since no-one else can do it for them'
(ARC, 19?9: 7) then the manner in which they usc feedback is pari of this. The
risk of onJy making limited use increases when ft.'"dback is given in the form of
a 'gift' _ handed over by Ihe giver 10 the recipienl- rather than as part of a di,J-
logue (Askew and Lodge, 2000). Kluger and DeNisi (1996) also show how th..
learner has options when faced with feedback, and can choos.c to:
Jncreasoe their effort rather Ihan lower the standard;
Modify the standard;
Abandon Iht' standard ('retiJ'(' hurt');
Reject the feedback!mesS(>nger.
144
I
1lll' fIrSt response is more Likely when 'the gool is clear, when high commitmenl is
secured for it and when belief in eventuaJ success is high' (1996: 260), We see the
other three options being ext'l'e&od (and exercise them OUI'!lelves) when students
settle for 'aliI want to do is pass', having started with more ambitious goals; when
they declare they are 'rubbish al ... ' and make no further effort and whl-n,. to
punish a teacher they do not like, they deliberately make no effort in that subject.
Other sources of fee dback
Feedback in this chapter has been treated largely in terms of the teacher-learner
relationship. It can,. however, rome from II variety of sources. What is increas-
ingly being recognized is lhat peer and self-assessment have a significanl role
to play in valid formative assessment. The logic of this is that, for these forms
of assessment to be effective, students have to be actively aware of the learning
inlention and the standard thaI has to be met. Sadler argued that lhe ultimale
aim of formative assessment is:
to download thai evalrlalive {lISS<!S5l1rcntJ knowledgc sc that stlldl'nts evenlually
become independent of the teacher and inteJligerltly engage in aud monitor their
own development. if allYthing, thl' guild knowledgc of lellchen; should consist less
in bruwing how 10 f'OOlullte student work lind mort: in brOUJing WIIYS 10 download
evaluative hrOUJlcdge 10 students. (1989: 141)
While the aim of feedback is to reduce trial and error in the learning process, it
is not intended to completely e:.clude them. Kluger and DeNisi make the point
thaI 'e\'en when FI [feedback inten:ention] is accompanied by uscful cucs, they
may serve as crutchL'S, preventing leaming from errors (natural feedback)
which may be a superior learning mode' (1996: 265). One has only 10 walch
skateboarders practising techniques (no manuals, no adults to offer feedback)
to see the point of this claim.
Feedback has been a Iwy element in this discussion of validity beocause of its
critical role in leading to further learning, the concept at the heart of conse-
quential validity in fonnative assessment. Whal has to be recognized is the
romple:.ity of feedback processes, and how activities thaI pass for feedback
may not be valid. TIle challenge is whether the consequence of the feedback is
further learning, rather than improved motivation or changes to self-esteem.
These may have a place, but are not themselves formative assessment. A further
thOllght is that some forms of feedback may sometimes undermine the deep
learning we claim to encourage.
What has not been considered so far is the role of reliability in these forma-
tive asSt-'SSment processes. Does it pose a threat to validity in the samc way as it
does in summalhe assessmenl?
Reliability and forrrYtive aSHssment
Unlike summative assessments, com"cntional concepts of reliability such as
marker consistency do not playa part in this partiCUlar validity argument. For
145
,
formative purposes judgements are essentially student-referenced rather than
needing to be consistently applied. since a variety of students with similar out-
comes may need different feedback to 'close the gap' in their learning. This is a
strength rather than a problem.
How is reliability to be interpreted in relation to formative assessment? Re-
conceptualizing reliability in terms of the trustworthiness of the leacher's
assessment has potential in relation to formative assessment. Wiliam (1992: 13)
has proposed the useful concepts of disclosure ('the extent to which an assess-
ment produces evidence of attainment from an individual in the area being
assessed') and fidelity ('the extent to which evidence of attainment that has
been disclosed is recorded faithfully') as alternative ways of thinking about reli-
ability. The concept of disclosure is useful in thinking about the reliability of
formative assessment. Has the formative assessment gathered the quality of
evidence needed to understand where the learner is? Would a different task
ha\'e led to a different understanding? While, ideally, feedback is repeated and
informal and errors in interpretation will be self-correcting, the intention is to
provide relevant feedback. 'Unreliable', in the sense of the limited dependabil-
ity of the quality of interpretation and feedback, may ha\'c some salience. nus
is particulilrly relevant when feedback is being given in relation to any fonn of
criterion-related standards, since any misinterpretation of these by the teacher
could lead to feedback that misdirects learning.
Conclusion
Validity is central to any assessment. It is directly related to the purpose, fonn and
context of an assessment; as these vary, so do the key threats to the validity of an
assessment.1be validation process involves judgements about the inferences and
consequences of an assessment and what may undermine confidence in these.
Reliability issues are part of this process in relation to summative assessment, but
less so to formative assessment since unreliable results will undermine the
dependability of the inferences that are made. So too will a failure to sample effec-
tively the construct being assessed. even if the assessment is reliable.
In formative assessment validity is about consequences. Did further learning
take place as a result of formative assessmenl? 1lle threats 10 validity are those
things that get in the way of this learning. 1hese may be related to the classroom
context, itself affected by the larger sodo-cultural context. and the conditions for
learning. nus is exemplified in how feedback. a key concept in formative assess-
ment, is used in classroom interactions. Many current feedback practices may nol
lead to further learning, and therefore may not be valid formative assessment.
I
146
I
Part IV Policy
Chapter 9
Constructing Assessment for Learning in the
UK Policy Environment
Richard Daugherty and Kathryn Ec:c1Htone
The rise of interest in assessment for learning in lhe UK has, as earlier chaplers
show, produced a parallcl increase in theoretical and technical activity in rela-
tion to teachers' assessments of thcir own studlmts and in mechanisms to
promote the validity and reliability of such assessments. All of these dimen-
sions have important policy implications for national assessment systems. Thl'
chaplers on teachers' practice show thai there was also, over the same period, a
growing professional interest in assessment for learning. However, despite
attempts in the late 1980s to include notions of assessment for leaming within
national curriculum assessment in England and Wales, UK policy makers have
only n.>cenlly taken an interest in this crucial aspect of the assessment of stu-
dents' attainments. For example in England, where the policy environment had
appeared to be unfavourable, policy makers have at the time of wriling linked
assessment's role in support of leaming to the 'personalization' of leaming..
which is a central plank in the current Labour government's approach to 'per-
sonalized public services' (see Leadbetter, 2(04).
This chapter explores the rise of assessment for leaming as a feilture of edu-
cation policy in the four countries of the VK and shows how assessment poli-
cies arc a pivotal element in the distinct, and increasingly divergent, policy
environments in England, Scotland, Wales and Northem Ireland. Each of the
four counhies of the UK is e\'olving its own education system and this process
has accelerated since 1999 when structural changes in the CQnstitution of the UK
resulted in increased policy-making powers for the Scottish parliament and for
assemblies in Wales and in Northem Ireland. The chapter considers the rising
prominence of assessment for leaming within the broader educalion policy
scene as one of the major ways in which governments aim to alter professional
and public expectations of assessment systems.
We take as our starting point Broadfoot's (19%) reminder that asst'Ssment
practices and discourses ate embedded in and emanate from cultural, social
and political traditions and assumptions. These affect policies and teachers'
prilctices in subtle, complex ilnd often runtr.ldictory wilys. In relation to
assessment, the past thirty years have seen fundamental changes in
expectations ilbout the soci.ll, political and educational purposes that
assessment systems must serve. Growing political interest in assessment for
leilming has occurred partly in response to a shift from norm-referenced
149
,
systems engineered to select the highest achieving students, towards various
forms of criterion-based systems Ihal aim to be both merilocratic and indusive.
At the same time, attempts to introduce more holistic approa<::hes to
assessment in poSI-14 and post-compulsory education and training, such as
records of achievement and portfolio assessment, aim to expand Ihe range of
outcomes that can be certificated and recognized formally (see Broadfoot,
1986; Hargreaves, 1995; Jessup, 1991).
The broader background of changing ideas about what counts as legitimate,
educational and useful assessment forms the context for considering debates
and poli<;y shifts around assessment for learning. This chapter will explore
debates inside policy processes and among academic and professional con-
stituencies about assessment for learning in the compulsory school system. In
the first p,art WI' will outline some theoretical tools that are useful for analysing
these deootes and processes. In the second part we will show how in England
ideas about assessment for learning were debated and contested amongst
policy makers, a(ademiC'l and professional constituencies as national curricu-
lum assessment was developed and implemented. In the third part \\1,," will
explain how policies and politics in Scotland set up a quite different context for
debate and practice in relation to assessment for leaming. In the fourth part 1'1,,"
will refer to poliq developments in Wales and Northern Ireland and also
review a range of recent policy initiatives across all four countries. Finally, w,,"
shall summarize the main shifts in conceptions and enactments of assessment
for learning in order to show how its edm:ational potential for making learning
deeper and more motivating can be subverted.
Analysing assessment policy
It is important to define 'policy' and while Ball admowledges that this is
fraught with oonceptual oonfusion, he offers a useful working definition:
[polidesJ are pre-eminenliy, statements aoout practice - the WilY things amId or
should /It - which rest Upo", derive from, statements aoout the world - ilbout the
way things are. Thtryj are in tellded to bring about indiuidual wlu tions to diag/losed
problems. (1990: 22)
Further clarification about what we mean here by 'policy' is offered by Dale
who differentiates between the 'politics of education' as the brooder agenda for
edu(ation, created through particular processes and structures, and 'education
p o l t ~ as pnxesses that operate inside official government dep;lTlments and
agencies and through engagement with other interested groups. These
processes are convoluted, often contentious and opaque to those outside them,
but they work to translate a political agenda into proposals to which institu-
tions and practitioners respond (Dale, 1994). He argues that a focus on educa-
tion politics makes little sense unless there is 'A more or less explicit reference
to, and appreciation of, the politics of edu(ation' (1994: 35).
150
Constructing Assessment for learning in the UK Policy Environment
Following these broad notions of policy and politics, one approach to analy-
sis in this chapter would be to locate debates about assessment for learning in a
broader structural analysis of the ways in which the economy, education and
culture interact. Or, we could analyse how various groups, individuals and
interested constituencies interact both within formal policy proct"Sses and
broader advocacy and the 'epistemic communities' that contribute ideas and
information to policy makers. We could also undertake a discursj\... analysis of
how a particular nolion, in this case assessment for l...aming, is symbolized and
then enacted through political conceptualization, fonnation and transmission.
Combining all three approaches enables an analysis of assessment for learning
as a prominent them... in education policy and th.. politics of education to be
tr"red to previous problems and debates (see for example, Eccl...stone, 2002).
We rc:lgnize h...re th... need to rem...mber broader stmctural and cultural
influences on debates about assessment for learning. We also acknowledge the
nl.'ed to know more about the .. Heets of debates about assessm(>nt for learning
at maera-, meso-- and micro-- levels of policy and practice and how these conneet
national policy, institutional responses to policy; and the shaping of individual
identity and social actions in classrooms. However, for reasons of space and
clarity, we will confine our analysis of assessm...nt for learnIng in recent assess-
ml.'nt policy to two notions off...red by Ball and other colleagues, namely 'policy
as text' and 'policy as discourse' (Ball, 1990,1994; BoWl' et aI., 1992).
Policy as text
K...y texts, such as acts of parliam...nt are translated at various le\'els of the policy
process into other official t('xts. such as national rurriruium policy statements
and regulations, and then into what Bowl' et al. call 'Sl"CQndary texts', such as
non-statutory guidelines and adviU' on practiu-. At all stages of the policy
proa.-ss official positions about assessment emerge in subtle and often contra-
dictory ways through various texts and discussions. Texts, therefore, represent
policy and encode it in complex ways through the struggles, compromises and
public interpretation of political intentions. Texis are th(>n decoded through
impleml.'ntations and new intt'rprctations, by individuals and constituencies,
moving in and out of policy processes. As Ball points out, attempts to present
policy may spread confusion as various mediators of policy try to relate their
understandings of policy to particular contexts. It is therefore crucial to recog-
nize that texts are not
clear or clOSI,'d ar complete Ibul} Ihe products of at lIarious stages (al
pcillts of initial illf/Ueuce, in tilt' micropchtir:s of !ormation, iu the I'M-
liammtary I'rocess and in Ihe poli/lrs ami mlcropolitics of interesl group arllcllla-
11011). (1994: 16)
Interest in assessment for learning at all levels of the UK's education system has
gen...rated 11 deluge of texts that follow on from the official key texts: draft and
151
I
Assessment and Le,Jrning
final assessment specifications; guidance to specification writers; advice to
teachers; gUidelines to awarding body officers; decisions and debates recordC'd
in minutes of policy meetings and public documents such as policy papers and
text books. In addition, interest groups and professional bodies offer their own
interpretations of assessment for learning while the speeches of policy milkers.
offidal videos and wcbsites add further layers of meaning. These texts can all
be seen as' ... cannibalized products of multiple (but circumscribed) influences
and agendas. There is ad hocery, negotiation and serendipity within th(' stall',
within policy fonnation' (Ball, 1994: 16).
In addition, as Bowe et. al argue, texis vary in the extent to which they are
'readerly' and offer minimum opportunities for interpretation by readers, or
'writerly', where the)' invite the reader to join in, to co--opt'rate and feel some
ownership of the ideas. Making sense of new texts leads people into a ' ...
process of trying to translate and make familiar the language and attendant
embedded logics' (1992: II).
For teachers, parents, professional bodies, policy makers and implementers
of policy, such as inspectors, a plurality of texts produces a plurality of read-
ings. Such romplexity means that we need to bear in mind constantly, we
review policy debates about assessment for learning, that' ... the expression of
policy is fraught with the possibility of misunderstandings, texts are gent'Tdl-
ized, written in reldtion to idealizations of the "real world" dnd can never be
exhaustive' (1992: 21).
Assessment for learning may therefore be robustly and overtly defin,-"<l, or it
may emerge in subtle and more implicit ways. Its various representations
reflect, again in overt and implicit ways, beliefs about desirable educational
goals and practices. In addition, further negotiation and understanding (om{'
(rom a very diverse range of bodies and individuals who make use of policy
texts. These include awarding body officers, inspectors, staff development
organizers, unions and professional organizations, local education authority
advisers, teachers, students, parents and employers. All crl.'ate and amend the
officiallexts and offer competing interpretations of policy aims. Exploration of
different texts Cilll therefore reveal the influences and agendas view,-"<l as legit-
imate both inside policy processes and within institutions. It also reveals how
these change over time as key actors move on or are remo\'ed from processes
and debates. Charting how policy texis have evolved enables us to understand
more about how teachers and students intl.'rprel their intentions and tum them
into 'interactive and sustainable practice' within particular social, institutional
and cultural contexts (Ball, 1994: 19).
Policy as discourse
Despite the importance of texts for understanding policy debates and
processes, focusing analysis too heaVily on them can produce an over-ratiunal
and linea'r account of debates about assessment for learning. The notion of
'policy as discourse' is therefore a crucial parallel notion because it enables
152
Constructing Assessme-nt for le-arning in the- UK Policy Environme-nt
researche-rs, practitioners and implemmters of policy to see how discourses in
policy construct and legitimize certain possibilities for thinking and acting
while ladtly excluding others. Through language, symbols and codes and their
presentation by different authors, discourses embody subtle fusions of particu-
lar meanings of truth and knowledge through the playing out of power strug-
gles inside and outside policy. They construct our responses to policy Ihrough
the language, concepts and vocabulary that they make available to us, and legit-
imize some voices and constihJendes as h;'gitimilte definers of problems and
solutions whilst silendng others (Ball, 1994). Focusing on discourse encourages
analysts of texts to pay dose attention to the language and to its ade<Juacy as a
way of thinking about and organizing how studmts learn. It also reminds us
how texts reflect shifts in the locus of power betvl'cen different groups and indi-
viduals in struggle to maintain or change views of schooling.
However, analysis of the ways in which particular discourses legitimize
voices, problems and solutions must also take account of the 'silences' in the
text, namely the voices and notions that it lea\'CS out. Silences operate within a
text to affect how we view educational problems and polides, but they also
come from other discourses and the policy proresses that produce them. For
example, a discourse about assessment for learning needs to be interpreted in
the light of disrourses in other texts about accountability or the need for nation-
ally reliable assessment.
In this chapter we will focus on selected texts in order to identify the interac-
tions and goals of different bodies in the production of texts and the discourses
of assessment for learning that permeate them.. either overtly or more subtly.
Assessment for learning in England
The Introdudion of national curriculum assessment
The transformation of education policy brought about by the Education Reform
Act of 1988 included within it, for the first time in the modem era, proVision for
a statutory national curriculum and associated 'assessment arrangements' cov-
ering the years of compulsory schooling (ages 5 to 16). With relatively little
prior thought seemingly having been given to what form such arrangements
might take, the Minister for Education remitted an experl group chaired by an
academic. Professor Paul Black, to draw up proposalS.
TIle Task Group on Assessment and Testing {TGAT}, working to an open-
ended all-purpose remit from government, chose to place assessment of stu-
dents by their teachers at the centre of its framework for assessment. The
group's recommendations (DES/WO, 1988a) drew on experience in the 1970s
and 1980s, in particular in relation to graded tests. of teachers' assessments con-
tributing both to students' learning and to periooic summative judgments
about their attainments. In the earliest paragraphs of the report - 'our starting
point' -the formati\'e purpose of assessment is identified as a central feature of
the work that teachers undertake:
153
Promoting children'S learnillg is a pritlcipal aim of Shoolitlg. Assessme'lllies at
the hellTI of this process. (pllra. 3)
... the remUs lof llalio"alllS5tSSllrtrtlsI should provide II basis for decisiQns about
PUllits' jurther learning 'lnds: they should bt' formaliVt'. (pilra. 5)
The initial formal response of minJsters to the TCAT recommendations was to
signal acceptance of what would clearly be an innovative system for assessing
students' attainments and their progress. The minister responsible, Kenneth
Baker, echoed TCATs focus on the individual student in his statement to par-
liament ae<:epting the group's main and supplementary reports and m05t of its
recommendations:
The results of tesls and other assessments should be used both fomullhrdy 10 I1dp
better tmming altd 10 infurm "exl steps for a pupil, and summatively at ages 7, lJ.
14 and 16 to inform pIlrt/lts about their child's progress. (Jflwled by Black, 7997: 37)
Yet it is clear, from the memoirs of the politicians in key roles at the time (Baker,
1993; Thatcher, 1993) as well as from the work of academics (Ball, 1990;
Callaghan, 1995; Taylor, 1995), that fomlal government acceptance of most of
the TCAT recommendations did not mean acceptance either of a discourse of
formative assessment or of its translation into ideas for practice. From a very
early stage of policy development, though only the proposals for the consis-
tency of teachers' assessments to be enhanced by group moderation (DES/\VO,
1988b) h..'ld actually been formally rejected by the minister, the
amongst policy makers concemed test development al each of the first three
'key stages'. The ideas that had shaped the TCAT blueprint for national cur-
riculum assessment qUickly came to be 'silences' in the policy discourse and in
the texts about assessment that emanated from government and its agencies.
As Black's own account of Whatever /wppeIled 10 TGAT makes clear, several
factors including the growing infIuenC\' of the 'New Right' had the effect of
underm\ning TGAT and transforming national curriculum assessment into a
very differently oriented set of assessment policies (Black. 1997). The need for
national assessments to supply indJC6 of school perlOmlance for accountability
purposes, an aspect of the policy thai had been downplayed when the govern-
ment was enlisting support for the Education Act's passage through parliament.
came to the fore once the legislation was in place. The ideological nature of the
debates about TCAT within the go\'eming part)' is evident from the characteris-
tically blunt comments of the then Prime Minister in her memoirs:
fact thai it [lhe TGAT RtpOrtJ was uoeJcollled by tlte LAbour Party, 0,,,
National Union ofTeachl7S and the TImes Educational SUPIIll'mml was 1',lOuglllo
cOllfinll for me Ihat its approach was /I proposed an daoorlltl' and complex
system of assessmen/ - teachl7 dominlltrd and uncos/ed. (Thillchcr, 1993: 594}
to short, TCAT was perceived as the work of an insidious left-leaning 'education
establishment' intent upon subverting the government's best intentions to raise
educational standards. llUs neo-conservative discourse, epilomi7..{'(\ by the lan-
154
glJage used in Marsland and Seaton's 711 Empirr Strikes Back (1993), was in the
ascendancy amongst education policy m k ~ in England in the early 1990s (see
Black, 1995, for a fuller discussion of this). Ewn though its influence was beginning
to wane by the time of the Dearing Review of the national curriculum and its
assessment in the middle of the decade, the voice of the Centre for Policy Studies
(Lawlor, 1993) was still a prominent feature of the policy discourse at national level.
As detailed policies for each element in the new assessment system were
deveJoped. the national curriculum assessment arrangements emerged as a
system of time-limited and end-of-stage tests in the 'core' subjects only, the
main purpose of which was to supply data that would place each student on a
10 (later 8+) level scale (Daugherty, 1995). It also became increasingly obvious
that such data would be aggregated and published as indicators of the per-
fonnance of teachers, schools, local education authorities and the system as a
whole. Although TGAT had envisaged arrangements that would focus on the
fonnative use of data on individual students, the evaluative use of aggregate
data coloured the multifarious texts spawned by national curriculum assess-
ment. In parallel, policy discourses associated with those texts reinforced this
perfonnance indicator and target-driven view of assessment. Without ever
being supersedl."d by a reviSl."d policy, the TGAT recommendations were dis-
lorted and then abandoned, thereby illustrating how policy texts are reworked
at the whole system level as policies move from an initial blueprint through
development and implementation. In this respect the discourse of assessment
for learning carricd the ominous silence, of accountability and concerns about
the reliability of teacher assessment, from other parallel discourses.
Over the same period, agencies and individuals responsible for implement;!;-
tion were interpreting and mediating those policies. 1llere is substantial research
evidence about the ways in which national curriculum assessment came to be
understood and operationalized, both by officials in departments and agencies of
govemment and by the headteachers and teachers in schools on whose practices
the system depended. This was happening in spite of government antipathy to
teachers' practices as biased and too student-eentred and the low profile of
'teacher assessment' in the policy texts of the time. Evidence of the impact of those
polities can be found in findings both from large-scale longitudinal studies in the
primary curriculum (Osbom et aI., 2CXXJ; Pollard et al., 2000 - see also Glapter 4)
and from many other empirical studies reporting on assessment practia.'S in
schools (for example, Tunstall and Gipps, 1996; Torrance and Pryor, 1998; Reay
and Wiliam. 1999). Taken together, these studies show the effects of a target-led
approach to assessment and the subtle changes to teachers' and students' roles
and perceptions of the purposes and outcomes of assessment.
The potential for local education authorities to have a significant role in
implementing.. moderating and monitoring national curriculum assessment
was never fully developed because policy makers at the national level, more
often implicitly than explicitly, acted as if what was decreed in London would
be accepted and acted upon in every classroom in every school in every LEA in
the luntry. Local education authorities were also perceived by some policy
activists on the political right as being prime movers in a malign influence of
155
the 'education establishment' on the education system. However, as agencies
that provided training for teachers and therclor!' medialed the texts published
at the centre, lheir influence on schools and teachers would be considerable. As
Conner and James have shown, some local authorities went beyond the
'attempt 10 accommodate state policy within a broad framework of local values
and practice' (1996: 164). developing local initiatives such as procedures for
moderating leachers' assessments of their students. Local moderation as a nec-
essary component in any national system that makes use of teachers' judgments
had, once TGAT's proposals in this respect had been rejected, been n{'gleeted by
policy makers al the national level.
Education policies in general, including assessment policies, were being
shaped by whal Broadfoot (2000) and Ball (2000), using Lyolard's lerm, have
characterized as a culture of 'performativity'. Performativity came to dominate
the thinking of policy makers in government during that period to such an
exlent thai
the cltllr policy emphasis fa[ I!U' 1990s WllSj on ~ r t as a measurement
droice, Ihe rt5ults ofu.'hich flre used 10 good studellls, teflchrrs alld inslitutimls as
Q whole 10 Iry harder. II is /lot surprisill8 thaI, faced wilh thest prt'SSurtS, schools
haw typically succumbd 10 them. (Broadfoot, 2000: J43}
As bolh Broadfoot (2000) and Ball (2000) have argued, lhe selting and regula-
lion of political targets influence leachers in subtle and profound ways. Sum-
mative assessmenl by leachers of their own students, a residual feature from
the original TCAT framework, was still given a notional status in the O"erall
framework, for example in the Dearing Review of curriculum and assessml'nt
in 1993{4. BUllh" policy t{'xls and associated discoufS{'s of Ihal period showl'd
thai recognition of Ihe teacher's role in using assessment to guide and support
learning disappearl'd from sight. Black and Wiliilm conclude Ihal
... by 1995 nothing UJilS left of the advances made ill the prnlious dutldes. Gov-
ernm/.'1l1 UlQS luken'llrm or uuinttrt5led in fimnaliw aSSfSSmmt; the systems 10
intl'gratt it with the 5ummlltiw had gone, and the further dtt'tlopme71t oftool5 U/jlS
OIrly weakly supported. (2003: 626}
The strengthening of audemic and professional disc:ourses
[n conlrast, acad{'mic and professional asS'ssment discourses retained forma-
tive assessmenl as a crucial aspect of assessmenl practice in ooucational insti-
tutions. In addition.. such discourses presentoo formative assessment as a
necessary component of any aSSl'Ssmenl policy Ihill soughlto mi'{'t several pur-
poses, which mighl be scrvoo by studenl data. For exampll', Torrance (1993)
was writing about Inc 'theoretical problems' and 'empirical questions' associ-
all'd with formative aSSl'Ssmenl. Contnbutions by academics 10 wider Iheoreli-
cal debates about assessment such as by Gipps (1994), and texis written by
ilcildemics for practitioners such as by Stobart and Gipps (1990 and subsequent
156
Constructing Assessment fOf learning in the UK Policy Environment
editions), also recognized and promoted the importance of formative assess-
menl. Crooks's (1988) review of the impact of assessment practices on students
and Sadler's (1989) seminal paper on formative assessment helped fuel contin-
uing debates in academic circles that were in contrast to the preocrupation
amongst policy makers in England with the summative and ,,"valuative uses of
aslieSSment data.
In this contell:t. and with the explicit aim of innuencmg assessment policy
discourses and tell:ts, a small group of academics was established in 1989 as one
of 5e"eral British Educational Research Association policy task groups and con-
tinued as an unaffiliated Assessment Reform Group after 1997. Among its early
publications was a critical commentary on the development of national cur
riculum assessmmt which reiterated the role of assessment in 'the improve-
mlOnt of education' (Harlen et al
v
1992). The group then obtained funding for a
survey of the research literature on formative assessment, undertaken by Paul
Black and Dylan Wiliam. The outcomes of that review, published both in the
form of a fun report in an academic journal (Black and Wiliam, 199&) and in a
pamphlet for wider cimJlation to practitioners and policy makers (Blade. and
Wiliam, 1998b), would in time ~ acknowledged as a major contribution to
reorienting the discourse associated with assessment policies in the UK (see
also Chapter I).
This initial optimistic foray by a group of audemics hoping to i n l u n ~
policy was supplemented by later publications from the team working with
Black and Wiliam at King's College, London and from the Assessment Reform
Group. As part of a strategy for communicating mon' ..fft'dively with policy
makers, making use of pamphlets and policy seminars, the Assessment Reform
Group chose the nlOf(> aOC('ssible term 'assessment for learning' rather than
using the technical lenninology of 'formative assessment'. Assessment for
learning also became increasingly promil'K!nt in the professional discourse
about assessment, supportt.>d by organil.ations such as the AIAA, with lts memo
bt>rship mainly drawn &om assessmCTlt inspectors and advisers in local gov-
(Omment, and by other advocates of formative asst.>ssml'nt operating from a base
in higher education such as Clarke (2001).
New government. continuing discourse
GOYt!mmcnt policy on curriculum and asst.ossment in England during the lole
1990s remained strongly wedded to the notion that the 'raising of standards' of
attainment in schools should ~ t."qualt"d with improyem..nl ;" the aggrega!c"
scores of successive cohorts of students as they passed through the 'key stages'
of the national curriculum. This applies at least as much to the 'Blairite' educa-
tion policies of thl' Labour administrations sinO! 1997 as to the policies of the
Conservative governments of the 1980s and the early to mid 1990s. In some
respects, the educalion policies of the incoming government in 1997 gave a
fresh impetus to the culture that had dominated policy tell:1s and discourses
earlier in the decade, endorsing rather than seeking to change the ideological
stanCt' that had underpinned education policies:
157
I
... mQrlY of NOlJ Labour's changts to the Omsen'lltive agenda wert' Illrgely cos-
metic. In some of its manifestations NroJ LAbour's SlHlllied Third Way looked
remarhlbly similllT /0 qllllsi-markds. (Whit/y, 2002: 127)
Reinforcing a general fX'r'ption that the most important role for data from
national curriculum assessments was to ruel performance indicators, the new
government's first major policy paper ExctlJellce in Schools (DrEE, 1997) sig-
nalled that schools and local authorities wOllld be expected to set and meet
'benchmarking' targets. The then Secretary of Stale for Education, David Blun-
kelt, raisea. the public profile of benchmarking by stating thai he would resign
if the government's national targets, based on national curriculum test data,
were not met. At the school and local authority level these policies weT(' policed
by an inspection agency, OFSTED, whose head revelled in his public image as
the scourge of 'low standards' in classrooms and in schools. This discourS<' was
dominated by the role of assessment in relation to accountability and alongside
this centrally-driven national strategies emerged, first in literacy (from 1998)
and then in numeracy (from 1999). Both were underpinned by the publication
of pedagogical prescriptions in which the formative functions of classroom
assessment had no official role.
The performativity culture was thus retaining its hold on policy makers and
also on practitioners whose performance, individually and collectively, was
being judged in those terms. Looking back on the first five years of Labour edu-
cation policies in England, Reynolds sums up in these terms:
tThe Labollr gl1flt'rnmenl} kept in its {,irllllll entirtly thl;' 'mllrket-bllS<'d' edllclI-
tio/lal policits introouctd by the govemmeut from 1988 to 1997,
illoolvillg the systrmotic tightening of celliral control on the /latllrt of Ihr cur-
riculllm IIml on a5SeSsmerll outcomts, combined with deooluli01r 10 S{;hools of the
dt'lcrmilration of Ihe 'means', 01 schoolalrd classroom IrlJt'l, to determine rmlrom/"S.
(2002: 97J
In such circumstances, there was no place in the official policy discourse in
England for assessment's role as an informal and personal source of support
for the learner or as a key element in learners' genuine engagement with
learning. Instead, the silences in both texts and discourses in relation to
formative assessment as integral to meaningful learning led to an implicit
presentation of it as an instrumental adjunct to the goal of raising formalle\'els
of achievement. In policy discourse and text, 'achievement' and 'learning'
became synonymous. This image of assessment prevailed amongst English
policy makers into the middle years of the next decade, with no
acknowledgment that an assessment culture which geared every of the
classroom experience to test performance - 'a SATurated model of pupildom'
(Hall el a\., 2(04) - was not conducive 10 fostering effective assessment for
learning, And the research into primary and serondary school students'
attitudes to assessment and learning. cited above, showed just how strong an
influence a summative image of assessment was.
158
I
Constructing Assessment for Learning in the UK Policy Environment
Scotland - distind politics, distinctive policies
From guidelines to national survey
Scotland did not experience the kind of major transformation of its schools Ulilt
the Education Reform Act brought about in England and Wales, relying instead
on a series of din..'Ctive but not statutory nalional 'guidelines'. In relation to assess-
ment policy, it retained a national system of periodic sampling of student attain-
ments introduced in 1983 - the Assessment of Achicvementl'rogramme (AAP)-
and did not foUow England in introducing a statutory curriculum backed up by
a system of external tests. Assessml'llt for learning as a major official policy pri-
orityemerged in the early years of the twenty-first century as a product of a dif-
ferent political environment and distinctive policy processes in Scotland (Humes,
1997, 1999) that predated the establishment of a Scottish Parliament in 1999
(Bryce, 1999; Bryce and Humes, 1999; Paterson. 2003; Finlay, 2(04).
Concurrently with the passage of the Education Reform Act through the UK
parliament, the Scottish Office published in No\'ember 1987 a policy text, Cur-
riculum Assessment i/l A Policy for the 1990s. which set out aims for
education of the 5 to 14 age group. In terms of policy process some Scottish
Office ministers fa\'ourcd statutory regulation. Howe\'er, after strong opposi-
tion from parents and teachers to proposals for stiltutory testing.. the guidelines
published in the early 19905 were to be the Scottish response to the broader
trend, within the UK and internationally, towards greater central government
control of the curriculum and assessment. Allhough this was a seemingly softer
approach to regulation of the curriculum and assessment in schools than was
found elsewhere in the UK, Finlay has argued that too much should not be
made of the usc of the word 'gUidelines'.
Hi'T Mafr'sty's of f.dHcation ... us the QS the basis of ills/I('c-
tions ofprimary SdIOO/S alld scolldari"" mId thl'l'Xpectatioll is tlral tlrry will
find a c/O$(' corre:sl'lJm!l'l1C/' betwel'l1 tire guidelines Ql1d pral."tice. 2(/()4: 6)
According to Hayward et al. (2004: 398) the guidelines on assessment, AsSi"SS-
mellt 5--J4 {SOED, 1991), ensured that there were 'clear principles, advocating
the centrality of formative assessment'. And yet, in spite of a supportive policy
discourse, it became apparent from both academic studies (Swann and Brown,
1997) and from a report by the schools inspectorate (HMI, 1999) that the impact
of Assessment 5-14 on the ground in schools was patchy. A national consulta-
tion on future assessment policies for Scottish schools, prompted by the find-
ings of the 1999 HMI survey, was undertaken b)' the Scouish Executh'e
Education Department (SEED) (Hayward et aI., 20(4). The consultation
re\'ealed 'clear, almost unanimous support for the principles of Assessment
5-14' (Hayward et aI., 2(04). However, by the late 1990s it was clear that the
overall assessment system was fragmented and fulfilling none of its purposes
particularly effectively.
Among the issues to emerge from the cunsultation report was the difficulties
159
teachers faced in establishing 'assessment for learning' practices in their class-
rooms and the tensions within an education culture where teachers were
expected to be able to reconcile expectations that assessment practices should
serve both formative and accountability purposes. Interestingly, the seeds of
subsequent developments were already in evidence in that report with overtly
supportive reference being made to Black and Wiliam's review of research. The
academic discourse developing from that review came to have a more direct
influence on government policy over the next few years in Scotland than was
the case in England.
There were other assessment policy initiatives during the 19905 from the
Scottish Executive which was still answerable at that time, via the Secretary of
State for Scotland, to the UK parliament. These had their roots in the priorities
of the UK government but took a different form from parallel developments in
Ihe three other UK countries. 'Neither the National Test system nor the AAP
survey had been designed to meet the new data requirements; the 'test when
ready' system did not proVide conventional test scores that could readily be
used for monitoring and accountability purposes' (Hutchinson,. 2005). The per-
ceived need to make available evidence about students' attainments in key cur-
riculum areas, the driver behind publication of school performance tables in
England, led to the introduction of the Nationa15--14 Survey of Achievement in
1998. Up until that point, the national tests in Scotland had been offered to
schools as test units in reading, writing and maths, devised by what was to
become the Scottish Qualifications Authority. to be used by teachers to confirm
their judgments as to the attainment levels (A to F) which students had reached.
After 1998, SEED collected aggregate attainment infonnation from every school
in those curriculum areas and there was an associated expectation that the
reported levels would have been confinned by national tests. Taken together,
these moves represented a considerable raising of the stakes because test data
became an overt part of the accountability policy discourse.
The 'Assessment is fot' te.rning' Project
Despite these influences, politics and policy making in Scotland had given rise
to a distinctive set of assessment policies during the 19905, and the establish-
ment of a Scottish Parliament in 1999 gave fresh impetus to a wide range of poli-
cies on education. A flurry of activity in education - a major responsibility of
the new de\'olved legislature - led to the passing by parliament of the Stan-
dards in Scotland's Schools Act with its five 'National Priorities'. The Minister
for Education and Young l'eople initiated a 'National Debate on Education' in
2001 and published Eduell/ing for Exctllenct in 2003, setting out the Executive's
response to \'iews expressed in the National Debate. Assessment policy for
Scottish schools was the sub;ect of a major parliamentary debate in 2003. Polit-
ical power was in the hands of a LabourlLiberal Democrat coalition, with influ-
ential individuals such as Jack McConnell, initially the minister responsible for
schools policy and subsequently First Minister, to the fore. National agencies,
such as Learning and Teaching Scotland (LTS) and Her Majesty's (nspectorat('
160
Constructing Assessment for Learning in the UK Policy Environment
of Education (HMIE), were drawn into policy development but the shaping of
education policy in the early years or the new centmy was driven frum within
the Executive and strongly supported by key politicians.
Thl' o:.'Stablishment of an Assessment Action Group was the next stage in
assessment policy development, drawing in a r;mge of interest groups includ-
ing national agencies, representati\'l's of teachers, parents and rescarcht'rs. lis
roll" was to OVl"rsee a programml" that developed assessment for students from
3 to 14. This programme was subsequently. and significantly, entitled 'Assess-
ment is for Learning' (Ail'Ll. The AifL programme had considerable wsources
invested in it, mainly to allow teachers time away from the classroom to engage
in devdoping their practice. And yt't. thuugh strongly k'd and guided from the
centre, thl" de\'dopmental model adopted was for teachers at schoollewl across
Scotland being recruitl'<! to a series of parallel projects and given opportunitit.'S
to shape and to share their practice.
One of those projects focused on 'Support for Professional Practice in For-
mative Assessment' and was explicitly based on the work of Black and Wiliam,
involving a team from King's College London ll-d by thl'm as consultants. The
report of its external evaluation (Hallam et al.. 2(04) is positive in tone: 'rela-
tively few difficulties in implementation', 'dramatic improvement in pupils'
learning skills', 'a shift from t..acher-cent",-'t1 pedagogy'. And yet, whilst recog-
nizing that progress had bo..'l'n madl" in the pilot 5(;hools, the evaluators also
highlighted the challenges ahead if the project's gains in terms of improving
learning through formativl" assessment were to be sustained and disseminated
more widely. PerCl'i"ed obstacles to furthl'r successful development includL-d
the tensions between formative assessment strategiL'S and what was required of
teachers in relation to summati\'(' assessment. Some teachers reported that time
pressures militated against being able to cover required curriculum contenl.
Evaluators of the programme also argued that it was crucial to continue teacher
'uwnership' of the policy development process:
TIll' I,roject/las had a promising slart ... /bljl/ s u ~ s u l dissemillalioll reqllires
colliinued fum/ing 10 .:"a/>/e nt'll' participarlls 10 hape suJficierll lime 10 delle/OJ'
aud i"'plrlller,t IJrdr id((l$ a",' n:fleet "PO" ami <'t'al"al.: '''eir progrl'S5. alaI/am rl
aJ., 2004: 13)
Th.. evaluators' conclusions, together wilh insights from other studies of the
programme such as Hayward et aI's (2004), offer helpful pointers to th.. issues
that nCt-'<i to be addlX-'SSL'<i in ilny p<.>licy context which aspires to the major ped-
agogical innovation that the widespread adoption of assessment for learning
classroom practices entails.
In Nov..mber 2004, Scotland's policy journey from the generalities of the 1991
guidelines through reformulation and reinvigoration of policy priorities
r..ached the point where, in Assessmenl, u'Sling and Rt.'/lOrli"s 3-14: Oljr Rs,lOIISC
(SEED, 2004a), ministers adopted moot of the main policy recummendations of
the consultation earlier that year on the AiI'L programme. By early 2005, officials
in the SEED were embarking on plans for implementing this latest policy text
161
I
on assessment in Scotland in parallel with the equivalent official published text,
A Curriculum fur ExcellenC(: Ministrrial (SEED, 2004b), setting priorities
and targets for the curriculum.
How had Scotland come to lake this particular route to rethinking assess-
ment policy? tn his review of Scottish education policies Finlay (2004) argues
that almost all of the major policy developments in education would haw been
possible in the pre-l999 era of 'administrative devolution'; indeed milny of
those developments were initiated and had progressed prior to 1999:
TIre contribution of political devolution in Scotland has bam Ihe realiud opportu-
nity to tngllge the demos much more widely in demlXTalic processes. I1Jl1iting
people to amlribule at tarly stllges to identifying long lerm political priorities is
qrlitt diffrrenl from giving the frredom to choo!>e much more widely ill dem-
ocratic (2004: 8, emphasis in original)
With Scotland bt>ing the first of the four UK oountries to identify as5t>Ssml'nt for
learning as a policy priority and to move, from 2005, into whole system imple-
mentation it will be interesting to see the extent to which that distinctive polit-
ical ideology oontinues to oolour the realization of assessment for learning in
the day-to-day practices of schools and dassl"()Oms.
Multiplying policies - proliferating discourses
Whilst assessment policies in Scotland and England diverged during the 1990s
Wales, operilting within the Sdme legislative framework as England, used the
limited scope allolVe<! by 'administrati\"e devolution' to bt> more positive about
the value,o( teachers' assessments and to adopt a less aggressive acoounl.lbility
regime (D.lugherty, 2000). The absence of primary school performance tables, a
different approach to the inspection of schools (fhomas and Egan, 2000) and an
insignificant Wales-based daily press all meant that the media frenzy about
'failing' teachers and schools was not a feature of the Welsh public discourse.
There is evidence (see for example, Daugherty and 2(03) from
the pre-l999 era of administrative devolution that a distinctive policy em'iron-
ment in Wales resulted in the recontextualizing of London-based policies.
Howeve
1
after 1999 political devolution undoubtedly speeded up the pace of
change as Wales became an all'na for policy formulation as well as policy
implementation. The rhetoric of The Ltarniu8 COlmlry published by the Welsh
Assembly govemment in 2001, was foUowed up by; amongst other policy deci
sions, the abolition of national testing for 7-}"ear-olds from 2002. Assessment for
learning did not become a feature of the policy agenda in Wales until the
Daugherty Assessment Review Group, established by the Assembly Minister
for Education in 2003, published its recommendations (Daugherty, 2(04).
Encouraged by the minister's espousal of 'evidence-informed policy-making',
the group was influenced by research evidence from the Black and Wiliam
review, from the Assessment Refonn Group and from other assessment spe-
162
Constructing Assessment for Learning In the UK Policy Environment
cialists. One of its main recommendations was that 'The development of as5eSS-"
ment for learning practices should be a central feature of a programme for
development in WalL'S of curriculum and assessment' (2004: 31).
The discotlTSe of assessment policy in Wales, without the steady evolution
since 1991 that had led to the Scottish Executive's endorsement of assessment
for learning, had thus changed markedly after 1999. By 2004 assessment for
learning was an uncontested aspect of official policy in Wales. As part of its
more broadly based recommendations the agency with statutory responsibility
for advising the Welsh Assembly government also quoted the Assessment
Reform Group's definition of assessment for learning in its advocacy of policy
change: 'ACCAC recommends that it should be remitted to establish a pro-
gramme to develop assessment for learning' (ACCAC, 2004: 41).
The minister in Wales, Jane Davidson, announced in July 2004 that she would
be implementing these recommendations from the Review Group and from
ACCAC and her support for a new assessment policy framework was unequiv-
ocal:
Theu clnlT roidl'1lCf' ... that umed if wt' art' to g..t th.. best from lJUT
the curriculum and our leachrrs. I propose, therrfore, to mOl!(' away ovu
the next flJUT YfQTS from th.. CU/"Tf'nt testing rt'gim.. to Il system which is mlJll'
getlrt'd 10 the pupil, mor.. on skills and puts t..ach..,. at its hellr/,
(Dilvidson, 2004: 2J
In Wales, as in Scotland, political devolution had given a fresh impetus to the
rethinking of education policies in general (Rees, 2(05).
Northern Ireland, since its establishment as a state in 1922,. had developed
what McKt'Own refers to as a tradition of
Ildoptioll (somttimes with minor adap/lltiouJ of policy from GB so to obtain
parity of provision, the mm-implemCll/ation of GB policy whtll dumed inappro-
and the ofpolicy illitia/iVl'S, wIJ..,.e feasiblt, which art re/fVQ,tt
specifically to Norlhern lrelllnd.(2004: 3)
Any review of assessment pollcies in that part of the UK can be framed in those
terms although England, rather than Wales or Scotland, was usually the source
for policy borrowing rather than 'GB'. Thus the Northern Ireland framework is
recognizably a first oousin of that to be found in England (and in Wales), but the
systO'm of mainly faith-based schooling and thO' oontinuLod existt.'rll:e of aca-
demic selection for secondary schooling at age 11 are the product of the
countty's distinctive social context.
A culture in which 'assessment' is closely associated with the testing of 11-
year-olds would not appear to be favourable to the development, politically or
professionally, of assessment for learning. And yet, even in a policy environ-
ment where deep and longstanding political conflicts stalled the establishment
of a devolved legislative assembly; initiatives taken by the agency responsible
for advising on curriculum and assessment brought assessment for learning
163
into the policy disrourse (CCEA, 2003; Montgomery, 2(04). As White's rom-
mentary on the CCEA 'Pathways' proposals notes;
Thl' m4jor priority is Ihllt llSSl'SSnlDlt should httppupils to Itllm, luchl'rs to ll'tU'h,
and paff/lls - Illl rtH'ducators - 10 support Ilmi SUp//Ermtnl flllult gutS 0/1 in
schoo/so This is why thr r"lplulsis ... is on IlSSlSS"'1'7rt jor ItQrnins rlllhn' than
llSSl'SSmt/l1 of lumi"K. (Whitt, 2()(U: J4)
1ne emphasis in the policy discourses in Northem Ireland has now been ron-
solidated with the key assessment lor learning approaches established within
the Key Stage I and 2 rurriculum as 'Ongoing Integrated Assessment' (CCEA,
2004: 10). At Key Stage 3 CCEA's Pathways consultatkm document proposed
that the' ... research carried out by the Assessment Refonn Group and others
has produced substantial evidence to show that by adopting these approaches
IIlSSI.'SSment for leamingl. the progress and achievement of pupils in all ability
ranges and the professional rompetcnces of teachers can be significantly
enhanced' (2003: 103). The then minisler for l>dUcation. Barry Gardiner, gave Ihe
final go ahead fot all of the curriculum proposals in June 200t
In England there were, during the second tenn of the Blair govemment (from
2(01), Iwo notable p.. ttems of developmentl'\"lating to assessment for learning.
1ne firsl was the lncrt'asing atlrotion to it in publications by England-based
organwti01\5 representing (SHA, Swaffield and Dudley, 2002;
NUT, 2004; CTC(E), 2(04). A joint publication from thret' of the teacher assoda-
lions, direded 301 ministers in England, made the case for assessment for learn-
ing to berome a 5ignificant part of lhe policy agenda for 'raising
standard5' wltiiSi also lilTing doubts about way in wltich thai term had
become part of the official policy discourse:
11It' mood of IlSSt$Sm1'711 for luming /lOW bring by rtlin
lIUIillly 011 ItIIdttrs' IInlllysis lind mllnllgtmtnl of datil to dillgrroK lind Illrgtl
pupils' ltllrning nt'tds. is ill dirtct co/ltrlldictioll 10, for uamp/t, 1M tfJtc-
lil./ltl1l'55 of 1M highly sucassfut optll1Nlch 10 IlS5l':SSmlnt for Itt/riling Ildopltd by
KillS'S Co/ltgr, london lind by IlIlliolllzl Ibstssmtnt 1kftmn Croup. (ATL
NUT lind PAT. 2004: 4'
Within these texts not only are the weU-establisked differences on policy
between teacher I't'presentatives and go\....mment apparerot but thel'\" OIl'\" also
'Ubstantial variations in the way5 in whim the teach!;!.r organizations defIne
assessmenl for learning and how its polential might be realized.
1ne lM.'COod significant series o( developments during Blair's second lenn
brought 'assessment for learning' for the first lime into the official discuurlll$
associated with education policy in England. The second lenn of a Blair-led
government WIS &5 wedded as Ihe first had been to 'strategie5', accompanied
by documentation and an infrastructul'\" of agencies implemroting
national directives. 1he language of target-setting and the discourse of perlor-
malivity I'\"mained, but attempts weI'\" also mad!;!. in some policy lexis 10 leaven
the discourse with a 'softer' message in which the individual 5"'dent's needs
weI'\" acknowledged. The 2004 Primary Strategy (for students aged 5 10 11)
,..
included guidance materials for schools and teachers in which assessment for
learning figured prominently. And yet. as a critique of the materials by an
organi7.ation representing assessment professionals points out, th{' mat{'rials
are 'problematic' and based on a model of assessment that is one o( 'frequent
5ummatiVl' assessment not formative assessment' (AAIA, 2005a: para 4.3) 1lle
Key Stage 3 Strategy (for shldents aged 11 to 14), whilst drawing more directly
on the .....ork of Black and Wiliam and the Assessment Reform Group, also
appcared trapped in a mindset that sees target-setting by tearnl'rs and schools
as the only route to higher achievement. That same uneasy mix of disparate dis-
courses was apparent in a ministerial speech early in 2004 which placed 'pcr-
sonalized learning' at the centre of the gm'emmenl's new policy agenda for
schools. Assessment for learning would be one of five 'key processes' in realiz-
ing this ambition to 'pcrsonalize' learning: 'Assessment for Learning that feeds
into lesson planning and teaching strategies, sets clear targets, and clearly iden-
tifies what pupils need to do to get there' (Miliband, 2004: 4).
Ol'veloping shldent autonomy through self- and peer assessment, which is
central to the view of assessment for learning that its academic advocates had
been promoting. is nowhere to be seen in this teacher-led and targel-dominated
usage of the term.
The report in 2005 of a 'Learning Working Group', commissioned by the
Minister for Schools in England but without any official stahls in the policy
process, refers both to the growing awareness of assessment (or learning and to
the proliferation of discourses when it comments that
AssC'smcnIivr Icarning is sprmding rapidly, in ,'Ilrl because iI, or more accurately
a version of il (some would IIrg14e a JJf'"I'f'rsionj, conlributes to the Key Stagt: 3
Strategy in Engla/rd, and in part becallS<' teach.-rs find that it works -thr safll-
tiftc evidence and the practiCf cvidCllce art aligned ami mutually suppertillf'. (Har-
greaws, 2005: 9)
Assessment for learning had, by 2005, been incorporated into the official policy
discourses in the other three UK countries and the term was increasingly lea-
hired in policy-related texis and associated discourses in England. But doubts
remained about the commitment to it of policy makers and ministers:
.,. for [(aming' is becoming aca/ch-all phrase, used to rep to a rallge
of practices. In semi l't'rsiO/ls it has been lumed inlo a series of ritualiud proct-
durn. In o/hrs it is laun to be more concerned with monitoring and rtcord-
keeping thatr urith using information to help Itarning. (fames, 2004: 2)
Conclusion
Within It''Ss than a decade assessment for learning became t"Slablished as an
element in the official policy discourses in each of the four c:ountries of the UK.
It did so in ways that reflected the distinctive cultures, policy environments and
165
Asses.sment and learning
policy processes of eam country. Whilst there are some common roots dis-
C't!mib'e in discourses across the UK there are also aspects of the process of
policy development that are fundamentally difftorent and can Ix> expected to
give rise 10 differences in the impilct of policy on practice.
1he academic discourse within the UK as a whole, though mainly based on
the Engtish evol\'ed during the 1990s. By the end of the century
there was an enriched literature, drawing on a growing body of empirical evi
denct'. The work of the PACE project is significant in this respect (Osborn et al..
2000; Pollard et 2000) as is that of Torrance and Pryor (1998) with its the0-
rizing In tt'nns of 'convergent' and 'divergent' modes of assessment But it was
the review by Black and Wiliam (199&), supported by the ways in which its
authors and the Assessment Refonn Group targeted policymakers in advocat
ing fonnath't' aMeSSment. which was inCTE'asingly recognized and reinterpreted
in the professional and policy discourses. 1ne new po6t-devolution administra-
tions in Sootland and Wm made overt commitments to 'evidence-infonned
policy'. This contributed to the evidence and argument from Black and Wiliam's
review of research, by the review's authors in Scotland and by
Daugherty in Wales. and becoming an e;q>lidl influcntt on the assessnwnl
policy decisions annoul'l'd during 2004.
In contrast. the dominant policy discourses and the main official policy texts
in England seemed at first to be largely unaffected by the evidence from
research or the advoncy of academics, Instead, it was developments in certain
localities, such as the KMOFAP and Learning How to Learn Project (see Chap-
ters I and 2) plus the work with groups of teachers by others such as Shirley
Clarke that fuell\>d a groundswell of interest in schools. Only when thai growth
in interest amongst leachers found echoes in somt' of the continuing centrally'
driven policy initiatiWll of the DfF.5 from 2003 onwards did the language of
assessment for leaming enter lhe policy discourses at national level in England.
Yet. as is evident from the examples quoted this infiltration into the
official discouNe'S brought with it sometimes worryingly disparate \-erslons of
both the 'why and the 'how' of assessment for learning. 'Personalized
with assessment for leaming as 0lWof five key components, WitS highlighted in
England in the run-up 10 e1ectiolU in 2005 as the educational dimension of the
govemment'l new drive 10 'personalize' public services. Pollard And James
Vl-elcome its potential to re-orientlte policies for schools tOWArdS the needs of
leamers whilst Also warning of the dangers of 'slipping back into over-sirnpli-
fied consideration of teaching provision lUld systems' (2005: 5). BUI
an accountability culture 110 strongly coloured by perfonnativity has meant
that, for many English professionals working al the lew!. discourses ass0-
ciated with student assessments are linked to parental choice of schools and
public measures of school perfonnance.
It i5 here that the concept of 'policy as discourse' is powerful. The 'silenoes'
of an enthusiastic policy rhetoric about assessment for learning comt' from
another, seemingly separate discourse - thai of performaliv!ly. Other silences
within the new di!lCOUrse of personalized learning which is suggesting an
individualized to assessment that is fllr from the constructivi.!lt and
'66
(onstructing Assessment for Learning in the UK Policy Environment
social learning notions that underpin assessment for learning (see Chapter 3).
Such silences speak louder to many than do the official policy texts in England
which now refer routinely to 'learning', displacing the discourse of 'what must
be taught' that had been dominant in the 19905.
The distinctive social and political cultures of these four countries are thus
increasingly apparent. For example, the assessment policy texts and discourses
at national level in Scotland and Wales acknowledge the reality of the tensions
created by trying to use evidence from assessment directly in support of learn-
ing whilst also using data, both about individuals and on cohorts of students,
for summative and evaluative purposes. The Scottish Executive's approach to
accountability was through the active involvement by represl,'ntatives of all
stakeholder interests in developing policy. This is in marked contrast to the
English approach which seeks to empower people by offering them choice in a
supposed market for schools, with aggregate data from student assessments as
the main indicator of school perfonnance. The emphasis on school self-evalua-
tion in the Scottish policy framework is another contribution to changing how
assessment data are used, thereby changing the perceived role of assessment in
the school system. Wales has been distancing itself from the inheritance of an
'England and Wales' assessment policy. But it is not yet clear whether those
responsible for policy in Wales, whether at national level or in schools and local
education authorities, realize how much of a culture shift is needed for teach-
ers to be able to dewlop their assessment practices in ways that ensure assess-
ment for learning is a mafor priority,
Understanding the social and political context is therefore crucial for an
understanding of the current status of assessment for learning in each of the
four countries. Understanding the interplay of discourse and text within each
country is also crucial for any judgement about the prospects for assessment
that supports learning and fosters student autonomy becoming embL'<idL'<i in
the routine practices of thousands of schools and tens of thousands of teachers.
It will be evident from this chapter that there are four 'policy trajectories' to
be found within the UK. At one level those who have long argued that fonna-
tive assessment has been neglected as a policy priority can be encouraged by
the fact that assessment for learning has moved up the official policy agenda in
all four countries o\'er the past decade. But it must also be remembered that the
policy developments reviewed here have all been located at the early stagC$ of
the policy cycle, namely those concerned with initiating policy at the national
level and articulating broad policy intentions for the system as a whole.
For a short lime in the late 1980s, the TCAT Report put fonnativc assessment
at the centre of a framework for national curriculum assessment for England
and Wales; the stages of policy development and implementation which fol-
lowed that Report ensured that, in 'policy as practice', it disappeared without
Irace during the 19905. Amarkedly more favourable social and political contl,')(t,
at least in Scotland and Wales, now offers better prospects for the ambitions of
recent policy texis in those countries being implemented in ways that, while
ineVitably mediated by practitioners, do not lose sight of the original policy
aims. In all four countries the policy cycle is only now beginning to unfold.
,.7
Chapter 10
Assessment for Learning: Why no Profile in US
Policy?
Dylan Wiliam
Th(' aim of this ch.:lpler is nol to provide an on'rvipw of assessment for learn-
ing in US schools' policy - given the lack of good evidence on this point, such a
chapler would either be vcr}' short, or highly speeulali\'l'. Instead, it is to
attempt to i!(counl for thc current position with regard to assessment for learn-
ing in the USA in Ii'll' light o( the history of assessment more generally. In the
broadest terms, the expeel<llion of high reliability and objectivity in the assess-
ment of students' learning within a culture of accountability and litigation
when things go wrons. has tended 10 deflfft policy developm",nts from ;my
consideration of improving learning through assessment.
The main story of this chapter, therefore, is how one highly specialized role
for assessment, the selection of students for higher l>ducation, and a wry
cialized solution \0 the problem, the use of an aptitude tl'St, gained wide
acceptance and usage. By eventually dominating other methtxls of selecting
students for university and ultimately influencing the methods of assessment
uSt.>d for other purposes, such approaches to asS<.'Ssment have eclipsed the use
of and to some extent discourse on formative assessment; that is, assessment
dl'Signed to support learning.
The chapter begins with a brief account of the creation of the College
Entrance Examinations Board and its attempts to bring some cohe/'('nce to the
use of written examinations in uni\'ersity admissions. The criticisms that were
made of the use of such examinations led to explorations of the use of
intelligence tests, which had originally used to diagnose learning
difficulties among Parisian school studl'nts but which had lx"Cn modified in
the USA to enable bl;mket t('sting of army rl'cruits in the closing stagl's of the
First World War. Subsequent se.:tions detail how the army intelligence test was
developed into the Schulastic Aptitude Test and how this test came to
dominate uni\'ersity admissions in the USA. The final sections discuss how
assessment in schools developed over the latter part of the twentieth century,
including some of the alternative methods of assessment such as portfolios
which were explored in the 198(}s and 199Os. These methods. with clear links
to assessment for learning. were ultimately eradicated by the press for cheap
scalable methods of testing for accountability a role that the technology of
aptitude testing was to fill.
169
Assessment in US schools
For at least the last hundred years, Ute experience of US school students has
been Utat assessment means grading. From the third or fourth grade (age 8 to
9), and continuing into graduate studies. almost all work that is assessed is eval-
uated on the same literal grade scale: A, B, C, D or F (fail). Scores on tests or
other work that is expressed on a percentage scale are roul:incly converted to a
leiter grade, with cul-offs fur A typically ranging from 90 to 93, B from 80 to 83,
C from 7{) to 73 and D from 60 to 63. Scores belolV 60 are generally graded as F.
In high schools (and sometimes earlier) these grades are then cumulated by
assigning 'grade-points' of 4,3,2, I and 0 to grades of A, B, C. D and F respec-
tively, and then averaged to produce the grade-point average (CPA). Where stu-
dents take especially demanding courses, such as Advanced Placement courses
that confer college credit, the grade-point equivalences may be scaled up, so
that an A might get 5. However, despite the extraordinary consistency in this
practice across the USA, exactly what Ute grade represents and what factors
teachers take into account in assigning grades and assessing students in general
are far from clear (Madaus and KeUaghan, 1992; Stiggins et aI., 1986), and there
arc few empirical studies on what really goes on in classrooms.
Several studies conducted in the 1980s found that while teadwrs were
requin.>d to administer many tests, they had relied on their own observations or
tests Utey had constructed themselves in making decisions about students (Stig-
gins and Bridgeford, 1985; Herman and Dorr-Bremme, 1983; Dorr-Brenune et al.,
1983; Dorr-Brcmme and Herman,. 1986). Crooks (1988) found that such teacher-
produced tests tended to emphasize low-oroer skills sum as factual recall rather
than complex thinking. Stiggins et al. (1989) showed that the use of grades boUt
to communicate to students and parents about student learning on Ute one hand,
and to motivate students on the other, was in fundamental conflict.
Perhaps because of this internal conflict, it is clear that the grade is rarely a
pure measure of attainment and will frequently include how much efforl Ute
student put into the assignment, attendance and sometimes e\'en behaviour in
class. The lack of clarity led Dressel to define a grade as 'an inadequate report
of an inaccurate judgment by a biased and variable judge of the extcntto which
a student has attained an undefined level of mastery of an unknown proportion
of an indefinite material' (Chickering, 1983).
Inconsistency in the meanings of grades from state to state and even district to
district may not have presented too many problems when the grades were 10 be
used locally, but at the beginning of the twentieth century as students applied to
higher education institutions increasingly further afield. and as universities
switched from merely recruiting to selecting students, methods for comparing
grades and other records from different schools became increasingly necessary.
Written examinations
Written examinations were introduced into Ute Boston public school system in
1845 when the superintendent of instructioo,. Mann, decided that the 500 most
170
Assessment for Learning: Why no Profile in US Policy?
able 14-year-olds should take an examination on the same day (Traven, 1983).
The idea was quickly taken up elsewhere and the results were frequently used
to make 'high-stakes' decisions about students such as promotion and
tion. The stultifying effects of the examinations were noted by the superinten-
dent of schools for Cincinatti:
... tlleY hllve occllsiolled lind mllde u>elluigh imperll,iw the use of mechll/licaland
rote I11l!tllods of teaching; they havt occasioned cl1lmming and the most vicious
habits of study; they Iurot cllused much of the ovoerpresSUfe charged upon schools,
scme of which is real; thl!!{ hllve t"nrptcd both tf'llchers lind pupils to disJwnnty;
lind IIISI bu' nol lellst, they hllve pamitted II mechllniclll method of school supavi-
siol1, (White, 1888; 519)
Admission 10 higher education institutions in the USA al the time was a rather
informal process. Most universities were recroiting rather than selecting stu-
dents; quite simply there were more places than applicants, and at times,
admission decisions appear to have been based on financial as much as aca-
demic criteria 1986).
In the period after the civil war, universities had begun 10 formalize their
admissions procedures. In 1865 the New York Board of Regenls, which was
responsible for the supervision of higher education institutions, put in place a
series of examinations for entry to high school. In 1878, they added 10 these
examinations for graduation from high schools which were used by universities
in the state to decide whether students ready for higher t'duration. Stu-
dents who did not pass the Regents examinations were able 10 obtain 'local'
high school diplomas if they met the requirements laid down by the district.
Another approach.. pionC('red by the University of Michigan. was to accredit
high schools so Ihat they were able to certify students as being ready for higher
education (Broome, 19(3) and several olher universities adopted similar mech-
anisms. Towards the end of the cenlury, however, the number of higher t'duca-
tion institutions to which a school might send students and the number of
schools from which a university might draw its students both grew. In order to
simplify lhe accreditation process, a large number of reciprocal arrangements
were established. Although attempts to co-ordinate these were made (see Krug.
1969), particularly in the elile institutions, it appears that university staff resis-
ted Ihat loss of control over admissions decisions. The validity of the Michigan
approach was also weakened by accumulating evidence thai teachers' grading
of student work was not particularly reliable. Not only did different teachers
give the same piece of work different grades, but even the grades awarded by
a particular teacher were inconsistent over time (Starch and Elliott, 1912, 1913).
As an altemative, the Ivy League universities (BroWf\, Columbia. Cornell. Dart-
mouth, Harvard, Pennsylvania. Princeton and Yale) proposed the use of common
written entrance examinations. Many universities were already using written
entrance examinations, for example Harvard and Yale since 1851 (Broome, 19(3),
buteach university had its own system with its owndistinctive focus. The purpose
behind the creation of the College Entranot> Examination Board in 1899 was to
171
I
establish a set of common examinations scored uniformly thai would bring some
coherence to the high school curriculum, while at the same time allowing indi-
vidual institutions to make their own admission decisions. Although the idea of
a common high school curriculum and associated examinations was resisted by
many institutions, the College Boards as the examinations came to be known
gained increasing acceptance after their introduction in 1901.
Th(' original CoUege Boords were highly pn>dictable tests - even the specific
passage of Homer or Virgil thai would be tested was made public - and so there
was conU'm that the tests ilSS<'ssed the quality of coaching rather th.m the talent
of the student. For this reason the College Board introduced its New Plan exam-
inations in 1916, focusing on just four subjects and placing greater emphasis on
higher-.order skills. Originally the New Plan examinations were taken almost
exclusively by students applying for Harvard, Princeton or Yale. However.
other universities quickly began lo.see the benefits of the 'New Plan' examina-
tions and for two reasons. Firstly, they provided information about the capabil-
ity of applicants to reason critically as opposed to regurgitating memorized
answers, and secondly, they freed schools from haVing to train students on a
narrow range of content. Although there was also SOllle renewed interest in
models of school accreditation (for example in New England), the New Plan
examinations b..><:ame increasingly popular and were quickly established as the
dominant assessment for university admission.
However, these examinations were still a compromise between a test of
school learning and a test of 'mental power'; more forused on the latter than the
original College Boards, but still an assessment that depended strongly on the
quality of preparation received by the student. It is hardly surprising.. therefore,
that the predominance of the 'College Boards' was soon 10 be challenged by the
developing technology of intelligence testing.
The origins of intelligence testing
The philosophical tradition known as British empiricism held that all knowl-
edge comes from experience (in contrast to the continental rationalist tradition
which emphasized the role of reason and innale ideas). Therefore, when Galton
sought to dl'fine measures of intellectual functioning as pari of his arguments
on 'hereditary genius' it is nol surprising that he forused on measures of
sensory acuity rather than knowledge (Galton, 1869). Building on this work, in
1890 Cattell published a list of ten mental tests that he proposed might be used
to measure individual differem:es in mental processes. To a modem eye,
Catll'll's tests look rather odd. They measured grip strength, speed of move-
ment of the arm, sensitivity 10 touch and pain, the ability to judge weights, time
taken to reacl to sound and to name colours, accuracy of judging lenglh and
time and memory for random strings of leiters.
In contrast, Binet had argued throughout the 1890s that intellectual func-
tioning could nol be reduced 10 sensory acuity. In collaboration with Simon he
produced a series of 30 graduated tests thai forused on allentioTl, communica-
172
tion,. memory, comprehension,. reasoning and abstraction. 11U"ough extensive
field trials, the tests were adjusted so as to be appropriate for students of a par-
ticular age. Ua child could answer correctly those items in the Year 4 tests, but
not the Year 5 tesls, then the child could be said 10 have a mental age of four.
However, the results were interpreted as classifications of children's abilities,
rather than measurements, and were used in particular 10 identify those stu
dents who would require additional teaching to make adequate progress. In
fact, Binet stated explicitly
I do not bdinle that one may measure ont of the i71/elltctual aptitudes in the sense
thllt Olio.' mill5Ut'fS a Irngth or a capacity. Thus, whtn a person studied CDn re/Din
sevetl figures after a single audition, Olio.' can class him, from tho.' point of his
_oryfor figures, after tho.' individual who retains tight figures under tire SIlme
conditions, and txfore those who rttain six. /I is a c1assifialtion, not 1/ mill5Ure-
mm/. (cited in VI/ron, 1936: 47)
Binet's work was broughllO the USA by Goddard who translated the tests into
English and administered them to the children at the New Jersey Training
School in Vineland. He was somewhat surprised to discover that the classifica-
tion of children on the basis of the tests agreed with the informal assessments
made by Vineland teachers; 'It met our needs. A classification of our children
based on the Scale agreed with the Institution experience' (1916: 5).
In the same year, Terman (1916) adopted the structure of the Binet-Simon
lests, but discarded items he felt were inappropriate for US contexts. He added
40 new items, which enabled him to increase the number of items per test to six.
The resulting tests, known as the Stanford-Binet tests, were then developed
in multiple-choice versions for use with army recruits. Known as Army Alpha
and Army Beta tests, the US Army trials proved successful, providing scores
that correlated highly with officers' judgments about the capabilities of their
men. This resulted in their full adoption and by the end of January 1919, the
lests had been administered to 1.726,966 men (Zenderland, 2<XXl).
InteUlilence tests In university admissions
The Anny Alpha test results demonstrated the feasibility of large-scale, group-
administered intelligence tests and shortly after the end of the First World War,
many universities began to explore the utility of intelligence tests for a range of
purposes.
In 1919, both Purdue University and Ohio University administered the Army
Alpha to all their students and, by 1924, the use of intelligence tests was wide-
spread in US universities. In some, the intelligence tests were used to identify
students who appeared to have greater ability than their work al university
indicated; in others, the results were used to inform placement decisions both
between programmes and within programmes (that is, to 'section' classes to
create homogeneous ability groups). Perhaps inevitably, the tests were also
used as performance indicators: to compare the ability of students in different
m
departments within the same university and to compare students auending dif-
ferent universities. In an early example of an attempt to manipulate 'league
table' standings, Terman. still at Stanford which was al the time regarded as a
'provincial' university, suggested selecting students on the basis of intelligence
test scores in order to improve the university's position in the reports of uni-
versity meril then being produced (Terman.. 1921).
Around this time, many universities began to experience difficulties in
meeting dPmand. 1he number of high smool graduale5 had more than doublf'd
from 1915 to 1925 and although many universities had tried to expand their
intake to meet demand, 50me were experiencing subslMltiai pressure on plac:es.
As levine noted' ... a small but critical number of liberal arts coUcges enjoyed
the luxury of seltcting their student bodies for the first time' (1986: 136). In
order to addresslhis issue, in 1920 the College Board established a commission
' ... to investigate and report on general intelligence examinations and other
new Iypes of examinations oHered ill several secondary school subjects'. The
task of developing 'new types of examinations' of content was given to
lnomdike and Wood of Columbia Teachers' College, who prest'llted the first
'objective examinations' (in algebra and history) 10 the College Board in 1922.
Four years earlier, some of the leading public universities Mel founded the
American Council on Education (ACE) to represent their intcresl5. In 1924 ACE
asked Thurslone, a psychologist at the Camc.-gie Institute of Technology, to
develop a 5erk'S of intelligence tests. Thurslone had hoped that his work would
be embraced by the CoUege Board but they in tum set up their own Committee
of Experts to investigate the use of 'psychological lests'. Although the commit-
lee included notable psychologisl3, no-one from TeacherS College was invited,
despite the foundational work of Thomdike and Wood in both inleUigence
and the development of 'objective' tests of subject knowledge. This was
to have severe and far.reaching implications for the development of the test that
came to be known as the Scholastic Aptitude Test. As Hubin notes ' ... from its
inceptkln. the Scholastic Aptitude Tesl was isolated from advancell in education
and leaming theory and ultimarely isolaled from the advances in a field thai
later would be called cognitive psychology' (1988: 198).
The Sc:holastic Aptitude Test
1he first version of the Scholastic Aptitude Test was produced in 1926 and
administered to 8026 students. As Brigham wrote in the introduction to the
manual thai accompanied the
1M Imn 'SCltOlllSlic IIptitu!k test' rnu rrfrrtnu 101M tI" of tXRminli/ion now in
curnn/ IlMfind Vlrriowsly Cllllttl 'psydwlogiclll Itsts', 'inltlligtllU ItStS', 'mhI1l1f
tlbility Its'S', IIfrontSS 'tS'S' tt cr:tm. Tht (Dmmilt liStS tM Itrm 'lIpli
tlldt' to distingui!lt such ItSts from tests ofll'llilrillg ill school AllY c1l1ims
tlUlI "Plitudt 'esls now in 11M rttllly m_lIn 'gmmd intttligtrlu' or 'gtntl'llf
tlbilily' mflY or lIUIy not SIlM/IIIltilltttl. It luis, bn MY St1ltrlllly
tsltlblishrrllhlll high scortS in slldl ItslS uSlllilly indiCIIlt IIbiJity 10 do II high orlkT
174
Assessment for learning: Why no Profile in US Policy7
of ;;cholastic wvrk. The tam 'scholastic uptitud/"' mukes no slronger claimfor such
tfSls Ihan thlltthae;s a tendency for individual differences in scores in lests
to hi" associated positivdy with individual diffrrences in subsequenl aClldemic
attahlmenl. (1926: l)
Initially, the .1cceptance of the SAT was slow. Over the first eleven years, the
number of test takers grew only 1.5 per <:ent per year. Most members of the
College Board (including Columbia, Prin<:eton and Yale) required students to
take the examination but m'o (Harvard and Bryn Mawr) did not, although sin<:e
most students applied to more than one institution both Harvard and Bryn
Mawr did have SAT scores on many of ilS students which provided eviden<:e
that could be used in support of the SAT's \'aJidity, and this evidence was crocial
when Conant, appointed as president of Harvard in 1933, began his allempts to
make Harvard more mentocratic.
One of Conant's first acts was to est<lblish a new scholarship programme and
he determined that the SAT, together with schooltranscnpts and recommenda-
tions, should form the basis of the Harvard National5cholarships administered
in 1934-6. The SAT proved to be an immediate success. Students awarded
scholarships on the basis of SAT scores did well .1t H.1rvard; indeed the 1981
Nobel Prize winner (Economic 5cien<:e), James Tobin, was one of the early recip-
ients of a Harvilrd scholarship. Emboldened by the success of the SAT, Conant
persuaded 14 of the College Board universities to base all scholarship decisions
on objectively scored multiple-choice tests from 1937 onwards.
From its first use in 1926, the outcomes on the SAT had been reported on the
familiar 200 to 800 scale, by scaling the raw scores to have a mean of 5(X} and a
standard deviation of 100. From 1926-40, this norming was based on the stu-
dents who took the SAT each year, so that the meaning of a score might change
from year to year according to the scores of the students who took the test. Since
the early period of the SAT was one of experimentation with different sorts of
items and formats, the difference in meaning from year to y"ar may have
quite large e\-en if the population of tt>St-lakers did nol change much. Respond-
ing to complaints from administrators, in 1941 the College Board inlroduced a
system of equating tests so that each form of the verbal test was equated 10 the
\'ersion administered in April 1941 (Angofi, 1971) and the mathematics test to
that administered in April 1942. At the same time, the traditional College Board
writlen examinations were withdrawn.
At the time of these the test was taken by than 20,0Cl0 students
but by 1951, three years after the Educational Testing Agency began to admin-
ister them, the number of SAT takers had grown to 81,000 and by 1961 to
805,000. In 2004, thf.' SAT was taken by 1,419,007 students (College Board, 2004).
While the SAT remainf.'d substantially unchanged for over sixty years, its name
has not. In 1990, the College Board changed its name to the Scholastic Assess
ment Test, and in 1996, it decided that the leiters did nol stand for anything. It
was just Ihe SAT.
The most serious and enduring ch.l1lenge to the predominance of the SAT
came in 1959, when Linquist and McCarrell of Iowa University established Amer-
175
,
iean College Testing (now called simply ACT'). Lindquist was an acknowledged
leader in the field of psychometrics and had edited the first edition of the field's
'bible', EduClItioJ/1Il1 Measuremellt (Lindquist, 1951). ACT was strong where the
College Board was weak. lbey had very strong links with public universities,
especially in the mid-west,. and had a strong in measuring school
achievement. And where the College Board was interested in helping the elite
universities in selecting srudents, ACT was much more interested in placement-
helping universities decide which programmes would suit an individual. In
reality, however, the differences between the ACT and the SAT are not thai clear
cuI. Despite its origins in thc idea of assessing intelligence, the SAT has always
been a test of skills th3t are developed al school; students with higher levels of
reasoning skills find mastering the material for the ACT easier. In fact, the eoITt'
lation between the scores on the SAT and the ACT is 0.92 (Dorans aI., 1997;
Dorans, 1999). To all intents and purposes, the two are measuring the same thing.
Ne\ertheless, many srudents lake both tests in order to maximize their <:hances
of gclting into their chosrn university, and almost as many students lake the ACT
eadl year (1, 171,460 in 2(04) as take lhe SAT (ACT, 21Xl4).
Ever since its introdudion, the SAT has been subjeded 10 mu<:h critical
scrutiny (again, see Lemann, 1999 for a summary), but things came to a head in
2001 when Ri<:hard Atkinson, president of the University of California,
announced that he had asked the senate of the university not to require SAT
reasoning test scores in considering applicants. In doing so, he said:
All teo oftetl, lmiwrsitics use SAT scores to ra'rk appliCatlts ill determillillg
who should l'f' admit/cd. This use of the SAT is /l0/ compatible with the US view
all how merit should be defilled and opportuuities distributed. The strength of us
society hos l>eetl its belief that actual achil'lN'm..'lt SllOlild be u'llIlt molters most.
should be judged ,m the basis ofwhM tht'Y IlIlve made of Ille apport IlIli-
tics available to them. 111 other words, in AmaiclI, stiidellls sllQuld be jlldged 011
what trey hUl'C accomplished dllrillgfour years of lrigh school, taking into OecOII'11
their opportunities. (Atkitlson, 20(1)
Because the SAT and the ACT are, as noted above, essentially measuring the
same thing, these criticisms are not well-founded in terms of the quality of deci-
sions made on the basis of tesl seorl-'S. The criticism is really one about the
message that is sent by calling something 'general reasoning' ralher than
'school achievement' - essentially an issue of value implications (Messick,
1980). Nevertheless, Ihe threatl'Tloo loss of income was enough to make the
College Board change the SAT to focus more on achievement and 10 include a
writing test. The new test was administered for the first time in March 2005,
The SAT therefore appears set to dominate the arena of admissions to US
universities for years to come. No-one really understands what the SAT is meas
uring, nor how a test is able to predict college grades almost ilS well
as the high.school grade point average (CPA) which is built lip from hundn.'<Is
of hours of assessed work. Nevertheless, the SAT works. [t works partly because
it is athJned to the US higher education system. In most European uni-
176
,
Assessment for learning: Why no Profile in US Policy?
versities, selection to university is combined with placement into a specific pro-
gramme, so information is needed on the applicant's aptitude for a particular
programme of study. In US universities, students do not select their 'major'
until the second or third year, so at admission information on specific aptitudes
is not needed. The SAT works also because it is well-suited to a society with a
propensity to litigate. The reliability of the SAT is extremely high (over 0.9) and
there is little evidl.'nce of bias (minority students get lower soort.'s on the test, but
also do less well at college).
In terms of what it Sl.'ts out to do, therefore, the SAT is a very effective assess-
ment. The problem is that it set the agenda for what kinds of assessment are
acceptable or possible. As the demand to hold schools accountable grew during
the final part of the twentieth century, the technology of multiple--choicl' tl'Sling
that h"d developed for the SAT was easily prcsso.>d into service for the
assessment of younger children.
The rise and rise in assessment for accountability
One of the key principles of the constitution of the USA is that anything that is
not specified as a federal function is 'reserved to the states', and this notion (that
has within the European Union giv{'n the inell'gant nam{' of 'subsidiarity')
is also practised within most states. Education in particular has always a
local issu{' in the US. so that for e"ample dc<isions about curricula, teach{'rs' pay
and conditions of service and organizational structures are not made at the state
lewl but in 17,000 school districts. Most of the funding for schools is raised
in the foml of taxes on local residential and commercial property. SinCt' the
school budget is g{'nl.orally determined by locally elected Bo.1rds of Education
there is a \'el)' high degree of accountability, and the annual surveys produced
by the Phi Delta Kappan organization indicilte that most communities are
happy with their local schools.
Frum tho:' 1960s, however, stale fInd f(-cleral sources had bt.><:Uffil' greater and
greater net contributors (Corbett and Wilson, 1991: 25), which led to demands
that school districts become accountable thl' local community and the
state has thus plaYl>d a greatl'r and greater role in ('<jucation policy and funding.
For example, in 1961 California introduo.>d a programm{' of achiew.'ment
testing in all its schools although the nature of th(' tests was l('flto the districts.
In 1972, the California Assessment Program was introduCt'd which mandated
multipil' choire tests in language, arts ,lnd mathl'matics in gr"dcs 2, 3, 6 and 12
(tests for grade 8 were added in 1983). Subsequent legislation in 1991, 1994 and
1995 {'nacted n{'w state-wide testing initiatives that were only partly imple-
mented. Howcvcr, in 1997 new legal rt.-quirements for curriculum standards
were passed which in 1998 led 10 the Standardized Testing and Reporting
(STAR) Program. Under this programme, all students in grades 2 to II take the
Stanford Achio:'wnwnt Test - a ballery of nurm-rt'ferenced tcsts - every rear.
Those in grades 2 to 8 are tested in reading, writing, SlX'lling and mathern'ltics,
and those in grades 9, 10 and II arc tested in reading, writing, mathematics,
177
Aueumen1 and Learning
science and sodill studies. In 1999 further 1egisliltion introduced the ACiidemic
Performilnce Index (API), iI weighted index of scores on the Stanford Achieve-
ment Tests, with i1WilrdS for high-performing schools and iI combination of
sanctions and additional resourn-s for schools with poor performance.
same legislation also introduced requirements for passing scores on the tests for
entry into high school, iIlld for the award of a high--school diploma.
portfolios
Many states t:xperimented with alternatives to standardited tests for mon-
itoring the quality of education and for attesting to the achievements of indio
vidual students. In 1974, the National Writing Pro;m (NWP) had been
established at the University of Califomia, Berkeley. Drawing inspiration from
the practices of professional writers, the National Writing Project emphasized
the importance of repeated redrafting in the writing process and so, to aS5t'SS
the writing process properly, one needed to see the development of the final
piece through _ral drafts. In judging the quality of the work,. the degree of
improvemrot across the drafts was.s important as the quality of the final draft.
lbe emphasis on the process by which a piece of work was created. rather
than lhe resulting product. wu also a key feature of the Arts-PROPEL project-
a collaboration between the Project Zero research group at Harvard University
and the Educational TC'lIting Service. 1he idea witS that students would' ... write
pot>ms, compose their own 5Of\&5, paint portraits and tackle other "real-life"
pro;ects as the starting point for exploring the works of practising artists'
(Project Zero, 2005). Originally, it appeal'5 that the interest in portfoli05 was
intended to be primarily formiltive but many writel'5 also called for perform-
ance or authentic assessments to be used instead of standardized tests (Berlak
et al.. 1992; Gardner, 1992)
Two stales in particular, Vermonl iIlld Kentucky, did explore whelher
portfolios could be used in place of standardized tests to provide evidence for
itCCOuntilbility and some districts also .ystems in which
portfolios were used for summative a5Se'SSments of individual students.
..r, the use of portfolios was attacked on several grounds such as being
' ... costly indeed, and slow and curnbenome' and' ... its biggest naw as an
external assessment is its subjectivity and unreliability' (Finn, dted In
Mathews, 20(4).
In 1994, the RAND Corporation released a report on the use of portfolios in
Vermont (Koretz et 31., 1994), which is regarded by many as a turning point in
the uS(' of portfolios (_ fur example, Mathews, 20(4). Koretz and his team
found that the meanings of grades or scores on portfolios were rarely compa-
rable from schoo1 to school because !here witS liltle agreement about what sorts
of elements should be included. The standards for reliability that had been set
by the SAT simply cou.ld not be matchrd with portfolios. While advocates
might claim that the latter were more valid measures of learning, the fact that
the same portfolio would gel diffel'1"J'll scores according to who dKi the scoring
made their use for summative purposes impossible in the US context.
178
Assessment for Learning; Why no Profile in US Policy7
In fact, even if portfolios had been able to attain high levels of reliability, it is
doubtful that they would have gained acceptance. Teachers did feel that the use
of portfolios was valuable, although the time needed to produce worthwhile
portfolios detracted from other priorities. Mathematics teachers in particular
complained that portfolio activities took time away from basic skills and com-
putation. Furthermore, even before the RAND report, the portfolio movement
was being eclipsed by the push for 'standards-based' education and assessment
(Mathews, 2004).
Standards
In 1989, President Bush convened the first National Education Summit in Char-
lottesville, Virginia, led by (the then) Governor Clinton of Arkansas. Those
attending the summit, mostly stale governors, were perhaps not surprisingly
able to agret' on the importance of involving all stakeholders in the education
process, of providing schools with the resources necessary to do tht> job, and to
hold schools accountable for their performance. What was not so obvious was
the agreemenllhat all states should establish standards for education and they
should aspire to having all students mm those standards. In many ways this
harked back to the belief that all students would learn if taught properly, a
belief that underpinned tht> 'payment by ll.'Sults' culture of the first half of tht>
nineteenth century (Madaus and KeJlaghan.. 1992).
The importance attached to 'standards' may appear odd to Europeans but the
idea of national or regional standards has been long established in Europe. Even
in England, which lacked a national curriculum until 1989, there was substantial
agreement about what shouJd be in,. say, a matht'marics curriculum since all
teachers were preparing studCflt5 for similar sets of public examinations.
Prominent in the development of national standards was the National
Council of Teachers of Mathematics (NCfM), which published its ClllTiculllm
lind EVQlllalion Standards jor Mathematics in 1989 and ProfrssiOlral Standards for
Tnu:hing Mnthemalics two years later (NCfM, 1989,1991). Because of the huge
amount of consultation which the NCfM had undertaken in constructing the
standards they quickly became a modt'l for states to follow, and over the next
few years every state in the USA except Iowa adopted state-wide standards for
the major school subjects. States gradually aligned their high-stakes accounta
bility tests with the state standards, although the e:dent to which written tests
could legitimately assess the high-order goals contained in most state standards
is questionable (Webb, 1999).
Texas had introduced a state-wide high-school graduation test in 1984. In 1990,
the graduation tests were subsuml'd within the Texas Assessment of Academic
Skills (rAAS), a !iCries of untimed standards-based achievement tests for grades
3 to 10 in reading, writing, mathematics and social studies. Apart from writing.
these tests are in multiple-choice format. MassachuseUs introduced statl"-wide
testing in 1986. The original aim of the assessment was to provide infonnation
about the quality of schools across the state, much in the same way as the
National Assessment of Educational Progress (NAEP) had done for the country
'79
as a whole (jones and Olkin, 2(04). Students were tested in reading, mathematics
and science at grade 4 and grade 8 in alternate years until 1996, and only scores
for the state as whole were published. In 1998, howe\'er, the state introduced the
Massachusetts Comprehensive Assessment System (MCAS), which tests Sh.Ldents
at grades 4. 8 and 10 in English, mathematics, scieoce and technology, social
shldies and history (the last two in grade 8 only). The tests uS(' a variety of
fonnats including multiple-choice and ronstrocted response items.
In reviewing the development of state-wide testing programmes, Bolon sug-
gests that many states appeared to be involved in a competition which might be
called 'Our standards are stiffer than yours' (2000: 11). Given that political time-
scales tend to be \'el')' short, it is perhaps not surprising that politidans have
been anxious to produce highly visible responses to the challenge of raising
sh.Ldent achievement. Howe\'er, the wisdom of setting such challenging stan-
dards was called into question when,. in January 2002. President Bush signed
into law the No Child Left Behind (NCLB) Act of 2001.
No Child Left Behind
Technically; NCLB is a reauthorization of the Elementary and Seamdaf)' Edu-
cation Act originally passed in 1965 (in the USA much legislation expires unless
reauthorized) and is a complex piece of I{'gislation,. even by US standards. The
main requirement of the act is that, in order to receive federal funds, each state
must propose a series of staged targets for achieving the overall goal of all stu-
;n grades J...8 to bt.> proficient in reading and mathematics by 2014
(although the definition of 'proficient' is left to each state). Each school is judged
to be making 'adl."lJuate yearly progress' (AYP) towards this goal if the propor-
tion of students being judged as 'proficient' on annual state-produced stan-
dards-based tests exceeds the target percentage for the state for thai year.
FurthemlOre, the AYP requirements apply not only to the totality of sh.Ldents in
a grade but also to specific sub-groups of students (for example ethnic minor-
ity groups), SO that it is not possible for good perfonnance by some student sub-
groups to offset poor perfonnance in others. Among the many sanctions that
the NCLB mandates, if schools fail to make AYP then paJ'('l1ts have the right to
have their child moved to another school at the district's expense.
It has been claimed by some (see for example, Robson" 2(04) that NCLB was
dt."Signed by Republicans to pave the way for mass school privatization by
showing the vast majority of public schools to be failing. In fact, the act had
strong bipartisan support Indeed some of the most draconian dements of the
legislation" such as the definition of 'adequate yearly progress', were insisted on
by Democrats because they did not want schools to regarded liS successful if
low by some students (for example those from minority ethnic
communities) were offset by high perfonnance by others. However, it is clear
that the way that the legislation was actually put into practice appears to be
very different from what was imagined by some of its original supporters, and
an incre(lsing number of both Rcpublic(ln (lnd Democratic polil:icians are calling
for substantial changes in the operation of the Act.
180
Failure to make AYP has severe consequences for schools, and as a result
many schools and districts have invested both time and money in setting up
systems for monitoring what teachers are teaching and whal students are learn-
ing. In order to ensure that teachers cover the curriculum, most districts have
devised 'curriculum pacing guides' that specify which pages of the set texts are
to be covered every (and sometimes each day). With such rigid pacing.
there are few opportunities for teacheTS to use information on student per-
formance to address learning needs.
Very recently, there has also been a huge upsurge of interest in systems that
monitor student progress through the use of regular formal tests that are
designed to predict performance on the annual state tests - some reports
suggest that this may be the fastest growing sector of the education market. The
idea of such regular testing is that students who are likely to fail the state test,
and may therefore prevent a school from reaching its AYP target, can be identi-
fied early and given additional support. For this reason these systems are rou-
tinely described in the USA as 'formative assessment', even though the results
of the assessments rarely impact on learning and as such might be better
described as 'early-warning summative'. In many districts such tests are given
once a on a Friday. Thursdays are consumed with preparation for the test,
and Mondays with reviews of the incorrect answers, leaving only 4l) per cent of
the a\'ailable sub;"cl time for teaching. While the pressure on schools to
improve the performance of all students means Ihat schools in the USA are now
marl' than ever in need of effective formative assessment, the conditions for its
development seem e\'cn less promising than ever.
Conclusion
In Europe, for most of the twentieth century, education beyond the age of 15 or
16 was intended only for those intending to go to university. The consequence
of this has been that the alignment between school and university curricula is
\'ery high - indeed it can bc argued that the academic curriculum for 16to 19-
year-olds in Europe has been delem\ined by the universities, wilh consequent
implications for lhe curriculum during the period of compulsory schooling. In
lhe USA, howe'ier, despite the fact that for most of the twentieth century a
greater proportion of school lea\'ers went on to higher education, the high-
school curriculum has always been an end in itself and determined locally. The
advantage of this is that schools arc able to serve their local commu-
nities well. The disadvantage is that high s<:hool curricula are often poorly
aligned with the demands of higher education and this ha5 persisted even with
the introduction of state standards (Standards for SUcce5S, 2(03).
When higher education was an essentially local undertaking the problems
caused by lack of alignment could be addressed reasonably easily, but the
growth of national elite universities rendered such local solutions unworkable.
At the time o( its 'ossification' in 1941, the SAT was being taken by less thiln
20,000 students each rear (Hubin, 1988), and it is entirely possible thilt it would
181
have remained a test required only of those students applying for the most
selective universities, with a range of altematives including achievement tests
also in use. It would be unfair to blame the SAT for the present condition of
assessment in us schools, but it does seem likely that the dominance of (he SAT
and the prevalence of multiple-choice testing in schools are both indications of
the same convictions, deeply and widely held in the USA about the importance
of objectivity in assessment.
Once multiple-choice tests wetI,' established (and not long afterwards, the
machine marking of tests - see Hubin, 1988), it was probably also inevitable that
any form of 'authentic' assessment such as examinations that required extended
responses let alone portfolios would have been found wanting in comparison.
This is partly due to such assessments tending to have lower reliability than
multiple-choice items because of the differences between raters, although this
can be addressed by having multiple raters. A more important limitation,
within the US context, is the effect of student-task inter3roon - the fact that with
a smaller number of items, the particular set of items included may suit some
students better than others. [n Europe, such variability is typically not regarded
as an aspect of reliability - it's just 'the luck of the draw'. However, in the USA,
the fact that a different set of items might yield a different result for a particu-
lar student would open the possibility of expensive and inconvenient litigation.
Once the standards-based. accountability movenlent began to gather momen-
tum, in the 1980s, the incorporation of the existing technology of machine-
scored multiple-choice tests was also probably inevitable. Americans had got
uSt.>d to testing students for less than $10 per test and to spend $30 or more for
a less reliable test as is commonplace in Europe, whatever the Ildvantllgl"S in
terms of validity, would be politically very difficult.
However, even with annual state-mandated multiple-choice testing it could
be argued that there was still space for the development of effective formative
assessment. After all, one of the key findings of the research literature in the
field was that attention to formative assessment raises scores even on state-
mandated tests (Crooks, 1988; Black and Wiliam, 19983), Nevertheless, the
prospects for the development of effective formative assessment within US edu-
cation St.'Cm more remote than ever. 1lle reasons for this are of course complex,
but two factors appear to be especially important.
The first is the extraordinary belief in the value of grades, both as a d('Vice for
communication between teachers on the one hand and students and parents on
the other, and also as a way of motivating students despite the large and mount-
ing body of evidence to the contrary (see Chapter 4).
The serond is the effect of an extraordinary degree of local accountability in
the USA. Most of the 17,000 district superintendents in the USA are appointed
by directly-elccted boards of education, which arc anxious to ensure that the
money raised in local property taxes is spent efficiently. Under NCLB, the
superintendents are required to ensure that their schools make 'adequate yearly
progress'. The adoption of 'early.waming summative' testing systems therefore
represents a highly visible response to the task of ensuring that the district's
schools will meet their AYP targets.
182
Assessment for learning: Why no Profile in US Policy?
There arc districts where imaginative leaders can see that the challenge of
raising achievement, and reducing the very large gaps in achievement between
white and minority students that exist in the USA, requires more than just
'business as usual, but with greater intensity'. But political timescalcs are short
and educational change is slow. Asuperintendent who is not re-elected will not
change anything. Satisfying the political press for quick results with the long-
term vision needed to produce effective long-term improvement is an extraor-
dinarily difficult and perhaps impossible task. There has never bet>n a time
when the USA needed effective formative assessment more but, perversely,
never have the prospe...1s for its sucressful development looked so bleak.
"3
Chapter 11
Policy and Practice in Assessment for learning:
the Experience of Selected DECO Countries
Judy Sebba
Studies of assessment for learning in countries other than the USA or those in
the UK potentially provide a rich and stimulating source of evidena> for under-
standing practices in assessment for learning. In Chapter 9, Daugherty and
Ecdestone provide an analysis of policy developments in the four countries of
the UK and in Chapter 10, Wiliam outlines reasons for assessment for leaming
not featuring in policy in the USA. This chapter draws on illustrative examples
of practice from sell'C!ed countries that participated in an OECO study of fann-
alive assessment. Daugherty and Ecclestone quoted Ball (1990) as suggesting
that poHdes are pre-eminently statements about practire. intended 10 bring
about solutions to problems idmtified by individual teachers and schools.
Classroom practice can thus be seen as a key measure of policy implementiltion
and it is examples of classroom practice from different countries that are pre-
sented and analysed in this chapter. Some brief comments are made about poli-
cies in these countries but a comprehensive analysis of educational policies in
these countries is beyond lhe scope of this chapler.
In 2005, the Centre for Educational Research and Innovation (CERI) al OECD
published research (OEcD, 2(051) on 'fonnath'e aSSl'SSment' in lower secondary
education, drawing on case studies involVing eight countries: Canada, Denmark,
England. Finland, Italy, New Zealand, Australia and Scotland. I undl'rtook thI'
case study of Qul'CnSland, Australia which was written up with Graham Maxwell,
a senklr manager working in the locality. This chapler draws heavily on examples
from the OECD study including thOO(' case studies in Canada. Denmark, New
Zealand and in particular, Queensland. There was no significant differences
between!hCSl' countries and the others not mentioned - it is simply that these case
studies provided illustrations of some emerging themes. It is important to
acknowledge thaI the eight COuntriL'S were from Europe, North America and Au...
tralasia and did not include a country that by any definition oould be described as
'developing'. Furthennore, the chapter draws only on illustrative examples of
policy and practice and the analysis cannot thl'refOTe be claimed to provide a com-
prehensive or definitive picture of the countries involved.
The OECO study provides a basis for idt'ntifying some common themes in
assessment for learning policy and practice thaI can be compared across coun-
tries. These involve, at the most basic level, what is included in assessment for
learning ('fonnative assessment' as it is called in the study), the nature and role of
185
A.s5e55ment and Learning
feedback, self and peer-assessment, the relationship between student grouping
strategies and assessment for learning and teacher de\'elopment. ill addition to
these classroom level issues, there are contextual factors in schools and beyond
which enhance or inhibit the implementation of assessment for learning strate-
gies. These factors, such as the role of leadership, schools as learning
organizations and students as agents of change. are nol specific to developing
assessment for learning and might more appropriately be viewed as strategies for
school improvement. They do, however. vary within and aCTOSS different coun-
tries thus innuendng the capadty for assessment for learning strategies to be
effective. The links between assessment for learning at the classroom le\el,
teacher development and school impro\ement are further explored in Chapter 2.
Before considering these themes, four underlying tensions are acknowl-
edged which need to be taken into account when drawing interpretations and
inferences from comparisons of assessment for learning across countries. These
are: focusing on difference at the expense of similarities; the innu('nce of cul-
tural contexts; problems of transferability of strategies across countries; and the
m('thodo]ogicallimitations of research involVing short visits to a small St.'CIOr of
provision in oth('r countries. These issues have been more extensively debated
in the literature on comparative education (for example Vulllamy et 011.,1990),
but nl't.'d to be mentioned here to urge caution in drawing generalized infer-
ences from the examples presented.
Focusing on difference at the expense of similarities
In undertaking any comparative analysis. there is a danger of focusing exclu-
sively on differences and ignoring or under-acknowledging similarities.
Throughout this chapter, an attempt is made to draw parallelS between the
experiences of assessment for learning across different countries and to seek
multiple interpretations of both the similarities and differences observed. [t is
important 10 attempt to distinguish between differences in tenninology, defini-
tions and meaning and real di{f('rcnces in policy and practice. International
comparisons are frequently hampered by a lack of agreed consistenlterminol-
ogy, which acts as a barrier to communication, development of understanding
and the drawing of conclusions. For example, much greater emphasiS was put
in som;countries on the use of test data to infonn teaching as a component of
formative assessment, whereas others saw this as distinctively separate from
formative assessment as su,;h.
The influence of cuttural contexts
A second underlying tension concerns the nl't.'d to acknowledge the cultural
contexts, within which assessment for learning strategies are being imple-
mented) Attempting to understand the cultural differences between countries
and indeed, between different areas within countries, is a considerable chal-
lenge extensively debated in comparative education (for example VuJliamy et
186
Policy and Practice in Assessment for Learning
aI., 1990). Broadfoot et al. (1993), provided strong evidence of the influence of
culture on the educational organization and processes in different countrics.
The interaction between national cultures and educational policies and stroc-
tures adds further complexity to this. For example, national policies prescribing
curricula, assessment and accountability systems provide stroctural contexts
which may reflect, or indeed crea\(', particular cultural contexts. Vulliamy
(2004) argues that the increasing knowledge and information associated with
globalization are in danger of strengthening those positivist approaches that
threaten the centrality of culture in comparative education.
Problems of transferability
"Thirdly, and partly related to issues of cultural context, the transferability of
strategies and innovations from one country to another is a further underlying
tension. The assumption that educational innovations that have an effect in one
context will have the same or any effect in another has been challenged by
many writers (for example Crossley, 1984). The Black and Wiliam research
review (1998a), which indicated the substantial impact of assessmenl for learn-
ing strategies on students' learning and generated extensiw international inter-
est, drew on a large number of studies from different countries but was
ultimately limited to those written in English. The OECD study included
appendices of reviews of the literature in English. French and German.
However, more often the findings of a single study undertaken in one country
are taken as evidence for the efficacy of that strategy in another country. Fur-
thermore, as Fielding et al. (2005) have demonstrated, the concept of 'best prac-
tice' is contested and its transfer from one individual, institution or group to
another is much more complicated than national policies tend to acknowledge.
Methodologkalllmtlations of ntSNrdI based on short ' .,...r visits
A final underlying tension is the methodological limitations of what Vulliamy
el al. (1990) referred to as 'jetting in' experts or researchers to other countries for
short periods. The OECD study inmlved 'experts' from one country visiting
another country for a short time (1-2 weeks), and in partnership with 'experts'
from the host country visiting schools, analysing documentation and inler-
viewing those individuals and groups involved as consumers or providers of
assessment for learning. While many comparati\'e studies are probably simi-
larly Iimit.."<i, the methodological weaknesses in research design. data colle<"-
lion, analysis and interpretation and of trying to identify appropriate research
questions, construct a research design and implement it in a short period wilh
limited understanding of the context, raise serious questions. Drawing infer-
ences from a ncressarily partial picture in contexts where others have deter-
mined what dOCtlmenls are accessed.. who is interviewed and observed and
perhaps the views that are conveyed.. is problematic. For example, it is dear that
observations undertaken in two schools in QtlL'ensland cannot be assumed to be
representalive of thai slate, lei alone Iypical of Australia as a whole. General-
'87
izatioos should therefore be minimized and interpretations treated as illustra-
tive rather than indicative. Vulliamy et aI. argue that in-depth qualitative
r ~ r h undertaken prior to the full study can inform tnc subsequent study
questions and design, thereby increasing relevance, validity (for example in
interpreting tenninology) and understanding. 'They also discuss ways in which
research capacity can be developed with local populations.
Despite these limitations, the DECO smdy provided rich descriptions of a
variety of forms of practice in assessment for learning, in a range of different
contexts, which offered interesting insights and suggested that some classroom
practices and challenges may share moll' similarities than differences.
What is induded in .SHssment for ....-.,inglfonnatlv.
assessm.nt?
In the DECO study 'fonnative assessment' is defined as:
... frequent, intl!ractive s ~ s m ..nls of sludent prog1?SS (lnd undaslanding to
idrntify ltarnillg nuds (lnd (ldjust traching appropriatl!/Y. (2005: 21)
This definition differs significantly from that which is prOVided in the intro-
duction (ARG, 2002a) and which has been adopted in the national primary and
secondary strategies in the UK. The Assessment Reform Group definition puts
considerably gll'ater emphasis 00 the use to be made by learners of the assess-
ment information. The DECO definition instead stresses the adjusting of teach-
ing in light of the assessment.
Similarities in the strategies encompassed in formative assessment across the
eight countries included; establishing learning goals and tracking of individual
students' progress towards these goals; ensuring student understanding (rather
than just skills or knowledge) is adequately assessed; providing feedback that
influences subsequent teacning; the active involvement of students in the learn-
ing process. But the emphasis given to each of these and the additional strate-
gies which were included under the formative assessment umbrella varied
considerably. For example, in the Danish case study schools there was gll'ater
emphasis on developing self-confidence and verbal competence. In New
Zealand, formative assessment was linked to the Maori Mainstream Pro-
gramme within which the importance of cultUll' is emphasized through group
work,. co-construction of knowledge and peer solidarity (Bishop and Glynn.
1999). Several of the case studies included use of summative assessment data as
part of their formative strategies, even where the use has been for whole school
improvement rather than individual learning, which arguably falls outside the
ARG definition given in the lntroduction.
The nature and role of feedback
At one of the two schools in Queensland (Sebba and Maxwell, 2005) students
well' working 00 their individual assignments in the library using books, articlC9
188
and the internet to research globalization in the context of a company they had
chosen, for exampl... Nike or McDonald's, The teacher individually saw about half
of the 25 students in the group to review their progress. She asked challenging
open questions 10 encourage them to cxtend and deepen their investigations and
gave sJx:eific feedback on what they needed to target for improvement.
In each of th... twu schools, I analys...d moll' than 20 students' files from across
yeM groups and ability ranges (as identified by the teachers) in order to check if
and how grades were used. I collected examples of comment marking and
looked for evidence that students had acted upon the comments. One of the
schools used no grades at all and the comment marking was characterized as
very specific and almost always incfudL'{\ targl'ls for improvement. What distin-
guished it in particulM from the comment mMking I have experienced in
England was that even positive comments were e1aboraicd to ensure that stu-
dents were left in no doubt about why a piC' of work was so good. For example:
You girls lww done ajrlll!astic job!! Not only is your information accurate and
well-researched, bllt you have also successflllly completed tile extrnsiorl tasks.! ny
and ket7' an eye on the difference be/wee'l 'mdangm:d' and 'extinct' and wafch
YOllr spelling. Bul please keep up Ihis brilliant effort! YOII have all gone aboue and
beyond in this actiVity!! Well done!!
The comments, in purticular for less high achieving students, were additionally
often characterized by empathy and humour:
L, an C1'cellenl effort, I would like 10 join you ill your miSSIOn to Mars.
At this school, l1-year-old students said thut grades or marks were never given.
They felt that this helped them work to their own standard and not worry about
comparing themselves 10 other people. They all claimed 10 Il'ad and act upon
the comments and suggestL'd that the t...ach...r was always wiffing to discuss
them. In both schools, students claimed to read and net uJXl'1 comments written
on work and there was some evidence of this, though there were no spedfie
consistent strategies as observed in a few schools elsewhere, such as keeping a
list of mmmcnts in the front of a book and expecting students to indicate when
and where (indicnted by a page ref('rence) these have been acted upon,
However, tenchers and students identified lessons in which time wns allocated
to making revisions in response to comments given.
[n Denmark (Townshend et aI., 2(05), one case study school pul great
emphasis on v r ~ l competcnciL'S. Goal-setling and {)fal feedback were strong
features of the formotive assessment work. Orol assessment was preferrccl
because it was qUick and f1exib[e ond allowed an immediate response from the
student enabling misunderstandings to be clarified rapidly. [ndividual student
interviews took piaa' several limes a year in order to assess progress and sel
new guals focusing on SUbjL'Ct outcomes, work attitudes and sociol skiffs. As in
the Queensland schools, the lessons often incorporated periods of reflective
feedback from students to teachers which resulted in adjustments to teaching.
189
,
Students used logbooks to record their reflectiuns and these were u5ed for
teacher and student to enter inlo a dialogue. Forster and Masters (1996-2001)
provide further examples of this in their materials developed in Australia on
dC\'elopmental assessment.
Effecth'c feedback seems to be characterized by specific comments that focus
on srudenls' understanding rather than on quality of presentation or behaviour.
Oral feedback may allow for grealer exploration of understanding and be more
immediate but written comments allow teachers grealer flexibility to reflect on
students' work and allocate more time 10 this process, though the dialogue may
be delayed and stilted. [n conle:ds with a strong oTal tradition such as the
Danish case study school. the balance was more in favour of oral thJn written
fl'edbOlck. Effective oral questioning and fl't"dback 5l'i'm to Te<Juire the teacher to
be confident and competent in that subject area and to ha\'e the flexibility to 'try
another way' in Iheir questioning and feedback strategies in order to ensure that
the message has been understood. A much fuller acroun! of this issue can be
found in Black et a1. (2003).
Self- and peer assessment
In the PROllC programme in Quebec (S1iwka et aI., 2005) teaching is organized
aroundtnterdisdplinary projects with a strong emphasis on collaborative group
explordtion. All projects in the programme make exlensh-e use o( 10 for
resean:J1, reporting and assessing. At the start of each proiect students identify
their individualleaming targets and at "'gular intervals they an> given time for
writing reflections on their own leaming. their learn learning and the achieve-
ment of their targets. These written reports are the basis for future target setting
and choices. Peer assessment is used 10 give feedback on each others' regular pre-
sentations of work and Of\ teamwork skllls. Students reported needing to adjust
10 the level of autonomy expected compared to their previous schools: 'You
understand that you are responsible, you are in charge' (SHwka et aI., 2005: 102).
In Saskatchewan (Sliwka et aI., 2005), one school uses electronic learning
portfoli . with younger children to record their own learning. They keep exem-
plary pieces of work, scan in art work and are taught how to assess their own
work. The same researchers noled thai in a school in western Newfoundland
portfOliO!> are similarly used by students to record Iheir best work alongside
reflective journals. In pairs, Ihey \Ise criteria prOVided by the teacher 10 give
each other feedback on ways of improving their quality of writing in English.
These practices an> used fonnatively to support continuous learning but may
also contribute 10 summative purposes of assessmenl.
Teachers and students in Queensland have had to ..Idapt to Ihe development
of an outcomes-based assessment system in which th;.ore is no testing. Teachers
are trying to ensure thai the students are aware of and understand the outcome-
based statements and can assess themselves against the standards. When inter-
viewed, the students described reflection time as a featllre of most lessons. This
involved Ihc use of their leaming ioumals in which questions to be addressed
190
I
Policy and Praeti((' in AS50eSSment fO( learning
included 'what do you understand about ... l' They gave examples 01 marking
each others' work and giving each other feedback on written work. Self and
peer-assessment was a strong fealure of the lessons observed in Queensland.
Every week there was an allocalro time for Year 8 and some Year 9 studml5 to
reflect on their leaming. working with others and their experiences, and to
write comments aboul it in their learning journals. Teachers Vo-ere allowed to
read these but nol allowed to write in them,
In another lesson OIl this school. OIl the end of each artivity, the students were
invited to assess it on difficulty; student feedback determined whether the
teacher moved on to the next activity or gave a further explanation of the pre-
vious one. Students and other staff interviewed confirmed that reflections of
this type were regularly built into lessons.
In Queensland, peer assessment was less well devclopt."Ci than self-assess-
mmt in the lessons observed. This may reflect the additional demands on teach-
en and students of peer assessment and the skills thai we have noted elsewhere
needing to be taught (for details of the skills, see Chapters 1and 5; also Kutnick
el a!., 20(5). Feedback between students tended to be at the level of whether an
outcome was corTed or not. rather than indicating how to improve it. Students
were encouraged to reflect on how effectively they had worked as a group, as
well as how well they had completed the task. One student had entered the fol-
lowing commt.'Ilt into her journal:
Ytsttrday my group and f madt dijfeTrnl shapes o/a urtllin siu Oul oll/n4l$pnper.
I got frustrated Tloht:n nobody would IisttPI to mt. Bul wefiniwd a squart and tlt'O
m:tansles. Listtn. Nom: of our gmup t ~ s listtPled 10 tllch other, Wt 1I1/1uld
idras but wouldn'l explain thml. Thm il would all tnd up in a mess.
In both schools in Queensland there \Vas a strong ethos of developing lifelong
learners rather than only getting students through school together with the
!;('1lior urtificate tlwy received on leaving. This was reflected in a strong focus on
leaming to learn and on students taking responsibility for their own actions, Self-
and peer assessment were seen by students and tearners 10 be contributing 10 this
but it was acknowledged that they required skllls that had to be taught
The relationship between student grouping strategies
and assessment for learning
Assessment for learning encourages students to develop greater responsibility for
their own learning but also to regard their peers as a potential re5OUK\l for k.,.rn
ing and thus to become I('SS dependent on the teacher. Assessment for learning
l't.'<juires effective group work,. in particular in lhe area of peer assessment. Well
dewloped peer assessment is very demanding on students' social and communi-
cation skills, in particular listening. tum-laking. clear and concise verbal and
written expression. empathy and sensitivity. 1here is substantial evidmce that
group work skills need to be taught (for example. see Kutnick ct al .. 200S for a
19'
and Learning
review) and that highly eErective groop work takes a long time to develop. Teadl-
ing shJdents to self-rclIe<1 was observed as follows in a school in Queensland:
Si:ctun Yrar 9 pupils in II PSHE fpcwnllill/ld socilll hea/lh tduClllimrl It$5(Jn
u'l11Ud i" Stl/-stltdtd groups of four. Tht school Iuut II 'buddyinS' for
incoming YNr 7 pupils whtrdly they had a" idtnlifitd 'buddy' fro'" YtlIr 10 htlp
Stltlt IlItm inlo schooJ. Most pupils in this Ytvlr 9 c1/lSS Iuld IIpplitd to bt
and wtrt to bt inttrVitwtd to 5tt ifl/lty art suilablt jor this rolt. Tht Ittlc:htr IISktd
tht groups to idtnlify u>lult cllllrllcttriSIU:s 'budditS' nmi. Sht g/lV't tht'm 1m
wdiscuss thi$ and drllU' up a lisl. Sht i"vited Iht groups 10Jttdbock. Sht
IIrtn i"lIiltd IhmJ '0 spmd 10 minults working 0111 wllllt qutSlions Iht i"ttr't'itw-
t'r5 u'Ollld ask,hnn to drllU' out, whtlhtr tMy Iuut Ihl$l' sl";/Is lI"d hO/.lllhty uJould
IIIlSII'tl'" tht'Sl qunlKms. Ihttnd oflllt Ihird ,illily 1I11dJttdbllCk * ttSktd Ihnn
wlIS5t5S haw IMy rrorktd i" Ihdr groups lind inlliltd fttdblld.
Despite the challenges of group work, students and teachl:>rs alike in the' two
Queensland schools reported bet"lefidal outcomes of using tht>se stralL-gi6. In
interviews in one school the shJdents claimed that they worked in groups for
about half the lessons and that this helped them to de\'elop their understanding
through testing out their ideas, examples and explanations on others. They sug-
gested that the disadvantages of working in groups included having' ... to
work with people yOll don't like, who hold you back or mess about'. Overall.
they felt that the advantages oulwi:'ighed the disadvantages and favoun.-d the
mixed-ability groups that they usually experienced:
I rtWm it's imporlanlla Iultlt pmp/t workinS tCJgtlhn' lit dilfrrtnt In""s, lhen Iitt
ptOplt lit hishtr Ittltls alii INCh lilt Pft'p/t at /.ower lnJtls in Ihtir own UJIIY. In Iht
rtaJ fl'Orld you u.,rt with di/ft:mlt Pft'plt, you who you work
unlh lind working with otll" ptOp/t you dont know Iltlps (Stbba and
Maxwtll, 2005: 202-3)
In II school in Denmark (Townshend et aI., 2005), 'core groups' prOVided oppor-
tunities for reflection on goals, effort and outcomes. The students ga\<t' each
other oral feedback. recorded their views in th...ir logbooks and cvaluatl.'d one
another's academic achievements and presentational skl1ls. This was done for
each pro;cct thai School leaders reported that students were
more competent at reflting on their own learning, identifying their targets
and engaging in social interaction.
Teacher development
The definition of formative assessment in the DECO study rerers to adjusting
teaching in response to feedback about and in this sense, formativt'
and teacher development are inextricably linked as emphasized in
Chapter 2. Many of the case studies refer to evaluation of teaching. For
192
Policy and Practice in Assessment for learning
example, in one school in Denmark Townshend et aI. (2005) noted that teachers
evaluated subject courses in teams as part of their departmental meetings. This
enabled lhem to compare the same student's progress in different sub;ects. The
focus of these evaluations was decided by prior agreement between teacher and
student. In this way, teaching as well as students' progress were assessed.
In Quebec. the university teacher educators acknowledged that in lhe
PROTIC programml' (Sliwka 1'1 aI., 2(05) leachers have to adapt to a shifl in
control and responsibility for learning from teacher to student, as noled in
earlier chapters. lrn:>y are also reqUired 10 recognize that they are nolihi' only
SOUTO:' of knowledge in the classroom. Thl're is evidenct' that the PRQTIC
approach has had an impad on the teaching approaches used by other Il'ach
ers, such as those in Saskatchewan who reported that whereas previously they
had worked in complete isolation,. they are now much more interested in
working together and sharing resources and have developed a clear focus on
how students learn since introducing formative assessment. This professional
development issue is developed further in Chapter 2.
Teachers at one school in Queensland (Sebba and Maxwell, 2005) shared
pieces of work and discussed comments they had made on them as well as the
work itself. They reported that this challenged their thinking and developed
their practice. Heads of department saw this as professional behaviour for mod-
eration purposes rather than monitoring of marking for accountability pur-
poses. It was seen as relatively easy to do in a small school in which
departments are not isolated.
In the case study school in Denmark (fownshend et al., 2005) teacher devel-
opment is supported through a centre for teaching innovation based in Copen-
hagen.. eslablished specifically to develop innovalory practices and share these
with schools across Denmark. They plan and provide in-service courses for
teachers, support school improvement in othtor schools and publish materials in
accessible professional journals. Teachers are challenged by self-evaluation in
that they are concerned they may be insufficiently 'objeclh'e' but there is evi-
dence of a continuing drive for more secure and effective teaching approaches.
School leaders in the case study schools acknowledged the cultural change
required 10 impleml'fll formative assessment slrall'gies effectively.
SChool improvement conteJrtual factors
The role of leadership
II was a feature of a number of the schools in the DECO shIdies that the head
teacher or principal had re<:ently changed, and thai the school had bt.."t'n restrue-
hired and significant new whole-school iniliath'es introduced. New managers
often proVided the impehIs for change or were appointed 10 provide this, bUI
frequent changes to senior management teams can be a threat to longer-term
suslainability of new initiatives and assessment for learning stralegies is nol
exempt from this. There was evidence in one of the schools in Queensland that
the changes brought about through assessment for learning including shIdent
193
self-reflection., group work and comment marking had become embedded in
the infrastructure of the school and could thereby 'survive' some degree of
senior management change.
Developing schools as learning organizations
The schools where significant progress had been made were characterized by
an ongoing commitment to further development. They did nol express views
suggesting they thought they had reached their targets and could reduce their
energies. Consistent with research on developing schools as learning commu-
nities (for example, McMahon et aI., 2(04), these schools recognized the impor-
tance of engaging with the wider community beyond teachers with many, for
example, having well-developed. mechanisms for ongoing dialogue with
parents about formative assessmenl. The emphasis on teachers working in
teams and helping to develop one another seems to have been another feature
of these schools, which is an issue considered in Chapter 2.
Students as agents of change
There was some evidence of students' awareness that their school was different
to others and of linking this 10 aspects of fonnative assessment. For example, in
one school in Queensland students that teaching strategies compared
\<ery favourably to those used in other schools attended by their friends, sug-
gesting that other schools reliL'<i more heavily on worksheets and students
received less full explanations from teachers. Students in one of the Denmark
schools noted that they enjoyed beller relationships with teachers and thai
instead of just 'gelling grades', they were engaged in a process with their teach
ers which en.. bled them to gel to know them beUer and to discuss expectations.
There was, however, liule evidence from the DECO case studies of well-
developed. examples of students acting as change agents in their schools in the
realm of formative assessment. Fielding (2001) has proposed levels of engage-
ment for students in schools that enable them to become 'true' contributors to
or even leaders of change. For example, students might be expected to chal-
lenge the school on why peer assessment opportunities provided in one subject
were not created in other subjects. This would seem to be II potential next devel-
opment for the schools
The impact and relevance of policy
Australia has a national curriculum framework based on agreed national goals
stilled ilS providing young Austrillian<; with the knowledge, skills, attitudes and
values relevant to social, cultural and economic needs in local national and inter-
national settings. This includes a commitment to an outcomes based approach
across eight key learning areas and an emphasis on developing lifelong learners.
The Queensland Studies Authority published principles (QSA, 2005) that empha-
'94
size that assessing students is an intt.'Sral part of the teaching and learning
process and that opportunities should be provided for srudents to take l'e'>ponsi-
bility for their own learning and self-monitoring. TIle syUabus documents rec-
ommend that assessment be continuous and on-going and be integrated into the
learning cycl.... _ that is, provid.... the basis for planning. monitoring student
progress, providing feedback on teaching and setting new learning targ.... ts. One
of the principles developed by the Assessment Reform Group (ARC, 20013), and
presented in the Introduction. relates to the need for assessment for learning to be
pari of effective planning for teaching and learning.
There is no external testing or examining in secondary schools
in Queensland. Reporting in y.... ars 1-10 is currently a school responsibility and
is unmoderated. School-based assessments for the Senior Certificate (Year 12)
are currently moderated by subject-baSt.>d panels of expert teachers, prOViding
advice to schools on the quality of their assessment and judgments based on
sample portfolios.
There are important contextual policy factors that seem likely to support the
practices observed in the two Queensland schools. Perhaps the national cur-
riculum framework, less prescriptive than that in England, is a useful contex-
tual factor enabling a strong focus on formative assessment by reducing the
necessity for teachers to dt.>Iermine what to teach. The lack of extemaltests and
examinations passed without comml.'tlt in the teacher interviews in the Queens-
land schools, yet as a systematic review of research on this has shown (Harlen
and Deakin Crick, 2003), high-stakes testing and publication of results af(> asso-
ciated with teachers adopting a teaching style that favours transmission of
knowledge. This not only reduces the use of teaching approaches consistent
with assessment for learning but is likely to consume energy that
teachers COtlld instead direct into assessment for learning. Finally, the status
ascribed to teacher summative assessment in the Queensland system suggests
that formative assessment is better recognized than in some other systems, for
both its contribution to learning and to summative assessment. Its role as a key
professional skill for teachers, as advocated in the principles in the introduc-
tion, is recognized in this system.
In Denmark, the 200J Education Act introduced an outcomes-based curricu-
lum defining competencies for all srudents. Schools are required to publish
annually the results of average grades on their websites, though these seemed
not to take account of prior attainment and were thef(>fof(> regarded by those
interviewed as f(>f1eeting intake and not effectiveness. There was no evidence
that this accountability framework was inhibiting the developments in forma-
tive assessment in the schools. At the time the case studies were conducted
(2002-3) the Ministry of Education had just changed the definition of the role of
headteachers, so in one of the case srudy schools the head had used formative
assessment as part of the change str.ltegy adopted.
Educational policy in Canada is set at provinct'!territory level. At federal
level, monitoring across the provinces takes place but curricular guidelines are
produced in the provinces and territories and these often emphasize learning to
learn skills. In western Newfoundland, the mandate from the Department of
195
Education and school district that required test (attainment) data to be the basis
for school improvement has influenced the developing focus on analysing
progress and addressing the needs of 'weaker' students, partly through forma-
tive assessment. Initial resistance was followed by a gradual change in culture
and analysing data is now a key focus of staff development activities, closely
linked to evaluation by teachers with individual students. This ll'flects the
Assessment Reform Group principle (ARC, 2OO2a) of promoting a commitment
to a shared understanding of the criteria by which studt.>nts all' assessed. The
tension for teachers ll'mains how to ll'concile the demands of summative
testing with formative assessment, though this does not seem to be severely
limiting developments in assessment for learning.
conclusion
Despite considerable differences in cultural contexts across the DECO case
study schools, what teachers do at the classroom level may be surprisingly
similar. For example, the feedback provided to students on their work, the
development of self and peer-assessment and the implications of formative
assessment for group work have overlapping practices across countries. Per-
ceptions of these, however, by students, teachers, senior school managers and
teacher educators may differ as a result of the considerable differences in
national policy contexts. The national assessment and curriculum framework,
accountability mechanisms and underlying values ll'flected in these .. bout the
purposes of education. lifelong learning skills and skills for employability, will
enhance or inhibit to different degrees the teachers' capacity to adopt, imple-
ment and sustain formative assessment practices. Furthermore, as has been the
experience of school improvement in general schools that are effective at imple-
menting a specific strategy, in this case formative assessment, are often those
which can redirect the requirements and additional resources made available
through national policies to support their established plans.
Note
1 I would like to acknowledge the extensive writing and editing that Janet
Looney of DECO contributed to the report.
I
196
Assessment for Learning: A Compelling
Conceptualization
John Gardner
At a seminar in 1998, hosted by the Nuffield Foundation at Irn:.ir London head-
quarters, the Assessment Reform Group launched the Black and Wiliam review
pamphletillsidr tile Box. The review itsell. and the pamphlet, immediately
attracted critkal acclaim and have continued to enjoy significant impact on
assessment thinking throughout the UK and further afield to the present day.
However. one moment in the event sticks out clearly in my memory. After the
main presentation. a senior educational policy maker stood up and declall"d
that he had heard it all before; we had nothing new to offer. Indicating.. with a
glance al his watch, that he had commitments elsewhere he promptly left the
seminar before the discussion proper gol undenvay. My immediate urge was 10
rush after him and say 'Yes, you are absolutely right! But il seems to us that,
powerful as it mighl be, formative assessment is actually off the schools' and
policy-makers' radar! Surely we need to do something quite urgently if we are
to reap the benefits we know are there?' I resisted the urge and instead a year
later, at the same \'enue and with the same sponsors, we injected the urgency
we all felt was needed. We launched the pamphlet for Learning:
Beyond the Black Box. This pamphlet deliberately and directly challenged official
complacency and inertia.
Six years on, the Assessment Reform Group can now record an impressive
list of dissemination successes and official endorsements of assessment for
learning from, for example, the Scollish and Welsh governments, the
curriculum and assessment agencies of England, Scotland, Wales and
Northern Ireland, and from overseas jurisdictions as diverse as Hong Kong
and the Canadian province of Alberta. However, in contrast to the situation in
Scotland, Wales and Northern Ireland the policy agenda in England n'mains
somewhat hamstrung.. with an accountability focus driving assessment policy
and specifically with schools being evaluated on the basis of the performance
of their students on external assessments. The ensuing and controversial
'league tables', which purport 10 indicate the relative quality of education in
the schools concerned, arguably increase the emphasis on 'teaching to the test'
as schools focus on raising their students' performance in external tests and
assessments. There is evidence that the richness of the delivered curriculum
suffers and that the pedagogic techniques associated with assessment for
learning are neglected.
197
I
Paradoxically; assessment for learning's central message, prompted by the
research 'rview of Black and WiJiam (l998a) and disseminated vigorously by
the Assessment Reform Group, is thai o\'erall standards and individual per-
formanC(' may be improved by actually emphasizing formative aSS<-'SSffienl
techniques such as student self-assessment, negotialion of le.lfning goals and
feedback to identify next steps. This message is now squarely on the 'radar' of
English schools as it continues to generate interest at the grassroots level.
attrdcting official endorsement in major areas such as the Key Stage 3 national
curriculum and in the professional development publications of the Qualifica-
tions and Curriculum Authority (QCA, 2(04).
Much progress is therefore being made, but let me return for a moment to the
observations made by our disappointed seminar guest above. [ readily mncede
that the principles and processes of assessment for learning are not novel in any
real sense; indeed they have a fairly lengthy pedigree in curriculum and assess-
ment developments in the UK. [ could reflect on Harry Black's work with teach-
ers in the early 1980s (Black, 1986) or [ could cite the work by Harlen that led to
the publication of professional development materials under the title Match and
Mismatch (Harlen et aI., 1977), to illustrate the point. Such sources would be in
keeping with the book's primary fOOlS on schools but I will illustrate the
breadth of Te<:ognition of the principles we espouse with an example from post-
compulsory (vocational) education. The quotation that fol1ows could conceiv-
ably have appeall'd at any time in the last seven years since the publication of
I/lside the Black Box (Black and Wiliam, 1998b) and the subsequent Assessment
Reform Group outputs: Assessment for vaming: lkyond the Black Box (ARG,
1999); Assessmrot for /.Laming: 10 Principles (ARC, 2002a) and Testing, Motivation
Qud vaming (ARC, 2002b).
However, the quotation I reproduce below was actually written in 1986 by
Pring as part of an analysis of developments in vocational curricula, initially
sponsored by the 1979 publication of the Department of Education and
Science's Further Education Unit's A Basisfor ChoiCt'. He argued that a number
of implications for assessment had begun to emerge in the wake of the various
initiatives in post-mmpulsory qualifications and summarized them as follows:
First, what had 10 /If' Qssessed was dijfrrellt. A curriculum that stresses pUs<mal
drtrl!lopmellt, social awartness, cooperative learning, Ilmblem solfling, is seeking to
assess dfJfrrent qualities from those in traditional forms of examination.
Secondly, the purpose of assessment was different ... the main purrese of
was the diagnosis of learning neals with a fliew to promoting the
process of learning. It is difficull 10 provide wdl-inji:mllfd guidallu, and const-
qllent I/egotiation of further learning txpriences, wi/hollt some assessment of
wlrat silidents know or c.m do. The1?fore, it was recommended Ilrllt lire QSsess
ment sholiid /:Ie plIrt of QcontinllOus, formative prvfilf of lhe experiences aud
achievemellts of the student. Fur/hermore, it was envisllged that this profile would
/:Ie the basis of regular teacher/student diS{;ussioll Iwd guidQnce of educational
l/rvgress. ... The rQdicQI difference lies not only ill the contenl of what is tllugh/ but
aloo in the processes of learning and thus the demands upon assessmenl. hI its
198
I
I
for Learning: A Compelling Conceptualization
Rnources Shul ... the Joint Beard /City aud Guilds of Londo" InstituU and lhe
Business and Technician Education Council] says:
'Ifthe individual student is to bt enabled to make the most ofhi$/he-r programme,
the quality of IIII! /lSStSSml!nl syslem aud ils link with supportive guidance will /It
critical. Most of the assessing will bt formative; that is, a rtgullir JetdtHlck on per-
jormonc:e to the students from allllu8 involved ... '
Assessment is thus tied to guidance, negotiotioll, aud the llS!iumptio/r of respell-
sibilily for olle's oum learnillg. (Pring, 1986: 13--14, emphases in originll1)
There are many such examples, over time, of the acceptance that the classroom
assessment techniques oomprising assessment for learning are broadly 'good
things' (0 do. HowcV('r, the specific intention of this book has been to ground
this 'goodness' in a credible argument that draws its authority and explanatory
power from sound empirical and theoretical contexts. The central arguments
have emerged in \'arious ways throughout the chapters, using research evi-
dence and theory to explain and support the points made. We allempted
to address the specific education-related aspects of assessment for learning but
clearly there are many more oontextual issues that have a bearing on practice,
policy and indeed perception. These include the dominance in some quarters of
summative assessment and the use of data from student assessments for
accountability purposes. The various educational and contextual key issues
may be categorized as follows:
Oassroom pedagogy;
The essence of assessment for learning;
Motivation of learners;
Atheory of assessment for learning;
Assessment myths, misunderstandings and tensions;
The complexity of influencing policy and policy makers.
Classroom
In Chapters I and 2, Black and Wiliam, and James and Pedder, respectively
relate insights gained from major research projects into the practice of assess-
ment for learning in the classroom. Chapter 1 offered examples of
techniques for feedback. for self-assessment and classroom questioning which
were developed by the teachers, but it particularly struck a chord in illustrating
how'... aopplying research in praocti<'C is mud> more than a simple process of
translating the findings into the classroom'. In true developmental fashion the
application in practice led directly to more research insights, especially in the
oonlexl of teachers' professional development. In Olapter 2. James and Pedder
took up the professional development baton by conceptualizing it in terms of
the ten principles (or assessmenl for learning. with a specific focus on the prin-
ciple that it should be regarded as a key professional skill for teachers.
However, they issue stem warnings that radical changes are needed in teach-
ing and learning roles, that teachers need to learn new practices and that the
'99
various changt."'S need to be encuuragt."<i by a supportive culture of continuuus
professionnl development. Their warnings continue: teachers' learning is not
strnightforward nnd there are serious dangers of much of the assessment for
lenming gains translnting to superfidnl practice if teachers do not engage
actively with the ideas and practia>s, and if the environments in which they
work du not nclively encourage inquiry-based modes of proft."'Ssional develop-
ment in the classroom. To paraphrase one of their mesS<lges, a true assessment
for learning context for teachers is one in which they take responsibility for all
aspects of their professional development, giving new meaning tu the old
expression 'self-taught'.
The essence of assessment for learning
Almost every chaptl'r in this book addresses at Il'ast suml' uf the CUrl' issues uf
assessment for knrning, including practice and theory nnd what some might
term its antithesis - assessment of learning or summative assessment. However,
it is the poli<:y chapter by Sebba (Chapter 11) that particularly focuses on the
commonality uf p r l i ~ in formativl' assessment a<:russ several natiunal and
cultural boundaries. Key ingredients of assessment for learning induding peer
and self-assessment, fccdba<:k to support lenrning and effcctive questioning OTC
all in eviden<:c, but here too 5cbbo issues several warnings.
These indude an echo of Black and Wiliam's and James and Pedder's identi-
fication of the crucial nt.'<.>(\ to ensure appropriate teachers' professional dewl-
opment. In terms of the students themselves, she also identifies the need to
teach peer (spcdfi<:ally group) assessment skills in areas such as verbal expres-
sion, sensitivity, tum-taking and listening. Scbba's chapter demonstrates a
welcoml' harmony in aspirations and understanding relating to formative
assessment across a variety of cultures and organizational contexts.
Motivation of learners
A theme that plays out through every SUCQ'Ssful instance of assessment for
learning is the motivation 10 learn that it generates among students. It is
arguably uncontentious that students arc motivated to learn if they partidpate
in developing their learning activities, if they know how their work will be
assess<-'<.1 and if they are involwd in aSSL'SSing it with their fX-'Crs. Perhaps it is
also unnecessory to point out that students' motivation is enhanced by the
ability to engage readily with their teacher, 10 receive fccdba<:k that supports
the next steps in their learning or by being involved in draWing up the criteria
against which they will be asseSSL'<.1.
Why then are these classroom processes not more commonplace? Harlen's
Chapter 4 docs not seck to answer this question but she focuses TCsearch
eviden<:c on the types of circumstan<:cs in which assessment has deleterious
effects on students' motivation and provides the Assessment Reform Group's
conclusions on how the worst effects can be avoided and the motivation of
learners enhanced.
200
Assessment for Learning: A Compelling Conceptualization
A theory of assessment for learning
A central aim of the book has been to cxplore a theorc!ical understanding of
assessment for learning and James's Chapter 3 provides an a<X'CSsible foray into
the main learning theories from which a putative theory of formative assess-
ment might spring. This daunting task is then taken up by Black and Wiliam in
Chapter 5. Grounding their tentative theory on the bnsis of the experiences of
the KMOFAP project, nnd supported by EngcstrOm's Activity System theory,
they make no bones about the compleXity of the task. That snid, the four com-
ponent model they expound in this chapter offers an approach to theorizing
assessment for learning. Blnck and Wiliam nrc. however. the first to concede
that much more needs to be done in terms of understanding practice before a
more comprehensive theory can be ilchieved.
Assessment myths. misunderstandings and tensions
I see the offerings in this ciltegory as addreSSing the amtextuill factors that
impinge on as&-'Ssment for learning nnd its practices. First there is the relation-
ship between assessment for learning and assessment of learning, or put another
way, between formative and summative assessment. Haden's Chilpter 6 is about
purposes and the manner in which the purpose ordained for any spedfic assess-
ment activity will ultimlltely distinguish it as serving Icaming (assessment for
lenming) or as providing a measure of learning (summative aSS('ssment). This
leads to a variety of tensions, not least about whether one assessment can provide
evidence for both of these purposes. The optimistic view, that perhaps they can
be, is unpickcd in some detilil by Hilrlen who condud('S that there is an asym-
metrical relationship Ix'''''e\'I"l evidence gathered for summative and formative
purposes, and that this curtails the opportunity for dual usnge.
Purposes are also key determinants in the arguments for reliability and valid-
ity and this link leads to thL'SC underpinning concepts being examined in Chap-
ters 7 (Black and Wiliam) and 8 (Stobart) respectively. Black and Wiliam's chnpter
provides a grounding in reliability theory, illustrated by examples, which debunk
the myth that cxternill summiltive testing is de facto reliable. If there exists criti-
cism that ilS5eSsment for leilming is inherently unreliilble because it involves
teachers with their subjective biases, for example. the nntithL'Sis that summative
assessment is reliable becnu.sc it is more 'objective', for exnmple, is seriously
undermined. Stobart's messnge is equally uneqUivocal. He draws out the purpose
of the ilsscssmenl as the milin driver in determining its validity. Formative assess-
ment is by definition dL'Signcd to support learning. To be considered valid, then,
it must lead to further lenming. It is as simple as that. Or is it?
Stobart's chapter is liberally sprinkled with caveats and cautions. The
'upstream' factors of nationill culture ilnd sociill context can create circum-
stances in which some of the key assessment for learning features might actu-
ally threaten the learning support they are otherwise designed to deliver. For
example, in cultures where it is the norm, individual feedback may enhance
201
AsSMsment and learning
[earning. while in others where task-related feedback is expected it impact
negatively on the leam<>l'S. Peer assessment may simply be unacceptable in
some cultures. Where highstakes lTansitions ex.ist (for ex.ample, enlry to uni-
versity) oomment-only feedback may be seriously conlentious and may strug-
gle to ilchieve its lim of supporting anxious students (or their parenls!) 10 next
steps in their learning. Nevertheless.. as a first approximation. the basic tenet of
successfully supporting learning remains the main validity check for assess-
ment for learning.
The comp5e.ity of Infll.Mf'Kint _nd milkers
Herein lies the rub. It ofl!rl mallen little if academic resear<hers 'know whal's
right' since the resources and other support, which they or practitioners might
may not be forthroming if the pre\'ailing policy is blind to the issues, or
worst", undef3tands them and dt'liberately Ignores lhem. Even if the researdl is
irrefutable, Rist warns that:
Wt IIrt wt/l ptlSl the Unit whCl it is possible to IIrgUt thllt good rtSUrch will,
MIIUS(' it i$ good, influtna 1M l:ooiicy I'racess. 1Mt kind oflintllr rtlRlioll5hip of
rtSblrch 10 lie/ion i$ not # uiabu U'Vly to think #ooul how knowltdgt am inform
dtrision nUlkins. "l'M rtIl1liorl is ooth mort subllt lind m01l" 'nluous. (2000: 1002)
It certainly helps if the professionals find ways of doing what they think is
'righi' anyway and in the UK a variety of bottomup pressures from grassroots
praditiOl>ers, not 'eIlSl through the activities of the A55essment Reform Croup
and our many collaborators, has brought considerable success in influendng
policy dt-velopment. But the pl'OCeSS is exceedingly complex..
Daugherty and Ecclestont'" Chapter 9 spells out the compleXity underlying
the suooesses of assessment for learning to date, with theory-based explanation
where appropriate and with factual detail in terms of the four countries of the
UK. In esserlct' they argue that the process of change is slow but once it grips,
governments can eMd quite radical polidcs. \Viliam's Chapter 10 paints an
entirely diffmmt picture of the assessment agenda in the USA. Here change is
more or less limited to refioements of long-eslllblished variations of summatiw
assessment, much of it geared to high-stak.l"s selection. Assessment for learning
is barely on the hOril.on, queued other learning oriented activities M1ch
as portfolio assessment. Where formativt' assessment does appear, it is more or
less iI collation of frequent or continuous assessments (for example, in a
lie) that constitute iI form of summative assessment, albt'it p;'rhaps in II more
valid manner than an end-of'year lesL
Assessment for learning: the concept
Any book coveTing the practice, theory and policy relating to a given educa-
tional ronocpt might Cl')flivably dilin\ 10 provide a comprehensive analysis of
202
Aueument for Learnil"lg: A Compelling Conceptualization
that roncept. We do not make such a claim for this book on assessment for
learning because the extent of existing knowledge and understanding of such a
complex process and set of techniques is still in its early stages. We might claim.
however, to have assembled an authoritative account of what is known today,
however inadequate the extent of this knowledge and understanding might be.
Drawing as it does on the work of many researchers and practitioners. as well
as our own. this is not an unreasonable claim. We will leave this for other! to
judge. What w(' can say categorically about aSSE'SSl'nent for learning. hoWE'ver,
is that it is more often than not a fundam('ntal ('lement of any sucressfull('arn
ing cont('.:1.
A d('('p appreciation of this fad was brought hom(' to me very d('arly in a
recent presentation I aUL,"ded on assessment for learning. Tho.- presenters were
two tl'arne.rs, Margo Aksalnik and Bev Hill, from a Rankin lnll't school in the
Nunuvut Territory, a new province established in northern Canada in 1999. The
main illustration in the talk was of the national symbol of the Inuit people, the
Inukshuk. An lnukshuk is a person-like construction of medium-sized rocks,
which has been used by the Inuit people for millennia as a means of guiding
wayfarers in the treeless and landmark-less expanses of northern Canada. 1heir
various uses include giving directiol\5 to good fishing waters or simply reas<-
suring the wayfarer that others hav(' passed the same way, and that they are on
the right path. A reproduction of the illustrative model used by tM two teach
ers is presented in Figure 12.1.
SII-.. 1n"Ilh..,
Dr '. , _
Rint .. ,
\
""""
,_.
...
...
"-
...
"'-
...0 .. , ...
........
(
'.... 01
..
J
""
1' .....
203
As can be St->en, they plaaod assessment for learning squarely in the set of
main ingredients designed to create a school with a culture of success. The other
clements included teachers, their planning of the learning activities, their teach-
ing and assessment strategies, their capadty to rencct about their own and their
students' learning. and the resources they bring to the learning environment.
Outside of the classroom, additional elements include professional develop-
ment and team sUpJXlrt for the teachers while outside of the school, the positive
involvement of parents adds to the re<:ipe for success.
It is arguable that other asJX-'<:ts of a successful school could be fmUld to pop-
ulate the Inukshuk's frame; successful sporting programmes or a students'
roundl for example. No doubt they lmd other features of successful schools are
also somewhere within the model, but the community-based context in which
the two teachers introduced assessment for learning to their school dispelled
any notion that its inclusion in the lnukshuk was either whimsical or contrived
(or the event (a seminar on assessment (or learning on Vanrouver Island). They
rlXountcd that:
The Elders mel 10 consider tlrese new approaches and had the conapt of assessment
for learning e.tplailled to them. They then came lip with a word to identify tire
dynamic - the rCSOllanCe - behllCen teaching, learning and assessment. (AkSlllnik
and Hill, 2(04)
This new word, in the lnuktitutlanguage of the Inuits, is
and is wrillen in Roman form as llIitaunikuliriniq (or in sound form:
nee-qu-lee-ree-nee-kay). Most non-Inuit educationalists will have difficulty
articulating this word but they will not fail to empathize with the assessment
for learning aspirations of this small community in Canada's frozen north.
Conclusion
Throughout all of the tex! in this book, the aim has been to argue the case for
the importance of a5St-'SSment as a means of enhilncing learning. The argument
has been bncked up by rensoncd explanation, empirical evidence and tht.'Qreil-
cal nnnlysis. Borrowing Fullan et nl's phrnsc, we offer what we hope is a 'com-
pelling conceptualization' (2004: 43) of a type of assessment thnt is specifically
designed to serve learning - and which impacts positively on thrt.>e key areas of
education: classroom pedagogy, the quality of students' learning experiences
and the insights that underpin assessment JXllicy formation.
204
References
AAlA (2005<1) A Critique of llu ASSl'S$IIlent for LLllrning Mllttrillis in 'urelll'lJU lind Enj(Jy-
ment: !.tarninS "nd TtllChins in Ihe Prim,,')' Yro",'. Birmingham: Association for
Achievement and Improvement through Assessment.
AAIA (2OO5b) MII7lllsing AsstSSm"nt for !.taming. Birmingham: Association for Achieve-.
ment and Improvement throogh Assessment.
ACCAC (2004) Re"lnu of the School Cllrncllium"nd ASStSJi"...nl A...."n8'"menls 5-16: A ...".,rt
to the Welsh I15snnb/y gsromtnrntt. Cardiff: Qualifications. Curriculum and Assessment
Authority for
ACT (2004) available at hllp://www.act.orglnewsidataJ041data.html
AERA/APAINCME (1985) Stllndards for EducatioMal and P.yrholpgirlll Tnts. American
Educational R"""arch Association/American Psychological Assoriation/Nalional
Council on Measurement in Education, Washington DC; American Psychological
Association.
Abalnil<, M. and Hill, B. (2004) Oral presentation to 'Making Connections', Assessment
for Learning Symposium, Vancouver l.land: Oassroom Connections, Coum.nay,
July.
Alexander, R. (2OlXl) C"ltll" ,,,,d Pedagogy; IM/UIlI1/;olllll rompari50n. In primllry eduCJItioM.
OKford; Blackwell.
Alennder, R. (2Ol}I) TOll'ards Dialogic Trilrhillg; RethlnkiMg rlllSSroom tallt. 2nd edn. Cam-
bridge: Dialogos.
Ames, C. {19M} 'Achievement attributions and self-instructions under rompetitive and
individualistic goal structl.lres'.!ollrnlll of Edllaltiollld Psychology, 76: 478--87.
Ames, C. {1m} 'OaSST'l)(lms: goals, structures and student mothalion'. fOllrllal of EdIlCll-
tnlllill Psychology, (3): 261_71.
Ames, C. and Archer, J. (1988) 'Achievement goals in the classroom: students'leaming
and moti\'ation fournalof EduCIltionll1 Psych<Xogy, 80: 260--67.
Anderson,. L W. and Bourke, S.F. (2001) AJfrcIIvt ChIlrlJCttrisllc:s in Sdtoois. 2nd
edn. NJ:
Angoff, W. H. (ed.) {I '171} The Colltgf 80Ilrd Admission;; Tl'Sting ProgrllllJ: A tfChlllcal n'J'Orl
"M nstIIn:h IIlId devtlopmeMtlJClivilin rdllliMg 10 tl,t sdwJastic aphtudt ttst and achitvl'1lltllt
tests. 2nd OOn. New York: College Entrance Examination Boord.
ARG (1999) AssessmeMt for u..rlliMS; Beyund tlot WIi{;It bor. Uni\'erslty of Cambridge:
Assessment Reform Group.
ARG ASSi'S$meMt for u..ming; 10 principles. University ofCambridgt': Assessment
Reform Group.
ARG (2002b) Te:stillg, M"tiooll"n "nd u..rning. University of Cambridge: Assessment
Reform Group.
205
ASF (2004) Working 2, DTIlft j Su",mMiur Assrssmrnl by T/'Ilfhns; [vidence fro'"
ils i"'pIiCl//ions jrJr policy lind prllctiu. Assessment Systems for the Future
Project (see ARC ....ebsite: http://www.assessment-reform-group.org).
Askew, Bro....n. M. L. Rhodes, v., Johnson. D. C. and Wiliam, D. (1997) [!frc/it... Tractr-
nos of N.. ",noIlCY: Finlll rr'pOTl. London: King's College, School of Education.
Askew, S.nd Lodge, C. (2000) 'Gifts, ping-pong and loops -linking feedback and learn-
ing', in S. Aske.... (ed.), FrrtflxldftJr ullrrri/ig. London: RoutledgeFalmer.
Atkinson, R. C. (2001) Tht' 2001 Robv-t H. Atu'tll Oisting..ishn/ Lturl', Paper presented at
83rd Annual Meeting of the American Council on Education.. Washington, DC.
Oakland, CA: Univt'rsity of California.
An, NUT and PAT (2004) Itss<ossmml for /Laming. nil' lutu" "f
nlltionm rnrri....l..m lISSrSS"'l"'/: A WIly fortnml. london: Association of Teachers and
LertulWS/Nalional Union ofTeacheTsI Professional Association of Teachers.
Ayn-.s, E. r. (1918) 'History and present status of educational meil5uren'ents', in S. A.
CO\lrtis (ed.). The MtlIS.."mrnt of Ed"clllionlli p,..,a"CI$. The 5rVr/11"",t" yurbook of Illl'
Nlllumlli Sodrty for tlrl' Siudy of Ed..allion. Bloomington, 1L: rublic School Publishing
Company. pp. 9-15.
Baird, J.R.2 and Northfield, J.R. (1992) ullrning fro'" ,'''' PEEL f.xp<',ierler. Melbourne:
Monash University.
Baker, K. (1993) nre T.. rbulrnt Y",rs: My life in polilia. London: Fabee and Fab<'r.
Ball, D. L and Bass, H. (2000) 'Interweaving content and pedagogy in teaching and
learning to leach: Knowing and using mathematics', in J. Boaler (ed.), Mu/lipll' Pno-
Sllt'rtiws On Tl'IlfIring 1I11d lLIIrning. Westport, CT: Ablex. pp. 83- \(}.t
Ball. S. (1990) Polilia and Policy-Making in Ed..clltion: Exploralions in policy soriology.
London: Routledge.
Ball. S. (1994) EduCIIlion &fornr: A criticlIl and poststr..cl","1 apl'n)Ilch. Buckingh.am: Open
University I'ress.
Ball, S. (2000) 'Performance and fabrications in the education economy: towards lhe per-
formative society'. A..stralio" [dlleolion Res.-orrhn, 27 (2): 1-24.
Bates, R.nd Moller Boller, J. ( 200J) Fmlbad 10 Il }",,. 7 P"pil. Unpublished research,
Stockport Local Education Authorily.
Benmansour, N. (1999) 'Motivational orientations, self-efficaC)', anxiety and slrategy use
in learning high i1Chool mathematics in Morocro'. Mn/itrrrurlelln /o.. rnlll of Educalional
SIl4dies, 4: 1-15.
Bennis. W.G.. Benne. K.D. and Chin. R. (1961) n,e Plunnirlg of C/'O"gr. London: Holt
Rinelutrt and Wins Ion.
Ikrlak, t-l, Newmann. F. M., Adams, E., Archbald, D. A.. Burgess, T., J. and
Romberg, T. A. (1992) Tou'llrds a New SOma of [duelltilmal Testins arid ASSl'SslIImt.
Albany, NY: State Univt'rsity of Ne.... York P!l.'SS.
Bevan. R. (2004) 'From black boxes to glass boxes: the "pplication of compulerisc.>d
concept-mapping in schools'. Paper presented al the Teaching and Learning Research
Project Annual Conference, Cardiff, NoV<'mber.
Biggs, J. (1996)'Enhanring leaching thrQUgh constructive alignment'. Higher [<lI4Clllion,
32,
Biggs. J. (1999) 'What the student does: teaching for enhanced learning'. Highrr dl4C1/lion
arid (>ny-Iop"'l'nt, 18 (I): 57-75.
Biggs, J. and Tang, C. (1997) 'Assessment by portfolio: mnstructing learning and design-
ing leaching'. Papt'r presented at the annual conference of the Higher Education
Research and Development Society of Australasia, Adelaide, July.
Binet, A. and Simon, T. (1911)'La mesure du developpement de i'intelligence chez it'S
206
References
enfants'. BIiUttln dt la Socittt librt PO"' ,"th.dt psychologiqllr dt rmfanl, 70--1,
Bishop, R. and Glynn, T. (1999) Culturt COUII/S: Cllanging poIlv, rt/lltiolls in &lIlClition. New
ualand: Dunmore Press.
Black,. H. (1986) 'Asses.sment for learning", in D. J, Nuttall (ed.), Edurlltwnlll
London: Falmer. PI" 7_18.
Black, P. (1963) Bulletin of the IMtilute of Physics and the Physical Society, 202-3.
Black. P. (1990) 'APU Science - the past and the fUlure'. School Srimct Rroiro.'. n (258):
13-28.
Black, P. (1993) 'Formative and summalive assessmml by teachers', Studits in 5cinlrr
EduCIltil>n, 21: 49-97.
Black, P. (1995) 'Ideology, evidence and the raising of standards'. Annual EduCillion
Lecture, umdon. King's College.
Black, P. (1997) 'Whalever Happened to TGAT?'. in C Cullingford (ed.), AsSl'S5mtn/ I'S.
Era/ua/imi. London: Cassell. PI" 24-50.
Black. P. and Wiliam, D. (199&), 'Assessment and classroom learning',lIssGsmrnt in &lu-
coli/lll, S: 7-71.
Black, P. and Wiliam, D. (I99Sb)I"sidt th.. B/lld: Btu: RIl.silill s/a"dord$ Illrougl, rllISSI't)Om
IISstSsmnlt. London: King's College (see also Phi [klta Kappan, 80: 139-48),
Black, P. and Wiliam, D. (2002) S/a"dards in Public Xllmillatimls. London: King's COUl'gC,
School of Education.
Black, P. and Wiliam, D. (2lXl3) 'In praise of eduealional "-'Search: formalive assessment'.
Brl/ish Educational Rl'Sl'IIrch !OImlal, 29 (5): 623-37.
Black, P. and Wiliam, D. (2005) 'Changing leaching through formalive aS5eSliment:
research and practice: The King'sMlodwayQxfordshire Formative Assessment
Project', in DECO: Formatlllt' Asst'ssmmt: Imprm'ins /tamlng ill rltwrooms,
Paris: DECO.
Black. P., Harrison, C, lee, C, Marshall, B. and Wiliam, D. (2002) /rrsidt Blork
Box: A55tS5mrn/ for ill London: NFER Nelson.
Black. P.. Harrison, c., lee, C, Marshall, B. and Wiliam, D, (2003) ASSt$.5l11tnt for (Lllrll'
"Ig: PlIth"g i/ ill/a prarhct. Buckingham: Open University Press.
Black, P., Harrison, C, J, and DU$(:hl, R. (2004) As.srs,n'ent of 5citllrt
/4-/9. London; Royal Society (www.royalsoc.ac.uk/educalion).
Bolon, C. (2000) 'Schoolbased Slandard tesling'. EdllcotiOlI Policy Allo/ysis Arrlti,'CS, 8 (23)
at htlp'J/epaa,asu.edufepaa/v8n23/
BoWl', R.. Ball, S, with Gold, A. (1992) Rtfor"'III8 Ed"cotiorl mid Chouging Schools, Ca'M!
studi.., in poliey sociology. London: Routledge.
Bransford, J. D., Brown, A. L. and Cockin!y R. R. {2000} Ho", Proplt Ltilrn: Brain. mind.
txpt'ritnct ond Sfllool. Washington, DC Nalional Academies Press.
Bredo, E. (1994) 'Rl>ffiMtructing educalional psychology'. Eduelltimlol Psychologist, 29 (I):
23-45.
llredo, E. (1997) 'n.., social coMtruction of I<'aming', in G, D. l"h)... (ed.), Hondbook of tic,,
dl'lnic l.l'amitrg: COlls/rllrtlorl ofklll1ll'ltdgt. San Diego, CA: Acad<'mic Press,
Brigham, C. C (ed.) (1926) 5chalasrit Apritudt Trst>: A ",alluol for tilt U'M! of srhools. New
York, NY: College Entrance Examination Board.
Broadfoot, P. (1986) Profi/rs oud Rteo,ds of Ad",""""'t,,l: II rrviru' of i....u"" and prMtiu.
London, Reinehart and Wilson.
Broadfoot, P. (1996) Edurmi."" ASSl'S5",tllt a,ul Socirt)l. Buckingham: Open Uni,....rsily
r=.
Broadfoot, [', (2000) 'Empowerment or perfomativity? Assessment policy in the late
twentieth CI.'Iltury', in R. Phillips and J. Furlong (ed.). &luca/wn, Rifornl ond /ht Sta/t:
207
Assessment and leaming
Twmty-ftw of polities, policy lI1ld prlldiet. London: RoutledgeFalmef. pp. 136-55.
Broadfoot P., Osborn,. M.. Gilly, M. and Bucher, A. (1993) PrrcqU1JtlS ofTtlUhillg: Primary
Khooi telldws ill EIlgilUld lI1ld Frllllet. London: Cassell.
Brookhart,. S. and DeVoge, J. (1999) a theory about 1M role of classroom assess-
ment in student motivation and achievement'. Applitd Mt'dS,,"mt'Ilt ill Ed"crotioll, 12:
409-25.
Broome, E. C. (1903) All HistOTiC<lllllld Criticlll DiWll5Siorr cif CoJlrgt Admi5$Km Rell"i......
mellls. New York. NY: Macmlllan.
Brous5t'au, G. (1984) 'lhe crucial role of the didactical contract in 1M analysis and con-
struction of situalions in teaching and learning maIMmatiC!i', in H. G. Steiner {ed.),
Throry uf Mlllhematics EduC'/Ilioll: ICME 5 tl!pie 1Irt11l1l1d miltiamfrrrnu. Germany: Bie1e-
feld. lnsfitut fUr Didaktik der Mathematik def Universifat Bielefeld. pp. 110-19.
Bruner, J. (1996) Cul/u", of EduC'/ltiorr. Cambridge, MA: Harvard UniveTSity
Bryce, T (1999) 'Could do belter? Assessment in Soottish schools', in T. Bryce and w.
Humes {eds), Scottish Edualtioll. Edinburgh: Edinburgh University. pp. 709-20.
Bryce, T. and Humes. W. (1999) &ollish Eduelllioll. Edinburgh: Edinburgh University
Press.
Butler, D. L. and Winne, P. H. (1995) 'Fel.'dhaek and self-regulated learning: a theoretical
synthesis'. Rn.'irw cif Eduelllional Rtsnlrch, 65 (3): 24HI.
Butler, R. (1988) 'Enhancing and undermining intrinsic motivation: the effects of task-
involving and ego-involving evaluation on interest and performance'. British !oumlll
of EduaitwlllIl Psydw/ogy, 58: 1-14.
Butler, R. (1992) 'What young peQple want to know when: effects of mastery and ability
goals on inlerests in ditferent kinds of social oompariSOfl'. lOu",al of Ptrs<mality (lnd
Soci41 Psydwlogy, 62: 934----43.
BUller, R. and Neuman, O. (1995) 'Effect5 of task and ego-achievement goals on help-
seo.king behaviours and altitudes. ]aurnal of (dualtional P5YChoiogy, 87 (2): 261-71.
Callaghan. D. (1995) 'lhe believers: politics, personalities in the making of the 1988
Education Act'. History of Eduution, 24 (4): 369--85.
Camefflll. J. and Pierct', D. P. (1994) 'Reinforcement, reward. and intrin5ic motivation: a
meta-analysis'. Rroit'w of EduCllliollaJ RtvIlrch, 64 {3): 363-423.
Carless, D. (2005) 'Pl"ll5pects for the implementation of assessment for learning'. Assess-
mtlll in (duratioll, 12 (1): 39--54.
Carter, C. R. (1997) 'Assessment: shifting 1M responsibility'. The Journal of Smmdury
Gifted Edu,""lion, 9 (2): 68-75.
Callell, J. M. (1890) 'Mental tests and measurement'. Milld, 15: 373-81.
Cattell,', M. and Farrand. L (1896) 'Physical and menial measurements of the students
at Columbia University'. Psyclwlllgif"Q1 &>vine, 3: 61&-48.
CCEA (2003) PalhWilYS - Propo5IIJs frr CllrriCl/lum IIlld Assessmml III Key J. Belfast:
Council for the Curriculum, Examinations and Assessment.
CCEA (2004) 1M RroiMd Norlherll Irellllld Pril1lllry CUnlCl/lum: Slllges lanJ 2. Belfast:
Council for Curriculum, Examinations and Assessment.
Chadwick, E. {18M) 'Statistics of educational results'. The Museum, 3: 479-84.
ChaikJin, S. (2005) 'lhe zone of proximal development in Vygotsky's analysis of leaming
and instruction', http://www,education.miami.edu/blanlonw/mainsite/oomponents-
fromdmer/ComponentS/ChaikiinTheZoneOfProximal()@wlopmenUnVygotsky.htmi.
Chickering, A. W. (1983) 'Grades: one more tilt at the windmill'. Anrma", Associlltioll for
High" Edualtioll Bulldill, 35 IS): 10-13.
Choppin, B. and Orr, 1.. {1976) Aptitude Ttslillg III 18+. Windsor: NFER Publishing.
Clarke, S. (1998) Targe/illg Asstssmtl'll ill the Primary School. London: Hodder and
208
Referentes
Stoughton.
Clarke, S. (WOl) Unlocking F,mlla/ill/' Asst'Ssmwl, London: Hodder and Stoughton.
Clarke, S. (2005) FormaliV<' AS5('S,mrnt ;n Ille S,w"dary CI""room. London: Hodder
Murray,
ColleSC Board (2004). Coll"S,'-b.J""d S",,;ors: II profilt oj SAT"mgram It'Sl-taktr5. New York,
NY: College Entrance baminalions Board.
Conanl. ). B. (19-W) 'Education {or a classless society: jeffel"$Onian lradition'. TI,e
AtlaHtir:, 165 (5): 5'i:HlO2.
Conner, C. and lames, M, (1996) 'The meddling role of LEAs in lhe interpretation of go,,
emment assessment policy at school level in England', Currintlum !OImlat 7 (2):
1SJ-66.
Corbeu, H. O. and Wilson, B. L, (1991) &fo.." 11",1 Rebel/ion. Hillsdale, NY:
Cowie, B. (20:14) St,"lrul rollllllrlllary a" Joron"lil'<' a5."-'SSll1ml. Papt:'r prel;<'nted al the
annual of the Nalional Association for Rt"Searrn in Science Teaching. Van-
couver, March.
Cowie, B. and \:leU, B. (1999) 'A model of formative a.\.SeS..ment in s<-;ence educalion',
in Eduealio", 6 (1):
Crooks, T. J. (1'J88) 'The impact of classroom evaluation practiC\.'S on sludt"fllS'. Rn'i,..,,, oj
Ed"catio"al Rrsenrcl" 51!: 43&-81.
Crooks, T. j. (2001) 'The validity 01 formative assessments'. Papt:'r presented at the
Annual Conference of the British Educalional Research Association, Leeds.
Crooks. T. J., Kane M.T. and Cohen. A, S. (19%) 'Threals to the valid use of assessments'.
A:<stS5",ml ill Edueali"", 3 (3): 265---S5.
Crossley, M. (198-1) for curriculum change and the question of inlernational
transfer'. Journal aJC",rie"/,,,," Sludi"" 16: 75-88.
Cumming, j. and Maxwell, G. S, (2004) 'Asso$sment in Australian schools: current prac
tk... and trends'. A5S<'Ssmtrll i,r Edr"'lIli"", II (I): 89-108.
Dale, R. (1994) 'Applied education politics or political sociology of education: contrast-
ing approaches to the sludy of ree""t edu<:ation reform in England and Walcs', in D.
Halpin and B. Troyna (t"<:ls), R,'S<'urellirrg Edumtioll Poliey: E1lrirlll mrd mttlrooolosirill
issrlcs. London, Falmer,
Daugherty, R. (1995) NII/ionlll Currieul"", Assessment: A rf'l,iell' oj !",Iiey
London: Falm"r,
Oaugh"r1y, R. (2000) 'National Curriculum assessmenl policies in Wales: administrative
d...\'olution or indigenous pol icy development?'. nrc Wl'Islr /ourtlill oj Eoll/riltimr. 9 (2):
4--17,
OilUgherty, R. (2004) Leorlling Polllu"'ys SIoll/tory AsSt'Ssmml: FiJlol report of tI..,
Doug/rerly AS,"-'Ssmenl R",,;cw Cro"I" C""diff: Welsh A. Government.
Oaugherty, R. and Elfed-eh,...,ns, r. (2003) 'A national curriculum for Wales: a case stud)'
of policy-m.,king in the era of ..dminist",tive de,olution'. B,itish ]ourtlal oj Edt/eotiO/ra!
Studies, 51 (3): 233-53.
Davidson, J. (2004) Statemml I" pltwrry stSsimr oj Ii,,' Natiollal A>Stm/lly for Wales, july.
Cardiff: Welsh Assemblr Government,
Davies, J. and Ilrember, L (1998) 'National curriculum testing and in Year 2.
The first five years: a cross-!iectionalstudy'. Ed"raliolla/ P'yrllOl"8Y' 18: 365-75.
Oa.. i"'5,]. and Brember, l. (1999) 'Reading and m,lIhemat;cs attainments and self-esteem
in Years 2 and 6: an eight rear cross-sectional study'. Ed'leatWnol Sludies, 25: 145---57.
Deakin Crick, R., BrO.Idfoot, P and Claxton, G. (2002) o......l"ping ELL!: tlrt Effrrti,'t: Lift-
lon8 lLnr"i"8 l"vrHtory in PradiCt. Bristol: University of Bristol Graduate School of
Edu<:ation.
209
Oed E. L. and Ryan. R. M. (1994) 'Promoting self-determined education', ScalldiullVialJ
1011"'01 of EdllClllumol h:sNrch. 38 (I): 3-14.
!Rei, E. Koestner, R. and Ryan R. M. (1999) 'A meta-analysis revie..... of experiments
examining the effects of extrinsic rewards on intrinsic motivation'. Psyrlwlogiul
Slilletin. 125: 627-88.
DES/WO (1988a) TIlSk Grllllp 011 !\ssNsmrnt alld Trsli"g: A TI'pO"I. london: Department of
Education and Science and the Welsh Office.
DES/WO (l988b) Task Grollp 011 AS5nsmrlll alld Trslillg; Thr suppkmm/ary TI'pO"ls.
London: Department of Education and Science and the Welsh Office.
DfEE (1997) Eduraliou for Exrrl/rner. london: Department for Eduration and Employ-
ment.
Dorart5. N. J. (1999) Corrrspolldmrr &tu.'11 ACT alld SAT I Scorrs. Princeton. NJ: Educa-
tiorull Testing Service.
Dorans. N. j., Lyu. C. F., Pommerich, F. and Houston. W. M. (1997) 'Concordance
between ACT Assessment and re--centered SAT I sum scores'. Collrgr m,d Univvsily.
73 (2): 24-34.
Dorr-Bremrne. D. W. and Herman. J. L. (1986) Asstossiug Sludml Arlrit'l'{'",rrrl: A profilt of
dassroom prarliers. Los Angeles. CA: Uniwrsity of California. Cenler for the SIudy of
Evaluation.
DorrBremme, D. W. Herman. j. l. and Doherty, V. W. (1983) Adtin",mml Trs/illg ill
Amniton P"blir Sc/,ooIs: A Ilolio"al pns/otCtiur. Los Angeles. CA: of Califor-
nia. Center for the Study of Evaluation.
Duckworth, K" Fielding, G. and Shaughnes.sy. j. (1986) Tht Rila/iolls/tip of Higlt School
Ttl/d,m' Class Trslillg Pradim 10 S'"dt",s' Ftf'lin1P of EffiOJty and Effor/s /0 SI,uly.
Eugene. OR: University of Oregon.
Dudley, P. (2004) 'lessons for I..aming: l'KO'arch lesson study, innovation, transf"r and
metapedagog}"' a design experiment?' Paper at the ESRC TI.RP Annual
Conference. Cardiff. November. Available at http://ww..... Ilrp.orgfdspace/r''trieve/
289IDudleyt-fullt-papert-Nov04t-for-+ronferencev5271004.doc.
D.....eck. CS. (1986) Motivational proces:;es affecting learning', IIm<7icali Psyd'ol"l(isl, 41:
IlJ4O-4B.
D eck. C S. (1992) 'The study of goals in psychology'. PsycltologiCIII Scitner. 3: 165-7.
D eck. C. S. (1999) StIfThroriN: Thrir roIt in mo'if",'i"n, JI"SOlialily and dn""lopmrlll.
Philadelphia: Psychology Press.
Dweck, C. Sand u-ggett. E. l. (1988) 'A socio-rognitive approach to motivation and per
5OI"Iality'. PsycllOllI8iral Rroinv. 95: 256-73.
E<!rl. l . Fullan. M., leith.....ood. K. and Watson. N. (2000) Walching and uor"ing: El'llltja-
lim, "f tilt implrntt"lation "o/ionallittrocy and nl/mnlll)' slraltgiN: First m,n",,1 "l"'rl.
London: Department for Education and Employment,
EcdestOf'li!, K. (2002) A'do,,,,my in Post-16 Ed"calion. london: RoutledgeFalmer.
Ecelestone. K. S.....ann, j .. Greenwood. M. Vobar. J. and Eldred. j. (2004) Impl'l1lring For-
maliv.- ill VOOIlional Edl/ralio" and &/sir Skills. Research project in progress
_see www.exeter.ac.uk.
Ed.....ards, A. (2005) 'let's get beyond community and practice: the many meanings of
learning by participating'. TIrt Currirl/II/m Journal, 16 (1): 49-65.
Elliott. E. S. and (h,.'eck, C. S. (1988) 'Goals: an approach to motivation and achievement'.
/ol/rnalo/ Pns;malily and Sodal Psychology. 504: 5-12.
Engeslrorn. Y. (1987) uorning by Expanding: An ru:/roity-lluortlical "pproach to drorlol'mtn-
101 Helsinki. Finland: Orienta-Konsultit Oy.
Engeslriim, y. (1993) 'Developmental studies of WQrk as a testbench of activity theory:
210
References
til<!' case of primary car(' in mOOical education', in S. Olaildin ;md J. l.Ivt' (ed$), Undtt
sttmding PrlICticr: Pusptiln on Ildiuity and a1nltrl. Cambridge, UK: Cambridge
Univt'rsity Press. pp. 6+-103.
Engestriim, Y. (1999) 'Activity theory and indi\'idual and social transformation', in Y.
EngeJtrlim. R. Micttinen and R-L. PunamJiki (eds). P{'TSptrtiv0t5 "" Activity Theory.
C.:unbridge: Cambridge Univt'rsity Pre$s.
Entwistle, N. (2005) 'Learning outcomes and .....aY" of thinking aCl"QSl!l contrasting disci-
plines and settings in higher education'. 1M CllrrinlllIlIl 101lnud. 16 (1): 67--82.
Evan..g, E. ;md Engelbl.rg. R. (1988) 'Stud""ts' perceptions of sdlool grading'. 10llrnal of
R=ardt and CNvtlopmmt in Edurolion, 21: 44-54.
Fernandez. C (2002) 'Learning from Japa.nese approilches to profes:;;ional dewlopment:
the case of lesson JOllntl1l ofTtilWr Edllrlltitm, 53 (5): 393--405.
Fielding, M. (2001) 'Students as radical agents of change'. 10llrnal of Educational
2: 123-41.
Fielding, M., Bragg. S., Craig, J., Curmingham, 1., Eraul, M., GilJinson, S., Homl;', M.,
Robinson, C. ;md Thorp, J. (2005) fmors Inflllenring the Transfrr of Good PrPClicr.
London: DfES.
Filer, A. lIIld Pollard, A. (2000) 1M Social World of Pupil AS;;('S5mrnl: and rontals
of primary Mhooling. London: Continuum.
Finlay, l. (2004) 'Evolution or devolution? Distinctive education policies in Scotland'.
Paper presented at the Annual Conferroce of the British Educational Research ASS<)-
dation,. Manchester.
Flyvbjerg.. B. (2001) Making Social Sdrnct Malh.,: WIry social inquiry fails lmd /low it ran
again. Cambridge: Cambridge University P""".
Foos, P. W., Mor", J. J. and Tkacz. S. (1':J94) 'Student study tedmiques and the generation
",ffecl'. Journal of fAllcationlll 86: 567-76.
Fcm;lE'r, M. ;md Masters, G. (1996-2001) Rrscllrcr Kil (romplete Camber
well: Auslralian Council for Educational Research.
Fredericksen. J. R. and Collins, A. (1989) 'A systems appr(),lch to educationalle!lting'.
fAlIC4tional 18 (9): 27-32.
Frederiksen, J. R. and White, B. Y. (2004) 'Designing assessment for instruction and
accountability: an application of validity theory to as5t'Ssing scientific inquiry', in M.
Wilson (ed.), Towards CoI'm!rra bttt<.....n Classroom AS;;('S5nlmt and ,l,c"ounlability: IOJrd
Yrarbook of the National Society fur Stlldy of fAuralion ParI II. Chicago: National
Society for the Study of Education. pp. 74-t04.
Fullan, M., Bertani, A. and Quinn. J. (2004) 'New lessorl.'i for districtwide refonn'. Eduea
lionalLLadrrship, April: 42--6.
Galton, F. (1869) Hrrrdilary Gnrills: An inquiry inlo its laws and ronStqlltnctS. London. UK:
Macmillan.
Gardner, H. (1989) 'Zero-based arts education: an introduction to Arts PROI'El'. Studies
i" Art Edurat;"": A }outrr,,1 of Issutf ,,"d 30 (2): 71-83.
Gardner, H. (1992) 'Assessment in cont,-'Xt: the a!lemat;'.... to staodardised testing', in B.
R. Gifford and M. C. O'Connor Changing AUl':Ssmmts: Alttrnatiw vitu.'$ of apti
IUtU, and Boston,. MA: Kluwer AcademiC Publishers. pp.
77-117.
Gardner, J. and Cowan, P. (2000) Testing lite Test: A study of tltr reli"I>ility and validity of I,"
Norllocrn Irdand mmsfrr proudllrr trsl in rnabling tilt of pupilsfor gntmlllar school
plactS. Belfast: Queen's Univt>rSity of Belfast. (See also Assnsm.... t in fAlIClltion, 14 (2):
145-65)
Gauld, C. F. (1980) 'Subjtct oriented lest construction'. Raurch in EduClllion, 10:
211
77--82.
Cibs<m, J, J. (1979) [>logical Approach to V;;<lUIl l'm:eptirm. London, UK: I loughton
Mifflin.
Cipps, C. (1994) Iky(md Ttsa"g: Towards a Ihrory ofrducalio"a/ as"""sme"l. London: Falffil'r,
Gipps, C and Murphy, 1'. (1994) A rair ",,,,,I? Assessment, achievement and equity, Hucking-
ham: Open University Press.
Gipps, C, McCailum, ll. and Hargreaves, (2001) Whal Makes a Good Primary Sehool
Expert c1as.<mo", strategies. l.ondon: l:almN.
Glassman, M, (2001) 'Dewey and Vygotsky: sociely, experience and inquiry in educa-
tiona! practice'. [dllcaliolJal Res<arche,. 30 (4): 3-14.
Clover, 1'. and 111Omas, R. (1999) 'Coming to grips with rontinuous aSSl."SSmcnt'. ASStss
"'ellt in Educalum, 4 (3): 365----80.
C,oddMd, 1-1.1-1. (1916) l'ublicalwn IJ/lhe Vinela"d Tra,,,i"li ScIv.,1 (no, 11). Vindand, NJ.
Vindand Training smool.
Goldstein, 1r. (1996) 'Group differen""" and bias in aS5<'5;5ment', in I r. Goldstein and 'I:
Lewis (eds), and statistical iso;ue,;. Chichesler: John
WUf.'y, pp. 8-5-93.
C,,-,rdon, S. and ReeS<', M. (1997) 'High stakes !l"sting: worth the price?'. Jounwl if Sehool
IAldership, 7: 34s-68,
Graduate Record Examinations Board (2004) Cllide 10 lhe U"" if Sco"", 2004--200:5. Prince-
ton, NJ: Educatiunal'l"sting Service.
Grocno, J. G. and 'lhe Middl".School Mathematics 'lhrOlJgh Applications Project Group
(1998) 'n,e situativity of knowing. learnlng and r<'S<'arch'. I'syc/wlJ,'gist, 53
(l): 5-26.
Greeno, J. G., Pearson, P. D. and Schoenfeld, A. I L (1996) Implications for NA!:!'''f Res<arch
01' uanling "lid C"gllilion; R'7"'rt Of" sludy by the Natio..al Academy of Ed,,-
ClItion. I'and on the NAEl' Trial State Ass<-'SSment, conducted by the Institute for
R<'S<'arch on Learning. Stanford, CA: National Amdemy of Education.
Grossman, F. L. and Stndolsky, S. S. (1994) 'Considerations of content and the circum-
stan""" of secondary school teaching'. RwelO"f Res<ttrch ill [ducal/em, 4: 179-221-
GrC(E) (2004) IlIlemal alld LIlemaJ A"o;emmt: Whal lhe right balallce f'" the future>
Advict to Ihe Secrrlary of Sialc jiJr I:ducati"n "lid mile'S, Lolldon: General Teaching
Council for Engiand,
Hacker, D, J. Dunlosky, J. and GraCS5Cr, A. C. (1998) Mclarogllilwn i" EdUOllimwl '[7,"""1
alld I'ractice. Mahwah, N]: Lawrence Erlbaum Associates,
1lall, C and I larding. A (2()()2) 'Leve] descriptions and teacher assessment in England:
towards a community of assessment practice'. Educa/iolllli 44: 1-15.
1lalL K., Collins, J., Benj,'min, 5, Nind, M. and Shcchy, K (2004) 'SATurated models of
pupildom: assessmenl and inclusion/exclusion', British EduCll/ional Research lnunlal.
30 (6); 801-18.
Hallam, S., Kirton, A, Peffers, J., Hobcrtson, 1'. and Stobart, G. (2004) Epalualw" of ['mlocl
10f Ihe A"St'Ssrnellt is for Development Programme: SUp".Jrt f"r profeSSiollai prllC-
lice in formalilX ",,,,,,,smenl. Edinburgh. UK: Srottish Ex<'O.ltivc.
Hargreaves, A (1995) Curru:ulllm and R1or",. Buckingham: Open University
Press.
HargrNves, D. (1999) 'The knowledge crealing school'. 101m",! of EdllCal/mml
Sludi<!S, 47: 122-44.
Hargreaves, D. (2005) Aooul 1<'7""'1 of Ihe Learning worki"g group. London:
Demos.
Harlen, W. (19911) 'Classroom assessment: a dimension of purposes and procedures'.
212
Referern:es
Paper presented at the annual conference of the New Zealand Association for
Research in Education, Dunedin, December.
Harlen, W, (2000) u"",ing Ilnd Assasing ximcr 5-J2. 3rd edn. London:
Chapman Publishing.
Harlen,. W. (2rot) 'A systematic review of the reliability and validity of assessment by
teachers used for summative purposes', in ElIidrnct in [dur,,/ion Library Issue
I. wndon: Ei'PI-Centre, Social Sciences Research Unit, InstihJte of Education.
Harl"n, w. (2005) 'Teachers' summati,e practices and assessment for learning: tensions
and synergies. Curriculum !ournal, 16 (2): 207-23,
Harlen, W. and Deakin Crick, R. (2002) 'A systematic review of the impact of summatiw
as.s<-'SSment and tests on students' motivation for learning (EPPI-Centre Review)". In
Rrm:rcl, E"idrner in Edurlltim, Library lssur 1. wndon: EPPI-Centre, Social Science
Research Unit. lru;litute of Education, Available OIl the website at: httpj/eppi.iOi:'.
ac.uk/EPP1Weblhome.aspx?pag"",,/reel/review..grOllpsJassessmentJreview_one.htm.
Harlen, W. and Deakin Crick, R. (2003) 'Testing and motivation for learning'. Assrssmrnt
in Education, 10 (2): 169-208.
Harlen, W. and James, M, (1997) 'Assessment and I"arning: diff"rences and relationships
between formative and sllmmative assessment'. Assrssmrnt in Eduration, 4 (3): 365-S0,
l13rlen, w., Da......in, A. and Murphy, M, (1977) /.,tadrr's G"idt: Malch and Mismatch
Rnising Q"ts/ions and Mlltrll Ilnd Mismlllcl, Fi"di"g Ansrvrl'$. Edinburgh: Oliver and
Boyd.
Harlen, W. et aL (1992) 'Assessment and the improvement of ooucation'. Thr Cllrrirllium
/ournal, 3 (3): 21,..30.
Harris, S" Wallace, G. and Rudduck, I. (1995) 'Irs not that I haventleamt much. It's just
that [dont really understand whal I'm doing: mebrognition and se<:ondary school
shJdmls'. R=arrh Pal"'rs in Edura/ion, 10 (2): 253-71.
Hayward, L. Priestley, M. and Young. M. (2004) 'Ruffling the calm of the ocean floor:
merging practice, policy and research in assessment in Scotland'. Oxford of
Education, 30 (3): 397--415.
Hmderson, V.L. and Dweck, C. S. (1990) 'Motivation and achievement', in S. S Feldman
and G. R. Elliott (eels), AI tI,e Thrrsho/d: Thr dew/oping ado/rsanl. Cambridge, MA:
Harvacd University Press. PI'. 308-29.
Herman, I. Land Dorr-Bremme, D. W. (1983) 'U'5"'I of testing in the schools: a national
profile'. Nrw Directilms for Testing lind Mrasumtltnl. 19: 7-17.
Hidi, S. (2000) 'An interest re-s.earcher's perspective: the effects of extrinsic and intrinsic
factors On motivation', in C. Sansone and J. M, Harackiewkz (oos), 1"lrinsic lind
Extrinsic MOlil'IItion: 1M- for optimal molir'lltion and prrftmnanrt. New York: Aca-
demic Press.
Hidi, S. and Harackiewia:, J. M. (2000) 'Motivating the academically unmotivatoo: a cril-
ical issue for the 21st century. Rrr,irw of Educ,,/ionlll RrsnIrch, 70 (2): 151-79.
HMI (1999) HM /nspotrlorso!Srlwo/s Rroitw"I AsstSSmenl in Pte&J,<lCI and 5-14. Available
online at: http://www.scolland.gov.ukJ3-14assessmentfrapm-OO.htm.
Hodgen. J. and Marshall, B. (2005) 'Assessment for learning in mathematics and English:
a comparison'. The Curriculum Journal, 16 (2): 153-76.
Holland, D., Lachicotte Ir. W., Skinner, D, and Cain, C. (1998jldmtily Q"d Agomcy in Cui
tllral Worlds. Cambridge, MA: Harvard University Press.
Hubin, D. R. (1988) The Sriro/astir Aphludr Test: Its ,Irorlopment and introduction,
I900-J94/J. Unpublished Unhersily of Oregon PhD thesis. Retrieved from
http://darkwing.llOTIgon.edu/-hubinon13.11.0t
Hufton, N. and Elliott, J. (2001) 'Achievement motivation: cross-culhJral puzzlL'S and
213
paradoxes'. Paper at the British Educati{)J\al Re5eard! AS5IXiation Confer-
enn', Leeds.
Humes, W. (1997) 'Analysing the policy process'. Scottish EduclltNmll1 Rroil"W, 29 (1):
Humes, W. (1999) 'Policy-making in Scottish education', in T. Bryce and W. Humes (OOs),
SaJ/lish Educlltion. Edinburgh: Edinburgh University Press. pp. 74-85.
HutchirtSOrl, C. and Hayward, L. (2005) 'TN! journey so far: assessment for learning in
Scotland'. 17lt CurriOilum 'Clurnlll, (forthroming).
Intercultural Development Re5eard! Association (1999) umgitudi"lIl Attritioll Ratts in
TtXIIS Public High School, 1!J8S-J986 to 1998-1999. San Antonio, TX: Intercultural
Development Research Association.
Isaac, j.. Sansone, C. and Smith, J. L (1999) Other people as a sauro'S of interest in an
activity'. 'oumlll of Exptrimrntlll SocUlI Psyltology, 35: 239-65.
James, M. (2004) 'Assessment of learning.. assessment for learning and pt'rsonalised
karning: synergies and tensions'. Paper presentl'd at Goldman Sachs US/UK Confer-
ence on Urban Education. London, December.
James, M. and Brown.. S. (2005) 'Grasping the Tl.RP nettle: preliminary analysis and
some enduring issues surrounding the improvement of learning outroml'S'. Tht Cur'
riculum Joumal, 16 (I): 7--30.
James, M., redder, D. and SwaWeld, S. with Conner, c., Frost, D. and MacBeath, J. (2003)
'A servant of two designing te$('arrn to advann' Icnowll'<lge and practice'.
rapt'r presentl'd at the annual meeting of the American Educational Research Ass0-
ciation. Chicago, in the symposium, 7111ki"8. TWrkin8 lind Iror"i"8 ",lth trodlaS lind
uhool Iraders; tilt Cambridge Symp<l$ium'. http://www.leamtolearn.ac.ul<:/home/
009_pubIit_paperslronf-papt'rs/OOO/l21-aera master2003.doc).
James, Pollard, A., Rees, G. and Taylor, C. (2005) 'Researching learning outromes:
building confidence in our conclusions'. The Curriculum JOllnlal. 16 (I): Il"J'}..U.
Jessup. G. (1991) Olltcomts: NVQs Amltltt emerging modd of tducation. London: Falmer.
Johnston, c. (1996) tilt Will to Wrn. Thousand Oaks, CA: Corwin Press.
Johnston, J. and McClune, W. (2000) &Ition Projt $<-1 S. I: PlIpil ",,,til'lllio,, and atlitudts
- locus of co"trol, IronJing dispositio" arid lilt imllAd pf orr ttaehing arid
leami"g. Belfast: Department of Education for Northern Ireland.
Jones.. L. V. and Oll<:in, l. (2004) 17,e Nation's Riport Cllrd: Evolutilm and PflSpt'(III'l'S. Bloom-
ington, 11..: Phi Delta Kappan Educational Foundation.
Kalll.', M. T., Crooks, T. and Cohen. A. (1999) 'Validating measures of performann". Edll-
catianal MNl$urrment: lS-lut'S a",( Prar/iet, 18 (2): 5-17.
R. A. and "Thompson, D. E (1990) 'Worl<: motivation: theory and practice'. Amn-
lelln Psychologist, 45: 144-53.
Kenagha", T., Madaus, G. and Raczel<:- A. (1996) Tht Clf E.xtrmal uami"lIti'lMs III
Improvr Stude"t MotiVAtion. Washington, OC: AERA.
King. A. (1992) 'Fadlitaling elaborative learning through guidl'd studentgenerated
questior'ling'. Ed"ralional Psychologist, 27: 111-26.
Kluger, A. N. and DeNisi, A. (1996) 'TIle effects of feedback interventions on perform-
once: a historical review, a meta-analysis, and a preliminary fel"dbacl<: intervention
theory'. PsydwlogirolBullelin, 119: 252-84.
Kohn, A. (1993) PuniWd by Rrwards: trouble with gold staTS, illetntil ... plQIlS. A's,
and brilln. Boston: Houghton Mifflin.
Korelz. D. M. (1998) 'Large-scale portfolio assessments in the US: evidence pertaining to
the quality of measurement'. A$sessmtllt in durotio", Principles, Policy lind Prar/iet, 5
(3): 309--34.
Koretz, D. M., Stemer. B. M., Klein,. S. r., McCaffrey, D. and Deibert, E. (1994) Ca" Port-
214
References
{olios Asstss Siudeni IIIld Influence IllSlruerionr 1M J991-92 Vermont e%pUi-
mce. SaTlta Monic-. CA: RAND Corporation.
I<n'!isberg.. S. (1992) Thrnsfilrming Pl1Wfl'; Dominlliion. Empou'fflTlf1l1 IImJ UJualtion. New
York :StiIte Univt'mty of New York
!<rug. E. A. (1969) 1M Shaping of the Ammun High Sdwol: 188Q--1920. Madison. WI: Uni-
of Wisconsin Press.
Kutniclr:. Sebba.J., Blatchford. P. and Galton. M. (2005) The EffrclsofPupil Grouping: An
ulmdd IittrlllulY m>kwfilr DfF.S (submitted).
!.avt', J. and Wenger, E. (1991) Silulltd Ll'arning: Ll'gitimllle pmphmd pIIrliciplltilm. Cam-
bridge: Cambridge Univt'rsitr Press.
Lawlor, S. (ed) (1993) TIll' Dtllrillg Drilate; AsMssmmlllnd the IlIIIioJ'UIJ cuniallum. London:
Centre for Policy Studies.
Leadbetter, C. (2(l(I.l) !.nrning. London: DfESlDEMOS.
Lee, C (2lXXl) 'Studying changes in 1Mpractia' of two teachers'. Paper presented at sym-
posium entitled 'Getting Inside the Bladr::: 80:< : Fonnative A505l'SSment in Practict", 1M
British Educational Researdl Association 26th Annual Conferena.', Cardiff University.
London: King's Collcgt' School of EduC<ltion.
Lemann, N. (1999) 1M Big Tnt: 1M Srd histo1y of the AmmCl/n mmtocT/ICY. New York.
NY: FalTar, Straus and Giroux.
Leoflard, M. and Davey, C. (20(1) Tiwughls on the 11 Plus, Bo.lfasl: Save the C1l.ildrro
Fund.
Levine, D. O. (1986) Thr AmlTiCl/n Collegr lind Ihr Culture of /\spi",,!;"n 1915-1940. Ithaca.
NY: Comell University Press.
Lindquist, E. F. (ed.) (1951) Educational 1st edn. Washington. oc: American
Council on Education.
Linn, R. L (1989) 'Current persp.ctivcs and future directions'. in R. L Linn (ed.), EduCl/-
tional MTllSulYmrnt. Jrd edn. London: Coilier Macmillan. pp. 1-10.
Linn, R. L. (2000) 'Assessment and accountability'. Rrsrllrchl'T, 29 (2): 4-16.
MacBealh, J. and Mortimon>, P. {eds) (2001) Improving School Ejfrdn't'nT$$. Buckingham:
Open University Press.
Madaus, G. F. and Kellaghan, T. (1992) 'Curriculum evaluation and OI5S<'SSment', in P. W.
Jackson led.), Handbook of on Curriculum. New York. NY: Macmillan. pp.
119--54.
Marshall, B. and Hodgen, J. (2005) Fonnative Assessment in English. Private communi-
cation _ in preparation for publication.
Marsland. D. and Seaton, N. (1993) EnrpilY Strim 8k: Crtlltivr 51llrotrswn of tilT Natilmal
Curriculum. York.: Campaign for Real EduC<ltion.
Masters, G. and Forster, M. (1996) Progrrss Map$. Victoria, Australia: Australian Council
for Educational Researdl.
Mathews, ,. (2004) Pllrtfolio ASSofSsmrlll. Retrieved on 30.3.05 from http://,,ww.f''duca-
tionnexl.orgl2OO43172.html.
Maxwell, G. S. (2004) 'Progressive assessment for leaming and certificaotion: some
lessons from school-based llSS<.>ssment in Queensland'. Paper p...-sented at W third
Conferroao of the A5I;ociation of Commonwealth Examination and Asses5ment
Boards, March, Nadi. Fiji.
McDonald, A. S., Newton, P. E., Whelton, C. and Benefield, P. (2001) Aptitudr TntinKfilr
U"iwrslry Entrllna: A literlllure rt"lIirw. Slough: National Foundation for Educational
Re5l'arch in England and Wales.
Mcinerney, D. M., Roche, L Mclnemey, V. and Marsh, H. (1997) 'Cultural perspec-
tives on school motivation: the n'levan<:e and application of goal theort. AmericQ"
215
EduC'Qlitmal Journal, 34: 207-36.
McKeown, P. (2004) 'Exploring education policy in Northern Paper presented
at the iUUlual conference of tM British Educational Research Association. Manchester
UK, September_
Mc..'\otahon, A., Thomas, S., Greenwood, A., Stoll L., Bolam, R., Hawkey, 1<., Wallace. M.
and Ingram, M. (2004) 'Effecti ...e professionalleaming communities'. Paper presented
at the ICSEI conference, Rotterdam.
Mcier, C. (WOO) "The influence of educational opportuniliell on assessment rellults in a
multicultural SoJuth Africa'. Paper presented at 26th IAEA conferenre, Jerusalem.
Mer<'e1', N. (2IXXl) Words and Minds. London: Roulledge.
Mercer, N., Dawes, L, Wegerif, R. and Sims, C (2lXW) 'Reasoning as a ways of
helping children to use language to learn science'. British Educillional Rtsi'ardr /(lU"'II/,
3(J (3): 359-77.
Messiclcf S. (1980) 'Test "'alidity and the ethics of assessment'. Ammcan 35
(1I): 10\2-27.
Messick. S. (1989) 'Validity', in R. L. Linn (ed.), ducatwnal Measurt'men/. 3rd edn. New
NY: American Council on Education and Macmillan. pp. 13---103.
Miliband, D. (2004) Speed1 to the North of England Education Conference, Bellasl
January a...ailable at http://www.dfes.go....ukJspeeches.
Montgomery, M. (2004) 'Key features of assessment in NOrlMm Ireland'. Paper pre-
sented at a seminar of tht' Assessment Systems for the Future project, Cambridge UK.
March.
National Commitlet" on Science Education Standards and (1995) Na"'"nal
Scirnu EduCll/wn Siandl/rds, Washinglon, DC: National Academiell Press_
Natriello, G. (1987) "The impact of e...aluation processes on studenlJl'. ducati"nlll Psy-
chologist, 22: 155-75.
NCfM (1989) Curriculum lind Eva/llatit", Stllndllrds for Scllool Ma/hrmalla;. Rellton, VA:
National Council of Teachers of MaltlematiC'$.
NCfM (1991) Profrssionlll Standards for Tellc"ing Mathemlltics. Re!lton, VA: National
Council of Teachers of MathematiC'!.
NlIT (2004) Tht NUT Approllc" to Asstssmenl for Eng/lind: foundation stagr and prillUlry.
London: National Union of Teilchers.
DECO (2005) forlllalilJf' As.stssmen" Improving learning in S'Colldary classrooms. Paris:
DECO.
OFSTED (2005) The Aunual Rlp(lrl of Hn- Mlljrsly's C"ief Inspector of Schools for 200J/Q.l.
Office for Standards in Education. at http://,,"WW.ofstcd.gov.ukJpublicationsiannual_
report0304fannuaLreport.htm.
Osborn, M., McNess. E
v
Broadfoot r., Pollard, A. and Triggs, P. (2OCO) What TelIChn"s Do:
Chan 'ng policy "lid prl/clia in prim"ry education. u.ndon: Continuum.
Palincw., AS. and Brown A.L. (1984) Rniprocal Te"chinK "f Co"'prr/rtnsion f"5ttring "nd
Monitoring Activitin: Cogui/ion alld instrllction. Hillsdale, NJ: Erlbaum.
Paris, S., lawton. To, Turner, J. and Roth, J- (1991) 'A developmental perspe<:ti...e on stan-
dardised achie...ementtesting', Educational 20: 12-20.
Paterson, L. (2003) Scottish Educlliion ill tile Twmlu-lIr Cmlury. Edinburgh: Edinburgh
Uni ...ersity Press.
Pedder, D., James, M. and MacBeath, J. (2005) 'How teachers value and practise profes-
sionallearning'. RNI'IIrch in duCllli"n, 20 (3): (forthroming).
Pellegrino, J. W., Baxter, G. P. and Glaser, R. (1999) 'Addressing the 'Two Disciplines'
problem: linking theories of cognition with assessment and instructional practice',
Rroiew of RestIlrch in duclllwn, 24: 307-53.
216
,
References
Pellegrino, P., Chudowsky, N. and Glaser, R. (2001) KllOWing What StUrkllts Know:
Kirner dl"ign of Washington, DC: National Academies
P"",s.
,,*,rrf."floud. P. (l991) 'Towards a pragmalic approach to formative evaluation, in I'
Weslon (ed.), af Pupils Aehi"''l'melll: Mofi,..tioll 'lIld sdtooI SIICCt':5S. Amster-
dam: Swels and Zeitlinger. pp. 79-10l.
PerT('noud, P. (1998) 'From formalive evaluation to a conlrolled regulation of learning
processes. Towards a wider conceptual field'. AS>f'SSmml in EJllcati<m, 5 (I): 85--102.
"*'rry, N. (19'98) 'Young children's selfregulated learning and conlexts that support it'.
faurnol of dlwltwlllll Psyclwlogy, 90: 715-29,
rhillipll, M. (1996) All Must ha".. Prius. london: UtilI>, Brown and Company.
Pollard, A. and James, M. (ed5) (2005) PrrsollllliS<'d Ltarning: A l1y Tl'QChillg
alld uamillg Rt'SeIlrd' Swindon: Economic and Social Research Council.
Pollard, A., Trigss. 1', Broadfoot P., McNess, E. and Osborn, M. (2000) Whol Pupils Say:
O'llnging pa/icy pmctier ill primwy niuutioll. London: Continuum.
Popham,. W. J. (1997) 'Consequential validity: right concern - wrong roncepl'. Eduellti(mlll
's.sUts otld 16 (2): 9-13.
Pring. R. (1986) 'The developing 14--18 curriculum and changes in asses.srnent', In T.
Staden and P. Preece (ed5), Is.suts in 23. beler: University of
Exeler. pp. 12-21.
Project Zero (2005) Hi,'ory of Uro. Retrieved on 30,03.05 from hltp:1!
www.pz.harvard.edulHistorylHistory.htm.
QCA (2004) ASMSsmml for ul/rnillg: Rtst'on:I' into London: QualificatiOll5 and
Curriculum Authority (CD-ROM pac1<age).
QCA (2005) ...ww.nc.ulr:.nct (accessed 22.02.05).
QSA (2005) Studies Authorit}, al hup:/fw....w.qsa.qld.OOu.au.
Ramaprasad, A. (1983) 'On the definition oJ feedbacl<'./klurvionrl 28: 4--13.
Raveaud, M. (2004) 'Asse5sment in French and English iniant schools: the
work,. the child or the culture?'. AS5t55..,{nl in EduClltwn, I I (2): 193-211.
Reay, D. and Wiliam, D. (1999) 'I'll be a nothing: structure, agency and the construction
of identity through llsses.sment'. British [Aueo/iOllal Rtst'l/rcIr lournal, 25: 343-54.
R""S, G. (2005) 'Democratic devolution and education policy in Wales: the emergel\Cl' of
a national system?'. umtmrpllNlTJI Walts, 17: 28-43.
J., McCall J. and MacCilchrist, B. (2001) 'Change leadership: planning.
tualizalion and perception', in J. MacBeath and P. Mortimore (eds), Impnwillg 5chooI
Ejfrr'ivrlll'SS. Buckingham: Open Uni,...tsity 1'Te5$. pp. 122-37.
Reynolds, D. (2002) 'Developing differently: educational policy in England. Wales, Scol-
land and Northern Ireland', in J. Adams and P. Robinson (ed5), DroaIulionlll PllKtitt;
Public policy di!frrtnets wilhill UK. London: Institute of Public Policy Research. pp.
93-103.
RilIt, R. C (2000) 'Influencing the policy pr""""" with qualitative """"arch, in N. K.
Denzin and Y. S. Lincoln (eds), of &starch. Thousand o..lr:s, CA:
Sag". pp. 1001-77.
Robson.. B. (2004) 'Built to fail: every child left behind. MinneapoJis/St Paul'. Cily PI/StS,
2S (1214). Retrieved from ht!p:1/cit)pages.com/databank/2Sfl214/artidell955.asp on
31.03.05.
Roderick. M. and Engel, M. (2001) 'The grasshopper and the ant: motivational responses
of low achieving pupils to high stakes testing'. Educatiollal EwluotiOlll/lld Policy AlU/ly-
sis, 23: 197-228.
Rogoff, B. (1990) ill Thillking: CognWvr in sociI/I Cllllh."rl. Oxford:
217
Oxford University Pres.s.
Rogosa, D. (1999) Huw Art Iht STAR P....u"tllt Ra"k Srorts Ivr
Studrllts? All Intrrprrtivt guidt. CSE Technical Report 509a. Los Angeles, CA: CRESST.
Published on ...eb-site : http://ww....C5t'.uc1a.l'du/products/reports_set.hlm.
Rousseau, J-J. (l176211961) Emllt, london: Dent.
Rowe, M. B. (1974) 'Wait lime and rewards as instructional variables, tlwir innut'nce On
language, logic and {ale conlrol'. Illumal IIf I" Trnchillg, II: 81-94.
Sacks, P. (1999) SllIndllrdl;m Minds: Thr high prier IIf Amninl's l<'Sting cu/lurr a"d ,mUlt "...
(lin da It> c:ha"St II. Cambridge, MA; Books.
Sadler, D. R. (1987) 'Specifying and promulgating achievement standards'. Oxlvrd Rn';j'U'
<>fEduCf/lion, 13: 191-209.
Sadler, D. R. (1989) 'Formative assessment and tit<:! design of inslructional syslems'.
Instructiollll1 Scinlct. 18: 119-44.
Sadler, 0, R. (1998) 'Formative assessment: revisiting the lerrito!),'. AsstSslIImt ill Edurll-
tl"n, 5: 77414.
Salomon, G. (ed.) (1993) Di5trlbutNi CagllitiOIlS: Psych"logiclllll"" j',h,ra/i""lll cOllsldrrallollS.
Cambridge; Cambridge Uni\'ersity Press,
Schon, D. (1%3) Tht Rrflrrtivr PrtK:/lrla",.,.. New York.: Basic Books,
Schunk,. D, H, (1996) 'Goal and sel{-('valualive during children's cognitive
skillieaming', Americlln EduC1Jrillllal Rr$l'lIrrh 10u"'lIl, 33 (2): 359--438.
Scriven. M. (\967) 'The methodology of evaluation' in R. W, Tyler (ed.), PfI'Sptiws <>/
Currictdum 1'll/uIIII"n. Chicago: Rand McNally, pp. :w-&3.
Sebba, J. and Maxwell, G. (2005) 'Quet':nsland, Australia: an outcomes-based curricu-
lum', ill FU1'lIIu';vt ASSt'SSmomt: 1m/trolling lrllrning in strondllty Paris: DECD.
5EED (2004a) ASst'Ssml'l1l, Trsting arid Rl'pOrting 3-14: Our rrspom;r. Edinburgh: Scottish
Exccuti,e.
5EED (2004b) A Cu"imlumIvr &ctlltnct: Mlnlsl,.,.illl rrspollst. Edinburgh: Scottish
ulive.
Serafini. F. (2000) 'Thrt'(! paradigms of assessment; measurement, prc.><:edure, and
inquiry'. TIlt Rtllding Trochtr, 54 (4): 384-93.
5fard, A. (1998) 'On t....o metaphors for learning and the dangers of choosing just one'.
EducaOO/IIJ1 27 (2): 4-\3,
SHA (2002) E;raminll,lonslllld AsstSsmrnl. Leicester: Serondary Heads' Association.
5ha)",r, M. (1999) 'Cognitive acceleration through science education II: its effects and
smpe', In/w,alumal/uurnlll af Scu",ct EduCII/iu", 2\ (5): S8J--902.
Shayer, M, and Adey. P. (1993) 'Accelerating the development of formal thinking in
middle and high-school students 4. 3 years after a 2-year intervention'. /ou"'l1l of
ResrllTfh I" Scirn(t TtllChlllg, 30:
Shepard, L. A. (1997) 'The centrality of test use and consequences for test validity'. Ed,,-
ca/iOlllU MtllSurj'mc"t; Issur5 a,," Practier, 16 (2): 5-8, 13.
Shulman, L. (1986) 'TIlose who understand: knowledge growth in teaching'. Educarim"u
15 (I): 4-14.
Shulman, L, (\987) 'Knowll'dge and teaching: foundations of tlw new ""form'. Harvard
E;duC1Jtill"al RnJif"lL', 57 (I): 1-22.
SHwka, A., Fushell, M., Gauthier, M, and Johnson. R. (2005) 'Canada: encouraging the
use of summal;ve data for formative purposes', in Furmll/II'" As5l'5sI1lC"/: improt'ing
It/lrnlng in strolldary classrooms. Paris: OECO.
Smith, E. and eorar<L S. (200S) '"They gives uS OUr marks: the role of furmative feedback
in student progress'. AsstSSmtnt in Educlltiun, 12 (1): 21-38.
SOED (199\) Cumrulum and Assrssmt"l;n Scu/lalld; ASSt'SSmrnt 5-14. Edinburgh: HMSO.
218
References
Starch, D. and Elliott, E. C. (1912) 'Reliability of grading high school W{)rk in English'.
School Rn';nl'. 20, 442-57.
Starch, D. and Elliott. E. C. (1913) 'Reliability of grading high school work in mathemat-
ics'. Schoo/ Rroirw, 21: 254-9.
Standards for Success (2003) Mixrd mrsSi1grs: Iligll Irsls commlmiallr "bout
stuMIII rtIIdinJ'SSfur roll'gr. Euge[\(', OR: Association of American Universities.
L. (1975) All In/roductioll la CurriCI4I"", "nd DnJrlapmtll/. London;
Heinemann Educational Books.
Stiggins. R. j. (2001) SlllllwHllt'Oh'fit Classl'OOlm ASSI'Ssmmt. 3rd (-dn. UPf'l.'r Saddle River,
NJ: Merrill Hall.
Stigsins, R. J. and Bridgeford, N. J. (1985) 'The ecology of dailSroom assessment'. Jaurn'"
4 Educaliana/ Ml"aSurrmrrll, 22 (4): 271-86.
Stiggins, R. j.. Conklin, N. F. and Bridgeford. N. J, (1986) 'Classroom assessment: a key
to effective edutation'. M'''Sl4rcmrnt: /SWfS "lid 5 (2): 5-17.
Stiggins. R. J., Frisbie, D. A. and Griswold, P. A. (1989) 'Inside high-school grading prac-
tiCl.'S: building a research agenda'. EdllCllhOlral MraSlll1'mrrrl: /ssurs 11IId Practirt'. 8 (2):
5-14.
Stigler. J. and Hiebert. J. (1999) TIl' Tnlcl,illS Gap: TIlt ksl idellS from I/Ir trorld's Irllchrnfur
improving fflilelltioll III Orr classroom. New Yorl<,. NY: FR.... Press.
Stob"rt, G. and Gipps, C. (1990) Assrssml'lll: A I.achrrs gui,lt In Ort issllrs. 1st edn. London:
Hodder and Stoughton.
Stoll. L., St{Xldrt. G. Martin. S" FrL....man. S" FrL'ed.man, E., Sammons, P. and Smees, R.
(2003) Prt"/lIlring for Chang" E",I"a"'"" of the impltmmlatiOlr of lilt Kry Slogt 3 slrattl{Y
pilol. London: DfES.
Sutton. R. (I99S) Assrssmml for Lr"",ini!' Manchest...r: Ruth SUllOI' Publications.
Swaffield. S. and Dudley, r. (2002) Assrssmr"t Walley for Wi.5<' Dieisi{ms. Londun: Asso-
dation 01 Teachers and Ledun-rs.
Swann, J. and Brown, S. (1':197) '"Jl, implementation of a Nillional Curriculum and teach-
... rs classroom thinking'. Rrst'arrh PIl/1tTS irl Educllli{m. 12: 91-114.
Tamir, P. (1'J90) 'Justifying the selection of answers in multiple ,noire items'./rllmmlilmlll
/ollrnol 'if Scirn.. EdllcotiOl', 12 (5), 563-73.
Taylor. T. (1995) 'Movers and shilk,'rs; high politics and the origins of the National Cur-
riculum'. TIlt Cl4rriClllum JOIm'1I1. 6 (2): 160-84,
Terman, L. M. (1916) TIlt of I",tl/igmc!', Elo5ton, MA: Houghton-Mifflin.
Terman, L. M. (1921) 'Intelligence tests In colleges and universities', School lind 5<lciny,
(April 28): 482.
Thatcher, M, (1993) TI,e Daw"i"g Sln:r/ Yellrs. umdon' Harp<'rColiins.
Thomas, G. and Egan, D. (2(0)) 'Policies On schools insp<.ortion in Wales and England',
in R. Daugherty R. Phillips and G. R('<'5 (eds). Edlltilliorl irl Wait'S.
Cardiff: University of Wales PrL'$S, pp. 149----70.
Thorndike. E. L (1913) Educaliorill/ PS!fClwlogy. V<I!"mr I: Tht' n,,'ul? of lIIari. New
York: Columbia University Teachers College.
Torrance, H. (1993) 'Formative assessment - SOme theoretical problems and empirical
questions'. !OImlal of EdllcatiOlr. 23 (3): 333-43.
Torrance, H. and Pryor, J, (1998) T=!ting. Ifllming Illld
assrssmrnl ill tilt classroom. Buckingham, Open University Press,
Toulmin, S. (2001) Rotlurn 10 Rotas"". Cambridge, MA: Harvard University Pll"SS.
Towns. M. H. and Robinson, W. R. (1993) 'Student of tcst-wiseness strategies in
solving multiple-choice chemistry ./,mrllal of Rrstarrh in Scit'nrt T<'Ilch-
illg, 30 (7): 70'9--22.
Townshend, J., Moos, L and Skov, P. (2005) 'Denmark: building on a tradition of democ-
racy and dialogue in schools', In Formatipt ASSNsmfllt: Improvillg laming ill SVlldary
c1a.ssrooms. Paris: OECD.
Travers. R. M. W. (19&3) How Rrs.,llrr;h Chllngrd Amrrlom &hooIs: A hislory from 1840
to IIlr Kalamazoo, MI: Mythos Press.
Tunstall, P. and Cipps, C. (1996) 'Teamer feedback to young children in aSS('$$-
men!: a typology'. British Educlltional lournal, 22: 389-404.
Tymms, P. (2tX}I) 'Are standards rising in English primary schools?'. BrilWt Eduational
R(SNrrll JOllrnal, 30 (4): 477-9..\.
Varon, E. J. (1936) 'Allred Binet's concept of intelligence'. PsycllOtogiCJ/1 43: 32-49.
Vispoel, W. P. and Austin, J. R. (1995) 'SuCU'SS and failure in junior high school: a critical
inddent approach to understanding students' attributional beliefs', AmrrlCJ/n Educa-
Iiollal Rfstarrh tournai, 32 (2): 377-412.
Vulliamy, C, 'The impact of globalisation on qualitative research in oomparati\e
and international education'. Campal'l', 34: 261-84,
Vumamy, G., u-win, K. and St"pheTl$, D. (1990) Doing Educatio",,1 in
Cou'II,lrs; QualitaliVl" London: Falmer.
Vygotsky, L. S. (1978) Mind in SocieIlJ: The DtwIopmrnl of Higlln' P.yrllOlogical Procrss.
Cambridge. MA: Harvard University Press.
Vygotsky, L. S. (1986) Thought and Languagt. Cambridge, MA: Harvard University Press.
Vygotsk)', l. S. {1998 119J3!41l 'The problem of age', in R. W. Rieber (ed.). Thr Colltctrd
Works of L S. \<)'golsky: Vol. 5. Child P.ydrcJogy (trans. by M, Hall). New York; Plenum
Press. pp. 187-205.
c., Carnell E.. lodge, c., Wagner, P. and Whalley, C (2000) Utl'Il;'lg about
lJarning, London: Routledge.
Walkins. c., Carnell, E.. lodge, C Wagner, P. and \'/halley, C. (2001) NSIN Resrllrrh
Martrr$ No.13: l1arnillg Ilbout IraTning mhm,res pnfr,mal1l:r. London: Institute of Edu-
cation.
Walkins, D. (2000) 'Learning and teaching: a cross-cultural penlpeetive'. &hovJ /..rlldrr.Jtip
and Manllgr..,rlll, 20 (2): 161-73.
Webb, N. L (1999) Aligumrnt ofScirncr and Mathrmalirs Slandard.llud Assrssmrnl. in Four
StatfS. WaShinglon, OC: Council of Chief Slate School Officers.
Weeden, P., Winter, J. and Bro.,dfoot, P, (2002) Assnsmml: \-'/hat', ill il for schoo/s7
london: RoutledgeFalmer,
Weiner, B. (1979) 'A throry of motivation for some classroom experience!l'./ouml/I of Edu-
rglia"al Psychology, 71: 3-25.
Wenger, E. (l998) Communitil'S of Pmcticr: /..ram/ng, mt'IIning mid idrnlillJ. Cambridge:
Cambridge University Press.
White, B. Y. and Frederiksen. J. R. (1998) 'Inquiry, mool'ling and ml'tacognition: making
sciel1C<' aca"Sible to all students'. CDgnltion and Inslruclirm, 16 (I): 3-ll8.
\'/hite, . E. (1888) 'Examinations and promotions'. EduCtlticm, 8: 519--22.
VI.'hite, J. (2tX}I) Unpublished report on the CCEA 'l'athwaY'" proposals. london.: Uni-
versity Instirute of Education.
\'/hitty, G. (2002) Milking ofEduCtltiou Policy. london: Paul Chapman l'ublications.
Wiliam, D. (1992) 'Some te<:hnical iS5ues in aS5eSSml'llt: a user's guide'. Briti5h Journlll for
Curriwlum alld As$r5smrnl, 2 (3): 11-20.
Wiliam. D. (2000) 'Recent developments in educational assessment in Engl;md: the inte-
gration of fonnative and summalive functions of assessment'. Paper presented at
SweMaS, Umea, S....eden, May.
Wiliam, D. (lOOl) 'Reliability, validity and all that jazz'. Eduoztioll J--.IJ, 29 (3): 17_21.
220
Referen<:es
Wiliam, D. (2003) 'l}.., impact of ooucatiOllal research on education' in A.
Bishop, M. A. Dements, C. K/!itel, J. Kilpatrick and F. K. S. u-ung {oos), xrond In'{I'
of fdllC<llion. Dordrecht, Nl!thl!rlands: Kluw/!r Acad/!-
mic Publishers. pp. 469---88.
Wiliam, D. and Black, P. (1996) 'Mcanings and col\SI!<juenco>s; a basis for di$tinguishing
formative and summat;"e functions of assessment". British Eduallitmal R="rch
23 (5): 537--48.
Wiliarn.. D., Lee. C, Harrison, C and Black, P. (2004) 'Teachers developing lI$Soe$smL'Tlt for
learning: impact on student achievement'. in (dllc,,'wn, 11: 49--65.
Wilson, M, (1990) 'Measurement of dew-lopmental It'vels', in T. Husen and T. N.
Postlethwaite (eds), Inltrnali",ral fncyrloptdia of fducolion: SInd sludirs (Slip-
plernrntory rolume). Chford: Pergamon Press,
Wilson. M. and Sloane. K. (2000) 'From prinaples to practict': an embedded assessment
system'. Applird Mrl/Surernrnl in fducali,,,, (forthcoming).
Wilson. M., Kennedy, C and Draney. K (2004) G'adrMap (Version 4.0) {computer
program]. Berkeley, CA: University of California, BEAR Center.
Wilson, S. M, and Berne, J. (1999) 'Teacher learning and the acquisition of professional
I:JlQwk-dge: an examination of research on contemporary professional development',
in A. lran-Nejad and P. D. Pearson (L>dS), RI'I'jro, of Rrsrarrli in Washington.
DC: American Educational Research Association. Pl'. 173--209.
Wood, D. (1998) How eMd"n Tlrink "nd LJ"m: TIll! s;x;ial co','exls of "'g"itiVf' drof"lopmrnl.
2nd oon, Oxford: Blackwell.
Wood, D., Bruner, J. S., and RO!is, G. (1976) The role of tutoring in problem !IOlving.
Joumal ofClrild PsycholDgy and Di$<:iplinrs, 17: 89-100.
Yarroch, W. L (1991) 'The implications of rontent n"rsus item validity on sdena- tests'.
Journlll of Rrgarch in Sciroct Ttl/chiog, 2B (7): 619-29.
Zcnderland, L. (2000) M"'Sllring Mi"ds' Hrnry Hi'1Wr1 Goddard the o,igins of Amrri"'!1
intrlligt11ct lesting. Cambridge: Cambridge University Press.
Zimmerman. B. J. and Schunk, D. H. (oos) (1989) Stlf-Rrgulatrd ua",ing l/lfd Acadrmic
Achirooo"""t: TIltory, rtSt'IITCh, lind prllCtict, NI!W York: Springer.
U!!.. 136, m 142.lU.
I..H..!.M.. 157, 166,
I.l!2. ll!Z. m 198
"""" C ""
Bourke. S,f, 6Z
eo-. R. 151, 152
Brolluford. ).0. &L 6Z
Bredo, E.,5,1 56. 'll
Bremoo. L fa
Bridgeford. N,J. lZD.
Brign'lIn. c.c. 174
Bro.odoot. P. g lJi. l.t2..
l>l. I.IO. IBZ
Brookh.art,. S. Z1. Z2. 12
Bl'OOlTIt', E.C. 111
G. 20
Brown. A.L M. 'll
Brown. S. rI. l.59.
Bm........, J. 'll
Bry<oe, T. l.59.
Butler. D.L!l2
Butler, R,1S..6S.. 'R,142
D. J.S,1
J.!l2
Carless, D. ill
Carter, c.R.1lll
CCEA 164
Cllaiklin, S. 'lIl
Olklo:ering. A, W, l.ZD:
Olin.. R, 22
Choppin. B. 127
OlUdowsky, N. 5J..
,.. .....
Darke. S. m 142,. 144,
151
Oronan. G. 6Z
Cocking. R.R, 52. &L 6Z
Cohen. AS m.. 136
lZ5.
Collins. A. 135
Conklin. N.F.l.ZD:
eon......... C.1S6
Corbett. I:LD. lZ'l
222
Cowan. P. rn. 123
B. 'R.. ill
Crooks, T,J. l.ll.. 6J. Z6. rn...
Ul! 140, 157, 170,
'" CT0551ey, M, l.BZ
Cumming.). I.OZ
R. lSIl
Darwin. A,
Daugherty, R. ll!.. 155, 1.fl2"J
Davey; C. ZIl.. Z1 Z6. 18
David5(ln, J.163
Da\o'ies, I. 69:
Deakin Crick. R. fiL M.. l.'lS
Di. E.L. 1I9
DeNi5l A. U. 141. 142, 14J,
144, ill
DES/WO.lQ,!?. 1().l, 153, J.5,I
DeVog.., J. Z12b. <'6.11
DfEE 1Sl
Donerty, V, W, lZD.
Dorans, N.J.1Zti
DorrBremm.., D.W. lZD.
Draney, K. 110
Duckwortn. K. Z2. 71. Z8
Dudley, P.:H. 164
Dunlo61<y, J. !l2
Ow!<, C.s. & Oi,
'R. 142. 1M
Earl LM!
Ecdel;tone, K. M. 140, ill
Edwards, A. S:9'
0.162
Elfed..Qwens, P. 162
Elliot. E.C.1Zl
Elliott. .S. ZS
Elli<;l!t. J. 138
M, 71. Z8
Eng.. R. z:z. Z!I'
Enge$tJVm. Y. !!,1 !l6
N. f.8
E. z:z. Z!I'
Fielding. G. "lL ZlI.
Fil:'lding. M. l8Z
Fill:'1, A. sa
Finlay, L l5'l. 162
Flyvbj<ors. B. II
Foes. P. W. 1.6
Forsll:'r, M. 110, l.!!l1
Frederiksen. J.R. 106,
135
FrisbII:'. D.A. 110
Fullan, M. "lOt
Calton, F. 112
Gardner. H. 118
Gardner, J. 123
Gauld, Cf. 127
Gibson, J.I.II2
Gipps, C 142, 143. 155, 156.
Glaser, R. i!J.. .22-
"'-'"
Glas.sman, M. 56
Glover. I>. I.li.
Glynn, T. 1.88
Coddard.l:i.H.123
Cold. A.1.a ill
Corard,S. ill
Cordon. S. Z2. Z"!. ill
Graesser, A.C. 22
Gn.-eno, I.G. 'll
Griswold, P.A. 1ZO
Grossman, P:L as
cre(E) 1601
Hacker, 0.1. '12
HaiL C. Tl
HalL K. l5iI
Hallam, S. 2Z.. ill
Har.tCkil:'wia. I.M. & M..lJZ
Harding, A. T1
Hargreavt'5. A.1Sll
HargreaV1:'5. D. 20.
Hargreaves, E. 142,. W
Harfen, W. 012. & l.O5..
106.l.llZ, l.ll2. l1Q. 113.
127, 157, JjS. I.9ll
Harris, S. 1J9.
Hayward. L m ill
Hmdel'$On, V.L. M
Hennan, J.L 110
Hid;' S. & 6i.l3Z
Hiebert,. I. .l'.!
HilL B. 203, 204
HMII59
Hodg..... J. ill 52. 85
Holland. D.!l2 83
Hubin, D.R. 174, ll!1182
Hufton.. N. Llll.
Humes. W.lS'l
HUlrnil\llOf\, Cl.6ll
Author index
baac, J. M
Jamel. M. J!).. ,ll .u. tl. 5J...
Zb.l.l!'L l.56. 166
lessup. G. l.Sll.
lohnston, c. 2ll
Iohn!;ton. J. ZQ, Z6
Jones,. L.V. 1811
Ka.ne. M.T. m 1.36
R.A. lil
K<!llaghan. T. & & !.TI!. l1'1
Kenrll:'dy, C lliI
King. A.1.6
Kluger, A.N.!1.. 141, 142,
143,1#, lAS:
KOl!$lner, R. 6)
Kohn.. A. & 142, ill
Koretz, D.M. 118
Kmsberg. 5. !Ill
KUlnick, P.191

Lawlor, 5. l55
Leadbetter, C. lfi
r-, CllZ
E.L.Z5
u.m.ann. N. l16.
lronard, M. Z!!. Z1 & ZlI.
u.vilW!, D.O. I1.L 174
Lewin, K.186-7, 1.88
UndquiSl. E.f.l16.
linn, R.L illl.36
Lodg... C. ill
MacBeath, I. 3.L 31n, J2
McCalL J. l'l
McCallum. B. 142, W
McC1u..... W. Z!.!.. Z6
MacCilchrist B. l.'.l
Mcllll:'fTle)'. D.M. 6S.
McKo'Own.. P. L6.1
McMahon, A. ill
Madaus. G.f. 6J.. 6S. J.Zll. l1'5!
Marshall, B. l.ll,. 52. 85
Masters, G. W!. l!IO
J. 118, l19.
Muwt'll, C.5. l.ll6::Z.llS..
1M. 192, l.'.l1
Ml'iI:'r, C. 1JZ
M"l'CI:'r, N. 9Il
Messick, 5. m l16.
Miliband, D. 165.
Moller Boller, J. 1.0
Mootgomery, M. 1601
MOOII, l. 139, \92, 19J
Mora, J.I. 1.6.
Mortimore, P:31n
Murphy. M.m
223
N"triello, G. 10
NCTM122
Neuman. O. '12
Northfil:'ld, I.R. 22
NUT ".

OfSTED"
Olkin, L 1.80.
Orr, L 127
Osborn, M. 155, 166
Palirlc:Nr, A.5. 'll
Pari!!, S. Z':I
Patl'rson, L. lS'l
Pe"1'!OI\, P.O. 53
Pt'<lder, D.;n. 32
l'ellegrino, J. W. !IS
l'ellegrino. P. SJ., 56. 5B.
59,39
Pel'Tet\Olld, P. w... l!.L !!Z
lIlL"''''
Perry, N. Z8
I't.illips, M. ill
PiI:'1t'e.. D.P. 22
Pollard, A. ft!.. Z2.. & "lL
""'66
Popham. W.J. l.36.
Prietley, M. 159, ill
l'ring. R. 1'.l'l
ProjKt Zero 178
Pryor, I. &. 2!1 15S..
'66
QCA 140, 198
Q5A '"
Quinn. J. 2l}4
RDCl.<!k, A. 65
Ramaprasad, A. U
Rave3ud, M. m 1J8.
Ruy, D.@.&a 142,l.55
RPe$, G. L6.1
Reege. M. Z6. Z!!.1l5
Reeves. J. l.'.l
R."...,.o:b, D. 1511
Rist, R.C. 202
RobiMon,. W.R. 121
Robson. B. 1.80.
Rod...nck, M."!L ZlI.
Rogoff, B. 56. !Ill
Rogoeoa, D. 122
ROIl&, G.2lJ:
RouSllc.llu. I-J lIJ
RoW\', M.B. 14.
Ruddutk, J. l.3'l
Ryan, R.M. 8'.l
Sadler, D.R. J1.l2.. l!!t
malarial
I
i1Q2. 145. 157
Salomon, G. Eo. IIJ.
s.m..on.... c. M
Schoenffold, A..H. 53
Schil<\ D. ti
Schunk, D.U. Z1. Zft.
"""
ScriVl'1\, M.l
Sl!bba.], l1\f!. 192, ill
SEED l.6b2
Serafini, F.
Sfard, A. 2'.1
SHA 164
Shaughnessy, J. Z1.. 'l1.1ll
Sharer, M. '!!.. ':!lI
Shevard, LA. 1J6
Shulman, L,l& 1l6.
Skov, P. 189. 192, ill
Sliwka, A. l'lll
Sloane, K. 2Z
Smith. E. ill
Smith, J_l. >I
Standanb for Success 1l!.l
Starch, DJ ill
L.:ill
Stephens, D. 1116-7, l8lI.
StiAAiru;, RI_ lZll
Stigler, j. J2
Stobart, G. l.56
$lodolsl<y. 55,!l5
Stol1, L rn
SUlt..... R. 38
Swaffield, S. 164
Swann,. j. J.52
Tam,r, P. 127
Tang. C. , 51!
Taylor. T. l5l
Terman. LM, m 174
Thatcher, M, l2i
Thomas, G. 162
Thomas. R. ill
Thompson, D.E. 6.1
Thorndike, E.L 142
TkaCz. S. lli
Torrance. H. R.!it \l!1
l26. 166
Toulmin, S. 2.1
Towns, MJ:L 127
Townshend, J. 189, 192,.l2.1
Travers. R,M.W.W
Tunstall, P. ill
Tymms, P. l.J6
Varon, E.J, ill
Vispoel W.P. 12. 22.
Vul1iam)'. G. lM::Z l88
Vygotsky. L5, g:!lJ:
224
Wallace, G. 13'l
Watkins. C. 'll. 1..
Walkins, D.R1J2
Webb, N,L 1.Z9.
Wl.'eden, P. Hi
W...mer, B. 66.
Wenger, E. 5fl..ll2. '.l.3
White, B. Y. l..2. !!l. l.ll6.
While, E.E. ill
While, J. 1M
l'IIhitty, G. l5S.
Wiliam. 0.11 L1ll. H. 'll.
&'!:Q.f!B.f!2.Z6.l2.'lZ.
114--15,120,123.127, 1:l(l,
136,142, HJ. J..i6. 155,
157. 166, In. 187.
m
Wilson. B.L 1ZZ
WilliQn, M. 97. lll!.
Wilson, S.M. 20
Winne, Cl:L on
Win\cr, I. ill
Wood, D. 'Kt. 2bl.
Yarroch, W.L.127
Young, M. 159, ill
ZenJerland, L.ID
Zimml'rman, B.).
Subject index
Added 10 the pil8l.' referent'\' 'f' dl.'notes a
figure.
activity systems, subject classrooms as !!Hz
2H
activity theory S9.
American Collcge (Acn l.Z6.
ARC (Assessment Reform Croup) L:i. ill
157,l97
'AS6I'SS""-'l'It is for Learning' (Aift)
programme l1Ill:02
assessment for learning Z::J... 197-:IDI
'compelling conceptualization' 204
concept 202-4
as a c)"cle of eV\'rlts J.l:Y,,5
definition 2
distinctions belwee" summalive
assessment and l. J..(),l"ji
educational and rontexlual ;HUes 1!l9 2m
and inquiry-base<! learning by le..chers
bZ
OECO study 11 OECO study
principles J. 2LI. Z!!. 1lJ8
and prof<"sionalleaming ow prof"""ional
devclopment/leaming
and student grooping strategies 191-2
UK policy on Uk policy
US policy sn US policy
snlJlStl formative aSS<.'SSment
for learning in the d3$room
""
ful"re issues 2bS
KMOFAr II KMOFAP (King's-M"",w,,>'.
Oxford"hi... Formati"" AS'lt'Mmenl
ProjI'd)
.......arch review 'i=.12
a__ of learning
distinctions between aSS'SSment for
"'"ming and 1ll3=6.
56 also summative assessment
As5essment Reform Group (ARC) L ill!.
1$7, l.2Z
aUribution th.""Y U
and locus of control 66.
Australia
impact and of policy
_ ulso QueeMland
BEAR (Berkeley Evaluation and ....
Research) pro;oo u.o
rontr.u;l with KMOF.... I'(King..Medway
Oxfordshin1 Formati..... Assessment
I'rojec1) '1Z=8
paltems of influ.ence '!5:
bdlaviourist theories Si. 62
unada
impact and of polley JjH
professional development/le,ming 19.1
self- and peer 'sseslimcnl l!llI.
cla$Skalle:lt !IS
classroom di,logue 1.4
framing and guiding 89
sludies2Y
classroom di5C'O\.lfW 11
clusroom practice 2-3. 11199-200
application of researdll1l::2L l'B
linb with dewk>pmenlll
examples 5lb2
,nd KMOfAI' (King's-Medway-
Oxfordshi", Formati..... Assessment

classroom roles
changes l.ld.ll
implications of assessment for le,ming
"='"
"" also sl\.Idenl's role; leacher's role
c1assruornli
assessmenl for learning In _ assessmenl
for learning in the classroom
a. communit;"" of p.mce lI.2:oJ
as figured worldsllb3
_ also subject cIass:rooms
cogniti..... lIoCCeleralion initi,\;w 28
cogni\lvist theories as
_ also COf\5lruclivisllht.'Ories
comment marking a 142-3
in QlIl'Cn5land 189
Gardner's theory of multiple intelligen!l53
goal orientation M::6. '.l'b3.
experimental manipulation ZbS
goals al different levels lQ9...-11 112f
grades
negative effects 11.. 22-
students' underilanding of Z2.,,3.
5.00 tests
Exallnra in SdIoolJ LS!I
explidlleaming l.Jlb!!
dilemll'la5
external 'l. .&&'l
"ffect of f8.
fonnative use 1.llB
s also nationAl curriculum; n.. lional
testing and assessment; ponfolio
llelection tests
exterT\lll control 66.
exlerNll rewards 63.
exlrinsk motivation
distinction between intrinsic and 62=3
negatiw impact 63. Z6
reedback J.lli"S
and C\Iltural diff..>.....-.as ill
experimental manipulation of ZJ,,5
impact'1it:::Z
in the OECD study J..8B",9O
researdllileTature 1L 12
and student-leacher interaction 8Z=!ll
through mar\cing Md5
validity and 141...fl
s I/so or.>l fle.edback; J"!'.'r ..-menl;
self'il5Se5$fIlent
figunod worlds, classrooms as 82:oJ.
fonn,ati"" assessment 2,,28
as a cydo:> of e-=b J..(I,b5
innovation in 100
learning contexl 5 learning context
relationship with summatlw ..-ment
_ relationship between fonnative and
summltiw a5lleSliment
reliability 129-JQ, I.!Cb6
""5'earm review 'bI1
andstudmts
involwmenl in 11
reactions 10 !l2:ol
lension between summative demands and
IB
theory 5 theory of formative assessment
all I Trojan Hol'R I!ll.. 99d.OO
validity 133=46
thr@at:'l to Li6
Ii IJJso _menl for learning
formative as!Ie55ment infonnation. use for
!ummati"", assessment IQ9...-J3. liZ
four componmt model of formati""
assessment 4. M:::2:i.!i'2. 201
fonmtive 1-.nefII 1JZ
national roniculum
__\ _
commltil!s of practices, as
"'"
...lidity 136
'ronstrJft 1rn'levanl variance' J..35,,6
'consttlifl undef"repraelltltion' 1.Y.
oonstrufivisl: lMorieI !L 6Zdl.
J.jQ
infIUC'n<\l! of U6. m !..J!!...
curricuta
between school and university
""""",
192-
relevance of policy l.95
natu and role oHeedback 189--90
development/leaming 00
studera IlS ag.....1'1 of dlang<! lit
of .-ments lli.l3ll
develo tal IndicalOn l..ll:b1.1
assessment 1ll4.
'dldactlc.Jltuarions'. creation of s:z
differenlittion ilL !llhl
di!IIClosurf 1.46.
of a question' 127
,_
policy III 166--7
S QIsolia5llfOOm disrol.IIX
'dislribu.;c. cognltlon'.5Z
dynamic _mentlll::J.5
....
pol.ltiG' l5Il

Ellgl.nd W::6. 166--7, 19'1=8
and dis<.vur-.
Labour tducatkln polldeflIS7-$, 164-6
nationafcurriculum 5
naoMal curriculum
""""",:!observation 58
evaluation
ofte 192-3
evaluatiw J..GI
evalWlliw 142

ct.....ginl bltsis of of
III-Ii
distincti'l" between evidence and
in:tt:tion of I.D'.l.
and formative
_1IlSO kft:natiVl' lIS5e5m'IeOt information;
evidelce
examinatio&s
formati purpose J.ll!b'j
226
righted malanal
Subject
as !twats to valid fomuotive iWeSSment
142-3
use in US schools 1m, l.82.
lTUlrking
group work 200
between a5Sol'SSment for
learning and 191-2
guided participation 911
individual interest 64.
inqulry-ba5l'd learning !lll::1
IlI5idt 1M B/;rd Boz 21 I.a. m
intelligena:
cffl on goal orientation of views
0'15
SIt 1I1t;</ multiple
intelligence testing If!!.. lZ2::Z
Scholastic Aptitude Test (SAT) 17.....7.
""'"
use in university admissions J.1.3"i
interactive formative assessment ill
'interaeli"" ""Sulation'lIZ
interest 6t
intemal ('OfItrol 66
intrinsic motivation
distinction bo!t_..."lrins;c and 62.=J.
promotion of 6.l::i
judgement of teachers llZ.l.46
Key Stage 1 initiative 22. I.li.2. l28.
reliability IUt
key stage tests, reliability 121125-6
KMOfAP (King's-Medway-o..cfordshire
Formative Project) J1cl.1
XL 8.L !H. 166
change in teacher practiC'e
contrast with the BAR (Berkeley
Evaluation and Research)
project 2Z::d
and impact 2b.'l
outcomes l.6d1I
patterns of influenC'e 2S::6
practices that developed Hd..6
research and practiC'e l.Hl
setting up l2.=l!
leadership, role of 1.'!'6L
learning
assumptions aboutll
motivation for sa motivation for learning
regulation of!it't r<>gUlalion of learning
thl'Oli<os of SIt theories of learning
sa also assessment for learnin.ll;
assessment of learning.: expHcit
learning.: inquiry-based learning.: meta-
learning.: penonaHzed learning.:
profes.olional development/leaming.:
self-""&,,lat..d learning; teacher learning
learning ronte>ct1.J6"t1
insid... th... classroom l3lb41
eJ<plicilleaming In explicit learning
formalivefsummative relatiomhip
Hf),j
trust and motivation l.1B
outside !he classroom ill
sa al5tl cultural contexts
learning environmentlZ::l8
'learning goals' 6S
Learning Huw to learn (l2l1 projecl:L
2J::i.?:L.:K!. 166
responses to staff questionnaire 1lb3
dimensions of classroom a55es5fTlmt
and teacher learnill& J.l::4.
relationships between lISIIe5Sment
practices and teacher learning
practices :Y=Z
learning organiuliOf\ll, developing schools

'learning orientation', distinction betwe.m
'perlurmanceorier\talion' and 92
learning tasks 11
lesson study.19
local asso.'SSment policies Zlb!Ifl
locus of control 66
lower ..chieving students il
impact of summative aS8l'SSment and
lests Z8
management of schooU s school
management
marker error ill
marking
h;oedback through Lfd.5
negative dfeC'ts 92
as a threat 10 valid fonnati\'e :ueessment
142-3
!it't comment marking.: grades; peer
marking
'mastery goals' 6S
mastery learning programmes 12
develOpment of !1.. 92
meta-learning 92
motivation for leaming 61-110, 200
components M::Z
concept 62,,;1
impaClll of assessment 6Z=Z.S
negali''I'1b=l
researdl fiIl::Z.S
ImporlanC'e 61-2. 25
and the role and statUS of e<luClltion
within a society ill
using assessment to promote Z5dlfl
mullipre intelligences, Gardner's 53
multiple-choice leSt!' 121
use in the US l.8l
national curriculum
227

introduction l.S.lo6
1eveI de:scriptiom J..l9,,4Q
5. Key Stage J initiative; key Stl>ge
'6'
national testing and.' ...mt 2B=8ll
No Child Left Behind (NCLB) Act (2001)
!Jll>,l
North America
classroom assessment Zb3
5 fIlSQ US policy
Nllf'lhem Ireland
policy environment 1.6.Y
"",Iection examinations Z!!::l..l.22.=3
OECD study J.8.S",96
definition of formative assessmmt188.
focusing on differences at the expeMe of
similariticsl86.
imp"d and ....lev;once of policy J.2,Y,
influence of cultural coolel<ts 1..ll6=Z
methodologicallimitl>tlons 1.8ZdI
nature and role of feedbKk I.8lb'lfl
problems of trafl!iferability 1BZ
profe$$ionaJ developm<!rlt/leaming 192..,3
khool impwvement a,mtextual factors
,.".
'''''If- and peer J.'lfb.I
student grouping sWllegics and
lIS6C55mmt for learning 191-2
oroll_ment 189-9ll
0I1l1 r..edbad: l!.. 189, 00
oral qul'l'itionir>g lS. 190
peer Z n:i.. 200
and cultural differences 132
in the OECD study J.'lfb.I
role In formative Il$6t'$6ment llL as.
pe.!r ....."'ing l1i
'performance goals' 65
pt':rformanc:e motivation 1.
'performance orientation', distinction
between 'leaming orientation' and!12
performativity l56, L2B... 166
per.;ooal interest 6!1.
per.;ooalized Ieaming H'L l.6.2. 166
plannrd formative assessment ill
policy H. 200
analysing lSlbl
impad and ",levance J.9,b6
influencing 202
5 IlISQ UK policy; US policy
policy '" 152::.l 166-7
policy as te,,1 151=2
'poillial of education' 150
portfolio ..-menl i. 58.
in Canada 00
in the US 178-9
5"/SQ o.-rwand portfolio _ment
ponfolio-ba-t YOClltionlll qualificatioM 1.4ll
po_r, i!5ues of !lIJ
practice 5 dassroom practin>
praise J..U,4
predlctioo\ use of tests for 127...,
profeMional d"""lopmmt/lO'amlng
27-42, 199-.200
links with the application of l'l'lIearm to
practice 2J
in theOECD study 192-3
questionnaire and findings 3lI:::9.
indkalO" for lIo-lI,m
progressive -.nentlO6::Z ill
llft-17, 201
and .....lidity J.Ji
""""'''"''
group work 192
Impact and ....levance of policy
nature and role of feedback 188-9
professional deve:lopmmtlJeaming 1.9.1
role of leadefship I.9.Y
self and peer -.nent l.flOd
students IS agents of ch.ange J.9,i
Queoensland portfolio I.l!Il...
ill. ill. ""
question sampling 12lb1
in c\asIlroom$ _ tlaM-room
dialogue
regulation of learning fl9'
teadlt'f'. role and 86=Z
....lationship between fonnatiYf and
summative 103..JZ WI
didlotomy Qr dimension 113-15
distinctions i. .lDJ::6... U5=l6
efft on validity u.o
fonnalive use of IUmmaliYf
,.."
using fonnative ilIl!leSIlment information
lor iUmmalive as5eIlII"JWI1t lQ9.-JJ. 111
reliability of..-nmts i. I INI, 201
deci5ion romlstency l2.b3O
and the owrlap between validity and
reliability 127-9
and the fonnalive
-,
evidence about l.22=l. J.3lb.1
threat512lb1
_....
applkation in'" pradke J..!!:,1L l.'B
. finq with prof".iona\ 2J
_11I50 lolacher research
re9l'ardllessons J9
reJOllf'l:eI, effec:t on formatiw ..-ment
ill
rewards
negative impact UI
sa also l'lCternal rewards
228
Subject index
roil'S sa classroom roll.."
schooling l38
20
Aptitude Tesl (SAn 174-7, 1.&1=2
school '!L ZIl
relalionship bel.ween pe=ptions of
teacher leamillS practic._", alld
classroom aiISCSSrm.'r11 practices lldI
Srotl.nd 2l. ZIl. 159-62. 166, lbZ
5eledioll, u:;e of lesl ......ults for 12&09:
:;electiolllest"
an.lysis of ",liability U2.::3
impact on for le.millg i'D=l
5elf..o;.:;essment 12 :lli.. 2J
effeet of e.rem.1 assessment 69:
goal and 14.
in the OECD study l..'.llhl
role in formalive asscssm""Il!.L LIS
scll...,fficacy 66=1
dfeCI 0/ cbssroom assessment 1.1.=2
5e1f-<$Il",m 66
impact of national 1'-'515 in England alld
Wak", ff!.
sclf.regulal.'<l le.ming fil. 'l2
effect on lor learning Z1
silual,-d Ih''Ori"" 5fd!
situ.tional inten",l 6:l
conslructivism' S'l
SO<ial,,'lationship gOilI" 'l2.::cJ
wcilKUltural tht.-ori,,,, 56-8, 5':1
inter..."i"n
feedback and IiHl
impru,oing lhrough fonnalin' assessm,."l
100
slud,."ts
as agents of change 1.2:1
a",umptions about leamin)\ 2J
f... tu illS
and formative .....'SSment
im'olvemenl in lJ
noactions to!U.::3
involvement in doosiuns about Z!I
perCl'plions of teacher personal illle"'St 21
understanding of Z2:...l
s alS<J lower achieving stodents; subjl..:t-
interaction
stodenr. role
chanS''''8Z
in learning 2.ld
subjc<:1 classrooms, activity syslems!!H
9H
subjl",t discipline, relalionship I>o."""",n Ih,'
'e""her's role and
subj..",t k""",k"'g", importan,"" uf leach..,s'
""" LN
subjol",t-studenl inleraction, importan"" of
chang., in t!6
summative as,,-'SSme"',t
impact on lower achil'Ving students ZIl
nolationship with formative assessment
sa ndalionship betwem formative and
"""sment
l..1b5
Ihreats 10 & US::::!l
sa of teachers'

summative assessm''fll formali"e
"""
summali"" 1'-'5'"
formalive Us<' 15-16, l.!!. l.J:lZ"li
impact On lower achil'Ving students ZIl
",Iiabilily ll'.b2'l
'Support for in
Furmalive AS!lCSSrm.'flt' 22=J
'Talk Les.suns':!B
Task Group on A......sment Tcstin,o;
(TeAl) report lQi, 153-5, lbZ
leacher d.. profes.sional
de"elop"",nl{1eaming
t..acher W. 22=JO.
Cl'ntralily J2=ll1
relalionship betw,,,,n <loISsroom
oISses,menl prJcti.,.. and
teach...
t.'aclwrs
assumptions Jbout leaming 2J
to l.ll:l::5
importanU' uf subjl"C1 knowll-dge ll5-{"
LN
inquiry-bJs<>d lI!aming br ll!=2
ludg,>m''''1 117, l:l6
I,'vels of support from Zl
assessn"",t by l.l.!:>ol.
StY ul", studenltncher interaction
rul"
and Ihe r"l\ulation of ll6=Z
relatiunship betw"",n subjo.."CI disciplill<'
Jnd 8S=6
t,-,ching
192-3
"'-"Iu"nce of Slag'''; 8ll
teJching 10 the t,,,,1 & 77-8, IO!\, l!!Z
't.,;1 analrsis' l1lil
t.... 1qu,,,,lion5. preparation 1&
't,,,,ling when ready' ZIl
t.'SIS
J.1lIb9.
impact on lower achieving Slud..nlS ZIl
sl"d""ls ill d,'Ci,ions Zl!
use fur pn'<lictic'" 127-8
use for :;election
5 01", t,,,,ls; summati"I!
Ic.;ls; 1<> 111,- tt'St
lexl, poiicy l.5.b2
TC/ T O"',k Group on Asso.o;smmt and
Tc"ing) ,..,port lJ.!:l.l..S.b:!. 1St>, lbZ
229
A$wssment and Learning
theories of leamlng H .iZ::6O
alignment bt'tweet\ aS5eSlill'letlt and
e>,ampres of classroom assessment
practices Sl!::2
beIla""""rist 6Z
cognitive, constructivist 1L :!'11i7-S,
"
possibilities for edectidsm Of synthesis
"""
5OCio-mltural, situated and aaivity
IN.1so activity theory
theory of fol'TT\ative aS$$Sment ill_1m, 201
application of activity theory <y"z
four component model!. 2'1
'"
model for cla5.'lroom transactions
I!k2
",lated """"arm and development studift
"'"
strategies for development 'l2=8.
thinking skills programm<'5 'lll
To...",rd. Dialog'c '.1.8
'traffic lighl!!' 15
training. efkct on formative aS6eSSment ill
tratli;ferabiUty, problenu of lllZ
Trojan Horw. formative as 28..
<"MOO
tru.tlJ8
UJ( policy 149-67,202
in England _ England
in Northern Ireland I..ft.Y
in Srot1and 22 Z!l. 159-62, 166,16Z
in Wall'S 166, lfiZ
US poIi')' 169--83 202
assessment for lKOOUntabilHy l77-11I,
l!b1
No OliJd Left Behind (NelSI Act
(2001) ll!!l::l
portfo1iOOli 178-9
standards l.Z2::,/!O
assesliment in schools 1ZO
studies of z:b1. ZH
and intelligtmee testing _ inh'lligence

SchoIasllc Aptitude Test (SAT) 174-7
written exmlinations 1.Zlb2
validity .u=!I
of formative D)...;Mi,2fi1-2
and feedback 1414>
and the learning rontext IN learning
conte>.1
overlap between rrliability and 127-9
of summaliV1! .s_m<'Tlt J...l:bS.
ttm.ats to '& ll50di
waitlilN' """"arm. LI.

lmpa.ct of nation;ol teMing and
""""
policy environment 162--.). 166, 16Z
Working BillCk Box 22
Zone of Proximal Development (ZP'DI
O<hl
230

Assessment and Learning-Gardner PDF

Uploaded by

Copyright:

Available Formats

Assessment and Learning-Gardner PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessment and Learning-Gardner PDF

Uploaded by

Copyright:

Available Formats

Assessment and Learning

AS6II5Smml and Lo.arruns: An Introduction C John Gardner 2006

Assessment fof Leaming in the Classroom

Part III Formative and Summative

On the Relation$hip Between for Formative and Summative Purpose$

You might also like